## Statistical Analysis

Statistical analysis is the study of mathematics used to explain groups of data, or **data sets**. Statistical methods are used to compare different groups or data sets. The following table lists the basic statistic terms: mean, median, mode, and standard deviation.

Statistic | Description | Excel Function |
---|---|---|

Mean | arithmetic average of a set of numbers | AVERAGE |

Median | value that occurs in the middle of a data set when arranged from lowest to highest, half the values above and half below the median value | MEDIAN |

Mode | value that occurs most frequently in a data set | MODE.MULT, MODE.SNGL |

Standard Deviation | measure of how widely the data values are dispersed from the arithmetic mean | STDEV.P, STDEV.S |

Excel has many statistical functions. Here are the basic statistical functions in Excel.

## Basic Statistical Functions

Function | Description |
---|---|

AVERAGE | arithmetic mean of a data set |

MEDIAN | median or the middle value in the dataset |

MODE.SNGL | mode or most frequent number in a dataset; can show only one value |

MODE.MULT | most frequent values in a data set; can show multiple values |

STDEV.P | standard deviation of a population |

STDEV.S | standard deviation of a sample |

LARGE | nth largest value |

SMALL | nth smallest value |

MAX | largest value |

MIN | smallest value |

RANK.AVG | rank in a list of numbers, any ties are averaged |

RANK.EQ | rank in a list of numbers, ties are shown as first number |

### Mean, Median, and Mode

#### AVERAGE

Arithmetic mean of its arguments, which can be numbers or names, arrays, or references that contain numbers.

AVERAGE(number1,number2,…)

number1: number1,number2,… are 1 to 255 numeric arguments for which you want the average.

number2: number1,number2,… are 1 to 255 numeric arguments for which you want the average.

#### MEDIAN

The median or middle number in a group of supplied numbers.

=MEDIAN (number1, [number2], …)

number1 – A number or cell reference that refers to numeric values.

number2 – [optional] A number or cell reference that refers to numeric values.

#### MODE.SNGL

The most frequently occurring, or repetitive, value in an array or range of data.

MODE.SNGL(number1,number2,…)

Number1: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

Number2: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

#### MODE.MULT

Returns a vertical array of the most frequently occurring values in an array or range of data.

MODE.MULT(number1,number2,…)

Number1: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

Number2: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

### Standard Deviation

Standard deviation is a measure of the amount of dispersion or variation in a dataset. To calculate the standard deviation for an entire population use the STDEV.P function. To calculate it for a sample, use the STDEV.S function.

#### STDEV.P

Calculates the standard deviation for a data set. This is the standard deviation of an entire population. STDEV.P replaces the older STDEV function, with the same behavior.

=STDEV.P (number1, [number2], …)

number1 – First number or reference in the sample.

number2 – [optional] Second number or reference.

STDEV.P calculates standard deviation using the “n” method. STDEV.S assumes data is an entire population. When data represents a sample, use the STDEV.S function.

#### STDEV.S

Calculates the standard deviation for a sample set of data. STDEV.S replaces the older STDEV function, with the same behavior.

=STDEV.S (number1, [number2], …)

number1 – First number or reference in the sample.

number2 – [optional] Second number or reference.

STDEV.S calculates standard deviation using the “n–1” method. STDEV.S assumes data is a sample only. When data represents an entire population, use the STDEV.P function.

## Useful Statistical Functions

### Largest, Smallest, and Rank

Function | Description |
---|---|

MAX | largest value in a dataset |

MIN | smallest value in a dataset |

LARGE | nth largest value |

SMALL | nth smallest value |

RANK.AVG | rank in a list of numbers, any ties are averaged |

RANK.EQ | rank in a list of numbers, ties are shown as first number |

#### MAX

Returns the largest value in a set of values. Ignores logical values and text.

MAX(**number1**,number2,…)

number1, number2,…: number1,number2,… are 1 to 255 numbers, empty cells, logical values, or text numbers for which you want the maximum.

#### MIN

Returns the smallest value in a set of values. Ignores logical values and text.

MIN(**number1**,number2,…)

number1, number2,…: number1,number2,… are 1 to 255 numbers, empty cells, logical values, or text numbers for which you want the minimum.

#### LARGE

Returns numeric values based on their position in a list when sorted by value. In other words, it can retrieve “nth largest” values – largest value, 2nd largest value, 3rd largest value, etc.

=LARGE (array, n)

array – The array from which you want to select the kth largest value.

n – An integer that specifies the position from the largest value, i.e. the nth position.

Using LARGE to find the 1st largest, would give the same results as MAX.

#### SMALL

Returns numeric values based on their position in a list ranked by value. In other words, it can retrieve “nth smallest” values – smallest value, 2nd smallest value, 3rd smallest value, etc.

=SMALL (array, n)

array – A range of cells from which to extract the smallest values.

n – An integer that specifies the position from the smallest value, i.e. the nth position.

Using SMALL to find the 1st smallest, would give the same results as MIN.

#### RANK.AVG

Returns the rank of a number against a list of other numeric values. When values contain duplicates, the RANK.AVG function will assign an average rank to each set of duplicates. RANK.AVG replaces the older RANK function.

=RANK.AVG (number, ref, [order])

number – The number to rank.

ref – An array that contains the numbers to rank against.

order – [optional] Rank ascending or descending. Default is zero.

#### RANK.EQ

Returns the rank of a number against a list of other other numeric values. When values contain duplicates, RANK.EQ will assign the higher rank to each set of duplicates. RANK.EQ replaces the older RANK function.

=RANK.EQ (number, ref, [order])

number – The number to rank.

ref – An array that contains the numbers to rank against.

order – [optional] Rank ascending or descending. Default is zero.

### Statistical Functions With Logic

Function | Description |
---|---|

AVERAGEIFS | average based on a logical test (replaces AVERAGEIF) |

COUNTIFS | count based on a logical test (replaces COUNTIF) |

SUMIFS | sum based on a logical test (replaces SUMIF) |

#### AVERAGEIFS

Computes the average of the numbers in a range that meet one or more criteria. The criteria in AVERAGEIFS supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.

=AVERAGEIFS (avg_rng, range1, criteria1, [range2], [criteria2], …)

avg_rng – The range to average.

range1 – The first range to evaluate.

criteria1 – The criteria to use on range1.

range2 – [optional] The second range to evaluate.

criteria2 – [optional] The criteria to use on range2.

#### COUNTIFS

Counts cells that meet one or more criteria. COUNTIFS can be used with criteria based on dates, numbers, text, and other conditions. COUNTIFS supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.

=COUNTIFS (range1, criteria1, [range2], [criteria2], …)

range1 – The first range to evaulate.

criteria1 – The criteria to use on range1.

range2 – [optional] The second range to evaluate.

criteria2 – [optional] The criteria to use on range2.

#### SUMIFS

Sums cells that meet multiple criteria. SUMIFS can be used to sum values when adjacent cells meet criteria based on dates, numbers, and text. SUMIFS supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.

=SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)

sum_range – The range to be summed.

range1 – The first range to evaulate.

criteria1 – The criteria to use on range1.

range2 – [optional] The second range to evaluate.

criteria2 – [optional] The criteria to use on range2.

### Random Numbers and Rounding

Function | Description |
---|---|

RAND | random number from 0 to 1 |

RANDBETWEEN | random number between numbers specified |

ROUND | rounds a number to a specified number of digits |

ROUNDDOWN | rounds a number down toward zero |

ROUNDUP | rounds a number up away from zero |

EVEN | rounds a number up to the next even integer |

ODD | rounds a number up to the next odd integer |

INT | rounds a number down to the nearest integer |

TRUNC | truncates a number to the specified number of decimal places by removing digits to the right |

CEILING | round a number up to nearest specified multiple |

FLOOR | round a number down to nearest specified multiple |

#### RAND

Returns a random number between 0 and 1. RAND recalculates when a worksheet is opened or changed.

=RAND ()

The RAND function is volatile: every change in Excel returns a new random number.

#### RANDBETWEEN

Returns a random integer between given numbers. RANDBETWEEN recalculates when a worksheet is opened or changed.

=RANDBETWEEN (bottom, top)

bottom – An integer representing the lower value of the range.

top – An integer representing the lower value of the range.

#### ROUND

Returns a number rounded to a given number of digits. The ROUND function can round to the right or left of the decimal point.

=ROUND (number, num_digits)

number – The number to round.

num_digits – The number of digits to which number should be rounded.

#### ROUNDDOWN

Returns a number rounded down to a given number of decimal places. Unlike ROUND, where only numbers less than 5 are rounded down, ROUNDDOWN rounds all numbers 1–9 down.

=ROUNDDOWN (number, num_digits)

number – The number to round down.

num_digits – The number of digits to which number should be rounded down.

#### ROUNDUP

Returns a number rounded up to a given number of decimal places. Unlike ROUND, where numbers less than 5 are rounded down, ROUNDUP always rounds numbers 1–9 up.

=ROUNDUP (number, num_digits)

number – The number to round up.

num_digits – The number of digits to which number should be rounded up.

#### EVEN

Returns the next even integer after rounding a given number up. The EVEN function always rounds numbers up (away from zero) so positive numbers become larger and negative numbers become smaller (i.e. more negative).

=EVEN (number)

number – The number to round up to an even integer.

#### ODD

Returns the next odd integer after rounding a given number up. The ODD function always rounds numbers up (away from zero) so positive numbers become larger and negative numbers become smaller (i.e. more negative).

=ODD (number)

number – The number to round up to an odd integer.

#### INT

Returns the integer part of a decimal number by rounding down to the integer. Note the INT function rounds down, so negative numbers become more negative. For example, while INT(10.8) returns 10, INT(–10.8) returns –11.

=INT (number)

number – The number from which you want an integer.

#### TRUNC

Returns a truncated number based on an (optional) number of digits. The TRUNC function does no rounding, it simply truncates as specified.

=TRUNC (number, [num_digits])

number – The number to truncate.

num_digits – [optional] The precision of the truncation (default is 0).

#### CEILING

Returns a given number rounded up to the nearest specified multiple.

=CEILING (number, multiple)

number – The number that should be rounded.

multiple – The multiple to use when rounding.

#### FLOOR

Rounds a given number down to the nearest specified multiple.

=FLOOR (number, multiple)

number – The number that should be rounded.

multiple – The multiple to use when rounding.

## Excel Formula Wildcards

Excel has 3 wildcards you can use in formulas:

- Asterisk (*) – zero or more characters
- Question mark (?) – any one character
- Tilde (~) – escape for literal character (~*) a literal question mark (~?), or a literal tilde (~~).

Usage | Description | Match |
---|---|---|

? | Any one character | A, d, C, r |

?? | Any two characters | AA, BC, dd |

??? | Any three characters | Sky, BBB, win |

* | Any characters | ace, David, D211 |

*st | Ends in “st” | best, East |

w* | Starts with “w” | William, west, w104 |

?* | At least one character | a, z, ab, DEF |

???-??? | 6 characters with a dash | ARZ–123, 107-BBB |

*~? | Ends in a question mark | What?, reply? |

*ou* | Contains “ou” | accounting, shout, country |

### Compatible Functions for Wildcards

Not all functions allow wildcards. Here is a list of the most common functions that allow wildcards:

- AVERAGEIF, AVERAGEIFS
- COUNTIF, COUNTIFS
- SUMIF, SUMIFS
- HLOOKUP, VLOOKUP, XLOOKUP
- MATCH
- SEARCH

### Named Ranges

An alternative to absolute cell references is a **named range**. A single cell or a range can be named and that name can be used directly in a formula or function. For example, a named range can be called ‘Sales’ rather than B7:B45.

To name a cell or a range, first highlight the cell or range and then click the Name Box below the ribbon to the left. The Name Box is just to the left of the Formula Bar. A named range is an absolute cell reference.

Named ranges are located in the Formulas tab on the ribbon under the Define Name button. Named ranges can be added, edited, or deleted.

The INDIRECT function below is useful when working with named ranges. If you have a named range ‘Sales’ it can be referenced by a column heading named Sales.

#### INDIRECT

Returns the reference specified by a text string.

INDIRECT(ref_text,[a1])

Ref_text: is a reference to a cell that contains an A1- or R1C1-style reference, a name defined as a reference, or a reference to a cell as a text string.

A1: is a logical value that specifies the type of reference in Ref_text: R1C1-style = FALSE; A1-style = TRUE or omitted.

## Video 2: Stock and Bond Returns Tutorial

## Video 3: Named Ranges and IFS Tutorial

## Excel Basics Lessons

We have a full set of Excel basics lessons. For all the lessons, see The Ultimate Guide to Finally Learn Excel.

Here are the Excel lessons: