**statistical concepts **

**introduction to statistics** [cdf: version 1; version2; version 3]

mean

- it's the average value or the expected value
- , where n = sample size; x
_{i}= value of data in the sample

standard deviation(SD)

- a measure of the variation in the data, where ~ 68% of the data (in a normal distribution) falls within ± 1 SD of the mean

standard error of the mean(SEM)

- SD of the distribution of means
- a measure of the variation in the value of the mean

__regression analysis__ [for 2 parameter linear equation: pptx; pdf.]

- is a method to determine the value of the parameters of a function that describe the experimental data

p-value [alternative: pdf; pptx]

- used in various statistical tests to identify groups of data that are the same or different
- probability of error (to state that there is a difference, when there is a no difference)
- convention
- for 2-tail p-value
- if p-value < 0.05, then: mean 1 ≠ mean 2, i.e. "the data is different"
- if p-value > 0.05, then: mean 1 = mean 2, i.e. "the data is the same"

- for 1-tail p-value
- if p-value < 0.05, then: mean 1 < mean 2 (or mean 1 > mean 2)
- if p-value > 0.05, then: mean 1 ≥ mean 2 (or mean 1 ≤ mean 2)

- for 2-tail p-value

__types of statistical test__

- is used to detect an outlier, a data that is different from the other data
- may be used to delete only a single data
- data is an outlier if Q = | suspect data - nearest data | ÷ (largest data - smallest data) > Q
_{c }Table 1. Q-test values for 90% confidence.

N (sample size)

Q

_{c}3

4

5

6

7

8

9

10

0.94

0.76

0.64

0.56

0.51

0.47

0.44

0.41

For example, is the value of 25 an outlier in the following data set ?

10, 11, 13, 14, 25.

In this example,

Q = | 25 - 14 | ÷ (25 - 10) = 0.73 > Q

_{c}= 0.64thus, the value of 25 is an outlier and may be deleted in subsequent data analysis

to compare 2 groups of data[pdf, pptx]

group 1group 2difference X1 Y1 X1 - Y1 X2 Y2 X2 - Y2etc. mean of group 1 mean of group 2 mean differenceuse the following statistical test [mathematica file that does these tests; requires mathematica]

2-sample t-test (or independent sample t-test or unpaired sample t-test)

- compare the
mean of group 1versus themean of group 21-sample t-test (or correlated sample t-test or paired sample t-test)

- compare the
mean difference(to zero)

to compare 3 or more groups of datathat differ by asingle factor, use [pptx; pdf]

- 1-factor analysis of variance (1-anova)- to detect if there is / are any pairwise difference(s) among groups of data [refer to p-value]
- Tukey's test - to identify the specific pair(s) of data that differ; done if the p-value < 0.05 in the preceding 1-anova
- mathematica file that does this test [requires mathematica]

to compare groups of datathat differ bytwo factors, use (optional)

- 2-factor analysis of variance (2-anova) - to detect if there are any effect by the factor in the column or row; also, examines if there any interaction between these 2 factors - the interpreation becomes more complicated if there are any interactions between the 2 factors
- website does calculation; directions on its use (ignore initial portion - it's based on an earlier version of my website)

additional resources:

- vodcast (goto stats tab)
- stats article (includes a supplement; both items are in my pickup folder on the school's server); calculus-based version of the supplement; alternative source of article & algebra-based supplement
- mathematica: statistics
- minitab (version 12; statistics analysis software): on school computer network, goto start --> all programs --> student menu --> science applications
- use of propagation of errors (physics only)
- misuse of statistics