# Connexions

You are here: Home » Content » Collaborative Statistics: Glossary

### Recently Viewed

This feature requires Javascript to be enabled.

# Collaborative Statistics: Glossary

Summary: This module contains a number of glossary terms related to elementary statistics. This module represents the combined glossary information for the Collaborative Statistics textbook/module (col10522).

If you cannot find what you are looking for in the Collaborative Statistics Glossary, then try one of the links below.

Link to the Statistics Glossary by Dr. Philip Stark, UC Berkeley

http:// statistics.berkeley.edu/~stark/SticiGui/Text/gloss.htm

http:// http://www.wikipedia.org/
(Search on "Glossary of probability and statistics.")

## Glossary

For any events A A size 12{A} {} and B B size 12{B} {} in the sample space P ( A or B ) = P ( A ) + P ( B ) P ( A and B ) P ( A or B ) = P ( A ) + P ( B ) P ( A and B ) size 12{P $$A bold "or"B$$ =P $$A$$ +P $$B$$ -P $$A bold "and"B$$ } {} .
Analysis of Variance:
Also referred to as ANOVA. A method of testing whether or not the means of three or more populations are equal. The method is applicable if:
• All populations of interest are normally distributed.
• The populations have equal standard deviations.
• Samples (not necessarily of the same size) are randomly and independently selected from each population.
The test statistic for analysis of variance is the F-ratio.
AND:
Logical operation over the subsets of a set. In statistics, if A A size 12{A} {} and BB size 12{B} {}{} are any two events (subsets in the sample space), then the event “ AA size 12{A} {} and BB size 12{B} {}” consists of all possible outcomes that are common to both AA size 12{A} {} and BB size 12{B} {}.
Arithmetic Mean:
The sum of the values divided by the number of values. The notation for the mean of a sample is x¯ x . The notation for the mean of a population is μμ.
Average:
A number that describes the central tendency of the data. There are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean.
Bayes' Theorem:
Developed by Reverend Bayes in the 1700s. A rule designed to find the probability of one event, AA size 12{A} {}, occurring, given that a finite set of other events, {Bi,i=1,2,...,l}{Bi,i=1,2,...,l size 12{B rSub { size 8{i} } ,i=1,2, "." "." "." ,l} {}}, has occurred.
Bernoulli Trials:
An experiment with the following characteristics:
• There are only 2 possible outcomes called “success” and “failure” for each trial.
• The probabilities pp of success and q = 1-pq=1-p of failure are the same for any trial.
Bias:
A possible consequence if certain members of the population are denied the chance to be selected for the sample.
Binomial Distribution:
A discrete random variable (RV) which arises from the Bernoulli trials. There are a fixed number, nn, of independent trials. “Independent” means that the result of any trial (for example, trial 1) does not affect the results of all the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV XX size 12{X} {} is defined as the number of successes in nn trials. The notation is: XX~ B ( n , p )B(n,p). The mean is μ=np μ np and the standard deviation is σ = npqσ=npq. The probability of having exactly xx successes in nn trials is P ( X = x ) = n x p x q n x P(X=x)= n x p x q n x . We can add that probability is assigned to discrete points.
Central Limit Theorem:
Given a random variable (RV) with known mean μμ and known standard deviation σσ. We are sampling with size n and we are interested in two new RVs - the sample mean, XˉXˉ, and the sample sum, ΣXΣX. If the size nn of the sample is sufficiently large, then XˉXˉ size 12{ { bar {X}}} {} N μ σ n N μ σ n and ΣXΣX size 12{X} {}N ( , n σ )N(,nσ). If the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. The mean of the sample means will equal the population mean and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, σ n σ n , is called the standard error of the mean.
Charts:
Special graphical formats used to visualize a frequency distribution. They include, but are not limited to: histograms, frequency polygons, cumulative frequency polygons, box plots, stemplots, bar charts, Venn and tree diagrams, and pie charts.
Class Mark:
Midpoint of the class.
Chi-square Distribution:
A continuous distribution with the following characteristics:
• The random variable (RV) is continuous and takes on only nonnegative values (in fact, it is the sum of squares of kk size 12{k} {} independent normal distributions).
• There is a "family" of Chi-square distributions. Each representative of the family is completely defined by the number of degrees of freedom, k1k1 size 12{k - 1} {}, where kk size 12{k} {} is the number of categories (not the size of sample).
• The pdf is positively skewed (skewed right). However, as kk size 12{k} {} increases (kk size 12{k} {}>90), the distribution approximates the normal distribution.
The notation is: χ 2 χ 2 ~ χ df 2 χ df 2 . For the χ 2 χ 2 distribution, the population mean is μ = df μ=df and the population standard deviation is σ = 2 df σ= 2 df . The Chi-square distribution is used to calculate the test statistic for the Goodness-of-fit Test (to determine if a population follows a specified distribution), for the Test of Independence (to determine if two factors are related or not), and for the Test of a Single Variance.
Classes:
Intervals in which the data are grouped. It is convenient to group outcomes into classes when working with large amounts of data. For example, every bar in a histogram corresponds to one class (one interval) and the midpoint of the interval can be chosen as a representative of all outcomes in the class. The Midpoint of the class is often called the class mark.
Cluster Sampling:
A procedure that is used if the population is dispersed over a wide geographic area. The population is divided into units or groups (counties, precincts, blocks, etc.) called primary units. Then some of the primary units are randomly chosen, and all members of those primary units are the sample.
Coefficient of Correlation:
A measure developed by Karl Pearson (early 1900s) that gives the strength of association between the independent variable and the dependent variable. The formula is:
r = n XY ( X ) ( Y ) [ n X 2 ( X ) 2 ] [ n Y 2 ( Y ) 2 ] , r = n XY ( X ) ( Y ) [ n X 2 ( X ) 2 ] [ n Y 2 ( Y ) 2 ] , size 12{r= { {n Sum { ital "XY"} - $$Sum {X$$ $$Sum {Y$$ } } } over { sqrt { $n Sum {X rSup { size 8{2} } - $$Sum {X$$ rSup { size 8{2} }$ $n Sum {Y rSup { size 8{2} } - $$Sum {Y$$ rSup { size 8{2} }$ } } } } } } } ,} {}
(1)
where nn is the number of data points. The coefficient rr is not more then 1 nor less then -1.
Cumulative Distribution Function (CDF):
Given a quantitative random variable (RV) XX, the function P ( X x ) P( X x ) is called the Cumulative Distribution Function (CDF). The CDF is the sum of the probabilities of all values of XX that are less than or equal to a particular xx.
Cumulative Relative Frequency:
The term applies to an ordered set of observations from smallest to largest. The Cumulative Relative Frequency is the sum of the relative frequencies for all values that are less than or equal to the given value.
Complement Event:
The event consisting of all outcomes that are in the sample space but are not in the given event.
Conditional Probability:
The likelihood that an event will occur given that another event has already occurred.
Confidence Interval (CI):
An interval estimate for an unknown population parameter. This depends on:
• The desired confidence level.
• Information that is known about the distribution (for example, known standard deviation).
• The sample and its size.
Confidence Level (CL):
The percent expression for the probability that the confidence interval contains the true population parameter. For example, if the CL=90%CL=90%, then in 9090 out of 100100 samples the interval estimate will enclose the true population parameter.
Contingency Table:
The method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other. The table provides an easy way to calculate conditional probabilities.
Continuous Random Variable (RV):
A random variable (RV) whose outcomes are measured. Probability is measured over continuous intervals.

### Example:

The height of trees in the forest is a continuous RV.

Correlation Analysis:
A group of statistical procedures used to measure the strength of the relationship between two variables.
Counting Principal:
If there are mm size 12{m} {} ways of doing one thing and nn size 12{n} {} ways of doing another, then there are m×nm×n size 12{m times n} {} ways of doing both.

### Example:

A cafe offers m=5m=5 size 12{m=5} {} kinds of coffee and n=7n=7 size 12{n=7} {} kinds of cake. There are 35 ways to serve coffee with cake.

Critical Value:
The dividing point between the region where the null hypothesis is not rejected and the region where it is rejected. For a one-tailed hypothesis test, there is only one critical value. For a two-tailed hypothesis test, there are two critical values—one in each tail— with the same absolute value and opposite signs.
Data:
A set of observations (a set of possible outcomes). Most data can be put into two groups: qualitative (hair color, ethnic groups and other attributes of the population) and quantitative (distance traveled to college, number of children in a family, etc.). Quantitative data can be separated into two subgroups: discrete and continuous. Data is discrete if it is the result of counting (the number of students of a given ethnic group in a class, the number of books on a shelf, etc.). Data is continuous if it is the result of measuring (distance traveled, weight of luggage, etc.)
Degrees of Freedom (df):
The number of sample values that are free to vary.
Dependant Samples:
Samples chosen in such a way that they are not independent of each other. Paired samples are dependent because two measurements are taken from the same individual or item.

### Example:

If the test scores of 13 individuals were recorded before a new teaching method was introduced, and then after using the new method, the paired samples are dependent.

Descriptive Statistics:
The numerical and graphical ways used to describe and display the important characteristics of data; for example, charts, frequency distributions, measures of central tendency and measures of spread and skewness.
Discrete Random Variable:
A random variable (RV) whose outcomes are counted.
Domain:
The set of possible values for the independent variable.

### Example:

• We are interested in the longevity of human life in years. The domain is {0,1,2,3...,120}{0,1,2,3...,120} size 12{ lbrace 0,1,2,3 "." "." "." ,"120" rbrace } {}.
• We are interested in the suit of a regular 52-card deck. The domain is { ; ; ; } { ; ; ; } .

Equally Likely:
Each outcome of an experiment has the same probability of occurring.
Error Bound for a Population Mean (EBM):
The margin of error. It depends on the confidence level, sample size, and the known or estimated population standard deviation.
Error Bound for a Proportion (EBP):
The margin of error. It depends on the confidence level, sample size, and the estimated (from the sample) proportion of successes.
Event:
A subset in the set of all outcomes of an experiment. The set of all outcomes of an experiment is called a sample space and denoted usually by S. An event is any arbitrary subset in S. It can contain one outcome, two outcomes, no outcomes (empty subset), the entire sample space, etc. Standard notations for events are capital letters such as A, B, C, etc.
Expected Value:
The arithmetic average when an experiment is repeated many times. The Expected Value is called the long-term mean or average. Notation: E(x),μE(x),μ size 12{E $$x$$ ,μ} {}. For a discrete random variable (RV) with probability distribution function P(X=x) P(X=x), the definition also can be written in the form E(X)=μ=xP(x)E(X)=μ=xP(x) size 12{E $$X$$ =μ= Sum { ital "xP" $$x$$ } } {}.
Exponential Distribution:
A continuous random variable (RV) that appears when we are interested in the intervals of time between some random events, for example, the length of time between emergency arrivals at a hospital. Notation: X~Exp(m)X~Exp(m) size 12{X "~" ital "Exp" $$m$$ } {}. The mean is μ=1mμ=1m size 12{μ= { {1} over {m} } } {} and the standard deviation is σ = 1 m σ= 1 m . The probability density function is f(x)=memx,f(x)=memx, size 12{f $$x$$ = ital "me" rSup { size 8{- ital "mx"} } ," "} {} x 0 x 0 and the cumulative distribution function is P(Xx)=1emxP(Xx)=1emx size 12{P $$X <= x$$ =1-e rSup { size 8{- ital "mx"} } } {}.
Experiment:
A planned activity carried out under controlled conditions.
F Distribution:
Developed by Sir Ronald Fisher. The F Distribution has the following characteristics:
• The random variable (RV) is a ratio (called the F-ratio) of two sums of weighted squares. It is continuous and takes on only nonnegative value.
• The pdf is positively skewed (skewed to the right).
• There is a "family" of F distributions.
Every representative of the family is defined by 2 parameters: the number of degrees of freedom for the numerator in the F-ratio and the number of degrees of freedom in the denominator in the F-ratio. The F Distribution is used to test of 2 population variances and in ANOVA hypothesis tests.
Frequency Distribution:
A grouping of data into mutually exclusive classes showing the number of outcomes in each class.
Frequency:
The number of times a value of the data occurs.
Geometric Distribution:
A discrete random variable (RV) which arises from the Bernoulli trials. The trials are repeated until the first success. The geometric variable XX is defined as the number of trials until the first success. Notation: XXG ( p )G(p). The mean is μ = 1 p μ= 1 p and the standard deviation is σ=1p(1p1)σ=1p(1p1) The probability of exactly x failures before the first success is given by the formula: P(X=x)=p(1p)x1P(X=x)=p(1p)x1 size 12{P $$X=x$$ =p $$1 - p$$ rSup { size 8{x - 1} } } {}.
Hypergeometric Distribution:
A discrete random variable (RV) that is characterized by
• A fixed number of trials.
• The probability of success is not the same from trial to trial.
We sample from two groups of items when we are interested in only one group. XX is defined as the number of successes out of the total number chosen. Notation: X~H(r,b,n).X~H(r,b,n). size 12{X "~" H $$r,b,n$$} {}, where rr = the number of items in the group of interest, bb = the number of items in the group not of interest, and nn = the number of items chosen.
Hypothesis Testing:
Based on sample evidence, hypothesis testing is a procedure that determines whether the null hypothesis is a reasonable statement and cannot be rejected, or is unreasonable and should be rejected.
Hypothesis:
A statement about the value of a population parameter. In case of two hypotheses, the statement assumed to be true is called the null hypothesis (notation H0H0 size 12{H rSub { size 8{0} } } {}) and the contradictory statement is called the alternate hypothesis (notation HaHa size 12{H rSub { size 8{a} } } {}).
Independent Events:
The occurrence of one event has no effect on the probability of the occurrence of any other event. Events A and B are independent if any of the following is true:
• P ( A | B ) = P ( A ) P(A|B)=P(A)
• P ( B | A ) = P ( B ) P(B|A)=P(B)
• P ( A and B ) = P ( A ) P ( B )P(AandB)=P(A)P(B)
.
Independent Samples:
Samples that are not related in any way.
Inferential Statistics :
Also called statistical inference or inductive statistics. This facet of statistics deals with estimating a population parameter based on a sample statistic. For example, if 4 out of the 100 calculators sampled are defective we might infer that 4 percent of the production is defective.
Interquartile Range (IRQ):
The distance between the third quartile (Q3) and the first quartile (Q1). IQR = Q3 - Q1.
Interval Estimate:
Based on sample information, an Interval Estimate is an interval of numbers that may contain a population parameter.
Level of Significance of the Test :
Probability of a Type I error (reject the null hypothesis when it is true). Notation: αα. In hypothesis testing, the Level of Significance is called the preconceived αα or the preset αα.
Linear Regression Equation :
A linear equation in the form y ^ = a + bx y ^ =a+bx, that defines the relationship between two variables. It is used to predict the dependent variable yy based on a selected value of independent variable xx.
Mean:
A number that measures the central tendency. A common name for mean is 'average.' The term 'mean' is a shortened form of 'arithmetic mean.' By definition, the mean for a sample (denoted by x¯ x ) is x¯ = Sum of all values in the sampleNumber of values in the sample x = Sum of all values in the sampleNumber of values in the sample size 12{ { bar {X}}= { {"Sum of all values in the sample"} over {"Number of values in the sample"} } } {}, and the mean for a population (denoted by μμ size 12{m} {}) is μ=Sum of all values in the populationNumber of values in the populationμ=Sum of all values in the populationNumber of values in the population size 12{m= { {"Sum of all values in the population"} over {"Number of values in the population"} } } {}.
Median:
A number that separates ordered data into halves. Half the values are the same number or smaller than the median and half the values are the same number or larger than the median. The median may or may not be part of the data.
Mode:
The value that appears most frequently in a set of data.
Multiplication Rule:
For any events A and B in the sample space, {} P ( A and B ) = P ( A B ) P ( B ) = P ( B A ) P ( A ) . P ( A and B ) = P ( A B ) P ( B ) = P ( B A ) P ( A ) . size 12{P $$A bold "and"B$$ =P $$A \lline B$$ cdot P $$B$$ =P $$B \lline A$$ cdot P $$A$$ "." } {}
Mutually Exclusive:
An observation cannot fall into more than one class (category). Being in one category prevents being in a mutually exclusive category.
Normal Distribution:
A continuous random variable (RV) with pdf f(x)= 1 σ 2π e -(x-μ) 2 2 σ 2 f(x)= 1 σ 2π e -(x-μ) 2 2 σ 2 where μμ is the mean of the distribution and σσ is the standard deviation. Notation: XX ~ N μ σ N μ σ . If μ=0μ=0 and σ=1σ=1, the RV is called the standard normal distribution.
One-Tailed Test:
Used when the alternate hypothesis states a direction. The rejection region is in one tail. Example: HaHa:μμ > 4040 with the rejection region in the right tail.
OR:
Logical operation over the subsets of a set. In statistics, if AA and BB are any two events (subsets in the sample space), then the event “AA or BB” consists of all outcomes that are in AA, or in BB, or in both AA and BB.
Outcome (observation):
A particular result of an experiment.
Outlier:
An observation that does not fit the rest of the data.
Parameter:
A numerical characteristic of the population.

### Example:

The mean price to rent a 1-bedroom apartment in California.

pdf:
PDF:
Percentile:
A number that divides ordered data into hundredths.

### Example:

Let a data set contain 200 ordered observations starting with {2.3,2.7,2.8,2.9,2.9,3.0...}{2.3,2.7,2.8,2.9,2.9,3.0...} size 12{ lbrace 2 "." 3,2 "." 7,2 "." 8,2 "." 9,2 "." 9,3 "." 0 "." "." "." rbrace } {}. Then the first percentile is (2.7+2.8)2=2.75(2.7+2.8)2=2.75 size 12{ { { $$2 "." 7+2 "." 8$$ } over {2} } =2 "." "75"} {}, because 1% of the data is to the left of this point on the number line and 99% of the data is on its right. The second percentile is (2.9+2.9)2=2.9(2.9+2.9)2=2.9 size 12{ { { $$2 "." 9+2 "." 9$$ } over {2} } =2 "." 9} {}. Percentiles may or may not be part of the data. In this example, the first percentile is not in the data, but the second percentile is. The median of the data is the second quartile and the 50th percentile. The first and third quartiles are the 25th and the 75th percentiles, respectively.

Point Estimate:
A single number computed from a sample and used to estimate a population parameter.
Poisson Distribution:
A discrete random variable (RV) is the number of times a certain event will occur in a specific interval. Characteristics of the variable:
• The probability that the event occurs in a given interval is the same for all intervals.
• The events occur with a known mean and independently of the time since the last event.
The distribution is defined by the mean μμ of the event in the interval. Notation: X~P(μ)X~P(μ) size 12{X "~" P $$μ$$ } {}. The mean is μ=npμ=np size 12{μ= ital "np"} {}. The standard deviation is σ = μ σ= μ . The probability of having exactly xx successes in rr trials is P(X=x)=eμμxx!P(X=x)=eμμxx! size 12{P $$X=x$$ =e rSup { size 8{ - μ} } { {μ rSup { size 8{x} } } over {x!} } } {}. The Poisson distribution is often used to approximate the binomial distribution when nn is “large” and pp is “small” (a general rule is that nn should be greater than or equal to 20 and pp should be less than or equal to .05).
Population:
The collection, or set, of all individuals, objects, or measurements whose properties are being studied.
Preconceived αα:
The probability of rejecting the null hypothesis when the null hypothesis is true (αα is equal to the probability of a Type I error). αα is called the level of significance of the test. Also called the preset αα.
Probability Density Function (pdf):
A mathematical description of a continuous random variable (RV). For any specific value xx, P(X=x)=0P(X=x)=0 size 12{P $$X=x$$ =0} {}. By definition, the pdfpdf is any positive function f(x)f(x) over the real numbers such that the area bounded above by f(x)f(x), below by the x-axisx-axis and from the right by a vertical line X=xX=x size 12{X=x} {} is equal to the probability P(Xx)P(Xx) size 12{P $$X <= x$$ } {}.
Probability Distribution Function (PDF):
A mathematical description of a discrete random variable (RV), given either in the form of an equation (by formula) or in the form of a table listing all the possible outcomes of an experiment and the probability associated with each outcome.

### Example:

A biased coin with probability 0.7 of heads is tossed 5 times. We are interested in the number of heads (XX = the number of heads). XX is Binomial: XXB 5 . 7 B 5 . 7 . P ( X = x ) =P(X=x)= 5 x . 7 x . 3 5 x 5 x . 7 x . 3 5 x or in the form of the table.

Table 1
xx P ( X = x ) P(X=x)
0 0.0024
1 0.0284
2 0.1323
3 0.3087
4 0.3602
5 0.1681

Probability Distribution:
The common name for Probability Density Function (pdf) and Probability Distribution Function (PDF).
Probability:
A number between 0 and 1, inclusive, that gives the likelihood that a specific event will occur. The foundation of statistics is given by the following 3 axioms (by A. N. Kolmogorov, 1930’s): Let SS denote the sample space and AA and BB are two events in SS . Then:
• 0P(A)10P(A)1 size 12{0 <= P $$A$$ <= 1;} {}.
• If AA and BB are any two mutually exclusive events, then P ( A or B ) = P ( A ) + P ( B ) P(AorB)=P(A)+P(B).
• P ( S ) = 1P(S)=1.
Proportion:
• As a number: A proportion is the number of successes divided by the total number in the sample.
• As a probability distribution: Given a binomial random variable (RV), XX B n p B n p , consider the ratio of the number XX of successes in nn Bernouli trials to the number nn of trials. P ' = X n P'= X n . This new RV is called a proportion, and if the number of trials, nn, is large enough, P'P' N p pq n N p pq n .
p-value:
The probability that an event or a more extreme event will happen purely by chance assuming the null hypothesis is true. The smaller the p-value, the stronger the evidence is against the null hypothesis.
Qualitative Data:
see Data.
Quantitative Data:
see Data.
Quartiles:
The numbers that separate the data into quarters. Quartiles may or may not be part of the data. The second quartile is the median of the data. The quartiles are the 25th, 50th and 75th percentiles
Range:
Difference between the highest and lowest values: Range = Highest value – Lowest value.
Relative Frequency:
The ratio of the number of times a value of the data occurs in the set of all outcomes to the number of all outcomes.
Random Variable (RV):
see Variable
Sample Space:
The set of all possible outcomes of an experiment.
Sample:
A portion of the population under study. A sample is representative if it characterizes the population being studied.
Sample Error:
The difference between a sample statistic and the corresponding population parameter that can be attributed to sampling (to chance).
Sampling:
A procedure for gathering information about the entire population by selecting only a portion of the population. The more popular random procedures are systematic sampling, simple random sampling, stratified sampling, and cluster sampling.
Scatter Diagram:
A chart that visually depicts the relationship between two variables.
Simple Random Sampling:
A sampling scheme in which every member of the population has the same chance of being selected.
For this rule to apply the events must be mutually exclusive: P(AorB)=P(A)+P(B)P(AorB)=P(A)+P(B) size 12{P $$ital "AorB"$$ =P $$A$$ +P $$B$$ } {}.
Special Rule for Multiplication:
For this rule to apply the events must be independent: P(AandB)=P(A)P(B)P(AandB)=P(A)P(B) size 12{P $$ital "AandB"$$ =P $$A$$ P $$B$$ } {}.
Standard Deviation:
A number that is equal to the square root of the variance and measures how far data values are from their mean. Notation: s for sample standard deviation and σσfor population standard deviation.
Standard Error of the Mean:
The standard deviation of the distribution of the sample means, σ n σ n .
Standard Normal Distribution:
A continuous random variable (RV) X~N(0,1)X~N(0,1) size 12{X "~" N $$0,1$$ } {}. When X follows the standard normal distribution, it is often noted as Z~N(0,1)Z~N(0,1) size 12{Z "~" N $$0,1$$ } {}.
Statistic:
A numerical characteristic of the sample. A statistic estimates the corresponding population parameter. For example, the average number of full-time students in a 7:30 a.m. class for this term (statistic) is an estimate for the average number of full-time students in any class this term (parameter).
Statistics:
The science of collecting, organizing, analyzing, and interpreting numerical data.
Stratified Random Sampling:
A population is divided into groups (called strata) and then a random sample is selected from each stratum.
Student's-t Distribution:
Investigated and reported by William S. Gossett in 1908 and published under the pseudonym Student. The major characteristics of the random variable (RV) are:
• The Student's-t is continuous and assumes any real values.
• The pdfpdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution.
• The Student's-t approaches the standard normal distribution as nn gets larger.
• There is a "family" of tt distributions: every representative of the family is completely defined by the number of degrees of freedom (one less than the number, nn, of data).
Notation: tdftdf where dfdf is the degrees of freedom. df=n - 1df=n - 1.
Systematic Sampling:
A population is arranged in some standard list (for example, alphabetically) and then every m-th (for example, every fifth) representative of the list is taken in the sample starting from a random initial representative.
t statistic:
Calculated from the data according to the Student's-t distribution statistic that is used to conduct a hypothesis test and to make the statistical inference about the whole population. If data contains nn observations, then the number of degrees of freedom for the Student's-t distribution is n - 1n - 1. The tt statistic is used, for example, when the population standard deviation is unknown, when nn is small, and when samples are dependent (matched pairs hypothesis test). The tt statistic formula is t= x¯ - μ sn t= x - μ sn .
Test Statistic:
Calculated from the sample value that is used to conduct the hypothesis test and that makes the statistical inference about the whole population. The calculation depends on the choice of the appropriate distribution, which often is reflected in the name of statistic: z-score, t statistic, F statistic (F Ratio), etc.
Tree Diagram:
The useful visual representation of a sample space and events in the form of a “tree” with branches marked by possible outcomes simultaneously with associated probabilities (frequencies, relative frequencies).
Type I Error:
The decision is to reject the Null hypothesis, when, in fact, the Null hypothesis is true.
Type II Error:
The decision is not to reject the Null hypothesis when, in fact, the Null hypothesis is false.
Uniform Distribution:
A continuous random variable (RV) that has equally likely outcomes over the domain, a<x<ba<x<b size 12{a<x<b} {}. Often referred as the Rectangular distribution because the graph of the pdf has the form of a rectangle. Notation: X~U(a,b)X~U(a,b) size 12{X "~" U $$a,b$$ } {}. The mean is μ=a+b2μ=a+b2 size 12{μ= { {a+b} over {2} } } {} and the standard deviation is σ= (b-a)2 12 σ (b-a)2 12 The probability density function is fX = 1b-a fX=1b-a for a<X<b a X b or aXb a X b. The cumulative distribution is P(Xx)=xabaP(Xx)=xaba size 12{P $$X <= x$$ = { {x-a} over {b-a} } } {}.
Variable (Random Variable):
A characteristic of interest in a population being studied. The common notation for variables are upper case Latin letters XX size 12{X} {}, YY size 12{Y} {}, ZZ size 12{Z} {},.... The common notation for a specific value of a variable) are lower case Latin letters xx size 12{x} {}, yy size 12{y} {}, zz size 12{z} {},.... The variable in statistics differs from the variable in intermediate algebra in two following ways:
• The domain of a random variable (RV) is not necessarily a numerical set but it may be words. For example, if XX size 12{X} {} = hair color then the domain is {black, blond, gray, red, brown}.
• We can tell a specific value of xx size 12{x} {} that the variable XX size 12{X} {} takes on only after performing the experiment.
Variance:
Mean of the squared deviations from the mean. Square of the standard deviation. For a set of data, a deviation can be represented as x-x¯x- x where xx is a value of the data and x¯ x is the sample mean. The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and 1.
Venn Diagram:
The useful visual representation of a sample space and events in the form of circles or ovals showing their intersections.
z-score:
The linear transformation of the form z=xμσz=xμσ size 12{z= { {x-μ} over {σ} } } {}. If this transformation is applied to any normal distribution X~N( μ , σ)X~N( μ , σ) , the result is the standard normal distribution Z~N(0,1)Z~N(0,1) size 12{Z "~" N $$0,1$$ } {}. If this transformation is applied to any specific value xx size 12{x} {} of the RV with mean μμ size 12{μ} {} and standard deviation σσ size 12{σ} {} , the result is called the z-score of xx size 12{x} {}. Z-scores allow us to compare data that are normally distributed but scaled differently.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks