Mathematics for Placement Papers - STATISTICS- Concept and Important FormulaS

STATISTICS

Statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting and analysing data as well as withdrawing valid conclusions and making reasonable decisions on the basis of such analysis.

Data: Data is collection of related facts, observations or figures. A collection of data is data set and each observation a data point .

Variables :

Discrete and Continuous :

A variable is a symbol, such as X,Y ,H, x or y that can any of a prescribed set of values called the domain of the variable. If the variable can assume only one value, it is called a constant.

A variable that can assume any value between the two given values is called a continuous variable, otherwise, it is called discrete variable.

Example : The number N of children in a family , which can assume any of the values 0,1,2,3 but cannot be 2.5 or 3.842 is a discrete variable.

The height H of an individual, which can be 67 inches, 65.5 inches or 66.234, depending on the accuracy of measurement, is a continuous variable.

Tabulation : The arrangement of the raw data under various heads in the form of a table is called tabulation.

Frequency : Number of observations falling in a particular class is called the frequency of that class

Cumulative frequency : The cumulative frequency of a particular class is the sum of the frequencies of this class and those prior to it.

Frequency Distribution Table : The table showing the class intervals along with the corresponding class frequencies is know as “ Frequency Distribution Table “.

Length of the class : The difference between the upper and lower boundaries is called the Length of the Class

Mid value : The average of lower and upper limit is called “ mid value “ of a class interval .

Measures of Central Tendency:

It gives us an idea about the concentration of the values in the central part of the distribution. An average of statistical series is the value of the variable which is representative of entire distributes

Types of Central Tendency:

1 Mathematical Average.

a) Arithmetic Mean

b) Geometric Mean

c) Harmonic Mean.

2 . Positional Averages.

a) Median

b) Mode

Arithmetic mean :

i) Individual Series : A.M of ungrouped date =

Sum of the items/Number of Items=

ii) Discrete Series : If X_1,X₂, ……X_n are n distinct values with frequencies f₁, f₂,f₃, …….f_n then

iii) Continuous Series: If X₁,X₂,X₃…..X_n are the mid values and f₁,f₂,f₃,….f_n are frequencies of a grouped data then

In the step deviation method A.M= where A= Assumed mean, C= width of the class d=

Important Results:

(i) The algebraic sum of deviations taken about mean is zero.

(ii) For any two numbers a and b, then their mean is (a+b)/2

(iii) Every data set has only one mean.

Merits of Arithmetic Average :

(i) Easy to understand and easy to calculate

(ii) If Provides good basis for comparison.

(iii) As every item is taken in calculation, it is affected by ever item

Demerits of Arithmetic Average :

(i) It cannot be located graphically

(ii) A single observation can bring , big change in the mean

(iii) It is very difficult to find the actual mean

(iv) We cannot calculate the mean for a data set with open ended classes.

Weighted Arithmetic Mean:

The weighted mean is calculated taking into account the relative importance of each of the values to the total value

When the observations X₁, X₂, X₃,……X_n and the weights W₁, W₂, W₃,…..W_n are given to each observation, then weighted Arithmetic mean is given by

Combined Mean: If X1 and X2 are arithmetic means of two series with m and n observations respectively, the combined mean is

Geometric Mean:

It is useful when we have some quantities that change over a period of time.

(i) Ungrouped data (individual Series)

G.M= = where X₁, X₂, X₃,……X_n are n observations.

(ii) Discrete Series:

G.M. = where f₁,f₂,f₃,….f_n are frequencies and X₁, X₂, X₃,……X_n are n observations and is the sum of the observations.

Properties of the Geometric Mean.

i) G.M. is used in calculating the growth rates.

ii) If any observation is zero, G.M becomes zero.

iii) It is difficult to calculate the nth root.

iv) If a and b are two numbers then their G.M. is .

v) If any observation is negative, G.M. is imaginary

Harmonic Mean:

Harmonic mean of a given series is the reciprocal of the arithmetic average of the reciprocals of the values of its various observations.

i) Ungrouped Data (Individual Series): Let X₁, X₂, X₃,……X_n be n observations, then their H.M. =

ii) Discrete Series:

H. M. =

Where n is the sum of the observations and f₁,f₂,f₃,….f_n are frequencies of the observations X₁, X₂, X₃,……X_n respectively

Properties of H.M.

(i) H.M. useful to calculate speed and distance.

(ii) If a and b are two numbers, their H.M. is .

MEDIAN:

The median as the name suggests, is the middle value of a series arranged in any order of magnitude.

For Ungrouped Data:

(i) If n is odd, th observation is the median, after arranging the observations either ascending order of descending order.

(ii) If n is even, then the average of the middle two observations is the median, after arranging the observations either in ascending or descending order.

For Grouped Data:

Median =

Where L= Lower limit of median class

N= Sum of the frequencies

M=The cumulative frequency before the median class

F=frequency of the median class

C=length of the class

Properties of Median:

(i) Median is easy to understand and it can be computed from any kind of data even for grouped data with open-ended classes, but excluding the case when median falls in the open-ended class.

(ii) Median can also be calculated for qualitative data

(iii) The sum of absolute deviations taken about median is least

(iv) Median is a time consuming process as it is required to arrange the data before calculating the median.

(v) It is difficult to compute median for data set with large number of observations.

MODE:

Mode is defined as the value of the variable which occurs most frequently in the data set.

Grouped Data Mode =

Where l= lower limit of the modal class

F= frequency of the model class. F₁ and F₂ are the frequencies before and after the model class.

C= length of the class.

Properties of Mode:

i) Mode can be used as a central location for qualitative as well as quantitative data.

ii) If is not affected by extreme values

iii) If can also be used for open-ended classes

iv) It is difficult to find the mode, when a data set contains no value that occurs more than once (or) all items are having the same frequency.

Relation between Mean, Median and Mode.

· In case of a symmetrical distribution, mean, median and mode coincide

i.e Mean=mode=median

· If the distribution is moderately asymmetrical,

Mean-Median=(Mean-Mode)/3

Thus Mode=3 Median-2 Mean

Measures of Dispersion:

A measure describing how scattered or spread out the observations in a data set are

Range :

Range is defined as the difference between the value of the smallest observation and the value of the largest observation present in the distribution

Co-efficient of Range =

Properties of Range:

i) Range is simple to understand and easy to calculate

ii) Range is the quickest way to get a measure of dispersion, although it is not accurate.

iii) It is not based on all the observation in the data

iv) It is influenced by extreme values

v) Range cannot be computed for frequency distribution with open-end classes

Inter-Quartile Range:

Quartile Deviation: In range we used to calculate L-S terms. But in this case we leave the first 25% and last 25% terms to avoid the undue importance of extreme values.

So it means that we get Q₁ and Q₂ if we leave first and last 25% terms.

A. Inter-Quartile Range = Q₃-Q₁ and Semi inter quartile range =

B. Coefficient of Quartile Deviation:

Mean Deviation:

M.D.=

f= frequency of corresponding interval

N= is total no. of frequencies

(D)= deviations from median or mean or mode ignoring ± signs

Coefficient of Mean Deviations:

Coefficient of M.D.

A. Individual series:

M.D=