STATISTICS
Statistics is
concerned with scientific methods for collecting, organizing, summarizing,
presenting and analysing data as well as withdrawing valid conclusions and
making reasonable decisions on the basis of such analysis.
Data: Data is collection of related facts,
observations or figures. A collection of data is data set and each observation
a data point .
Variables :
Discrete and Continuous :
A variable is a
symbol, such as X,Y ,H, x or y that can any of a prescribed set of values
called the domain of the variable. If the variable can assume only one value,
it is called a constant.
A variable that can assume any value between
the two given values is called a continuous variable, otherwise, it is called
discrete variable.
Example : The number N of children in a
family , which can assume any of the values 0,1,2,3 but cannot be 2.5 or 3.842 is a discrete
variable.
The height H of an
individual, which can be 67 inches, 65.5 inches or 66.234, depending on the
accuracy of measurement, is a continuous variable.
Tabulation : The arrangement of the raw data under
various heads in the form of a table is called tabulation.
Frequency : Number of
observations falling in a particular class is called the frequency of that
class
Cumulative
frequency : The cumulative frequency
of a particular class is the sum of the frequencies of this class and those
prior to it.
Frequency
Distribution Table : The table showing the class intervals
along with the corresponding class frequencies is know as “ Frequency Distribution Table “.
Length of
the class : The difference between
the upper and lower boundaries is called the Length of the Class
Mid
value : The average of lower and upper limit is
called “ mid value “ of a class interval .
Measures of Central Tendency:
It
gives us an idea about the concentration of the values in the central part of
the distribution. An average of statistical series is the value of the variable
which is representative of entire distributes
Types of Central Tendency:
1 Mathematical Average.
a)
Arithmetic Mean
b) Geometric Mean
c) Harmonic Mean.
2
. Positional Averages.
a)
Median
b)
Mode
Arithmetic
mean :
i)
Individual Series : A.M of
ungrouped date =
Sum of the items/Number of Items=
ii)
Discrete Series : If X1, X2,
……Xn are n distinct values with frequencies f1, f2,f3,
…….fn then
iii)
Continuous Series: If X1,X2,X3…..Xn
are the mid values and f1,f2,f3,….fn
are frequencies of a grouped data then
In the step deviation method A.M= where A= Assumed mean, C= width of the class
d=
Important
Results:
(i)
The algebraic sum of
deviations taken about mean is zero.
(ii)
For any two numbers a and
b, then their mean is (a+b)/2
(iii)
Every data set has only
one mean.
Merits of
Arithmetic Average :
(i)
Easy to understand and
easy to calculate
(ii)
If Provides good basis for
comparison.
(iii)
As every item is taken in calculation,
it is affected by ever item
Demerits of
Arithmetic Average :
(i)
It cannot be located
graphically
(ii)
A single observation can
bring , big change in the mean
(iii)
It is very difficult to
find the actual mean
(iv)
We cannot calculate the
mean for a data set with open ended classes.
Weighted
Arithmetic Mean:
The
weighted mean is calculated taking into account the relative importance of each
of the values to the total value
When the observations X1, X2,
X3,……Xn and the weights W1, W2, W3,…..Wn
are given to each observation, then weighted Arithmetic mean is given by
Combined
Mean: If X1 and X2
are arithmetic means of two series with m and n observations respectively, the
combined mean is
Geometric
Mean:
It
is useful when we have some quantities that change over a period of time.
(i)
Ungrouped data (individual
Series)
G.M= = where X1, X2, X3,……Xn
are n observations.
(ii)
Discrete Series:
G.M. = where f1,f2,f3,….fn
are frequencies and X1, X2, X3,……Xn
are n observations and is the sum of the observations.
Properties
of the Geometric Mean.
i)
G.M. is used in
calculating the growth rates.
ii)
If any observation is
zero, G.M becomes zero.
iii)
It is difficult to
calculate the nth root.
iv)
If a and b are two numbers
then their G.M. is .
v)
If any observation is
negative, G.M. is imaginary
Harmonic
Mean:
Harmonic
mean of a given series is the reciprocal of the arithmetic average of the
reciprocals of the values of its various observations.
i)
Ungrouped Data (Individual
Series): Let X1, X2, X3,……Xn be n
observations, then their H.M. =
ii)
Discrete Series:
H. M. =
Where n is the sum of the observations and
f1,f2,f3,….fn are frequencies of
the observations X1, X2, X3,……Xn
respectively
Properties
of H.M.
(i)
H.M. useful to calculate
speed and distance.
(ii)
If a and b are two
numbers, their H.M. is .
MEDIAN:
The median as the name suggests, is the middle
value of a series arranged in any order of magnitude.
For
Ungrouped Data:
(i)
If n is odd, th observation is the median, after arranging
the observations either ascending order of descending order.
(ii)
If n is even, then the
average of the middle two observations is the median, after arranging the
observations either in ascending or descending order.
For Grouped Data:
Median
=
Where
L= Lower limit of median class
N= Sum of the frequencies
M=The cumulative frequency before the
median class
F=frequency of the median class
C=length of the class
Properties
of Median:
(i)
Median is easy to
understand and it can be computed from any kind of data even for grouped data
with open-ended classes, but excluding the case when median falls in the
open-ended class.
(ii)
Median can also be
calculated for qualitative data
(iii)
The sum of absolute deviations
taken about median is least
(iv)
Median is a time consuming
process as it is required to arrange the data before calculating the median.
(v)
It is difficult to compute
median for data set with large number of observations.
MODE:
Mode
is defined as the value of the variable which occurs most frequently in the
data set.
Grouped
Data Mode =
Where l= lower limit of the modal class
F= frequency of the model class. F1
and F2 are the frequencies before and after the model class.
C= length of the class.
Properties of Mode:
i)
Mode can be used as a
central location for qualitative as well as quantitative data.
ii)
If is not affected by
extreme values
iii)
If can also be used for
open-ended classes
iv)
It is difficult to find
the mode, when a data set contains no value that occurs more than once (or) all
items are having the same frequency.
Relation
between Mean, Median and Mode.
·
In case of a symmetrical
distribution, mean, median and mode coincide
i.e
Mean=mode=median
·
If the distribution is
moderately asymmetrical,
Mean-Median=(Mean-Mode)/3
Thus Mode=3 Median-2 Mean
Measures of Dispersion:
A
measure describing how scattered or spread out the observations in a data set
are
Range :
Range
is defined as the difference between the value of the smallest observation and
the value of the largest observation present in the distribution
Co-efficient of Range =
Properties
of Range:
i)
Range is simple to
understand and easy to calculate
ii)
Range is the quickest way
to get a measure of dispersion, although it is not accurate.
iii)
It is not based on all the
observation in the data
iv)
It is influenced by
extreme values
v)
Range cannot be computed
for frequency distribution with open-end classes
Inter-Quartile
Range:
Quartile Deviation: In range we used to calculate L-S terms. But
in this case we leave the first 25% and last 25% terms to avoid the undue
importance of extreme values.
So it means that we get Q1
and Q2 if we leave first and last 25% terms.
A.
Inter-Quartile Range = Q3-Q1
and Semi inter quartile range =
B.
Coefficient of Quartile
Deviation:
=
Mean
Deviation:
M.D.=
f=
frequency of corresponding interval
N=
is total no. of frequencies
(D)=
deviations from median or mean or mode ignoring ± signs
Coefficient
of Mean Deviations:
Coefficient
of M.D.
A.
Individual series:
M.D=
B.
Discrete Series:
M. D =
Note:
dy is deviation of variable from X, M or Z ignoring ±
signs
Standard
Deviation:
S.D=
where x=x-; x2=
Coefficient
of standard deviation =
For a
symmetric distribution the relationship among Q.D., M.D. & S.D. is:
Q.
D. = 2/3 S.D.
M.D.
4/5 =S.D.
Q.
D. = 5/6 M.D.
M.D.
= 6/5 Q.D.
Q.
D.= 2/3 S.D.
S.D.
= 3/2 Q. D.
M.D.
= 4/5 S.D.
S.D.
= 5/4 M.D.
Correlation:
The
measurement of the degree of relationship between two variables is called
correlation.
Coefficient
of Rank Correlation:
The
relation ship between two variables which can not be measured directly can be
found by the coefficient of rank correlation.
Spearmann’s
Rank Correlation coefficient (r) =
Where
di is the fiff. In the ith rank of two quantities & n is the no. of
observations.
The
value of ‘r’ lies between -1 and +1
The
value of r is equal to 1 for a
perfect positive correlation.
The
value of r is equal to -1 for a
perfect negative correlation.
r is 0 for a complete absence of
correlation.
If
|r| < 0.2, the relationship is
‘negligible’.
If
0.2 < |r| < 0.4, the
relationship is ‘slight’.
If
0.4 < |r| < 0.7, the
relationship is ‘substantial’.
If
0.7 < |r| < 1, the relationship
is ‘very high’.