Chi-square statistics in research for data analysis
Chi-square
statistics in research for data analysis
In this blog, I will explain to you what is a chi-square test in a more A clear way and how it can be used for data analysis.
Things don't generally turn out the way in which you expect in statistical
insights. There might be a shrouded predisposition in the decisions individuals
make or possibly the information are not made equally. We use a unique
statistical test called a chi-square test to address the expected vs the unexpected.
It is a unique sort of test that manages frequency of data rather than means as
in other statistical tests.
Chi-square test is often determines
whether to retain the null hypothesis or the problem of the study. If you have
two categorical variables in your data and you want to test the relationship
between the two, then chi-square test serves the purpose. For any data
analysis, the important thing is to formulate the research plan (test
statistic, significance level). It should describe how to use the data to
accept or reject the null. Suppose, if you wish to conduct a chi-square testing
problem to check the independence of two categorical variables, then the
following are the main requirements for the analysis:
1.
Degrees of freedom
2.
Expected frequencies
3.
Test statistic
4.
P-value
Let me explain you what exactly the chi-square does
and how it can be used for data analysis by means of an example.
I love to watch horror movies. With lot
of curiosity, I once inquired some of my classmates if they like to watch
horror movies too. So, I gathered the data so that I could investigate it and
identify some patterns. And the data I got is:
Like to watch horror movies
|
||
Yes
|
No
|
|
Women
|
32
|
38
|
Men
|
30
|
12
|
Total
|
62 (55.4%)
|
50 (44.6%)
|
By looking at the data, it would seem that both men and women watch
horror movies in equal proportion. However, if you look closely, it is not!
This is the place where bias places a significant role. This situation lead me
to analyse the data for statistical significance.
In this case, there are no mean values to work with!
Well, the data is purely categorical in nature, so I should use a test
which deals with count data instead of mean values. In order to test the
statistical significance for this situation, I would adapt the most widely used
chi-square test like other test like t-test and F-test for means. The problem
of claim or the problem statement is:
Null Hypothesis: There is no significant difference between the movie
preferences and gender.
Alternative hypothesis: There is a significant difference between the
movie preference and gender.
And from Agresti (2002), the chi-square test statistic can be
represented as
Now comes a question! What is the expected mean here? How do you
calculate?
Before that, we need to frame the null hypothesis stating that the
participants who love to watch horror movie are independent of
gender. Let us calculate the expected frequencies for the computation purpose.
The chance of a woman who likes to watch horror movie (Women-yes) is
(70/112) * (62/112) = 38.75. Likewise, other expected frequencies are calculated
and found to be Women-no = 31.25, Men-yes = 23.25, and Men-no = 18.75.
Thus, the chi-square value will be 7.02 based on the formula. Next, we
have to take a decision whether it is statistically significant or not. For
this, we need to compare the value with the critical value of the distribution
with the corresponding degrees of freedom. Degrees of freedom is calculated as
(no.of rows -1) * (no.of columns -1). If the calculated value exceed the
critical, then we conclude there is a lack of independence. Thus, for this
horror movies example, our calculated value is higher than the critical value
with 1 degrees of freedom with 5% level of significance, leading us to reject
the null hypothesis (i.e) Horror movie liking is not independent of gender.
Main use of the chi-square statistic
is to test the statistical significance between the observed and the expected
frequencies and it is applicable only when the data is nominal in nature. Chi-Square test is similar to the non-parametric Kolmogorov test. Apart
from this, chi-square test have certain limitations: If the expected values is
less an 5, then chi-square test may lead to invalid results. In addition, if
there is a small sample size, chi-square test will not provide reasonable
results.
Let’s look at situations where the
Chi Square Test is useful for data analysis.
- A marketing
company wants to identify the relationship between the customer’s
geographical location and their brand preferences. In such case,
chi-square plays an important role and based on the value of statistic,
the company will develop their marketing strategy to different locations
to make profits.
- The
Chi-square test will be helpful for data analysisto test the homogeneity
or independence between the categorical variables, or to test the
goodness-of-fit of the model considered.
- It
has the flexibility in handling two or more groups of variables. And it is
used in various fields such as research field, marketing, Finance and Economic,
Psychology, Medicine, etc.
- It
is a distribution free test or simply it is a non-parametric test used for
categorical data and it is more robust with respect to the distribution of
the data.
- It
doesn’t require mean or variance like in other test statistics such as
t-test, F-test, ANOVA, etc.
- It
is easy to compute and a detailed information can be obtained with this
test and it is easily carried out in software like R, SAS, SPSS, etc.
- The
main application of chi-square statistic could be found in the medicine
field. If the researcher wants to identify the performance of a drug with
control group, then chi-square test will satisfies the needs. Likewise,
there are many areas still utilizing the omnibus test statistic chi-square
for identifying the relationship between two categorical outcomes.
References:
1. Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley,
New York.
2. Kateri, M. (2014). Contingency Table Analysis. Springer.
3. Voinov, V, Nikulin M, Balakrishnan N (2013). Chi-squared Goodness-of-fit
Tests with Applications, Elseiver.
Comments
Post a Comment