Choosing Which Statistical Test to Use.
- April 26, 2020
- By Sanjeev Dubey
- 0 Comments
You can use many different tests in statistics.
Sometimes it can be quite difficult to know which the correct test to use is. This
article will talk about seven tests you are likely to use. Involving means,
proportions, and relationships. When you are trying to work out which is the
most appropriate test.
You should ask three questions:
1. What level of
measurement was used for the data we are analyzing?
2. How many samples
do we have?
3. What is the
purpose of our analysis?
I will now explain each of these questions
1). Data or level of measurement: Is
our data nominal or interval/ratio?
Nominal data is also called categorical,
qualitative or nonparametric. Examples of nominal data are color whether parts
are defective or not, or preferred type of chocolate. Nominal summary values
are usually stated as frequencies, proportions or percentages. The tests that
involve nominal data are: Test for a proportion, Difference of two proportions and
chi-squared test for independence. The other type of data is interval/ratio, also
called quantitative. Examples of interval/ratio data are daily sales figures
for coconut ties weight of peanuts or temperature. The most common summary
value for interval/ratio data is a mean. Tests that involve interval/ratio data
are: Test for a mean difference of two means - independent samples difference
of two means – paired and regression analysis. Ordinal data can be classified
with nominal or interval/ratio depending on the circumstances.
2). Samples: Next, we ask how many
samples are involved.
Is there one sample for which we are testing
the relevant statistic against a hypothesized value or are there two samples which
are being compared with each other or is the one sample but each observation
has a measure or score for more than one variable? The same sample is measured
twice. If we wish to compare a proportion or a mean against a given value, this
will involve one sample. If we're comparing two different lots of people or
things such as men and women or people from two different departments then we
would have two samples. If we have two sets of information on the same people
of things, we would say we have one sample with two variables. An example is
one set of days and information on how many coconut ties are sold and what the
temperature was. Alternatively - one set of people and information on their
gender and preferred type of chocolate.
3). Finally, we ask. What is the
purpose of the analysis?
We can be tested against the hypothesized
value comparing two statistics or looking for a relationship. Chi-squared test for
independence and regression are similar in that they are looking at the
relationship between two variables. The difference between them is in the kind
of data. If you would summarize the data in a table, we would use a chi-squared
test for independence whereas if you would put it on a scatter plot you would
use regression analysis. Here is an example for each of these tests. They
relate back or out other articles teaching about hypothesis testing. After each
description of the scenario pause the article and see if you can identify the
correct test before we tell you the answer. Helen is still selling coconut ties.
Example 1:
Sufficient nuts: Helen was concerned whether the
number of nuts was sufficient in her coconut ties. She took a sample of
twenty packets and found the weight of nuts in each packet.
1. Data: The weight was
interval/ratio data.
2. Samples: There was just
one sample of twenty packets of coconut ties.
3. Purpose: Helen was
comparing against a given value
Thus, the test she needs to use is Test for a
“Mean”.
Example 2: Prize tickets
In a promotional campaign, twenty percent of all
packs of coconut ties should include tickets for free prizes. Helen takes a
sample of fifty packets and finds that seven of them have winning tickets
1). Data: For each bar we are saying yes or
no, only to be lumped whether or not there is a ticket. We get a sample proportion of seven out of
fifty from this nominal data.
2). Samples: There is one sample of fifty
packets
3).Purpose: Helen is comparing the sample value
against a given value: twenty percent.
We conclude that the test she needs to use is tested for a “Proportion”.
Example 3: Bar longevity compared with nut-bars.
Helen thinks her coconut ties last longer than
the competition, nut-bars. She gets 36 people to eat one of each and records
their eating times.
1). Data: Helen collects time taken in
seconds so this is interval/ratio data.
2). Samples: There is one sample of thirty-six
people but with two scores for each person the time for the coconut tie and
the time for the nutbar.
3). Purpose: She is looking at whether there is a
difference in the amount of time taken for each of the bars.
Thus the test is “difference of two means,
paired sample”.
Example 4: Defective wrapping from two wrapping
machines
Helen thinks there is a difference in
performance between the two wrapping machines in her factory. She checks 200
bars from one machine and 150 bars from the other. For each bar, she is seeing
if the wrapping is satisfactory or not. She finds that ten out of two hundred
bars from the first machine and nine out of 150 bars from the second machine are
badly wrapped.
1). Data: The information for each bar is OK
or not ok. This is nominal data. It has been summarized as frequencies.
2). Samples: there are two independent samples one
sample from each of the two machines
3). Purpose: Helen is comparing the proportions
of the two samples.
We can see that the test is the “difference of two
proportions”.
Example 5: Do stickers help sales?
Helen is exploring whether having free stickers
makes a difference in sales. She has the sales figures for thirteen days when
she did offer free stickers and ten days when she did not.
1). Data: For each day Helen has a number or
value corresponding to the sales for that day. This is interval/ratio data. It
is summarized as a mean member of sales.
2). Samples: There are two samples one sample
for days with stickers and one sample for days without.
3). Purpose: Helen is comparing the average
sales figures for the two treatments.
We conclude that the test to use is...”Difference
of two means independent samples”.
Example 6: Are sales affected by temperature?
Helen wants to see if there is a relationship
between the daily temperature and sales of coconut ties. She has data on sales
and temperature for thirty weekdays of sales.
1). Data: Sales and temperature at both
interval variables.
2). Samples: There is one sample of thirty days
with two measures or scores for each day.
3). Purpose: Helen is interested in the relationship
between sales and temperature.
This leads us to decide that the test is “Regression”.
Example 7: Men, women, and chocolate preference.
Helen is thinking of selling dark chocolate,
milk chocolate, and white chocolate coconut ties. She thinks that men and women
might have different preferences with regard to type. She collects data from
fifty customers, noting down if they are men or women and asking them which
variety they prefer.
1). Data: Helen records the type of chocolate
and sex of a person. These are both nominal variables.
2). Samples: There is one sample of fifty
customers but with two measures or variables.
3). Purpose: Helen is looking at whether there
is a relationship variable.
Thus the test is “chi-squared test for
independence”.
0 comments