Is Data Analysis involves Statistical Process?
- April 26, 2020
- By Sanjeev Dubey
- 0 Comments
We know what statistics is, I want to spend
some time talking about the process of doing statistical analysis. The first
step in doing any kind of statistical reporting is determining what it is you
want to know. This may seem like a trivial step but it's actually one that
professional pollsters and researchers invest a lot of time in.
Pollsters, for example, want to make sure that
their questions don't have any bias, that is that they don't accidentally
suggest an answer. For example, a question like, "If you knew that Senator
Jones had been indicted for tax evasion, would you be more or less likely to
vote for him?"
The first part of the question establishes the connection between the senator and crime in the minds of the person being
asked the question that's going to indirectly influence their opinion and
therefore their answer.
The second step in the process is determining
the source of your data.
It often isn't possible to survey an entire
population, even small states, for example, have populations in the hundreds of
thousands, so you'll need to limit your research to a sample or subset of the
whole population.
Coming up with a method for getting a sample
that represents an entire population is the second major hurdle faced by
statistical researchers. This is another area where they have to take great
care to avoid introducing any kind of bias in the selection process. For
example, if you wanted to get information about voter preferences in Michigan, it
would be easy just to take your sample from people living in cities because
they're easy to get to. The problem with that approach is that the opinions of
people living in rural areas outside of large cities may be very different from
the opinions of people living in large cities and those opinions with representing your results. Once you've collected your data, you're ready to
start analyzing it.
What I've referred to here as "exploring"
the data can mean a lot of different things. You could sort the data and look
for outliers, that is to say, values that are unusually large or small. You can
look to see if the values are evenly distributed or if they form groups or
clumps. You could try creating graphs from the data to get a more visual
perspective on the results. There are a variety of things you can do here, many
of which we'll be talking about in later lectures. Now that we have a basic
understanding of the data, we're ready to make specific choices about how the
results should best be presented. That might be by looking at statistics like
the average and standard deviation, with a chart or graph, or it might be with
something called a frequency distribution.
At this point, the direction you go is going to
be determined by a combination of the researcher's experience presenting
results and the expectations of the people who are going to actually be using
those results. Finally, you'll want to decide whether or not the results are
statistically significant. For example, your sample might tell you that 51% of
the voters prefer Candidate A over Candidate B but is that result significant?
Do you have enough confidence in the results to say that Candidate A really is
the people's preference?" Confidence" may sound like a kind of vague
word but there are statistical techniques that will let us express our
"level of confidence" in very precise, numeric ways.
Now that we have this process in place, you
should keep in mind that it's going to be used in different ways in different
situations. For example, a financial analyst may want to see the average daily
sales for a company's office for the last month. The analyst has to define a
specific question, i.e. "What are the office's average daily sales?" but
the sampling method is already determined by the question - she's going to want
all of the sales figures for the specific office. There really isn't any need
to explore the data because the business has defined which numbers they're
interested in. The managers of this company have decided that average daily
sales are a number that's important to them and that's what they want to see in
their reporting.
In business situations, you'll often omit the
last step where you look for statistical significance. Corporate managers and
executives will usually apply their personal judgment and experience in
interpreting numeric results rather than trying to use specific statistical
techniques. Now that we have a process in place, we're ready to start looking
at the ways that the specific steps are implemented starting with determining
our population and a sample. Thank You.
0 comments