Is Data Analysis involves Statistical Process?

Statistics, Statistical Modelling,Statistical Analysis


We know what statistics is, I want to spend some time talking about the process of doing statistical analysis. The first step in doing any kind of statistical reporting is determining what it is you want to know. This may seem like a trivial step but it's actually one that professional pollsters and researchers invest a lot of time in.

Pollsters, for example, want to make sure that their questions don't have any bias, that is that they don't accidentally suggest an answer. For example, a question like, "If you knew that Senator Jones had been indicted for tax evasion, would you be more or less likely to vote for him?"

The first part of the question establishes the connection between the senator and crime in the minds of the person being asked the question that's going to indirectly influence their opinion and therefore their answer.

The second step in the process is determining the source of your data.

It often isn't possible to survey an entire population, even small states, for example, have populations in the hundreds of thousands, so you'll need to limit your research to a sample or subset of the whole population.

Coming up with a method for getting a sample that represents an entire population is the second major hurdle faced by statistical researchers. This is another area where they have to take great care to avoid introducing any kind of bias in the selection process. For example, if you wanted to get information about voter preferences in Michigan, it would be easy just to take your sample from people living in cities because they're easy to get to. The problem with that approach is that the opinions of people living in rural areas outside of large cities may be very different from the opinions of people living in large cities and those opinions with representing your results. Once you've collected your data, you're ready to start analyzing it.

What I've referred to here as "exploring" the data can mean a lot of different things. You could sort the data and look for outliers, that is to say, values that are unusually large or small. You can look to see if the values are evenly distributed or if they form groups or clumps. You could try creating graphs from the data to get a more visual perspective on the results. There are a variety of things you can do here, many of which we'll be talking about in later lectures. Now that we have a basic understanding of the data, we're ready to make specific choices about how the results should best be presented. That might be by looking at statistics like the average and standard deviation, with a chart or graph, or it might be with something called a frequency distribution.

At this point, the direction you go is going to be determined by a combination of the researcher's experience presenting results and the expectations of the people who are going to actually be using those results. Finally, you'll want to decide whether or not the results are statistically significant. For example, your sample might tell you that 51% of the voters prefer Candidate A over Candidate B but is that result significant? Do you have enough confidence in the results to say that Candidate A really is the people's preference?" Confidence" may sound like a kind of vague word but there are statistical techniques that will let us express our "level of confidence" in very precise, numeric ways.

Now that we have this process in place, you should keep in mind that it's going to be used in different ways in different situations. For example, a financial analyst may want to see the average daily sales for a company's office for the last month. The analyst has to define a specific question, i.e. "What are the office's average daily sales?" but the sampling method is already determined by the question - she's going to want all of the sales figures for the specific office. There really isn't any need to explore the data because the business has defined which numbers they're interested in. The managers of this company have decided that average daily sales are a number that's important to them and that's what they want to see in their reporting.

In business situations, you'll often omit the last step where you look for statistical significance. Corporate managers and executives will usually apply their personal judgment and experience in interpreting numeric results rather than trying to use specific statistical techniques. Now that we have a process in place, we're ready to start looking at the ways that the specific steps are implemented starting with determining our population and a sample. Thank You.

You Might Also Like

0 comments