In statistical analysis we intent to avoid bias, as bias lead to results that are not representative for the population we are studying. To reduce bias, we would wish to apply a probability sampling if resources allow for it.
Sampling frame bias
Sampling bias is typically understood as inappropriate methods applied for the selecting of a certain type of sample. We use the wrong methods for the choosing of a sample. Sampling frame can be inappropriate and/or not representative of the population.
Example 1: Say we wish to understand certain conditions among our colleagues. We could draw a list of all employees and then select staffs from this list. This list is the sampling frame. The sample frame is the subset of the population, or the whole of the population, that we wish to study.
Example 2: Say we wish to study certain conditions among Norwegian adults and that we have legal access to a list of cell phone numbers of Norwegian citizens. It could turn out that a relatively large proportion of the Norwegian pensioners do not have cell phones. That means that the sample frame is incorrect, as there will be a so-called undercoverage of the pensioners in the sample.
Example 3: When a media, say that an online and printed magazine makes an opinion poll among their subscribers and take the results as representative for the whole country.
For example: A Spanish media publishes an article saying: “Spanish citizens prefer cheeseburgers rather than paella”. For the sake of this example, say that this media is called “Spanish Cheeseburger Lovers”.
Loosely stated: They have run a poll among cheeseburger lovers asking if they prefer cheeseburgers rather than paella. This sampling frame is obviously not representative for the whole of the Spanish population.
How do the individuals in the sample respond to the survey? That depends on how the questions, the poll or the survey is introduced to them. Problems related to this relation are called response bias.
Voluntary response bias
Reality shows, fan culture programs, call-in programs, mail-in surveys and other surveys in which there is a cost to participate usually are typically used by participants with strong views.
They are willing to pay and to participate in order to get their “voice out”. These kinds of surveys are typically over-covered with people who feel passionate about their views and under-covered with people who have different opinions but who are less passionate about them.
Another example of voluntary response bias are the polls for which everyone can vote or express their opinion the number of times they wish.
“Oh, one more of those salespersons calling. I really can’t take the time”. Maybe, this reaction rings a bell (for me it does (sorry!)). This is one of the mayor problems in polling: the difficulty in getting people to participate. There is no way that we can get these people’s opinions reflected in the survey, and therefore any poll or survey can contain non-response bias.
One question can be asked in different ways:
- Do you believe that it is reasonable to add some extra tax on meat and in this way contribute to the critical global climate situation?
- Do you believe meat consumption should be punished adding more tax to the prices, assuring extra income for the state?
The way a question is asked influences the response. In this case the persons with a less passionate feeling about the meat-taxing issue might be influenced on one way or the other through the way the question is asked. This is questionnaire bias.
Incorrect response bias
In statistics we work with datapoints, but we should always bear in mind that there is a quality to each point. The datapoint can be completely wrong compared to the context!
The datapoint can be a yes, when it is actually a no. Or it can be 10 when it is actually 5. Questions can be answered in a non-truthful way. There can be many reasons for the individuals not to answering truthfully, and this is called incorrect bias.
Size bias occur when a certain subgroup of a population is incorrectly represented as to its’ number of individuals in the sample. It can be over- or under-represented.
Example: What is the mean size as to number of employees in start-up companies based in Malaga, Spain?
Say we go to Malaga Startup Greenhouse office facilities where “a great proportion” of Malaga start-ups are housed within their first years of their start-up. We go to the entrance door and ask people who enter about the size of the company they work in.
Say that the situation is that 3 companies have +20 workers (+60 employees) and 15 companies have less than 3 workers (less than 45).
Chances are that half of the workers that we ask will answer that they are more than 15 workers. So, I might end up with a wrong conclusion stating that most of the Malaga startups have more than 15 workers. This is an example of size bias.
Or the marathon example: I feel intimidated whereas other runners are passing by me all the time. My sample size, in this case, include only the runners passing by me, because I don’t see the ones that don’t pass me. This is another of size bias.
Bias learning resources
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.