The Danish butcher production and retail chain, MeatMe, is considering whether to start production of dogfood and decides to start a 7 days test production and call the product BargainBone. For the fixating of price, MeatMe wishes to analyze the relation between the produced quantity (in kg.) and the total costs of production. It could be logical to assume that the more production in kg of any product, the higher the total costs of production. However, this relation can be altered by e.g. synergy effects, and MeatMe aims to understand the actual relation between these two parameters stating the following question:

 

Question 1: Is there a correlation between the production in kg of BargainBone and the total costs of production?

Let’s work on an answer to the question 1: The 7 days test production resulted in the following observations:

Visualizing can help to get an immediate intuition:

From the graph, we could assume that there is a linear correlation between the production of BargainBone and the total production costs. The higher the production of BargainBone, the higher the total costs of production. But we wish to be as exact as possible in our answer, so we decide to run a linear regression analysis testing the linear correlation between the two parameters.

 

First, we calculate our regression line:

 

\( \\displaystyle y = mx +b  \)

 

where m is the slope of the line and calculated by:

 

\( \\displaystyle m = \\frac {\\bar{x}\\bar{y}-\\bar{xy}}{\\bar{(x)}^2-\\bar{x}^2} \)

 

and where b is the intercept with the y axis where x = 0:

 

\( \\displaystyle b = \\bar{y} – m\\bar{x} \)

 

Let’s display a table from which we can derive these parameters

So, our m and b are:

 

\( \\displaystyle m = \\frac{(1,114.3\\times 17.8) – 20.509}{1.114^2 – 1,426,429} = 0.0036 \)

 

\( \\displaystyle b = 17.8 – 0.0036\\times 1,114 = 13.73 \)

 

Thus, our regression line becomes:

 

\( \\displaystyle y = mx +b \\Leftrightarrow \\qquad y=0.0036x + 13.73\)

 

The linear correlation between BargainBone production and the total costs

Is our line a good fit? Does it express the real relation between x (production in kg. of BargainBone) and y (total production costs)? Is there a linear correlation? These are the questions that we need to clarify. The coefficient of correlation, denoted as \( r^2 \), answers these questions, as it is a measure of the strength of the linear relationship between the two variables.

 

\( r2\) describes the percentage in the variation in the total costs of production that is described by the variation in the production of BargainBone. The perfect regression line that fits the observed data is a 100% fit. It is 1. So, we will say 1 minus the error in our line. So, let’s find the error in our line:

 

We find the percentage of the error by comparing the error of the line with total error from the mean of the y.

 

\( SE_{Line}\): The Squared Error of the Line describes error of our regression line compared to each of the observed data points. What is the distance from each observed data point to our line? What’s the error of our line?

 

 

\( \\displaystyle SE_{\\text{Line}}= \\sum _{i}^n(y_{i}-(mx_{i}+b))^2 \\Leftrightarrow \\sum_{i}^n(y_{i}-{\\hat {y}})^{2}\)

 

 

\( \\displaystyle SE_{\\hat{y}}\)  = The Squared Error of the mean y describes the total variation in y being the difference from each observed data point to the mean y.

 

\( \\displaystyle SE_{\\bar{y}} = \\sum_i^n(y_{1}-\\bar{y})^2+(y_{2}-\\bar{y})^2…(y_{n}-\\bar{y})^2 \\Leftrightarrow SE_{\\bar{y}} = \\sum_{i}^n(y_{i}-{\\bar {y}})^{2}\)

 

So, seing the relation between the error of our line and the total error expresses the percentage of variation in y that is NOT described by the variation in x:

 

\( \\displaystyle \\frac{\\sum_{i}^n(y_{i}-{\\hat {y}})^{2}}{\\sum_{i}^n(y_{i}-{\\bar {y}})^{2}} = \\frac{SE_{Line}}{SE_{\\bar{y}}}\)

 

So, now we have the parameters to fill in our formula for the \( r^2\):

 

\( \\displaystyle r^2 = \\frac{SE_{Line}}{SE_{\\bar{y}}}\)

 

Answer to Question 1:

Yes, there is a clear linear correlation between the production in kg. of BargainBone and the total costs of production. Our \( r^2  \) = 0.947 meaning that 94.7% of the variation in the total costs of production can be explained by the variation of the production in kg. of BargainBone.