Think about it before you read the answer.
The best line is the one that minimizes the distance of all the data points to the line.
The correlation coefficient indicates the strength of the relationship between the independent and the dependent variable whereas the coefficient of determination (r-squared) explains to what extent the variance of the independent variable explains the variance of the dependent variable.
A correlation coefficient close to 1 indicates a positive relationship between the independent and the dependent variable and a coefficient of determination closer to 1 indicates a good fit of data to the predictive model.
Armed with this knowledge, we can create our first simple linear regression model in either Qlik Sense or QlikView.
Implementing Linear Regression in Qlik Recently, I stumbled upon this very interesting article that shows the nexus between teen pregnancy and poverty rate in America.
These facts are worth pondering about the reasons why teen pregnancy leads to a higher poverty rate: Only 38 percent of girls who have a child before the age of 18 get their high school diplomas by 22 Two-thirds of teen mothers who move out of their family home live in poverty, and a similar share receive public benefits in the first year of their child’s life Seventy-eight percent of children born to teenage mothers who never married and who did not graduate from high school live below the federal poverty level It’s a problem that we all should be aware of and if we can help in any way, we should at least try.
I was lucky enough to find a dataset around this on Pennsylvania State University’s statistics website, STAT462.
So, we’ll be using this dataset to create a simple linear regression model in Qlik Sense.
Go ahead and save it to your machine.
Here is a snapshot of the dataset: This dataset of size n = 51 is for the 50 states and the District of Columbia in the United States.
So, let’s look at the steps and I want you to follow along in Qlik Sense as we go through them.
Step 1: Create a Scatterplot Step 2: Calculate correlation coefficient Create a text and image chart using the below expression: Step 3: Calculate the coefficient of determination Create a text and image chart using this expression: Step 4: Calculate the slope Step 5: Calculate the y-intercept Step 6: Create a variable x with initial value = 0 This variable will allow us to change the independent variable value, birth rate (15 to 17), to predict the poverty rate.
Click on the variable option from the bottom left corner of the sheet editor: Step 7: Calculate the predicted Birth Rate for teenage group (15 to 17) As stated here, our Qlik Sense Linear Regression model matches the fitted line equation: Y = 1.
373X + 4.
267 At 0% poverty rate, teenage birth rate would be 4.
A one-unit change in the value of the independent variable equates to 1.
373 change in the value of the dependent variable.
What would be the teenage birth rate if the poverty rate is 15%?.Here’s the answer: I can now combine the power of associative engine to narrow down the list of States to predict the birth rate for a female age group 15 to 17 based on my selections using the poverty rate of 15%: Fabulous!.Don’t you love the power of Qlik?. Comparing our Results with a Model Built using Python Next, we will build a similar simple regression model in Python using the Pandas and scikit-learn libraries.
I want to compare the accuracy of the predictive model we created in Qlik Sense against one we will create in Python.
The output from our Python simple regression model matches the one in Qlik Sense.
Let’s compare the predictor value from our Qlik Sense linear regression model against the one we created in Python: End Notes We can create a simple regression model to show the “What-IF” scenario in Qlik Sense as long as we first validate that the relationship between the independent and the dependent variable is either positive or negative using a built-in correlation function to view relationship.
Besides, ensure that data is fit for modeling using the coefficient of determination (R-squared).
If a value is closer to 1, then our data is suitable for simple regression modeling in Qlik Sense.
Let me know your suggestions and feedback for this article in the comments section below.
About the Author Shilpan Patel – Co-Founder, Analyticshub.
io BI Professional with over 15 years of experience in Business Intelligence, data warehousing and various relational database management systems.
Extensive QlikView development experience with IT full life cycle business intelligence solution delivery.
Proficiently managed and mentored off shore and on shore resources for IT projects.
With incredible data growth — both structured and unstructured — businesses struggle to make sense of data.
Business discovery tools such as QlikView and Tableau offer new opportunities to those who want to help an enterprise make insight driven decisions.
That’s my passion and that’s my life long purpose.
You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.
adsbygoogle || ).