Emulating R regression plots in Python

Residual plotFirst plot that’s generated by plot() in R is the residual plot, which draws a scatterplot of fitted values against residuals, with a “locally weighted scatterplot smoothing (lowess)” regression line showing any apparent trend.This one can be easily plotted using seaborn residplot with fitted values as x parameter, and the dependent variable as y..lowess=True makes sure the lowess regression line is drawn..Additional parameters are passed to underlying matplotlib scatter and line functions using scatter_kws and line_kws, also titles and labels are set using matplotlib methods..The ; in the end gets rid of the output text <matplotlib.text.Text at 0x000000000> at the top of the plot 1..Top 3 absolute residuals are also annotated:2..QQ plotThis one shows how well the distribution of residuals fit the normal distribution..This plots the standardized (z-score) residuals against the theoretical normal quantiles..Anything quite off the diagonal lines may be a concern for further investigation.For this, I’m using ProbPlot and its qqplot method from statsmodels graphics API..statsmodels actually has a qqplot method that we can use directly, but it’s not very customizable, hence this two-step approach..Annotations were a bit tricky, as theoretical quantiles from ProbPlot are already sorted:3..Scale-Location PlotThis is another residual plot, showing their spread, which you can use to assess heteroscedasticity.It’s essentially a scatter plot of absolute square-rooted normalized residuals and fitted values, with a lowess regression line..Scatterplot is a standard matplotlib function, lowess line comes from seaborn regplot.. More details

Leave a Reply