Visualizing models 101, using R

◾️ ◾️ ◾️What‘re we doing?In this article we’ll be taking a look at visualizing some of the findings you’ve made with your model, like the marginal effects of a given explanatory-variable on our response-variable.

We’ll be making pretty plots like this:Who knew that if you receive no government support at all you’re forced into employment :OPreparing our data and modelWe have to do a little bit of house-keeping to prepare our data and models for proper visualization:dat$gov.

support <- exp(dat$gov.

support)dat$employed <- factor(dat$employed, levels(dat$employed)[c(2,1)])The first command applies the exp() transformation on our ‘gov.

support’ values, this is because the values are log-transformed by default in the raw data.

The reason we’re doing this is so that when we fit the model with ‘log(gov.

support)’ we get the same results as in the previous article but we get the added benefit of ‘back’-transforming our data which I will show later!The second command simply swaps the order of our categorical response-variable ‘employed’ so that our model-fits will default to fitting for ‘employed == “yes”’ even when not explicitly defining it.

This is important because now we can fit our model without using a “logical” response-variable which checks whether or not ‘employed == “yes”’ or not.

This means that while the base-model from the last article was created like this:fit.

1 <- glm(employed == "yes" ~ .

, data = dat, family = binomial)Now we define it like this:fit.

1 <- glm(employed ~ .

– gov.

support + log(gov.

support), data = dat, family = binomial)summary(fit.

1)Notice we don’t define a logical condition for our response-variable and we replace the new exponentially transformed ‘gov.

support’ with a log transformation of the variable to revert back to our initial model:As evident above we get the exact same summary as we did in the previous article before doing our “cleaning”.

The last thing I want to do is redefine our factor variables to be more clear:dat$zero.

young.

children <- factor(dat$young.

children == 0)dat$zero.

school.

children <- factor(dat$school.

children == 0)dat$zero.

children <- factor(dat$young.

children + dat$school.

children == 0)This means our final fit looks like this:glm(employed ~ foreigner + log(gov.

support) + age + school.

children + zero.

young.

children + zero.

school.

children + zero.

children + I(age^2) + foreigner:age + foreigner:zero.

children + age:school.

children + log(gov.

support):zero.

young.

children, family = binomial, data = dat)Now lets make some plots!Lets start by plotting what I find the most interesting, the relationship between “employed” and “gov.

support”.

I’m using an absolutely brilliant package called “sjPlot” for this:library("sjPlot")plot_model(full.

fit, type = “pred”, terms = c(“gov.

support”))Giving us this plot:Completely useless!.What’s going on here?.Well for our convenience ‘plot_model’ spits our the following error:This is exactly why we needed to transform the data!.Let’s try again:plot_model(full.

fit, type = “pred”, terms = c(“gov.

support [exp]”))This is pretty damn close to being useless as well, what’s going on now?Lets try taking a look at our model and how it tries to capture the relationships in the data.

There’s a few different ways to do this but lets just begin by looking at the coefficients of our full model:What’s the first thing you notice when looking at this?.For me it’s the factor-variable ‘zero.

young.

children’ having a coefficient of almost 13!.This isn’t necessarily a problem though but lets take a look at the confidence-intervals of the coefficients:confint(full.

fit)Right, this looks like trouble!.Take a look at the span on ‘zero.

young.

children’ this means that we’ll have huge confidence-intervals in our marginal-effect plots.

This is most likely why our plot has such a huge confidence-interval.

sjPlot can plot these as well:plot_model(full.

fit, transform = NULL, show.

values = TRUE, axis.

labels = “”, value.

offset = .

4)This makes it real easy to see that the factor variable ‘zero.

young.

children’ is quite problematic with regard to making any plots with any real confidence.

Lets take a little detour and create a more simple model without any variable-interactions, turns out this is what we get:simple.

fit <- glm(employed ~ foreigner + age + zero.

young.

children + zero.

children + log(gov.

support) + I(age^2), family = binomial, data = dat)summary(simple.

fit)A small increase in AIC which isn’t a good thing but it’s not too much and we gain a lot of simplicity by using this model instead!.(If we measure performance with BIC instead we actually find that this model is better due to BIC penalizing harder on complex models! 1047 vs 1068)Lets look at the new coefficients:plot_model(simple.

fit, transform = NULL, show.

values = TRUE, axis.

labels = “”, value.

offset = .

4)This look a lot better!.Let’s try making our plot once more using this model instead!plot_model(simple.

fit, type = “pred”, terms = c(“gov.

support [exp]”))Nice, something useful!.So obviously the probability of being employed is lower the more government support you’re entitled to.

This makes quite a bit of intuitive sense!.How about the difference between foreigners and non-foreigners?.Well plot_model makes this real easy to do as well, just add it as a term!:plot_model(simple.

fit, type = “pred”, terms = c(“gov.

support [exp]”, “foreigner”))This is a breeze!. More details

Leave a Reply