Data Visualization with ggplot 2

ggplot2 will only use six shapes at a time.

By default, additional groups will go unplotted when we use this aesthetic.

One way to add additional variables is with aesthetics.

Another way, particularly useful for categorical variables, is to split our plot into facets, subplots that each display one subset of the data.

To facet our plot by a single variable, use facet_wrap().

The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”).

The variable that we pass to facet_wrap() should be discrete:To facet our plot on the combination of two variables, we add facet_grid() to our plot call.

The first argument of facet_grid() is also a formula.

This time the formula should contain two variable names separated by a ~:A geom is the geometrical object that a plot uses to represent data.

People often describe plots by the type of geom that the plot uses.

For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on.

Scatterplots break the trend; they use the point geom.

On the other hand, we could set the linetype of a line.

geom_smooth() will draw a different line, with a different linetype, for each unique value of the variable that we map to linetype:Here geom_smooth() separates the cars into three lines based on their drv value, which describes a car’s drivetrain.

One line describes all of the points with a 4 value, one line describes all of the points with an f value, and one line describes all of the points with an r value.

Here, 4 stands for four-wheel drive, f for front-wheel drive, and r for rear-wheel drive.

If we place mappings in a geom function, ggplot2 will treat them as local mappings for the layer.

It will use these mappings to extend or overwrite the global mappings for that layer only.

This makes it possible to display different aesthetics in different layers:These are some of the basic features which we can use to generate graphs using ggplot2.

We can generally use geoms and stats interchangeably.

For example, we can create a plot using stat_count() instead of geom_bar():This works because every geom has a default stat, and every stat has a default geom.

This means that we can typically use geoms without worrying about the underlying statistical transformation.

We might want to draw greater attention to the statistical transformation in our code.

For example, we might use stat_summary(), which summarizes the y values for each unique x value, to draw attention to the summary that we’re computing:There’s one more piece of magic associated with bar charts.

We can color a bar chart using either the color aesthetic, or more usefully, fill:Note what happens if we map the fill aesthetic to another variable, like clarity: the bars are automatically stacked.

Each colored rectangle represents a combination of cut and clarity:The stacking is performed automatically by the position adjustment specified by the position argument.

position = “dodge” places overlapping objects directly besideone another.

This makes it easier to compare individual values:Coordinate systems are probably the most complicated part of ggplot2.

The default coordinate system is the Cartesian coordinate system where the x and y position act independently to find the location of each point.

There are a number of other coordinate systems that are occasionally helpful:coord_flip() switches the x- and y-axes.

This is useful (for example) if we want horizontal boxplots.

It’s also useful for long labels — it’s hard to get them to fit without overlapping on the x-axis:coord_polar() uses polar coordinates.

Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart:We have seen much more than how to make scatterplots, bar charts, and boxplots.

We learned a foundation that we can use to make any type of plot with ggplot2.

To see this, let’s make a code template:ggplot(data = <DATA>) +<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>),stat = <STAT>,position = <POSITION>) +<COORDINATE_FUNCTION> +<FACET_FUNCTION>Our new template takes seven parameters, the bracketed words that appear in the template.

The seven parameters in the template compose the grammar of graphics, a formal system for building plots.

The grammar of graphics is based on the insight that we can uniquely describe any plot as a combination of a dataset, a geom, a set of mappings, a stat, a position adjustment, a coordinate system, and a faceting scheme.

We could use this method to build any plot that we imagine.

In other words, we can use the code template that we’ve learned in this article to build hundreds of thousands of unique plots.

Reference: R for Data Science by Hadley Wickham and Garrett Grolemund#R #DataScience #Visualization #ggplot2.

. More details

Leave a Reply