When creating data visualizations, the “ggplot2” package in R allows you to customize your data set, variables, color scheme, graph type, aesthetic layout, and more.
The package “ggplot2” in R is a framework developed to make your data visualizations customizable. This allows you to compose your graph using a variety of elements rather than only selecting from predefined options. Explore what ggplot2 offers by learning how it works, what types of professionals use it, and where to start with commonly used graphics.
When creating visualizations based on your data, it’s important to accurately represent your information in a way that is intuitive and easy to follow for people with various backgrounds. In R, a statistical programming language known for computing and graphics, you can choose from various packages that allow you to customize your analysis and data representations.
One of the most popular choices for data visualization is the “ggplot2” package, which is based on the “grammar of graphics.” This means the package divides components of graphic visualizations into key parts that allow you to customize each picture aspect. You can create multilayered graphics tailored to the specifics of your data set. The fundamental concept is that your plot combines your data set, aesthetic choices (e.g., color, size, shape), and geometry (e.g., bar chart, histogram, line plot).
Because ggplot2 is such a powerhouse for data visualizations, you can find data visualization professionals using the tool in any industry required to represent information in interpretable, aesthetically pleasing formats. For example:
Marketing specialists may use ggplot2 when designing new advertisement campaigns to present consumer buying habits and product market trends to their team.
Business intelligence analysts may use ggplot2 to design performance reports, which can help their organization analyze current processes and make adjustments to improve business strategies.
Developers may use data visualizations to generate ideas and facilitate team brainstorming sessions. Once ideas are generated, they may continue to use visualizations to illustrate top ideas and communicate across multiple departments within the company.
Data scientists may use ggplot2 to create visualizations to inform decision-making and derive insights across various fields, such as business, finance, research, health care, and city planning.
Creating your graph with ggplot2 involves seven composable parts that fit together to create a custom visualization: data, mapping, layers, scales, facets, coordinates, and theme. While more advanced visualizations will involve defining aspects of all seven layers, many have custom defaults that work for most data sets and outputs. Each graph will require you to specify your data, mapping information, and layers.
Your data layer involves the information you are using to construct your plot. You’ll likely get a better output if your data is clean and organized, so it’s recommended that you go through a few data management steps before you begin plotting.
#in this code, you are calling a data set “your data”
ggplot(data = yourdata)
Your mapping function will instruct your plot on how to map aesthetic variables onto geometric objects. In a scatterplot, this involves delineating which variable is the independent variable, plotted on the x-axis, and which is the dependent variable, plotted on the y-axis.
#in this code, you are mapping “rainfall” as the independent variable and “crop growth” as the #dependent variable
ggplot(yourdata, mapping = aes(x = rainfall, y = crop_growth))
Your layers convert the mapping into a visualization that represents your data. This includes displaying your data, such as points versus lines; whether you use raw data or convert it to alternate formats; and where you display the information on your plot.
#you are choosing to create a scatterplot with your x and y variables
ggplot(yourdata, aes(rainfall, crop_growth)) +
geom_point()
The scales of your graphic are the components of your visualization that make it easier to understand what the data is showing. This might include axes, legends, limits on plots, formatting of colors based on your variable, and so on. These elements help make your graph more easily interpretable to viewers and allow you to further curate the right image.
#you are choosing a predefined color scale: scale_color_viridis_c() will change the #color of points based on the crop type
ggplot(yourdata, aes(rainfall, crop_growth, color = crop_type)) +
geom_point() +
scale_color_viridis_c()
To create different graphs with subsets of your data to compare trends, you can split your data into different panels using specific facets formulas. For example, you might look at crop growth relative to rainfall by year, by type of crop, by geographic location, and more.
#you are creating subset plots based on the crop type and year
ggplot(yourdata, aes(rainfall, crop_growth)) +
geom_point() +
facet_grid(year ~ crop_type)
In some cases, you might want to alter the position of your graph or the scale of your variables. For example, you might want to look at the fixed interval on one of your axes or want both variable values to be shown on the same scale. The coordinate layer allows you to make these modifications to explore different visualization options.
#you are creating the plot on a coordinate plane that standardizes a one-unit length between the x and y variables
ggplot(yourdata, aes(rainfall, crop_growth)) +
geom_point() +
coord_fixed()
The theme is the final layer of aesthetic elements, allowing you to create color schemes, alter the location of your labels and titles, add background colors, and so on. Depending on your preferences and organizational requirements, you can play around with this layer to create visually appealing plots. This also allows you to create multiple versions of the same type of visualization.
You can specify many elements of your theme, including borders, spacing, color fills, font and size, text positions, and more. By exploring different options and removing predefined default options, you can go beyond traditional visualizations and take your plot to the next level.
#you are choosing “theme_minimal” as your default theme and adding a title with a specified #size
ggplot(yourdata, aes(rainfall, crop_growth)) +
geom_point() +
theme_minimal() +
labs(title = “Crop Growth vs Rainfall”) +
theme(plot.title = element_text(size = rel(2)))
While the list of visualizations available in ggplot2 is extensive, you can start by exploring a set of commonly used charts and graphs that fit most data types. These include:
Scatterplots: Plot data points individually on x versus y axis plane
Line graphs: Connect individual data points with the line of best fit
Histograms: Use bars to show the distribution of continuous data values
Box plots: Demonstrate the characteristics of a distribution (median, quartiles, etc.)
Bar charts: Show the frequency counts of categorical variables
Pie charts: Divide a circle into slices representing the proportions of a whole
When creating your data visualization, you might choose ggplot2 because of its intuitive and consistent syntax and extensive customization options. Because of its “grammar of graphics” and layer approach, you can decide how detailed you want your customizations to be, allowing you to modify your outputs for academic publications, industry presentations, personal use, and so on. For newer programmers, the default settings make ggplot2 feel more approachable. You can continue to integrate new controls and become more comfortable with the syntax. Additionally, if you want a standardized look and feel across your visualizations, you can reproduce the formatting using the same controls across your graphs.
Like any type of software, knowing the potential disadvantages can help you make informed decisions when choosing which visualization software to use. While ggplot2 offers built-in functionality, novice coders might find Tableau or Excel more approachable. If you opt for Tableau, you can integrate R and ggplot2 to enhance your visualizations over time. Be aware that you might have trouble recreating your plot in a different programming environment due to the specificity of how ggplot2 approaches plots.
When creating your visualizations, sometimes less is more. While you can go above and beyond with colors, segments, and aesthetic elements, it’s important to keep a few best practices in mind to ensure you’re communicating your insights effectively to your audience. When creating your graphic, consider the following:
Who will be viewing your visualization? Knowing your audience can help you choose the right level of detail. If you have a highly technical audience, you might opt for a more granular approach, while a general overview may be more appropriate for less technical audiences.
What type of data are you visualizing? Different representations may be more or less appropriate depending on whether you are working with categorical, continuous, time-series, or other data types.
What is the key information? Try to focus on your essential message before adding too many complicated elements. You want to keep your message clear and obvious for your audience.
Choosing ggplot2 for your visualizations allows you to customize your graphics beyond traditional visualization software. To explore how data visualizations might benefit your industry, you can take exciting Specializations on Coursera designed to give you a strong foundation in R, different package options, and customization techniques. To start, consider completing the Data Visualization & Dashboarding with R Specialization, offered by Johns Hopkins University.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.