How to add maps to a ggplot2 figure in R (CC264)

7,537

143 0

Published 2022-11-10

Pat shows how to add maps to his ggplot2 generated heatmap in R to better display the country boundaries across the world. Along the way he'll stylize the figure and work through some problems with pushing his changes to GitHub. The overall goal of this project is to highlight reproducible research practices using a number of tools. The specific output from this project will be a map-based visual that shows the level of drought across the globe. The website we generate in this episode can be found at www.riffomonas.org/drought_index.

You can find my blog post for this episode at www.riffomonas.org/code_club/2022-11-10-map_backgr….

#case_when #R #github-actions #snakemake #Rstats #github

Support Riffomonas by becoming a Patreon member!
www.patreon.com/riffomonas

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in taking an upcoming 3 day R workshop be sure to check out our schedule at riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/

0:00 Introduction
3:17 Creating world map with country outlines
8:19 Removing Antarctica and modifying lines
10:01 Putting world map under heatmap
12:42 Improving appearance of figure
13:37 Resolving merge conflict on push
17:19 Final figure

All Comments (20)

@rayflyers 5 months ago

Hey, Pat. Happy 2024! We missed you in 2023. I hope you're okay. I started teaching an R class last year, and I always recommend your videos to my students. Best wishes!
@alexandreloureiro5197 1 year ago

Hi Pat, I just wanted to let you know how much I enjoy your work and how valuable it’s been to me. I’ve literally been binging your episodes and I can honestly say I’ve become a better R user because of you. Thank you, and keep up the good work!
1 year ago

GitHub Actions and Snakemake are fantastic tools. Thank you very much for this video series! I've learn so much <3
@Pvillanueva13 1 year ago

Thanks for these videos! I was looking for an introduction to Snakemake that starts from scratch and this was the perfect walkthrough. About the conflicts you were running into: something I've seen pretty often is deploying the webpage based off a separate branch. You can set up an action to run whatever workflow to render your webpage documents and then send it to a different branch. Then, you change the your settings to target that particular branch. The advantage of doing it this way is that you prevent the situation you ran in to by keeping the output of the pipeline (the webpage and figure) separate from your code. Then, if you want to make changes to your code, you don't have to worry about pulling down all the revisions resulting from pipeline runs. It's also not an issue here, but it also avoids the situation where you're working on a team and everyone is generating their own outputs and everyone's repo gets out of sync. The action I use is peaceiris/actions-gh-pages. I add a rule to put all of the webpage files into a docs folder, which I target with the action. Maybe a little overkill for this simple website, but this workflow is extensible to more complicated websites (and dovetails nicely with Quarto webpages). You can see my implementation of your project here: https://github.com/pommevilla/drought_index. Another comment - you use `snakemake -c 1 ...` to run the workflow, and you've mentioned before that you designed the workflow to work with one processor. Snakemake actually determines which rules can be run together based on the DAG. Rules run as soon as their dependencies are completed, so if a rule doesn't have any (for example, leaf nodes in the dag), then they can run right away. In my modified workflow (see DAG on the README on my repo), there are 4 child nodes, so I could technically call `snakemake -c 4 ...` run those four jobs in parallel. Also, when `get_all_archive` runs, it can use one of the clusters to run one of its two dependencies instead of waiting for the single processor to open up. I'm not sure how much runtime gains you'd gain here since the biggest chokepoints are the downloads and reading the dly files, but it's something to keep in mind. Again, thank you so much for these videos! I learned a lot of good stuff here, and I'm looking forward to future videos.
@niceday2015 1 year ago

Hello my dear Pat, happy new year! Hope to see you soon online! Best wishes
@oluwafemioyedele 1 year ago

Another great tutorial @pat, thank you for always releasing great tutorial!!!
@IarukaSkYouk 1 year ago

sir, you are so amazing. I am learning alot from your channel thank you for sharing your knowledge to the community!
@mabenba 1 year ago

It's been quite a long vacation, Pat. Come back, we miss you.
@bassamsaleh8034 1 year ago

He hasn't make videos for the last 5 months, his videos were very good with a lot of useful tips and trick and workflows. I hope he's okay and doing well.
@haraldurkarlsson1147 8 months ago

Pat, I am not sure where this would fit but since you are dealing with large datasets in your climate series then you are probably already familiar with the arrow and duckdb packages. The former allows you to work with larger-than-memory datasets in R. One of the main drawbacks of R is that it loads everything into memory and can thus be slow. arrow (which works with a bunch of different languages - Python, Rust, Mathlab and so forth - however, is similar to data.table (in R) but much faster. The key is that arrow uses a data structure (parque files) that works much more efficiently than the normal - row-wise data structures (e.g., csv). Duckdb is a structured database that lives on local drives (no need for cloud storage even for large files) and is quickly gaining ground. Both these programs have R versions (API?) and are excellent for big data. I would love to see you cover these. Thanks, H
@caseyj1144 1 year ago

Hi Pat - just popping in to say you're missed! I hope all is well with you :)
@PA_hunter 1 year ago

Would be really cool if you could show approaches in R that implement the most accurate maps we have today (perhaps Winkel Tripel or AuthaGraph).
@haraldurkarlsson1147 1 year ago

Nice videos as usual. Since you are playing around with different programs I was wondering if you had looked at imagemagick? As far as I can see it can do amazing things both inside and outside R. I would love to learn more about it other than the rudimentary stuff I know. I hope you are willing to explore it and do a video on it. Thanks!
@KamalSingh-dn7gv 1 year ago

Hi Pat. You have fantastic episodes about coding. Thank you. However, we the scientists use a lot of IC50/EC50 computations. Would it be possible to do an episode on this topic? Maybe using drc library from R. Thanks again - Kamal
@AnkitKumar-xh4eh 1 year ago

Hey man! Why are you not creating more video, I really appreciate what you are doing
@mabenba 1 year ago

As always very amazing content! Thank you very much Patt! Can you make a video about making publication quality tables in R?
@mahatmaalimibrahim6631 1 year ago

What a skill! fantastic really. Professor may you please do a visualization project using the sf package?. Thank you.
@fourlokody 4 months ago

Hi Pat! how did you set up R in VScode? seems to be a process where many(me) get tripped up. thanks!
@musicspinner 1 year ago

What's next for Prof. Schloss and the Code Club?