Adding lines and asterisks of statistical significance on a figure with ggplot2 (CC093)

Published 2021-04-16
Have you ever wondered how to add lines with asterisks to denote statistical significance on a plot you've made with ggplot2? In this episode of Code Club, Pat will walk you through performing a series of statistical tests with kruskal.test and pairwise.wilcox.test and how to use geom_line and geom_text to draw the lines and stars.


Pat will use RStudio and base functions including #kruskal.test and #pairwise.wilcox.text and the #geom_line and #geom_text functions from the #ggplot2 package and other packages from the tidyverse. The accompanying blog post can be found at riffomonas.org/code_club/2021-04-16-testing-signif…

Do you have a figure that you would like to receive a critique or help improving? Let me know and I'd be happy to arrange a guest appearance!

If you're interested in taking an upcoming 3 day R workshop, email me at [email protected]!

R: r-project.org/
RStudio: rstudio.com/
Raw data: github.com/riffomonas/raw_data/releases/latest
Workshops: www.mothur.org/wiki/workshops

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/

0:00 Introduction
3:23 Running kruskal.test and pairwise.wilcox.test
10:02 ggplot2 figures can be saved as objects
11:13 Drawing line segments to denote comparisons
13:57 Adding text to line segments
16:09 Critique of figure
19:21 Conclusion

All Comments (21)
  • @Riffomonas
    Anyone figure out the challenge I gave in the video?
  • @11mgarrard
    You are simply fantastic! I thoroughly enjoy each and every one of your videos and have applied several tips into graphics within my PhD chapters. Thank you for being such an influential resource. I hope this channel rewards you with lots of compensation and opportunity. After a while I will have to acknowledge you in the thesis. xx
  • @aelhamamy
    Sir, I learned a lot about ggplot from you today, thanks a lot!
  • @fburton8
    I think there's a useful distinction between P<0.05 and P<0.01, but agree that the smaller P values are not worth distinguishing (at least in biology). That would mean 1 or 2 stars only in plots. The size of the difference is important too, of course, and maybe we shouldn't be using stars at all if two measures differ by less than 1% say and such a difference is unimportant in a biological context, even though the statistics says it's 'significant'.
  • @misskana5
    Thanks a lot, this video was super helpful and to the point! :)
  • @hanswurst4728
    I've seen plots with these type of "significance annotations" quite often when it comes to group differences but as you pointed out they look kinda messy if you're looking at more than say four groups. I think a more eye-pleasing solution is the use of (if necessary, bootstrapped) CIs. Slap them in the backgroud with geom_errorbar() and if some group's CIs overlap, there is no statistically significant difference in means or medians or whatever you've constructed them around. Just don't use theme_minimal or void because some horizontal gridlines come in handy if you're looking at a multitude of CIs.
  • Excelent video. Professor, I have a little problem, look at the maximum value of y in your graph (30) and the respective divisions of it (intervals of 10 units). Additionally, in your graph you have a blank space at the top that allows you to add the line to compare the categories without modifying the maximum value of y, and the division of the same. In my case, since I don't have that upper blank space, when I add the line to compare the box plots, it distorts the scale of the y axis and increases my value on the y axis. If I set the y-limits, then it removes the comparison lines from the boxplot and doesn't show them. Any ideas to fix it. I just want some extra white space at the top of the chart that allows me to insert the lines without changing my max y value.
  • @fburton8
    These videos are really useful! The problem I'm grappling with at the moment is positioning text in a plot relative to the axes and relative to data. I would like an automated way rather than doing it manually for each plot. One case is putting significance stars (or single star! :) a fixed distance above the top bar of an errorbar (mean+-SE). Here both x and y depend on the data. The other case is placing annotations a fixed fraction up/down from the bottom/top of the plot, e.g. 5%, independent of plot size. Here the x position depends on the data, but the y position varies with the y-axis limits. I don't know if you "do requests", but if you have time to cover this in a future video I would be extremely grateful!
  • @KN-tx7sd
    Hi Pat, Thanks. If we want to do the same for multifactor grouped bar plots can you please describe how to add ANOVA significance (either as numerical values or as asterisks) like the one you showed for this plot?
  • @ieliemielio
    Hey Pat, what do you do when the factor is significant but none of the levels differ in the pair wise tests due to the correction for multiple comparison?
  • Hey Pat, thanks for the great video! I tried it and it worked very well. Can you maybe explain how this works with the function facet_grid? I want to add one line for all grids. This would be very helpful! :)
  • Thank you so much! As usual, your video was useful. I have à little problem writing automatically normality test result (Shapiro test) onto qqplot graphic. any help please. thinks
  • @PA_hunter
    Would be interesting to see how you make this reproducible and in a way that would adjust automatically to different data :)
  • @erikacroft7056
    Hello great video, I am trying to be it presently but I keep an error Error: unexpected ',' in: "hypopre + geom_line(data= hypopre(x=c(2, 3), y=(16," when I remove the "," Error: unexpected symbol in: "hypopre + geom_line(data= hypopre(x=c(2, 3) y" is it because I have categorical names as my x labels and not numerical ?
  • @labratatouille
    Hey Pat, this link doesn't work. https://www.riffomonas.org/code_club/...​.