Testing for significance with microbiome data on individual taxa using R (CC122)

Published 2021-07-02
Testing for significance across microbial taxa is a critical tool for analyzing microbiome data. Pat will show how he uses the wilcox.test with tidy data to compare the relative abundance of bacterial taxa (e.g. genus, OTUs, ASVs). After correcting for multiple comparisons using p.adjust, he shows how he would visualize the relative abundance of significant taxa across individuals along with an indicator of the median and intraquartile range.

In this episode, Pat will use data handling functions from the tidyverse including #nest, unnest, map, and #tidy as well as #wilcox.test in RStudio. The accompanying blog post can be found at www.riffomonas.org/code_club/2021-07-02-wilcox.tes….

If you're interested in taking an upcoming 3 day R workshop, email me at [email protected]!

R: r-project.org/
RStudio: rstudio.com/
Raw data: github.com/riffomonas/raw_data/releases/latest
Workshops: www.mothur.org/wiki/workshops

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/

0:00 Introduction
5:19 Testing for significance by SRN status
14:26 Visualizing differences in relative abundance
23:33 Improving appearance of figure
28:35 Recap

All Comments (21)
  • @Riffomonas
    Have you used the set of map functions in the past? What questions do you have about these functions?
  • @revmohamed
    Thanks a lot for all your videos - very helpful!!
  • @nicola84palm
    I am a bioinformatician and lover of the tidyverse and your videos are excellent!!
  • @aleonflux1138
    Another great tutorial, Pat. Your comment re the non (or less-than-ideal) applicability of a reductionist approach to microbiome analysis gave me the inspiration I needed for an up-coming lab group presentation.
  • @signomar
    You are a person from which I can learn
  • Thank you so much! What is the equivalent for testing for significance of individual taxa in a phyloseq object? For some reason it seems like too much trouble shuffling around types of data. Also, do you have or plan on doing a SIMPER analysis tutorial? Thanks again!
  • @bridget9926
    Hi Pat, I'm confused as to when I should adjust p-value... Based on your video I should use it when making multiple comparisons. Does this mean I should also use adjust my p-values when looking at alpha diversity between two groups? Thanks.
  • @rupalhatkar4695
    The videos are super useful!! Could you please also post videos of cancer genomics data analysis? For example, analyzing and making copy number plots, structural variants (i.e. circos plot), etc? Thank you!!!
  • Hi, Thank you very much for the code. I am new to R,. If I want to add transparent box plots with whiskers to the plot, which code do I need to add? Thanks
  • @mikhaeldito
    Thanks! I LOVE your content. Would you be interested in sharing your top tips in QC- and preprocessing microbiome data? Honestly, I am still confused whether I should rarefy my 16S data or not.
  • Hi Pat, I really enjoy your videos and I have learned a lot in the past few weeks. I have a question. Is it always necessary to do the correction for multiple comparisons? In my data, (16s rRNA gene sequences for soil) I get significant genera between samples, however, I get none after doing the correction for multiple comparisons. Could there be false negatives after correction? what do you suggest I should do? Thanks in advance..
  • @wmavila_14
    Hey Pat! Thanks for this awesome video! I'm trying to identify significant differences in genera among mice from four different groups. At 6:37, you mentioned another video on the Schubert dataset with three groups, which seems relevant to my analysis. Could you kindly share the link to that episode? Much appreciated!
  • Great video once again thanks! Is there any video where you show how to make a bubble plot with relative abundance of OTU as the size of bubbles. Like in this graph to have in y axis the level of taxa? Thanks I am learning a lot with your videos!
  • Great content and tutorials. Enjoyed every part of this video. I wonder how I can visualize a number of different plant lifeforms from two forest types (Primary and Secondary) across different elevational gradients. What would be the appropriate test to validate these patterns of diversity within these lifeforms. Any feedback or examples would be much appreciated. Thanks again.