Introduction to building machine learning models in R with mikropml (CC124)

Publicado 2021-07-07
mikropml is a new R package for building machine learning models that was created by members of Pat's research group at the University of Michigan. In this episode of Code Club, Pat introduces why he thinks machine learning is the future of microbiome resarch, the framework that mikropml employs, and gives a demo of how to use the default parameters of mikropml's run_ml function. The data he uses is from a microbiome study his lab has published looking for biomarkers associated with colorectal cancer.

In this episode, Pat will use functions from the #mikropml R package and data handling functions from #dplyr in #RStudio. The accompanying blog post can be found at www.riffomonas.org/code_club/2021-07-07-mikropml-i….

If you're interested in taking an upcoming 3 day R workshop, email me at [email protected]!

R: r-project.org/
RStudio: rstudio.com/
Raw data: github.com/riffomonas/raw_data/releases/latest
Workshops: www.mothur.org/wiki/workshops

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/

0:00 Why machine learning?
5:59 Framework used by mikropml
12:55 Setting up data for use in mikropml
19:04 Running mikropml with defaults
28:35 Altering training framework
28:00 Recap

Todos los comentarios (17)
  • @Riffomonas
    What questions do you have about using machine learning methods?
  • @morgomi
    as a Turkish, I loved the package' name :d
  • @truemusicmedia
    very informative! very well done lesson. Thank you very much.
  • @yingdongli3433
    learning from your paper, hope the following video come soon.
  • Looking very forward to trying out mikropml! Perhaps this topic is covered in a future video I haven't seen yet, but it would be great if you could discuss dealing with highly imbalanced datasets (i.e. 450:50 control to case structure) when using these ML methods, or better yet show an example of how you typically deal with those types of datasets.
  • @user-ke2jx3rk4r
    Loving the videos, the energy and the help. I have tried to run mirkropml but i have a doubt. When looking at the number of samples in the results, it shows 41 samples with 28 predictors.... It does not make much sense as i had initially 203 samples. Is 41 referring to the samples used to test the model and not the training set? Thank you!!
  • @wapsyed
    Never thought I would watch Seth Rogen teaching ML in R 😅
  • @8bitgerman477
    These videos are great. Thank you, I just had one question. What are the implications of getting a higher AUC and accuracy on the test data than on the training data?
  • Hello, it is possible to use this package and do ML with 3 levels (values in your "srn")? Example, do we can have healthy, symptomatic, asymptomatic? Thank you very much
  • Hi, thank you for your introduction to mikropml. I am planning to implement the pipeline in my study. In your opinion, what is the minimum sample number to input into mikropml? Is n=20 samples would be enough? Or n should be around 100 samples?. Thank you once again for your time and our advice
  • @bedece1549
    thank you so much for the video, very interesting package. But I have a doubt, is necessary separate a 20% validate (in the 80%) when use a cross validation method? thanks for you help
  • Thank you so much for the channel and amazing explanation Please, I want to ask you about "Adaptive neuro fuzzy inference system" which package we can use to implement it in R. Thank you so much
  • @rishikeshdash12
    Sir, I have one doubt when we were using machine learning or deep learning in Microbiome data for predicting healthy or diseased what type of normalization we should perform with otu counts ? should we prefer clr or relative abundance? sir, i have microbiome data and few clinical parameters (vitamin-d level, womac score, age, year of pain) as features or you can say it input variables to predict two output variable as healthy or diseased so what type of normalization i should prefer for meta data? shall we use scale() function for all the features or different normalization for above features? thank you sir!
  • @rishikeshdash12
    Sir, Please suggest any book for ML in R and ML on microbiome using R. I want to learn and understand parameters used in ML model at basic level.
  • @nosaosawe3158
    What are your social media handles sir? I really love your works.