Introduction to building machine learning models in R with mikropml (CC124)

5,123

215 0

Publicado 2021-07-07

mikropml is a new R package for building machine learning models that was created by members of Pat's research group at the University of Michigan. In this episode of Code Club, Pat introduces why he thinks machine learning is the future of microbiome resarch, the framework that mikropml employs, and gives a demo of how to use the default parameters of mikropml's run_ml function. The data he uses is from a microbiome study his lab has published looking for biomarkers associated with colorectal cancer.

In this episode, Pat will use functions from the #mikropml R package and data handling functions from #dplyr in #RStudio. The accompanying blog post can be found at www.riffomonas.org/code_club/2021-07-07-mikropml-i….

If you're interested in taking an upcoming 3 day R workshop, email me at [email protected]!

R: r-project.org/
RStudio: rstudio.com/
Raw data: github.com/riffomonas/raw_data/releases/latest
Workshops: www.mothur.org/wiki/workshops

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/

0:00 Why machine learning?
5:59 Framework used by mikropml
12:55 Setting up data for use in mikropml
19:04 Running mikropml with defaults
28:35 Altering training framework
28:00 Recap

Todos los comentarios (17)

@Riffomonas hace 3 años

What questions do you have about using machine learning methods?
@morgomi hace 3 años

as a Turkish, I loved the package' name :d
@truemusicmedia hace 2 años

very informative! very well done lesson. Thank you very much.
@yingdongli3433 hace 3 años

learning from your paper, hope the following video come soon.
@mehrbodestaki6818 hace 3 años

Looking very forward to trying out mikropml! Perhaps this topic is covered in a future video I haven't seen yet, but it would be great if you could discuss dealing with highly imbalanced datasets (i.e. 450:50 control to case structure) when using these ML methods, or better yet show an example of how you typically deal with those types of datasets.
@user-ke2jx3rk4r hace 1 año

Loving the videos, the energy and the help. I have tried to run mirkropml but i have a doubt. When looking at the number of samples in the results, it shows 41 samples with 28 predictors.... It does not make much sense as i had initially 203 samples. Is 41 referring to the samples used to test the model and not the training set? Thank you!!
@wapsyed hace 9 meses

Never thought I would watch Seth Rogen teaching ML in R 😅
@8bitgerman477 hace 2 años

These videos are great. Thank you, I just had one question. What are the implications of getting a higher AUC and accuracy on the test data than on the training data?
@charlottebraley8702 hace 2 años

Hello, it is possible to use this package and do ML with 3 levels (values in your "srn")? Example, do we can have healthy, symptomatic, asymptomatic? Thank you very much
@nguyenlephuong9489 hace 2 años

Hi, thank you for your introduction to mikropml. I am planning to implement the pipeline in my study. In your opinion, what is the minimum sample number to input into mikropml? Is n=20 samples would be enough? Or n should be around 100 samples?. Thank you once again for your time and our advice
@bedece1549 hace 1 año

thank you so much for the video, very interesting package. But I have a doubt, is necessary separate a 20% validate (in the 80%) when use a cross validation method? thanks for you help
@mohammedarazzaq7847 hace 2 años

Thank you so much for the channel and amazing explanation Please, I want to ask you about "Adaptive neuro fuzzy inference system" which package we can use to implement it in R. Thank you so much
@rishikeshdash12 hace 2 años

Sir, I have one doubt when we were using machine learning or deep learning in Microbiome data for predicting healthy or diseased what type of normalization we should perform with otu counts ? should we prefer clr or relative abundance? sir, i have microbiome data and few clinical parameters (vitamin-d level, womac score, age, year of pain) as features or you can say it input variables to predict two output variable as healthy or diseased so what type of normalization i should prefer for meta data? shall we use scale() function for all the features or different normalization for above features? thank you sir!
@rishikeshdash12 hace 1 año

Sir, Please suggest any book for ML in R and ML on microbiome using R. I want to learn and understand parameters used in ML model at basic level.
@rishikeshdash12 hace 2 años

what is fit_result here?
@nosaosawe3158 hace 9 meses

What are your social media handles sir? I really love your works.