The magrittr and base R pipe: what's the difference? (CC241)

2022-08-22に共有
Did you see that R version 4.1.0 included a pipe within base R? Perhaps you're wondering how it compares to the magrittr pipe we're familiar with from the tidyverse. Pat breaks down the relevant differences and why he's sticking with the magrittr pipe we're all familiar with. He also explores the other pipes and aliases that are available through magrittr. He does all this using local weather data downloaded from NOAA in RStudio with a lot of help from the tidyverse

You can find my blog post for this episode at www.riffomonas.org/code_club/2022-08-22-pipes.

#magrittr #baseR #pipes #R #Rstudio #Rstats

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in taking an upcoming 3 day R workshop be sure to check out our schedule at riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/

0:00 Introduction
3:25 Calculating correlation with pure base R
7:21 Calculating correlation with dplyr functions
9:46 The base R pipe
12:03 The magrittr pipe
13:27 The exposition pipe
15:18 The tee pipe
17:42 Alias functions from magrittr
22:36 The assignment pipe

コメント (21)
  • "ceci n'est pas une* pipe" is a thing of beauty. You should put it on a t-shirt! I'd buy it in a heartbeat.
  • great video, thanks a lot. just one side note starting with R version 4.2, you can use the (_) as data placeholder like the (.) in the Magrittr. so the syntax would be: |> cor.test(~prcp+ snow, data = _)
  • I cant believe i'm here. I can just watch R videos and understand things enough to not need to pause and go back to learn something, then resume.
  • @mabenba
    The video we didn't know we needed came the exact same time we needed!! Thanks for sharing. I love your content!
  • I didn't know there were so many pipes. Thanks, very interesting!
  • @sven9r
    Finally Pat started the nerd talk! I love it! I want to add I never wrote a single time in my life the pipe manually. For the others besides Pat's highly trained fingers: Just press cntrl+shift + m on Windows or cmd + shift + m on my mac.
  • No wonder the output of line 10 is the same as line 7; they both use the same data 😁! Awesome video as usual.
  • Very clear presentation. I want to add a few things to give a fairer comparison to base R (irrespective of the pipe). Base R has na.omit() and subset() which are essentially the same as drop_na() and filter(), so you can run the first set of code just as cleanly in base R as follows: local_weather |> na.omit() |> subset(snow > 0) Similarly, base R has the transform() and within() functions, which work very similar to mutate(). As others have mentioned, we now have the _ placeholder for piping to arguments other than the first. Though it's also worth noting that |> pipes to the first unnamed argument. So no_na_no_zero |> cor.test(formula = ~prcp+snow) also works, because the first unnamed argument is data, which the dataset is then passed to. An alternative to %$% in base R is with(), e.g., no_na_no_zero |> with(cor.test(prcp, snow)). The first argument to with() is the dataset and the second is the expression you want to run, which makes it amenable to piping. Just wanted to add to the info you presented for those interested. I think base R gets a bad rap sometimes, but it's not always as obtuse as some make it out to be.
  • @gimanibe
    Great video, Pat, Thanks a lot. I actually use the %T>% pipe inside functions quite often, for example, to plot and intermediate result or print a data.frame I do further work with.
  • @russtin1
    I love the Ctrl-Shift-m hot key for the Magritter pipe. My fingers hate reaching for that top row
  • @rayflyers
    I'm a data scientist, and given the scale of data I work with, the difference in performance is a consideration for me. I won't deny that the dot notation for the standard Magrittr pipe is incredibly useful though. I'm pretty curious about the trade-off in performance between loading the magrittr pipe for base R functions like gsub versus using the Base R pipe for pipe-friendly functions from tidyverse packages like str_replace_all. I don't expect you to go into all that though. I can test it on my own. Also, huge thanks for the introduction to the other pipes and alias functions because I've been overlooking them, and they could come in super handy for more readable code. Great video!
  • Excellent video. Always forget that cor.test is available. Much easier to pipe in as opposed to the cumbersome dataframe %>% select(x,y) %>% cor(x, y). Also had no idea set_colnames existed. Was using a purr function for years. Similar, but one fewer argument which I always forgot. Thank you Dr. Pat.
  • Wow. I never knew there were other pipes other than %>%. Thanks for the video.
  • Tee Pipe is useful if you want to inject console progress messages while pipeline is executing.
  • @mikep8857
    I agree that the %T>% pipe is not terribly useful but I do like it to look at both the head and the tail of a data frame in one pipe.
  • Fira Code font displays the native R pipe as an arrowhead which is super cool (but not the Magrittr pipe)
  • Hi Pat. Thanks for the video. I haven't spent any time looking at the magrittr aliases and they do look useful for certain use cases. One comment though. When you mention show the base pipe workflow, you didn't use the `_` placeholder to replace the `.` from {magrittr}. I demonstrate the use of the placeholder below: ``` data("CO2") CO2 |> cor.test(~ conc + uptake, data = _) ``` The placeholder was added in R 4.2 (I think) and, as I understand it, the _ placeholder can only be used for a named argument - so not a positional argument, nor a ... argument. I still use the magrittr pipe by default for interactive workflows, but I think that the the R base pipe is a better fit when I work on packages. I also haven't found a use for the eager pipe and I also avoid the assignment pipe for the same reasons that you mention. If I want to produce a modified dataframe I will usually pipe the first assignment call: ``` my_data <- read_csv(....) %>% drop_na() %>% filter(....) ``` This makes it obvious that my_data is not the raw data from the read_csv call, but doesn't give me eye-strain looking for %<>% hidden in my code. 🤣