That old and tiring war between R-lovers and pythonists
Author

Henrique Costa

Published

June 20, 2023

Doi

Having decided to improve my skills in the short space of time left between my professional life and personal life, I started researching the available data science options. It quickly became clear that I should choose a programming language to learn and apply the techniques and the choice fell on R or Python, for me both options are good because they are free and open source and have an excellent reputation, in addition to each one having its own own community of active users, and very active!

I don’t remember where I saw this tip, but for me it was the central factor in my decision and something worth sharing:

If you are a professional who uses analysis tools in a business context, and with experience in Microsoft Excel, my tip is: select R to start your journey in data science. R is a single-threaded object-oriented functional programming language, and once you understand the main commands, it is intuitive to use. As such, it’s pretty predictable, which has been great for me as a professional. R has an excellent graphics package that complements the idea of analyzing data or data wrangling, which Python programmers often turn to ggplot2 to generate beautiful visualizations, or use the ggplot2 theme within Python (lol), even though in recent years the Python’s visualization libraries have improved a lot, yet this happens frequently.

On the other hand, if you already have some experience in programming, or are from the computing field, Python is a much more general purpose and more readable language for those coming from this field, and everything you can do in Python is the same as most of the things R is good at.

Both groups have strong communities that share their knowledge on various blogs and events and I really think they are both great choices. Today the market is more heated when it comes to Python, and professionals who know both languages do very well.

Now that I’ve been using R for a few years (since 2013), I’ve had the chance to use it a few times in the workplace, and I use it whenever I can, showing the resources this tool offers, creating econometric models and estimation procedures. in modeling for financial risks to provide some business insights. I have already carried out many analysis projects for private clients and I feel honored that my efforts to learn are generating results.

A fellow Python expert with an interest in machine learning once saw the few lines of code needed to organize the dataset, train a model, and predict results, and, frankly, he was shocked. We had an awkward R-to-Python conversation trying to understand the differences between data matrices and data frames afterwards, although for me it clarified where R was strong: the ease of preparing and building a model quickly for any type of analysis.

But depending on who you ask, when I first came into contact with R in 2013, R probably had a slight, if not substantial, advantage over Python in user adoption for machine learning and what is now known as data science. Since then, the use of Python has grown substantially and it would be difficult to argue against the fact that Python is the new favorite, although the race may be tighter than one might expect given the enthusiasm of Python fans supporting the new and Brilliant tool with biggest hype.

In recent years, Python has benefited greatly from the rapid maturation of free add-ons such as the Scikit-learn machine learning framework, the Pandas data structure library, the Matplotlib graphing library, and the Jupyter notebook interface, among several other open applications. Source libraries that make it easier than ever to do data science in Python. Of course, these libraries only brought Python to par with what R and RStudio could do long ago! However, Python is comparatively fast and memory efficient – at least relative to R – which may have contributed to the fact that Python is now arguably the language most frequently taught in formal programs in data science. and quickly gained adoption across business domains.

Rather than indicating the imminent death of R, because Python’s rise is steep. In fact, the use of R is also growing rapidly, and R and RStudio are becoming more popular than ever. Although students sometimes ask whether it’s worth starting with R rather than jumping straight to Python, there are still many good reasons to choose to learn machine learning with R over the alternative.

Please note that these justifications are quite subjective – not just mine, but any justification on the internet will be like this – and there is no right answer for everyone, so I hesitate to put this in writing! However, as someone who still uses R almost daily as part of my work for a large corporation, here are a few things I’ve noticed:

Hopefully, the reasons above will give you the confidence to begin your journey. There is no shame in starting with R (as some people think), whether you stay with R long-term, use it side-by-side with other languages like Python, or major in something entirely different, the fundamental principles you learn will be transferred to any language (a clear example is: if you haven’t yet learned SQL and already understand the basics of Tidyverse, you will have an easier time studying SQL, or vice versa) or tools you choose. Although code written in R is much “prettier” to me, I highly recommend that you use the right tool for the job, whatever it may be. You may find, as I did, that R and RStudio are your tools of choice for many real-world data science and machine learning projects - even if you occasionally take advantage of Python’s unique strengths!

In the end, my recommendation is: just pick one and get started. Although I will still need to learn a lot more about Python to get the unique benefits each language offers, amplified through collaboration. The good thing about this is that, using RStudio, an IDE for R, it is possible to use Python and SQL as well.

Hey! 👋, did you find my work useful? Consider buying me a coffee ☕, by clicking here 👇🏻

Reuse

Citation

For attribution, please cite this work as:
Costa, Henrique. 2023. “For Me, the Choice Was R or Python.” June 20, 2023. https://doi.org/10.59350/83vew-1de35.