Package inflation!

Picture of by Dr Jan Roth

by Dr Jan Roth

Today I’m not talking about the increase in parcel post, but about statistical software.

I get asked from time to time which statistical software/language to learn as a beginner. My answers are usually the same (personal view):

1) In areas like clinical research and epidemiology, I would choose Stata (https://www.stata.com/): It has nicely curated, menu-based functions if needed, has a consistent statistical language and documentation, includes most basic functions already, and is faster to learn than, say, R or Python. R is much more error-prone (for beginners), and the dependency on many additional packages (even for basic functions like data manipulation or nice cross-tabulations) can be annoying. SPSS is easy to use but is quite limited.  

2) Focus on data science/machine learning -> use e.g. R or Python; you can also use the R- and Python-integration in Stata.

Some may say that better visualizations are possible with R than with Stata. This is not true (anymore).

Others may say that Stata is not freely available. This is true but it is available at a reasonable price – especially for students and through research institutions.

There are very different opinions on this topic. Regardless, epidemos works with researchers to show them the basics of statistics software like Stata, R or SPSS. It is only through hands-on experience that you can truly understand statistics.

What’s your take on this?

Here’s more