If you decided to collect quantitative data for your research project, you face the choice of how to process and analyze this data: You can either rely on software with many functions reachable by a simple click on a button and that lets you modify entries directly such as MS Excel, Tableau or SPSS. Or you can opt to use a programming language. R is such a language (not the only one) that can be used for many steps in empirical research processes, such as data preparation, web scraping, statistical analysis, quantitative text analysis, reporting results (in form of graphs, webpages etc.), and many things more. Actually, you can do almost anything that you can do with other computer programs. The difference is that instead of doing your tasks “manually”, you tell the computer which tasks it should carry out by writing computer code. For example: Instead of modifying a value in your dataset by writing a new value in the cell in MS Excel, you write in computer code that a certain value should be modified.
This has several advantages such as traceability, automatization, and flexibility, but also disadvantages such as a (possibly) high time consumption and an initially steep learning curve.
Traceability means that by writing code you automatically document what you did with your data. This makes your data analysis process more transparent (to others, but also to yourself) and you can easily go back and modify something at any step of the whole procedure if you found an error.
Automatization means that if you need to execute the same task many times (for example draw the same type of graph), you only need to write it once. This can save a lot of time.
Flexibility means that there is no limit to tailoring code to solve your particular problem. For example, while charts made with MS Excel will always have some basic design features, you can customize it endlessly in R.
On the other hand, learning a programming language is at the beginning a little bit like learning a foreign language. Moreover, the process of writing code sometimes takes more time than using pre-built software. This means that for a small scale research project, which does not include many steps of data manipulation, it might be easier to use a more “manual” program.
However, the probably most important advantage of using R is its increasingly large worldwide user community committed to the open source principle. This means that simply by using google you can find answers to almost all questions, tons of tutorials, and thousands of “packages” developed by users. A package is essentially a collection of functions that other users can customize for their own needs.
R is not the only language that reunites these features. There are many more available, another popular one being Python.
You can request R via the Software Centre of Leiden University (to use it on a university computer) or you can download it from: https://www.r-project.org/ (for the software), and https://rstudio.com/products/rstudio/#rstudio-desktop (for a user-friendly text editor).
Hadley Wickham & Garrett Grolemund: R for Data Science
The best book for getting started with doing data science in R
- The university library features some books on R
- For searching answers to your questions
To Code or Not to Code: should lawyers learn to code? Lawtomated blog.
Some thoughts about whether lawyers should get into coding or not
Devaux / Chretien / Bobichon 2019.
A paper that demonstrates how to make an analysis transparent