Congratulations for getting started with R!
If you are reading this sentence you have probably made up your mind to harness the power of R. This article prepares you for take-off in your flight with R, it provides a gentle introduction to R and how to go about setting it up on your computer. While we taxi on the runway doing preflight checks, you will also be introduced to some of the jargons of R.
What is R?
R is a free, open source, command line oriented statistical package. R, in common parlance, also refers to a weakly-typed, dynamic and interpreted high-level programming language that runs in the statistical package.
Therefore, depending on who your current companions are the term ‘R’ may refer to different things. If you are with computer science students, R is a programming language that has excels (pun intended) at data manipulation. If you are with statisticians, R is an excellent statistical package. And God forbid, if you are with a bunch of executives then R is a free software that runs on almost any computer and whose experts are pretty cheap to hire.
Brief History of R
When learning a new programming language it pays off to know a bit about the roots of the language. Knowing about the origins of a programming language allows you to write idiomatic code in that language.
R is modeled after the programming language S. To give you a sense of how old S is, know that development of S language dates back to 60s. To put it in perspective Fortran (the second oldest alive programming language) dates back to 1957. S, which was developed by researchers at AT & T, is inspired by Scheme. One of the core distinguishing features of S is that it is interaction-oriented. An interaction-oriented language is one that encourages (and not just allows) interactive programming. Programmer writes small piece (typically one line or one command) of code, collects or views the result and processes the output further. Well known examples of interaction-oriented languages are LISP (famously called Read-Eval-Print Loop or REPL) and Python. To appreciate the innovative spirit of S, recall that 60s was an era of batch processing and punched cards. Coming up with an interaction-oriented programming language in that period is remarkable to say the very least.
R is an open source implementation of the S programming language. Since R chose to introduce a few deviations from the S programming language, R can be best thought of as a programming dialect of S. Since R is free (as in zero-cost), open source and runs on almost any programmable device it has become far more popular than S. As of this writing, R is the de-facto statistical package used in colleges and universities in North America and India. Salute to the power of being free. Now you know why marketing departments across the world want the word ‘free’ in every advertisement.
Why Choose R?
If you are still wondering whether R is the right statistical package for you or not, here are a few advantages of R –
- R is free as in zero cost. You don’t need to pay anybody anything for installing, using, modifying or re-distributing R. In my opinion this is one of the main USPs of R. Not having to pay anything upfront makes R natural choice for universities, professors and students.
- R is semi-free as in partly liberated. It’s not controlled by one of those gigantic, monolithic corporations (thinking of the evil search engine company that is engulfing the internet). It’s controlled by a bunch of fair intentioned, hard-working individuals. There have been some attempts at hijacking the freedom (by the old OS company) but the R ecosystem has become so vast that it’s now immune to such ‘takeover’ attempts.
- R is open source, licensed under GPL. If you want you can change the source code with no favors owed to anybody. Just that if you re-distribute your work, you must provide source code at least on demand.
- R has a vast ecosystem. There are a large number of libraries that extend the R base. There are a number of mailing lists and experts available on stackoverflow.com to help you out with your issues.
- R runs on almost every platform. It’s likely to run even on antique hardware, though you may not be able to practical problem solving owing to memory constraints. Nevertheless, installing and running R is usually a piece of cake. (SAS developers may pitch in here with their stories at this point)
R interoperates with other statistical packages and programming languages as well. R works with Java, PHP, Python, C++ and can be integrated with Microsoft Excel, SAS, SPSS, Pentaho and Stata.
- Last but not the least, R has some of the state-of-the-art graphics capabilities. With a little practice you can plot aesthetically pleasing and informative graphs summarizing the information that you generate from the analysis. In fact it is the graphing abilities of R that has allowed it to gain so much traction.
R is available on Windows, OS X, UNIX (such as Solaris) and Linux platforms. R also provides a tarball of the source code which you can compile on your platform of choice. Prebuilt binaries are readily available for the three most popular operating systems – Linux, Windows and OS X.
The online repository of R resources is called CRAN (https://cran.r-project.org) which is acronym for Comprehensive R Archive Network. True to its name it really is comprehensive. You will find a ton of resources there including binaries and documentation.
Installing R on Windows
- Go to http://www.r-project.org/
- Click on “CRAN”. Select a mirror site near your location.
- Click on “Download R for Windows” under “Download and Install R”.
- Click on “base”
- Download the exe that matches the architecture of your hardware and OS
- If you are impatient click here – https://cran.r-project.org/bin/windows/base/
- Install R from this exe (yeah, double click :-))
Installing R on OS X
- Go to http://www.r-project.org/
- Click on “CRAN”. Select a mirror site near your location.
- Click on “Download R for (Mac) OS X” under “Download and Install R”.
- Download the latest pkg file
- Double click the pkg file to install R
Installing R on Linux
You can download packages from CRAN for Ubuntu, Red Hat, Suse and Debian like you can for Windows and OS X. You can also install R on Linux using native package managers such as yum or apt-get.
R package is named “r-base” on Ubuntu and Debian, “R-base” on Suse and “R.i386” on Red Hat. Here is how a typical installation command using native package manager looks like –
$ sudo apt-get install r-base
$ sudo yum install R.i386
Installing R by compiling the source code is pretty standard and easy on Linux platforms. You can get the source code from the CRAN website under “Source Code for all Platforms” section.
Once R is installed, it can be run the way you run other programs on your OS.
1. On Windows –
Click on Start → All Programs → R
Double-click on the R icon on your desktop or on the R exe installed
2. On Mac OS X – Either click on the icon in the Applications directory or put the R icon on the dock and click on the icon there. Alternatively, you can just type ‘R‘ (capital case) on command prompt.
3. On Linux flavours – Assuming your PATH contains R’s installation directory, type ‘R‘ (capital case).
A typical R console on Windows looks like the image below. On Windows there is a R-Gui ‘main’ window that contains R console and other windows that might appear during the session. Notice that unlike spreadsheet programs there are no ribbons or drill-down menus with thousands of pre-built commands.
A Simple Sample Session: Hello World in R
The R console is the command line interpreter that you use to provide inputs and view textual and numeric output. By default, the R prompt is the ‘>‘ symbol. The prompt indicates that R is ready and is waiting for your commands to be typed in. Many R programmers don’t like this default prompt as it is easy to confuse with the greater than symbol. The most popular choice among R programmers is ‘R> ‘. The way to change your command prompt is using options function as shown below.
> options(prompt="R> ")
This changes the command prompt to ‘R> ‘
You can use the keyboard up arrow (↑) and down arrow (↓) to scroll through any previously executed commands. This reduces a lot of typing effort.
Now that we have set the command prompt to our liking, let’s get done with the customary Hello World! program. Here is how it goes in R –
R> print("Hello Bitchy World!")
 "Hello Bitchy World!"
You can also use the cat function –
R> cat("Hello Bitchy World!")
Hello Bitchy World!R>
As you might have noticed, R programmers are less diplomatic, more honest and truthful and quite accurate in using adjectives.
Writing Bigger Chunks of R Code
As you might have guessed the command line interpreter is good for writing one command at a time. If you want to write longer pieces of code (and eventually you will have to) you can either use an IDE (such as RStudio or Netbeans) or editor (such as Sublime Text) or you can fall back on the script editor inside R.
You can open a new instance of the R source code editor using the R GUI menus (Go to File → New script in Windows or File → New Document in OS X).
The built-in editor features useful keyboard shortcuts (
CTRL+R on Windows), which automatically send lines
to the console. You can send the line upon which the cursor sits, a highlighted line, a highlighted part of a line, or a highlighted chunk of code.
That’s all for this part folks!
We got up and running with R in this article, next we will look at packages and help subsystem in R.