Most of the time, you will not be hand-entering data into your R GUI. R can read datasets into its working memory that are stored in other places, such as on a hard drive or a website and it can read most any data format. In class, you will mostly be working with .csv files (comma separated values file), so this is what is covered here.
These R Help pages use the “diamonds” dataset from the ggplot2 graphics package in R, by Hadley Wickham and Winston Chang
A description of the variables is below:
price price in US dollars
carat weight of the diamond
cut quality of the cut (Fair, Good, Very Good, PrReemium, Ideal)
color diamond colour, from D (best) to J (worst)
clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1,IF (best))
x length in mm
y width in mm
z depth in mm
depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43㤼㸶79)
table width of top of diamond relative to widest point
If you have a Mac, double click the diamonds.csv file on your computer. If it opens with Excel, you are good to go. If it opens in Numbers, you must change the file association or R won’t be able to read the file in. To accomplish this:
R has the read.csv() function for reading data from your .csv files into your workspace. A common beginner challenge is figuring out how to find your data.
R has the concept of a working directory. If you store your data and analysis scripts there, you won’t need to point R to your file when calling it in. It will know where to look!
To find your current working directory, use the getwd() command.
getwd()
## [1] "C:/Users/lesli/Dropbox/MGSC 291/RHelp"
This is the way a pathname looks on a Windows machine. You can see that my working directory is one of my Dropbox folders. To change the working directory to say, the desktop, change the pathname in the setwd() command. Note, if you aren’t sure of the pathname to the location you want to set, go to that location, right-click a file, choose ‘Properties’ and look at the path next to ‘Location’. Note that you must use forward slashes as separators. If you copy and paste the path and it has backslashes, you will need to change them to forward slashes.
If you use getwd() on a Mac, the pathname will look a bit different.
> getwd()
[1] “/Users/leslie.hendrix/Documents”
To find the pathname where you want to set your working directory, navigate to a file in that location and either left-click the file once and use Cmd + i or right-click and choose ‘Get Info’. The pathname is next to ‘Where:’ under general info. On my Mac, this line for a file on my Desktop says
Where: Macintosh > Users > leslie.hendrix > Desktop
Use the result from getwd() to inform you where to start in the setwd() command. You can copy and paste the path from the “where” information, but you may have to change the arrows to forward slashes if it doesn’t automatically change them for you.
> setwd(“/Users/first.last/Desktop”)
> getwd()
[1] “/Users/first.last/Desktop”
Once you have your working directory set where you want and your data saved in that location, it’s easy to call in your .csv file using the read.csv()} function and the name of the stored .csv file. Our diamonds.csv has header names, which is the default for the read.csv() function, so we can simply type the name of the file on quotes to read in the data.
sparkly <- read.csv("diamonds.csv")
You could also call the dataset in by pointing R to a specific locaton, like this on a Windows machine:
sparkly <- read.csv("C://Users/lesli/Dropbox/MGSC 291/RHelp/diamonds.csv")