The first € price and the £ and $ price are net prices, subject to local VAT. Prices indicated with * include VAT for books; the €(D) includes 7% for. Germany, the. R is a powerful and free software system for data analysis and graphics, with over This book introduces R using SAS and SPSS terms with which you are PDF · Programming Language Basics. Robert A. Muenchen. Pages This book introduces R using SAS and SPSS terms with which you are already DRM-free; Included format: PDF; ebooks can be used on all reading devices.
|Language:||English, Spanish, French|
|Genre:||Children & Youth|
|ePub File Size:||15.78 MB|
|PDF File Size:||9.44 MB|
|Distribution:||Free* [*Regsitration Required]|
I expect users of most other statistics packages could benefit from this book. An audience I did not expect to serve is R users wanting to learn SAS or SPSS. While only a small percent of SAS and SPSS users take advantage of their output management soundofheaven.info The code and data sets that accompany the books R for SAS and SPSS Users and R for Stata Users.
Read the full text. This is the default value so I list it here only to point out its importance. Here is an example: It is not an ideal way to work, but it does get you into R quickly. Therefore, if you need to use a blank space as a missing value, you cannot use blanks or tabs as a delimiter. Since we are saving it to a file we have named, you can tell it no. Enter your email address to receive notifications of new posts by email.
You can see what packages that are installed and ready to load with the library function. It will show you the names of all packages that you installed but have not yet loaded. You can then choose one from the list. Here I am loading the Hmisc package .
Since the Linux version lacks menus, this function is the only way to load packages. When trying to load a package, you may see the error message below. It means you have either mistyped the package name remember capitalization is important or you have not installed the package before trying to load it.
In this case, the package name is typed accurately, so I have not yet installed it. We will discuss this function in detail in Chap. GlobalEnv"  "package: Hmisc" "package: That can be very confusing until you realize what is happening. For example, the Hmisc and prettyR  packages both have a describe function that does similar things. In such a case, the package you load last will mask the function s in the package you loaded earlier. For example, I loaded the Hmisc package first, and now I am loading the prettyR package having installed it in the meantime!
The following message results.
For example, the following command will detach the prettyR package. Hmisc" If your favorite packages do not conflict with one anther, you can have R load them each time you start R by putting the commands in a file named.
That file can automate your settings just like the autoexec. For details, see Appendix C. You simply use the update. Version 1. This message, repeated for each package, tells you what file it is getting from the mirror you requested Iowa State and where it placed the file. As long as you see no error messages, the update is complete. First, you download and install the new version just like you did the first one.
Multiple versions can 16 5 Installing and Updating R co-exist on the same computer. You can even run them at the same time if you wanted to compare results across versions. When you install a new version of R, you also have to install any add-on packages again. You can do that in a stepby-step fashion as we discussed above. Here is an example that does this for all the packages we use in this book.
Once you are confident that you will no longer need an older version of R, you can remove it. That menu choice runs the uninstall program, unins That program will remove R and any packages you have installed.
That file is located in the folder c: To uninstall R on the Macintosh, simply drag the application to the trash. Although it is rarely necessary to uninstall a single package, you can do so with the uninstall.
First though, you must make sure it is not in use by detaching it. For example, to remove just the Hmisc package, use detach "package: Hmisc" If it is loaded. If the Packages window does not list the one you need, you may need to choose another repository.
Several repositories are associated with the Bioconductor project. As they say at their main website, http: Note that two CRAN repositories are selected by default. On Microsoft Windows, that is Shift-click and Ctrlclick, respectively. Omegahat 4: BioC software 5: BioC annotation 6: BioC experiment 7: BioC extra Enter one or more numbers separated by spaces 1: You can use these practice datasets directly. For example, to look at the top of the CO2 file capital letters C and O, not zero!
Samoa Andorra Angola infant. You can see the result of each command immediately after you submit it. You enter your program into a file and run it all at once. You can ease your way into R by continuing to use SAS, SPSS, or your favorite spreadsheet program to enter and manage your data, and then use one of the methods below to import and analyze it. As you find errors in your data and you know you will , you can go back to your other software, correct them, and then import it again. It is not an ideal way to work, but it does get you into R quickly.
The main R Console window will appear looking like the left window in Fig. Then enter your program choosing one of the methods described in steps 2 and 3 below. Enter R functions into the R console. R will execute each line when you press the Enter key. If you enter them into the console, you can retrieve them with the up arrow key and edit them to run again.
I find it much easier to use the program editor described in the next step. If you enter the name of a function and an open parenthesis, R. Enter R functions into the R Editor. You can see on the right side of Fig. Submit your program from the R Editor.
To run a block of lines, select them first, and then submit them the same way. Make any changes you need and submit the program again until finished. Save your program and output. The console output will contain the commands and their output blended together like an SPSS output file rather than the separate log and listing files of SAS.
Save your data and any functions you may have written. You can also save your workspace using the save. RData" 6. RData" See Chap. Optionally save your history. R has a history file that saves all of the functions you submit in a given session. This is just like the SPSS journal file. SAS has no equivalent. You can also use R functions to do these tasks. I prefer to always save a cumulative history file automatically. R offers to save your workspace automatically upon exit. If you are using the save.
Next time you start R, it will load the contents of the. RData file automatically. Creating a. RData file this way is a convenient way to work. However, I prefer naming each project myself. Start R by choosing R in the Applications folder. The R console window will appear see left window in Fig. Enter R functions in the console window. When you type a whole function name, the functions arguments will appear below it in the console window.
The R Editor will start with an empty window. You can see it in the center of Fig. You can also perform these functions using the R functions save. RData" load "myWorkspace. R has a history file that saves all of the functions you submit in a given session and not the output. You can see the command history window on the right side of Fig.
Notice that it has alternating stripes, matching its icon. Clicking the icon once makes the history window slide out to the right of the console. Clicking it again causes it to slide back and disappear. You can see the various buttons at the bottom of the history, such as Save History or Load History. You can use them to save your history or load it from a previous session.
Users of any operating system can quit by submitting the function quit or just q. R will offer to save your workspace automatically upon exit. The next time you start R, it will load the contents of the. You can enter R functions using either of the methods described in steps 2 and 3 below. You can retrieve a command with the up arrow key and edit it, and press Enter to run again. You can include whole R programs from files with the source function. For details, see Sect.
Enter R functions into a text editor. It color-codes your commands to help find syntax errors. You can submit your programs directly from Emacs to R.
See the R FAQ at http: Linux or UNIX users can route input and output to a file with the sink function. You must specify it in advance of any output you wish to save. The file will contain a transcript, of your work. Users of any operating system can save it by calling the save. RData" Later, you can read the workspace back in with the command: You can also save or load your history at any time with the savehistory and loadhistory functions.
Quit R by submitting the function quit or just q. Next time you start R, it will load the contents of 6. To include a program in R, use the source function. R" One catch to keep in mind is that by default R will not display any results that sourced files may have created. Of course, any objects they create — data, functions, etc.
If the program you source creates output that you want to see, you can source the program in the following manner: If you prefer to see only some results, you can use the print function around only those that you do want displayed.
For example, if you sourced the following R program, it would display the standard deviation, but not the mean. This one will. An alternative to using the source function to include bits of programs you are reusing is to create your own R package.
However, that is beyond the scope of this book. This is called batch processing. If you had a program named myprog.
Similarly, SPSS runs batch programs with the spssb batch command. In R, you can find the details of running batch on your operating system by starting R and entering the following command, in which the letters of BATCH must be all upper case.
You will need to change the path of Rterm. It is too long to fit in a standard cmd. You will need to open a new cmd. R will execute myprog. It is easier to write a small batch file like myR. It will route your results to myprog. R UNIX users can run a batch program with the following command. It too will write your output to myprog.
R There are of course many options to give you more control over how your batch programs run. See the help file for details. This approach also lets you make the most of your SPSS know-how, calling on R only after the data is cleaned up and ready to analyze. Even if that is what you plan to do, the remainder of this book will still provide valuable information.
The way R deals with many aspects of programming and analysis — handling missing values and selecting variables to name two — is so different between the two that you will need to know a fair amount about R to take full advantage of this feature. The package plug-in and its manual are available from http: Full installation instructions are also at that site, but it is quite easy as long as you follow the steps in order. First install SPSS.
Older versions of R are available at http: If this is your first time reading this book, you might want to skip this section for now and return to it when you have finished the book. First, you must do something to get a dataset into SPSS. We will use our practice data set mydata. If you use the commands, adjust your path specification to match your computer.
Now that you have data in SPSS, you can do any type of modifications you like, perhaps creating new variables or selecting subsets of observations before passing the data to R.
For the next step, you must have an SPSS syntax window open. Enter the program statement below. From this command on, we will enter R programming statements.
We will discuss that function in detail later. You can use the form c "workshop", "gender", "q1 to q4" or simply c "workshop to q4". I used the longer form to demonstrate that you must enclose in quotes each variable name or set of contiguous names connected by the keyword TO.
You can also use syntax that is common to R, such as c 1: That uses the fact that workshop is the 1st variable and q4 is 6th variable in the data set. See the manual for details. Its values are used in the far left column to label the rows. If we had not specified the row. However, the row labels would still appear the same because R always labels them and if you do not provide a variable that contains labels to use, it defaults to simply sequential numbers, 1, 2, 3. Now let us do some descriptive statistics on variables q1 to q4.
There are a number of different ways to select variables in R. One way is to use mydata [3: Keep in mind that if we had not listed ID on the row. R can select variables by name, but we will save that topic for later. The summary function in R gets descriptive statistics. Therefore, if we were running this interactively in R rather than in SPSS, we would submit this command: However, when submitting this from within SPSS, you will not see the results unless each function call is enclosed within a call to the print function.
Therefore, to see the summary results, you would submit this. R is essentially sourcing the program from SPSS, hence the similarity. Advanced programmers often want to run programs and do not see the results right away. Instead they use the unseen results as input to other programs.
They will of course eventually print some final results. Finally, we will do a linear regression model using standard R commands that we will discuss in detail much later. Our goal here is just to see what the spsspivottable. Display function does. That style draws only horizontal lines in tables, as most scientific journals prefer. If you copy this table and paste it into a word processor, it should maintain its nice formatting and be a fully editable table.
When I ran the program, this table appeared first in the SPSS output window even though it was the last analysis run. SPSS puts its pivot tables first. If your program contains some R code, then some SPSS code, then more R code, any data sets or variables you created in the earlier R session s will still exist. It does not include a point-and-click GUI for running analyses.
There are however several GUIs written by R users. You can learn about several at 6. It provides menus for many analytic and graphical methods and shows you the R commands that it enters, making it easy to learn the commands as you use it.
Since it does not come with the main R installation, you have to install it one time with the install. Below are the steps I followed to create the screen image you see in Fig. I started R for details see the section, Running R Interactively, above.
Then, from within R itself I started R Commander by loading its package from the library. That brought up the window that looks something like Fig. I had to tell it to look for All Files because by default it looks for. RDA file types and ours are. I then chose the file, mydata. So I clicked on the View data set button. The data appeared in Fig. The output you see on the bottom of the screen in Fig. You can learn more about R Commander from http: Its name stands for the R analytical tool to learn easily.
It is a point-and-click interface that writes and executes R programs for you. Before you install the rattle package, you must install some other tools. See the website for directions. Once it is installed, you load it from your library in the usual way. As the instructions tell you, simply enter the call to the rattle function to begin. It shows the steps it uses to do an analysis on the tabs at the top of its window.
You move from left to right, clicking on each tab to do the following steps: Then choose your variables and the roles they play in the analysis. In Figure 6. Explore — examine the variables using summary statistics, distributions, interactive visualization via GGobi, correlation, hierarchical cluster analysis of variables, and principal components. Transform — replace missing values with reasonable estimates imputation , convert variables to factors or look for outliers.
Cluster — finds groups of similar cases. Associate — finds association patterns. Model — apply models from tree, boost, forest, SVM, regression, or all. Evaluate — see how good your model is using confusion tables, lift charts, ROC curves, etc. Figure 6. It adds some helpful tools, like syntax checking in its program editor. It also provides the help files in a way that lets you execute any part of an example you select.
That is very helpful when trying to understand a complicated example. Linux users have some additional minor steps that are described at the site. In Fig. In this next example Fig. That is very helpful when you are learning! Selecting different tabs across the top enable you to see the different types of objects in your workspace.
Below I right clicked on gender, this brought up the box listing the number of males, females, and missing values NAs. If you have a list of models, you can sort them easily by various measures, like their R-squared value. Double-clicking on a data frame in the object browser starts a data editor Fig. It lets you rename variables, search for values, sort by clicking on variable names, cut and paste values and add or delete rows or columns.
However, they can be somewhat intimidating at first, since many of them assume you already know a lot about R. By the time you finish this book, the help files and other documentation should make much more sense.
However, they are written for intermediate to advanced users. It is a generic function which means that new printing methods can be easily added for new classes.
On any operating system you can submit the help. To get help for a certain function such as summary, use the form: You can cut and paste them into a script window to submit in easily digestible chunks. You can also have R execute all of the examples at once with the example function. Here are the examples for the mean function, but do not try to understand them now.
We will cover the mean function later. R has add-on packages that you must load from its library before getting help. One of these is the contents function. Let us try to get help on it before loading the Hmisc package. This might remind us that we have not yet loaded the Hmisc package.
We can do that with the following command: It does not cause a problem for our purposes. Attaching package: POSIXt, trunc. We do not need to look at the actual help file at the moment. We will cover that function much later. In many cases the help file for the generic function will refer you to those other functions, providing all the help you need.
However, in some cases you need to dig for such help in other ways. We will discuss this topic in Chap. We will also examine an example of this in Chap. To get help on a package, you must first install it and load it. However, not all packages provide help for the whole package. Most do provide help on the functions that the package contains. We will use a number of functions from the Hmisc package. Note that R is almost identical to the S language and books on S usually point out what the differences are.
We will discuss books on graphics in the chapters on that topic. I recommend signing up for the r-help listserv. There you can learn a lot by 46 7 Help and Documentation reading answers to the myriad of questions people post. If you post your own questions on the list, you are likely to get an answer in an hour or two. However, please read the posting guide, http: Taking the time to write a clear and concise question and providing a clear subject line will encourage others to take the time to respond.
Sending a small example that demonstrates your problem clearly is particularly helpful. See Chap. Also include the version of R you are using and your operating system. You can generate all the relevant details with the following command: However, if you add the letter R to other keywords, it is surprisingly effective. Adding the word package to your search will also narrow it down. If you use the Firefox web browser, there is a free plug-in called Rsitesearch  you can use.
Download it from http: People who write packages can put anything into its vignette. To see the vignette for a particular package, enter it in the function with its name in quotes: Chapter 8 Programming Language Basics R is an object-oriented language. Everything that exists in it — variables, datasets, functions procedures — are all objects. However, if you always put quotes around a variable or dataset actually any object name, it can contain any characters, including spaces.
Unlike SAS, the period has no meaning in the name of a dataset. However, given that my readers will often be SAS users, I avoid using the period.
Case matters, so you can have two variables, one named myvar and another named MyVar in the same dataset, although that is not a good idea! Commands can begin and end anywhere on a line and R will ignore any additional spaces. R will try to execute a command when it reaches the end of a line. Therefore, to continue a command on a new line, you must ensure that the fragment you leave behind is not already a complete command by itself.
Continuing a command on a new line after a comma is usually a safe bet. As you will see, R commands frequently use commas, making them a convenient stopping point. Submitting only that character will then finish your command. You may end any R command with a semicolon just like SAS.
That is not required though, except when entering multiple commands on a single line. It is only useful when your results run across several lines. We can tell R to generate some data for us to see how the numbering depends upon the width of the output.
The form 1: I have set my line width to 64 characters to help things fit in this book. We can use the options function to change the width to 40 and see how the bracketed numbers change. Additional spaces do not affect the commands. Instead, R has several different data structures including vectors, factors, data frames, matrices, arrays, and lists.
R also has data structures specifically for time series, but those are beyond the scope of this book. It is not. It exists by itself and is neither a column nor a row.
In practice, it is usually one of two things: All our examples will use the same dataset, a pretend survey about how people liked various workshops on statistics packages. That is, we submit commands and see the results immediately. Although typing out the print function for most of our examples is not necessary, we will do it occasionally when showing how the R code looks in a typical analysis.
Let us create a character variable. Using R jargon, we would say we are going to create a character vector, or a vector whose mode is character. These are the genders of our hypothetical students: Even when entering string values for gender, never enclose the NA in quotes. Now let us enter the rest of our data: There are no percents and no lines drawn to form a table.
The output is in a form that other functions can use immediately. Other functions exist that provide more output, like percents. Still others format output into publication quality form. R will usually provide output that is NA when performing an operation on data that contains any missing values. It will typically provide an answer only when you tell it to override that perspective.
There are several ways to do this in R. For the mean function, you set the NA remove argument, na. R has a special data structure called a factor for such variables. Regardless of whether the original data is numeric or character, when it becomes a factor, its mode is numeric. Let us enter workshop again just to see its values and convert it to a factor: It also lists the levels so you can see what labels are possible: Let us review entering our practice data for gender and printing it back out: R drops them to let you know it is not a valid character string that might stand for something like North America.
If we are happy with those labels, we can convert gender to a factor by using the simplest form of the factor function: It works the same way as for workshop, but the values on the levels command need to be in quotes: Male Female Male Male Male 8. A data frame is also rectangular.
In R terminology, the columns are called vectors, variables, or just columns. R calls the rows observations, cases, or just rows. Share Give access Share full text access. Share full text access. Please review our Terms and Conditions of Use and check box below to share full-text version of article.
Volume 80 , Issue 1 April Pages Related Information.
Email or Customer ID. Forgot password? Old Password. New Password. SurveyMonkey vs. Downloads Here are the programs, data sets, and files that support my books and the examples on this site.
June 10, at 9: Bob Muenchen says: June 10, at Dominik says: StevieD says: October 10, at Is there an actual book available that speaks about migrating from SAS to R? October 10, at 7: Cheers, Bob. NonSleeper says: May 11, at 4: Download sources are not working. Reason cited is too many traffics on the public Dropbox link.
Ratheen says: May 15, at