The Art of R Programming: A Tour of Statistical Software Design Paperback – Oct 15 2011
|New from||Used from|
Frequently Bought Together
Customers Who Bought This Item Also Bought
No Kindle device required. Download one of the Free Kindle apps to start reading Kindle books on your smartphone, tablet, and computer.
To get the free app, enter your mobile phone number.
About the Author
Norman Matloff, Ph.D., is a Professor of Computer Science at the University of California, Davis. He is the creator of several popular software packages, as well as a number of widely-used Web tutorials on computer topics. He has written articles for the New York Times, the Washington Post, Forbes Magazine, the San Francisco Chronicle, and the Los Angeles Times, among others, and is also the author, with Peter Jay Salzman, of The Art of Debugging (No Starch Press).
What Other Items Do Customers Buy After Viewing This Item?
Top Customer Reviews
Has some typos / mistakes that can cause you to scratch your head sometimes.
Most Helpful Customer Reviews on Amazon.com (beta)
What Matloff does is to lay out the essentials of the R language (or S, if you prefer) in depth but in a readable fashion, with well-chosen examples that reinforce learning about the language itself (as opposed to focusing on statistics or data analysis).
I'm a long-time (12 years) R user, which is my platform for analytics every day, and I have programmed in a variety of languages from C to Perl. I have long missed the fact that there is nothing for R comparable to Kernighan & Ritchie ("K&R", The C Programming Language) or similar programming classics; finally there is. Matloff is not quite as beautiful and elegant as K&R (and to be fair, is not in their position as the language creator) but this book has similar goals and comes reasonably close.
I think there are two primary audiences for this book: those who are learning R from a computer science or programming background; and statisticians and others who use the programming language and want a thorough exposition. In my case, for instance, despite having written perhaps 100k lines of R code over the years, there remained areas where I was uneasy (e.g., exactly how do lists relate to data frames). Matloff sets it all straight, in friendly, readable fashion. Even in rudimentary chapters, I learned shortcuts and miscellaneous functions that are quite useful. The examples throughout are more "CS-like" than statistical, which is highly advantageous for this topic.
In addition to the tutorial content, it is well-suited as a quick reference. It doesn't aim to be comprehensive from a function point of view (which is almost impossible, and what R Help is for), but it is comprehensive from a programming conceptual point of view.
In short, if you program R, and unless you're a member of R-Core, then I believe you'll enjoy this, will learn something, and will refer back to it repeatedly.
Variable scope - Chapter 7
User-defined classes - Ch 9
Debugging - Ch 13
Profiling and performance (mostly, vectorization) - Ch 14
Interfacing with C/C++ and Python - Ch 15
Parallel computation ("pure R" approach using "snow" package, and C++-aided approach using "OpenMP" library) - Ch 16
I have not seen the material of Chapters 15-16 in any other R reference; the other topics have shown up elsewhere - in "R in Nutshell", for example - but get more attention here. The chapters would have been much shorter if written in a "Nutshell" style; however, I do not automatically consider a verbose, user-friendly writing style a negative.
The early chapters introduce R in a way similar to other books - except for (a) eschewing discussion of the language's statistical repertoire, which makes sense given "programming" focus, and (b) showing a greater interest in the "matrix" class - and although they do it quite nicely (this said, let me ask the author to reconsider his "extended examples"), I would not recommend "Art of R Programming" to non-SRPs, and point them to Robert Kabacoff's "R in Action" or (the E-Z version) Paul Teetor's "R Cookbook" instead.
Overall, while the book did not quite click for me - I am a "data analyst" and at present do not have much "need for speed" (cf. C/C++); on the other hand, I would like a firmer grasp on R's OOP, but here, "Art of R Programming" only whets one's appetite - I cannot deny its quality and unique value for budding SRPs. If there was any wavering between four and five stars on my part, the appreciation of how pretty and inexpensive the book is tipped the scales.
The book does a great job at times of explaining how the various R functions work, as well as concepts such as "vectorized" functions. A bit of code is shown, and then there is a lot of explanation that describes what it does, and why. Sometimes, the phrasing could use improvement, and I found myself perhaps struggling to master a concept longer than I should have, but it was enough to get the job done.
Then I got about a quarter of the way through the book and hit an extended example of applying logistic regression. First, the code included a tilde operator, which had not been mentioned anywhere the book before that. Next, it called a function, glm, without explaining what it does, and it showed the results, and said, "Sure enough, we get a 2-by-8 matrix, with the jth column given the pair of estimated B[i] values obtained when we do a logistic regression using the jth explanatory variable."
In effect, the book suddenly shifted from an explain-it-all-as-we-go text to a we-assume-you-know-statistics-as-well-as-exotic-R-operators-and-functions text. I am completely unable to understand this example until and unless I dig into both the related concepts in statistics, and the R-related syntax. I can't blame the book too much for my lack of knowledge in statistics, but I can say that it was careful to provide explanations on some much simpler statistical concepts earlier. As far as the R syntax, I don't think there is any excuse for that. It also turns out that the caret operator in this context is not at all what a programmer would expect it to be--no coverage of that either.
Somewhat later was a very long example on a Discrete Event Simulator. Here, as in so many other places, the author likes cryptic variable names such as rw, evntty, inspt and appin. If you were to study the code long enough, you would eventually understand what all of these meant. But it's sloppy and irritating and makes the job of understanding the code much harder.
Not long after this, he makes a comment on recursion that made me burst out laughing:
"It's fairly abstract. I knew that the graduate student [who had asked him for advice on writing a function], as a fine mathematician, would take to recursion like a fish to water.... But many programmers find it tough."
What I, a mere dim-as-a-20-watt-bulb programmer, find tough, is a plethora of cryptic variable names. Recursion, not so much. I followed his example with ease. Maybe if I were a math graduate student I could understand those variables!
I've also been disappointed with how little attention the book gives to the fundamental differences between some of R's "families" of functions, such as apply, lapply, sapply, and tapply, or lm and glm. There is a brief hand-waving comment and then off we go. This is unfortunate especially since, in my view, the builtin R help is often impenetrable and written more as a technical spec then a clear explanation.
I have pushed on to subsequent chapters, and learned more from the book. But be forewarned that it has a tendency to shift suddenly and without warning from a from-the-ground-up perspective to a we're-all-experienced-R-users perspective.
One other comment, as others have noted here, the publisher really should have included data files so that readers could play along with the examples.
Being new to R and having worked through the first five chapters I was struggling with the data files that are referenced in the book. Normally, when learning a new programming language working the examples works fine for me, but for this book it proved a nightmare: 1) does not explain where the data files can be found. 2) After searching the internet, I found a link to "the data files" on the publishers web site, only to be disappointed even more: many files are missing or have different names from the ones used in the book. Some are corrupt and/or contain different values from those shown in the book.
It really made me wonder where all the five star ratings for this book were based on. I cannot belief that these reviewers used the book intensively.
This problem is not new although only few reviewers mention it: if you google "missing data files art of r programming" you will find many other people that encountered the same problem.
A second problem is that the code fragments often have errors that are really hard to solve for beginners. One example being the mount rushmore code on page 65 and another one the code for the words frequency problem on page 98. On the web I found some solutions/corrections by other readers.
Then why did this book earn so many five-star ratings? It probably has to do with the fact that it could be a very good introduction to R, if only the author (and editorial staff at No Starch Press) had payed more attention to detail and had spent some extra work in providing correct data files.
Look for similar items by category
- Books > Computers & Technology > Computer Science > Artificial Intelligence > Computer Mathematics
- Books > Computers & Technology > Programming > Languages & Tools
- Books > Computers & Technology > Software > Mathematical & Statistical
- Books > Professional & Technical > Professional Science > Mathematics > Applied
- Books > Science & Math > Mathematics > Applied > Probability & Statistics
- Books > Textbooks > Computer Science & Information Systems > Programming Languages
- Books > Textbooks > Sciences > Mathematics > Statistics