Speaking as a Pythonist (but who is in love with ggplot2 and dplyr) the wonderful thing about IPython Notebook is that its possible to inline R code with no more fuss than adding "%%R" in a cell.:
This is my biggest beef with R. It is constantly changing the dimensions and types of your data without telling you. Want to grab some subset of the rows of a matrix? Better add some extra post-processing in case there's only one row that satisfies your query, or else R will change its type!
The solution is not to make the programmer memorize obscure edge cases.
Although I think these are more reasonable (I've got multiple commits at work with messages bemoaning drop=FALSE), this can ironically also mess you up if you got used to the old defaults :)
There's no question that many defaults seem wonky to many users, however you have to take into account that the use cases when the language was created (particularly going back to S) aren't the same as they might be now. tl;dr classic statistics isn't the same as contemporary data science
I am torn, because Hadley Wickham's tools are truly wonderful, but the underlying R language is such a mess. For example, R has lazy evaluation despite being an imperative stateful language.
I wish Hadley had developed these tools for some other language, such as Python, or in a language-agnostic way. Hopefully that is the direction things will go in the future.
It's python that's really a mess for data science; you can't avoid it being a programming language first and a tool for data science a distant tenth. Syntax that only a programmer would like is necessary, and quite a bit of it at that. R is a much better fit for people who want to do statistics first, and as little programming as possible.
Thinks like function parameters being promises make it far easier to deal with functions like optimizers where there really are 10+ tuning parameters or things you may want to tweak. Iterative languages are far easier to understand for people who don't want to be programmers.
You cannot develop plyr or ggplot in a language agnostic way, because they need the purpose built syntax R has. Contrast to eg the fight in python to get an infix matrix multiplication operator.
I don't see how not having innate language support for an infix matrix multiplication matters. In R all "infix" operators are really functions anyway[1] so you could write your library that way. Alternatively, you could use python's overloading for infix operators. (Also, since when was Python considered not iterative? And for that matter, doesn't R's widespread use of *apply make it more functional anyway?)
[1]: Here is a simple example of that for +, note the very strange overloading of quotes.
I well understand R's operators, but why on earth is that relevant?
math:
S = ( H β − r )^T * ( H V H^T )^{-1} * ( H β − r )
python, ugly mess
S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
python, better: (although @ is an ugly matrix operator)
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
the latter is an order of magnitude easier to understand, and looks just like the math. Having one layer of indirection: math to code, is far better than two: math to code to obfuscated code because you won't create infix operators.
Edit: examples stolen from the matrix operator pep
The problem is that you want to do everything in one line. The proper way to do this would be to save things like H.dot(beta) - r in a separate variable and compute it just once.
Moreover, if it's hidden in a library then why does the user care if it's ugly? It's the library designer's job to test it and make sure it's right.
No, I simply want my math to look like math instead of a bunch of code that, after careful reading and some notes, implements some math. It's not a library function, it's the code I write. R, matlab, julia all allow users to write something that looks very close to the actual math. Python doesn't see that as a priority.
I disagree. I think if using dplyr + ggplot2 you can go very far without dealing with R's warts and I hope that the future of R lies in this direction. Honestly, R's syntax is not the problem, the "standard" library is the biggest problem in it's inconsistencies.
That said, the other big area of complaint in R is the type system. We are too often having to coerce types, but I'm not exactly sure of the solution for that.
Wickham et al's R packages are great, especially dplyr, and I think should be taught to new R users pretty much right off the bat. I find R's syntax to be a big hangup for new learners, especially on indexing and apply-to-each (sapply, mapply, just plain apply...), but dplyr really makes life much easier.
The %>% operator alone (which to be fair was originally from magrittr) is a great help. Not sure if this is my personal biases, but I always find it easier to read calls chained postfix-style.
I'm actually reviewing a book due out this summer called "Data Computing" that introduces the "Hadley stack" as the way of getting started in data analysis and statistics. It's by a professor here in Minneapolis at Macalester College.
I agree with you about dplyr + ggplot and was pretty much gobsmacked at the obviousness of "this is the way it should be taught" and am glad I'm in the position to help review such a text!
I wonder if eventually this is the future of standard R.
It's often a tradeoff between conciseness in one domain and generality in others. A similar story: Matlab is great for doing math and plotting, but I hated my life when I was developing a GUI in it. I later ported this project to Python, which was great for the GUI (relatively so) and a little less concise for the math. I find that tradeoff to be okay
http://nbviewer.ipython.org/github/davidrpugh/cookbook-code/...
BTW: For pandas-dplyr dictionary: http://nbviewer.ipython.org/gist/TomAugspurger/6e052140eaa5f...