## Statistical Programming With R 52

Posted
by
timothy

from the no-r-in-statistics dept.

from the no-r-in-statistics dept.

An anonymous reader writes

*"This series introduces you to R, a rich statistical environment, released as free software. It includes a programming language, an interactive shell, and extensive graphing capability. What's more, R comes with a spectacular collection of functions for mathematical and statistical manipulations -- with still more capabilities available in optional packages."*
## Good-oh... (Score:3, Interesting)

I've heard good things about R, but have never really got to grips with it (although I know it has been around for a while), so any kind of primer is more than welcome as far as I'm concerned.

## SPSS is garbage (Score:3, Informative)

## Re:Good-oh... (Score:3, Informative)

## Linux Stats package (Score:2)

## B ... C ... C++ (Score:4, Funny)

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S languageSo, R came from S; that must mean that R++ is coming up next!

## Re:B ... C ... C++ (Score:2)

## Re:B ... C ... C++ (Score:3)

"R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language"

:-)So, R came from S; that must mean that R++ is coming up next!

No, that's just a bullshit answer so they don't have to admit "R" really stands for 'rithmatic

## Graphing, hah! (Score:5, Interesting)

</cranky old man>

## Re:Graphing, hah! (Score:2)

## Re:Graphing, hah! (Score:3, Informative)

I don't know how R came into being, but IDL was originally designed as an ad hoc statistics and plotting tool. Because everyone was using it as an ad hoc tool, there was an assumption that everyo

## R supports graphics output in many formats (Score:2, Informative)

Output to different graphics devices has been in S, Splus, and R for as long as I can remember (and that's a long time). Maybe you should try having a look at the copious documentation for R; the documentation, like the system itself, is free.

## Re:R supports graphics output in many formats (Score:1)

## Re:R supports graphics output in many formats (Score:2)

## RTFM! (Score:5, Informative)

R is really a beautiful language, for its purpose. It has a very nice correspondence with math and code, and for most parts of "hard" science, that's really important.

Compared to MATLAB, you can easily write R code 5 times as compact as MATLAB code, and still get more understandable code.

## Re:RTFM! (Score:2)

I'm sorry, you don't get to use witticisms like "RTFM" while defending something something as fundamentally idiotic as:

The mind shudders at what the less "simple" ways of doing it must look like.

## Re:RTFM! (Score:2)

## Re:Graphing, hah! (Score:2)

Do help("bitmap") for all the details.

Do try to read the man page till the end next time, or ask a question to the dev team. Both the jpeg and png man page mention the function bitmap as a solution to the problem you are having.

All the best.

## Re:Graphing, hah! (Score:1)

## What's a Robust Replacement for Excel??? . . . (Score:2, Interesting)

## Re:What's a Robust Replacement for Excel??? . . . (Score:2, Informative)

http://www.wolfram.com/news/statistics.html [wolfram.com]

## Re:What's a Robust Replacement for Excel??? . . . (Score:2, Informative)

## Re:What's a Robust Replacement for Excel??? . . . (Score:2)

## Re:What's a Robust Replacement for Excel??? . . . (Score:1, Informative)

No data entry facilities, but it handles multi-dim. visualisation very well and has an handful of convenient methods (correlation analysis, PCA, histogram) built in.

I've come to realise that Excel's data vis. is almost totally a joke, and that its value for data entry is almos

## Re:What's a Robust Replacement for Excel??? . . . (Score:4, Informative)

1) Matrix languages (e.g. Matlab, Gauss): These have C-like syntax with the basic data object being an nxn matrix (so, internally, a scalar is a 1x1 matrix). These languages are the way to go if you want to write your own statistical/simulation algorithms. They do have extensive pre-written routines for many statistical tasks, but they're mainly for people who know that a regression coefficient vector is given by inv(X'X)X'y and aren't afraid to code that. Nice thing is that it would be a single line of code to do this computation. I believe GNU/Octave belongs to this category.

2) Data languages (SAS, SPSS): The basic object here is a dataset with variables. Inverting a data matrix here is essentially a meaningless concept, and would be extremely difficult to do, but creating a new variable that sums sales for different people by division for certain months is straightforward (note that this is very difficult in a matrix language). Beyond trivial manipulations, you'd store code in procedures like any programming language.

3) Menu-Driven languages (e.g. EViews): The basic object is still a dataset with variables, but your primary method of manipulation is menu-driven. Want to run a regression?, just select your dependent and independent variables from dropdown lists and click

There's some area of overlap between 2 and 3. 2-type programs provide a rudimentary menu-driven system for those who don't want to code everything, and 3-type languages will allow you to store some command line instructions for future use.

In terms of learning curves, they get progressively flatter (easier) from 1 to 2 to 3.

Pick your poison!

## Re:"Inverting a data matrix" (Score:2)

Not sure if this meets your definition, but I've been using SAS for boocoo years and can tell you that it has a "TRANSPOSE" facility explicitly for making columns into rows & vice versa.

## Re:What's a Robust Replacement for Excel??? . . . (Score:1, Informative)

v.Inverse[X].v

It's mostly shell-based, but the shell includes pretty formulas and graphics, histograms are not too difficult to do, and EVERYTHING in mathematica is a data structure which can be read and manipulated by the shell, includi

## And Of Course... (Score:4, Informative)

## R is cool... (Score:2)

But... ANYTHING is better than SPSS.

dave

## Manipulations (Score:5, Funny)

What's more, R comes with a spectacular collection of functions for mathematical andstatistical manipulations...I can see that this package will be quite popular with political campaign managers.

## So, what's the difference... (Score:2)

## Re:So, what's the difference... (Score:4, Informative)

In contrast, R is very close to Splus and comes with an extensive array of statistical toolboxes. Many professional users use, and even prefer, R for their day-to-day work.

If you are doing anything with statistics, graphs of real-world data, or bioinformatics, R is the package to use.

If you are doing other kind of numerical work, things are less clear. Matlab is widely used, but it is hugely expensive and the language is pretty limited. Octave is the obvious open source choice, but there aren't many packages for it, and Matlab software requires some amount of porting if you want to use it with Octave. Numerical Python is technically far better than either Matlab or Octave, and it has a lot of packages and features that neither offer, but it (obviously) isn't Matlab compatible, so you can't just load existing Matlab packages into it.

## Re:So, what's the difference... (Score:4, Informative)

Anyway, what it doesn't do as well as IDL (*shrug*) is visualization. Its graphing is limited to, well, graphs. Interactive analysis with funny widgets and stuff isn't R's selling point. Nor is R very well developed for image analysis and stuff like that. I think they have multi-D fourier transforms now, but they didn't two years ago.

IDL, OTOH, doesn't really do statistics at all. For example, it doesn't come with something as fundamental as QQ-plots. Believe it or not, but every paper that comes with an assumption of normality should come with QQ-plot... Or at least have done it.

The syntax of IDL (*shrug*) is unbeliably nasty (*shrug*, aargh, sorry, couldn't resist). I heard they have done something about it now, but two years ago, IDLs concept of scoping was at best, uhm, well, unclear. You could easily modify variables in other peoples badly designed code without being aware of it. Then, the COMMON blocks you often needed to pass parameters...? I have a hard time understanding people would actually use anything like IDL (*shrug*). R has a very clearcut lexical scoping of objects. You've got to really design your code veeery badly to fall in the same traps IDL programmers fall in on a regular basis. I've seen IDL programmers who's been in it since the beginning go WTF over scoping... It was better being a lone R user than an IDL user with a lot of support...

Also IDL attempted to get in OO in version 5 (IIRC), but it is a mess. OO designers would be rolling in their graves over this. R, OTOH, has decided not to incorporate all OO concepts, but the stuff they have done, is very clean, very easy to understand, and perfectly sound.

But the real point of R is to have very clear mapping between code and mathematics. You code your math, it is so easy to see what happens. No iterating over array indices, it simply never happens. That's extremely appealing once you've got the hang of it.

I once translated 70 lines of MATLAB code to 7 lines of R code, some interpolation stuff that didn't exist in R. Never finished it though, because I found I didn't need it, but as a proof of concept it was great. And while MATLAB code was pretty hard to grok, the R code was very straightforward, you could just show it to anyone with basic training in math, and they would immediately see what it did. Try that with code from any of the others!

I think that the basic thing is that most numerical math for physics and astronomy is right now more advanced in IDL or MATLAB. If you do any kind of statistics, you should be going over to R. If you are willing to code, I'd argue that R is a platform so much better than IDL and MATLAB, you should be migrating your code starting now. I know I'd be writing thousands of lines of R code rather than going back to IDL (*shrug*)... :-)

Then, you know, you can't inspect the code in the core of IDL or MATLAB. It is likely to be flaws in there, and they may not have meant anything for any other problems than yours.... I got hit with three bad bugs in R when I worked with it, I manage to narrow them down, and they were all corrected within hours. To me, this is extremely important. The implementation of math should be available for review just like a derivation of equations are.

## what R isn't (Score:5, Informative)

For people who have never taken real stat classes in college (or never learned it on their own) R will seem like a useless language. Most other languages can handle basic statistics computations.

Statistics is a whole lot more than means and averages. When I took my first real stat class, everything I knew about statistics was

literallycovered on the first half of the first page. I was totally blown away by what you could do with statistics.R is for hardcore stat folk who know a bit about programming, not programmers who need to do a little basic computation.

## Re:what R isn't (Score:2)

The other thing is that folks need a better way of handling relations and statistical functions. Right now, you need to learn a _lot_ to do stuff that shouldn't be that hard. Its almost like folks wanted to make sure that any project of this nature would need a DBA _and_ a script developer(or team) _and_ a statistician to get work done. That really

## Re:what R isn't (Score:3, Informative)

However, you do not necessarily need to be into statistics to find R appealing. I'm an astrophysicist, and I wrote my whole thesis based on R. I started out with a bit of C, and I used some small Perl hacks to do some naive parallellizing,

## Comparison of R, Mathematica, S-plus, Matlab, etc (Score:1)

## Re:Comparison of R, Mathematica, S-plus, Matlab, e (Score:3, Informative)

## Comparison to octave? (Score:3, Interesting)

Does anyone have any insight on how this differs from octave [octave.org]?

This is the first I've heard of R, but I've tried using octave a few times. It seems to be a sort of enhanced gnuplot. I was thinking about using it for a project I'm working on, though I may just stick with good 'ol C for performance.

Do any of these projects work well with sparse matricies? I'm interested in using them to run a pagerank [wikipedia.org]-like computation, but not if they use n^2 memory.

-jim

## Re:Comparison to octave? (Score:2, Informative)

Octave is basically an open source version of Matlab. This R looks similar just with a different programming language and different libraries.

It looks like it's probably more powerful but I don't know since I haven't used R.

## Re:Comparison to octave? (Score:2)

However, the C and FORTRAN bindings in R are excellent. So, if you're doing statistics on the stuff you find, you might want to look at doing the high-performan

## Minitab? (Score:1)

## Re:Minitab? (Score:1)