Learn how to crunch big data with R

Get started using the open-source R programming language to do statistical computing and graphics on large data sets

1 2 Page 2
Page 2 of 2
shiny code fig 12

Figure 12. The faithful$eruptions data used in this example is from the Old Faithful geyser data built into the R Datasets package.

The power of R

As we’ve seen, R is a useful tool for data scientists and statisticians, and its somewhat nonstandard scripting language will be of interest to programmers who might otherwise resort to Python (with NumPy, Pandas, and StatsModels); SQL (for data held in a database); or SAS (and its GUI derivative, JMP) for their data analysis. Compared to Excel, R has considerably more statistical and graphing power, especially if you add packages for your particular needs, and it's much more auditable. It’s far easier to validate an R script than a spreadsheet full of formulas.

With the addition of RStudio as an IDE, developing R applications can be quite productive. RStudio Server allows companies to take advantage of the huge RAM and many processors available in big server hardware, Shiny turns R into a Web application server, and R Markdown allows you to use R for reports.

On the other hand, the great power of R and the large number of R packages available can make for a fairly intimidating learning curve. It helps a lot to have some statistics background when learning and using R, but that’s true for all data science. As can be said for any other programming language with many libraries available, your best strategy for learning R is to take it one step at a time.

This story, "Learn how to crunch big data with R" was originally published by InfoWorld.

Copyright © 2015 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon