Update on Random Forest Package Downloads
I just updated the code from a previous post where I analysed the download statistics of different random forest packages in R, see the code at the bottom of the article. I calculated the number of cran downloads in march 2016 and march 2017.
Standard random forest
The number of download of different packages containing the random forest algorithm in march 2016 and in march 2017:
package | –march 2016 | –march 2017 | –ratio of 2017/2016 |
---|---|---|---|
randomForest | 29360 | 55415 | 1.89 |
xgboost | 4929 | 12629 | 2.56 |
randomForestSRC | 2559 | 6106 | 2.39 |
ranger | 1482 | 5622 | 3.79 |
Rborist | 298 | 441 | 1.48 |
What is clearly visible is the general increase in downloads of all packages that contain the standard random forest. The biggest gain in popularity could achieve the ranger package that allows to run the random forest in parallel on a simple machine. xgboost and randomForestSRC got a bigger increase than the standard randomForest package.
Random forest for big datasets
The results of the random forest packages for big datasets:
package | –march 2016 | –march 2017 | –ratio of 2017/2016 |
---|---|---|---|
h2o | 3719 | 9666 | 2.60 |
ParallelForest | 281 | 279 | 0.99 |
bigrf | 10 | 3 | 0.30 |
h2o could achieve the biggest increase in cran downloads. The other two packages remain rather unknown to most R users.
Packages with algorithms similar to random forest
Lastly the results for the other R packages that do not contain the standard random forest package but a similar version:
package | –march 2016 | –march 2017 | –ratio of 2017/2016 |
---|---|---|---|
rpart | 22769 | 30552 | 1.34 |
party | 15423 | 32888 | 2.13 |
extraTrees | 1446 | 1112 | 0.77 |
RRF | 525 | 1153 | 2.20 |
rFerns | 450 | 488 | 1.08 |
rotationForest | 407 | 342 | 0.84 |
obliqueRF | 267 | 255 | 0.96 |
wsrf | 248 | 435 | 1.75 |
randomUniformForest | 198 | 179 | 0.90 |
trimTrees | 148 | 96 | 0.65 |
roughrf | 137 | 94 | 0.69 |
party and RRF could gain the maximum increase in downloads. The package rpart for simply constructing one tree remains with high download statistics but cannot gain a much bigger increase. extraTrees dropped a bit in the download statistic and does not gain so much attendance as last year.
Conclusion
ranger, xgboost and party are the biggest winners in the last year. Generally the downloads have approximately doubled - which indicates that random forest is getting more and more popularity in the R community.
R-code for obtaining the tables:
library(cranlogs)
library(data.table)
library(knitr)
downloads = cran_downloads(packages = c("randomForest", "xgboost", "randomForestSRC", "ranger", "Rborist"), from = "2016-03-01", to = "2016-03-31" )
downloads = data.table(downloads)
downloads = downloads[, sum(count), by = "package"]
downloads_new = cran_downloads(packages = c("randomForest", "xgboost", "randomForestSRC", "ranger", "Rborist"), from = "2017-03-01", to = "2017-03-31" )
downloads_new = data.table(downloads_new)
downloads_new = downloads_new[, sum(count), by = "package"]
downloads = cbind(downloads, downloads_new$V1, round(downloads_new$V1/downloads$V1, 2))
colnames(downloads) = c("**package**", "**--march 2016**", "**--march 2017**", "**--ratio of 2017/2016**")
kable(downloads, format = "markdown", padding = 2)
barplot(unlist(downloads[,2]), names.arg = unlist(downloads[,1]), col = "blue")
downloads = cran_downloads(packages = c("h2o","ParallelForest", "bigrf"), from = "2016-03-01", to = "2016-03-31" )
downloads = data.table(downloads)
downloads = downloads[,sum(count), by = "package"]
downloads_new = cran_downloads(packages = c("h2o","ParallelForest", "bigrf"), from = "2017-03-01", to = "2017-03-31" )
downloads_new = data.table(downloads_new)
downloads_new = downloads_new[, sum(count), by = "package"]
downloads = cbind(downloads, downloads_new$V1, round(downloads_new$V1/downloads$V1, 2))
colnames(downloads) = c("**package**", "**--march 2016**", "**--march 2017**", "**--ratio of 2017/2016**")
kable(downloads, format = "markdown", padding = 2)
downloads = cran_downloads(packages = c("rpart", "RRF", "obliqueRF", "rotationForest",
"rFerns", "randomUniformForest", "wsrf", "roughrf", "trimTrees", "extraTrees", "party" ), from = "2016-03-01", to = "2016-03-31" )
downloads = data.table(downloads)
downloads = downloads[,sum(count), by = "package"]
ordering = order(downloads$V1, decreasing = T)
downloads = downloads[ordering,]
downloads_new = cran_downloads(packages = c("rpart", "RRF", "obliqueRF", "rotationForest",
"rFerns", "randomUniformForest", "wsrf", "roughrf", "trimTrees", "extraTrees", "party" ), from = "2017-03-01", to = "2017-03-31" )
downloads_new = data.table(downloads_new)
downloads_new = downloads_new[, sum(count), by = "package"]
downloads_new = downloads_new[ordering,]
downloads = cbind(downloads, downloads_new$V1, round(downloads_new$V1/downloads$V1, 2))
colnames(downloads) = c("**package**", "**--march 2016**", "**--march 2017**", "**--ratio of 2017/2016**")
kable(downloads, format = "markdown", padding = 2)