# Random Forest in R

Random forests were formally introduced by Breiman in 2001. Due to his excellent performance and simple application, random forests are getting a more and more popular modeling strategy in many different research areas.

Random forests are suitable in many different modeling cases, such as classification, regression, survival time analysis, multivariate classification and regression, multilabel classification and quantile regression.

An overview of existing random forest implementations and their speed performance can be found in the ranger documentation, altough this list is not exhaustive and many new implementations are comming up. The performances of models build with different packages slightly differ, depending on how the random forest algorithm was implemented.

Now I will present some random forest implementations in R. A good site to find all R packages to one specific topic is Metacran.

A statistic about the RStudio downloads of different R-packages for executing the random forest
can be seen in the table. It was created with help of the R-package **cranlogs**.
**randomForest** is clearly the most used package in R, probably because it was the first available already in April 2002.

package |
RStudio downloads in the last month |
---|---|

1: randomForest | 28671 |

2: party | 13512 |

3: randomForestSRC | 2134 |

4: ranger | 1405 |

5: Rborist | 341 |

6: bigrf | 13 |

So what package to use? Of course it depends on the statistical problem.

In the classical classification or regression case you have many options. For big datasets the packages
**ranger** or **Rborist** should be used, because they are much faster or **randomForest.ddR**, an extension of
randomForest. Wright (the author of **ranger**) recommends to use **Rborist** for
low dimensional data with large sample sizes (n>25,000) and **ranger** in all other cases.
The core of **ranger** is written with help of the R-package **Rcpp** and it generally produces the same results as **randomForest**.

## Multivariate Classification and Regression

Multivariate classifications and regressions with random forests can be modelled with **randomForestSRC**.
The multivariate classification case in **randomForestSRC** is used in **mlr** package to perform multilabel classifications with random forests, see the mlr tutorial for more information.
Moreover in the **randomForestSRC** package many hyperparameters can be set individually like for example the splitting
rule.

## Survival Analysis

If you want a random forest for survival analysis, **ranger** or **randomForestSRC** can be used.
**party** contains the function **cforest** which implements the random forest and bagging ensemble algorithms utilizing
conditional inference trees as base learners.

## Quantile Regression

For quantile regression you can use the package **quantregForest**, which is
based on the **randomForest** package. This implementation could also be used for estimating conditional
densities and conditional probability distributions.

## More

Many more packages exist with new algorithms based on random forests (**RRF**, **roughrf**, **icRSF** (for survival), **wsrf**, **iRafnet**, **randomUniformForest**, **fuzzyforest**), possibilities for variable selection (**varSelRF**, **VSURF**, **RFgroove**, **AUCRF**), visualisations (**ggRandomForests**, **forestFloor**) or imputation (**missForest**, **imputeMissings**) with random forests. For binary data, **LogicForest** is a forest of logic regression trees.
**REEMtree** is useful for longitudinal studies where random effects exist.

These implementations can be found easily by a quick search on Metacran.