What is Random Forest Algorithm?

Byte Size Data Science

Apr 24, 2025

Definition

Random Forest is one of the most popular bagging algorithms that uses decision trees as its base learners. What distinguishes it from other bagging algorithms is its introduction of feature randomness alongside bootstrap sampling. When building each tree, the algorithm not only bootstraps training data but also considers a random subset of features at each split. This allows creating highly diverse trees, each capturing different aspects of the data. Additionally, Random Forest provides powerful capabilities including feature importance measurement, outlier detection, and handling missing data, making it one of the most versatile ensemble methods available.

Implementation

To build a random forest classifier we can import RandomForestClassifier and adjust the model setting using:

n_estimators: the number of decision trees in the random forest
oob_score: if out-of-bag samples are used to measure the model accuracy
max_depth: set the maximum depth of each decision tree classifier. Limiting max_depth can prevent overfitting.

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(
	n_estimators=20, 
	max_depth=20
	oob_score=True
)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)

Random Forest Classifier produces useful attributes that supports model interpretation, such as feature_importances_ computes the importance of input features and oob_score_ evaluates the model accuracy on out-of-bag samples.

Data Design

Discussion about this post

Ready for more?