Random forests explained intuitively

One quick example, I use very frequently to explain the working of random forests is the way a company has multiple rounds of interview to hire a candidate..The reason we have a panel of interviews is that we assume a committee of people generally takes better decision than a single individual..I hope you get the gist.If you have heard about the decision tree, then you are not very far from understanding what random forests are..Random forest is a collection of many decision trees..Instead of relying on a single decision tree, you build many decision trees say 100 of them..And you know what a collection of trees is called — a forest..So you now understand why is it called a forest.Why is it called random then?Say our dataset has 1,000 rows and 30 columns.There are two levels of randomness in this algorithm:At row level: Each of these decision trees gets a random sample of the training data (say 10%) i.e..Let us try to understand other aspects of this algorithm.When is a random forest a poor choice relative to other algorithms?Random forests don’t train well on smaller datasets as it fails to pick on the pattern..You get variable importance but this may not suffice in many analysis of interests where the objective might be to see the relationship between response and the independent features.The time taken to train random forests may sometimes be too huge as you train multiple decision trees..Unlike linear regression, decision trees and hence random forest can’t take values outside the training data.What are the advantages of using random forest?Since we are using multiple decision trees, the bias remains the same as that of a single decision tree.. More details

Leave a Reply