Isolation Forest Concept and Pseudocode
Anomaly Detection
Isolation Forest (IF) is a unsupervised learning algorithm for identiying outilers. It’s similar to Random Forest, i.e, are build based on decision trees, but there are no pre-defined labels.
In an Isolation Forest, randomly chosen subsets of data are processed in a tree structure using randomly selected features. Instances that delve deeper into the tree are less likely to be anomalies, as they required more cuts to be isolated. Conversely, samples landing in shorter branches signal anomalies, as the tree found it easier to separate them from other observations.
Suppose that, we have the following bidimensional data:
| x | y |
|-------|-------|
| 1 | 5 |
| 3 | 8 |
| 2 | 6 |
| 10 | 12 |
| 8 | 10 |
| 15 | 20 |
| 11 | 14 |
| 14 | 18 |
| 13 | 16 |
We should to detect the anomaly using this data. For that, we can build a decision tree: