Isolation Forest Concept and Pseudocode

Anomaly Detection

Patrizia Castagno
3 min readFeb 1, 2024

Isolation Forest (IF) is a unsupervised learning algorithm for identiying outilers. It’s similar to Random Forest, i.e, are build based on decision trees, but there are no pre-defined labels.

Photo by Amber Kipp on Unsplash

In an Isolation Forest, randomly chosen subsets of data are processed in a tree structure using randomly selected features. Instances that delve deeper into the tree are less likely to be anomalies, as they required more cuts to be isolated. Conversely, samples landing in shorter branches signal anomalies, as the tree found it easier to separate them from other observations.

Suppose that, we have the following bidimensional data:

                            |   x   |   y   |
|-------|-------|
| 1 | 5 |
| 3 | 8 |
| 2 | 6 |
| 10 | 12 |
| 8 | 10 |
| 15 | 20 |
| 11 | 14 |
| 14 | 18 |
| 13 | 16 |

We should to detect the anomaly using this data. For that, we can build a decision tree:

--

--

Patrizia Castagno

Physics and Data Science.Eagerly share insights and learn collaboratively in this growth-focused space.LinkedIn:www.linkedin.com/in/patrizia-castagno-diserafino