Incremental decision tree

Incremental decision tree algorithms are online machine learning methods that update decision trees by incorporating new data instances without reprocessing existing ones. This approach is useful when the dataset isn't stored, is too large, or changes over time. Applications include online learning, data streams, concept drift scenarios, hierarchical data modeling, and systems requiring interpretable outputs.

The CART family includes incremental CART (1989), which builds on the non-incremental CART algorithm for classification and regression. The ID3/C4.5 family encompasses several methods: ID3' updates trees by rebuilding them after each new instance; ID4 incrementally incorporates data but discards subtrees, limiting learnable concepts; ID5 avoids subtree removal but doesn't guarantee identical results as batch processing; ID5R ensures consistency with batch outputs and handles recursive updates; ITI (1997) is efficient, accommodates numeric variables, multiclass tasks, and missing values, producing consistent trees regardless of data order or mode.

Other notable systems include VFDT (2000), which subsamples data for efficiency, and CVFDT (2001), adapting to concept drift via sliding windows. VFDTc extends this with continuous data handling and Naive Bayes integration. OLIN and IOLIN are based on Info-Fuzzy Networks, while GAENARI is another incremental decision tree implementation.

These algorithms balance learning cost and quality through factors like update effort, convergence speed, total effort, and knowledge consistency.

Incremental decision tree

Navigation menu

Search