Dan Steinberg's Blog
On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote

CART Tree Cloning: FORCING Identical Structure Across Different Targets

CART 7.0, an integrel part of Salford Systems Predictive Modeler SPM 7.0, offers a new feature to essentially clone a tree or subtree structure and impose it on any target variable you choose. This means, for example, that you can grow a CART tree on dependent variable Y1, optionally prune the tree judgmentally, and then extract the entire sructure of the tree and force it onto a new target variable Y2. The second tree will exhibit the identical structure to the first, but it will be "about" Y2 rather than Y1.

Of course, CART users have always had the option of growing a tree on target variable Y1 to create a segmentation of data. Predictions for any variable at all can then be made segment by segment by simply noting the mean values of other variables, such as, for example, Y2, and these could be taken as CART predictions for the other variables. So what are the advantages of the new tree cloning feature?

First, when the tree structure extracted from a model on Y1 is imposed on Y2, we actually get a new CART tree detailing results for the new variable Y2 with complete node detail for every node. Thus, we can review competitors and surrogates for every node. In addition, the user has the option of either imposing the original surrogates as well, or allowing CART to search for and leverage new surrogates specific to Y2. The new CART tree will also truncate branches that cannot be evolved using the Y1 set of splitters, meaning that the tree is properly adapted as closely as possible to the new target Y2.

Second, tree cloning allows the modeler to take the structure of a regression tree and apply it to a classification problem. This is important when using a two–stage approach to creating a single tree to apply to a collection of target variables (the vector target). The methodology follows these steps for a collection of binary targets:

Use SVD or factor analysis to find a the best linear combination of the targets, e.g, the first principal vector, say V1.

Use CART 7.0 to develop a regression tree to model V1.

Clone the tree structure of optimal or preferred tree for V1.

Impose the cloned structure on each original target, modeled as a binary classification.

The performance of these constrained trees can be compared to unconstrained models, both individually and as a group. If the cloned structure performs well enough it can function as a single model designed to output several (and possibly many) predictions.


Tags: CART, Blog, Tree Cloning