Non-GUI (aka command line) SPM includes a number of command line options, offering the user some additional flexibility. These can be displayed by launching SPM with the -h flag, like so:
Dan Steinberg's Blog
Users of cross validation (CV) in CART, MARS, and TreeNet have become accustomed to simply requesting this testing method when setting up a predictive model and allowing the software to take care of the details. Of course, the Salford software prepares the data automatically and uses stratified sampling to randomly assign each record to a CV bin. The user has no influence and no control over how the bins are managed.
This is a very short post but it could be very useful for the expert user. As you know, Salford data mining and predictive modeling tools produce both pretty GUI (Graphical User Interface) displays of the results and also classical, plain text output. The plain text shows up in the “Classic Output” window and it can occasionally be very convenient to issue commands directly from this window. But you need to get to the bottom of the window and activate the command prompt. To do this instantly no matter where your cursor is positioned in the Classic Output window simply click (maybe repeatedly) on the “Command Line” icon on the toolbar; as soon as you see the command prompt displayed at the bottom of the window, you can type commands directly.
Many have asked if RandomForests (RF) supports regression analysis.
The short answer is: not with the current implementation. Salford Systems plans to support RF regression in our next release.
That said, if you have been thinking about RF regression we urge you to consider using TreeNet regression instead. Some reasons follow:
CART users often ask where they can find the value of the R-Squared for their regression trees. The answer is simple. In conventional statistics,
R-Squared = 1 - SSE/SST, (1)
where SSE is the sum of squared errors of the actual data around the model predictions, and SST, the total sum of squares, is the sum of squared deviations of the dependent variable around its mean. In traditional statistics R-Squared is always calculated using the training data (LEARN SET). CART users can read the R-Squared directly from the output:
CART (Classification and Regression Trees) was originally developed by Breiman, Friedman, Olshen, and Stone to construct data-driven solutions to a predictive modeling problem. The essence of the technology is recursive partitioning, where the original dataset is progressively split into mutually exclusive regions using a series of binary splits. The resulting solution is presented in the form of a binary tree with key variables shown at each node in a tree.
We are often asked “Which analytical technology is best for my problem?” This topic not only comes up in practical, day-to-day modeling, but has also been the subject of a few (largely disappointing) academic studies. The short answer we usually give is that for many modeling problems it doesn’t require much time to run several different analyses, so why not rely on experimentation rather than some rule of thumb? If one method stands out for any reason, such as accuracy or intuitive attractiveness of the model, then you have your answer.
Did you know you can easily build a family of CART models with the BATTERY feature? It's true! BATTERY is one of the most powerful aspects of the Salford Predictive Modeling Suite (SPM). For instance, suppose you wish to consider how the size of your CART tree affects the tree's predictive accuracy. You might build a series of individual trees yourself, or you can let BATTERY do it for you. Four batteries -- ATOM, MINCHILD, DEPTH and NODES -- work in similar ways by varying the allowable size of the atom, minchild, tree depth and the number of nodes permitted in the maximal tree. These controls constrain how large your CART tree is permitted to grow. Because they are tree-oriented controls, they work with TreeNet and RandomForests models too. For example, by issuing just the following simple series of commands you will find yourself with eight CART trees, which you can easily compare against one another to find a tradeoff between predictive accuracy and tree complexity that works best for you: