Dan Steinberg's Blog
On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote

SPM Handling of Categorical Variables

When a variable is declared categorical in SPM, each value in the dataset is treated as a separate category. This can cause confusion if some of the values of a numeric categorical variable are very close together. Consider, for example, a variable Y with values 0, .9996, 9998, .9999, and 1.0. If declared categorical, and the default display precision of three decimal places is used, the target frequency table might look something like this:

Variable: Y
N Classes: 5
Data Value                   N      %     Wgt Count      %
----------------------------------------------------------
0                        60243  58.81         60243  58.81
1                            1   0.00             1   0.00
1                            1   0.00             1   0.00
1                         1576   1.54          1576   1.54
1                        40621  39.65         40621  39.65
----------------------------------------------------------
Total                   102442               102442

If, as is likely, you were expecting two classes, instead of five, this will be highly disconcerting and can cause major problems, as SPM will treat each value separately. If what you really want are two values, this can be corrected by rounding the values to the nearest integet. This can be done in Salford BASIC with the following command:

%let y=int(y+.5)

After the transformation, the table will look like this:

Variable: Y
N Classes: 2
Data Value                   N      %     Wgt Count      %
----------------------------------------------------------
0                        60243  58.81         60243  58.81
1                        42199  41.19         42199  41.19
----------------------------------------------------------
Total                   102442               102442

Read FAQs for more insight!

Watch Introductory Videos on SPM

[J#13:1603]

Tags: Blog, SPM