DESCRIPTION OF THE CANCER DATA FILE The Prostate Cancer clinical trial data of Byar and Green (1980) is reproduced in Andrews and Herzberg (1985, pp 261-247). It is also available in Statlib at the URL http://lib.stat.cmu.edu/datasets/Andrews/T46.1 Some transformations and deletions have been made to get the version of this data set in Cancer.dat, these are described below. This data set was obtained from a randomized clinical trial comparing four treatments for 506 patients with prostatic cancer. These patients had been grouped by physicians using clinical criteria into Stage 3 and Stage 4 of the disease. This classification has not been used by Multimix.for to group the data, but it is useful to compare groupings found by Multimix or any other clustering program with it. It could also be used as the basis for a discriminant analysis program. There are twelve pre-trial covariates measured on each patient, seven may be taken to be continuous, four to be discrete, and one variable (Index of tumour stage and histolic grade) is an index nearly all of whose values lie between 7 and 15, and which could be considered either discrete or continuous. In order, the covariates are: Age, Weight, Performance rating, Cardiovascular disease history, Systolic Blood pressure, Diastolic blood pressure, Electrocardiogram code, Serum haemoglobin, Size of primary tumour, Index of tumour stage and histolic grade, Serum prostatic acid phosphatase. Continuous covariates: Age, Weight, Systolic Blood pressure, Diastolic blood pressure, Serum haemoglobin, Size of primary tumour, Index of tumour stage and histolic grade, Serum prostatic acid phosphatase Categorical covariates (Number of Levels): Performance rating (4), Cardiovascular disease history (2), Electrocardiogram code (7), Bone metastases (2) A preliminary inspection of the data showed that the size of the primary tumour (SZ) and serum prostatic acid phosphatase (AP) were both skewed variables. These variables have therefore been transformed. A square root transformation was used for SZ, and a logarithmic transformation was used for AP to achieve approximate normality. Observations that had missing values in any of the twelve pretreatment covariates were omitted from further analysis, leaving 475 out of the original 506 observations available. When the program Missing.for becomes available a useful exercise will be to re-estimate the parameters and group assignments using all 506 observations (not forgetting to transform AP and SZ). The parameter estimates based on the 475 complete observations may be used as initial parameter estimates for the new iteration. Missing.for is slower to execute than Multimix.for, so this will be the usual way in which it is used.