Cancerous Cells Classification - Neural Network

neural network

classification

pca

ctree

rocr

splom

Malignant cells classification across digitized images of fine needle aspirate (FNA) of a breast mass 3-dimensional fragmentated samples

Author

Oscar Cardec

Published

June 21, 2020

Introduction

Breast cancer is “the most common cause of cancer deaths among women worldwide”. In the United States, breast cancer is second to lung cancer related deaths, making it a national health critical issue. Statistical facts show breast cancer as the most frequently diagnosed cancer in women in 140 out of 184 countries. Key to survival and remission of breast cancer is closely linked with early detection and intervention.(Henderson 2015).

Early signs of irregular cells growth are detected by sampling and analyzing nuclear changes and parameter using diagnostic tools. The results of these nuclear morphometry tests are evaluated for structural deviations, which are representative of cancer diagnosis.(Narasimha, Vasavi, and Harendra Kumar 2013) Now, considering the significance in accuracy of these evaluations, one may question how the medical industry uses machine learning models like neural networks to augment diagnosis and judgement of such vital medical assessments.

The following is post-study report of numerous breast cancer preventive screenings, scrutinizing cell nuclei parameters in order to classify the specimens as either malignant or benign. The data have been previously categorized, thus, the intent here is to employ a neural network methodology to replicate this categorization and measure the algorithm’s effectiveness supporting medical professionals in the identification and early detection of breast carcinoma.

Data

The employed data, Breast Cancer Wisconsin - Diagnostic,comes from the UCI Machine Learning Repository. The multivariate set contains 569 instances and 32 attributes as described bellow. As previously stated, the data set features quantitative observations representative of the images obtained by means of a fine needle aspirate (FNA). These digitized samples were studied, measured and recorded, ultimately enabling the classification of every instance as malignant or benign. Essentially, there is a total of 10 real-value features per cell, however, given the 3-dimensional fragmentation of each cell sampling, it produces a total of 30 observations across 3 planes.(William Wolberg 1993)

Variables list across each plane
- 1 - ID Name :: Identification number of the sample
- 2 - Diagnosis :: Dependable variable label. M = Malignant, B = Benignant
- 3 - Feature_1 Radius :: Mean of distances from center to points on the perimeter
- 4 - Feature_2 Texture :: Standard deviation of gray-scale values
- 5 - Feature_3 Perimeter ::
- 6 - Feature_4 Area ::
- 7 - Feature_5 Smoothness :: Local variation in radius lengths
- 8 - Feature_6 Compactness :: Perimeter^2 / Area - 1.0
- 9 - Feature_7 Concavity :: Severity of concave portions of the contour
- 10 - Feature_8 Concave Points :: Number of concave portions of the contour
- 11 - Feature_9 Symmetry ::
- 12 - Feature_10 Fractal Dimension :: Coastline approximation - 1

Exploratory Data Analysis

Basic descriptive statistics of the data set.

# sample of statistical summary - Plane 1
describe(df[1:10], interp = TRUE, ranges = FALSE)

           vars   n   mean     sd skew kurtosis    se
Diagnosis*    1 569   1.37   0.48 0.53    -1.73  0.02
Feature_1     2 569  14.13   3.52 0.94     0.81  0.15
Feature_2     3 569  19.29   4.30 0.65     0.73  0.18
Feature_3     4 569  91.97  24.30 0.99     0.94  1.02
Feature_4     5 569 654.89 351.91 1.64     3.59 14.75
Feature_5     6 569   0.10   0.01 0.45     0.82  0.00
Feature_6     7 569   0.10   0.05 1.18     1.61  0.00
Feature_7     8 569   0.09   0.08 1.39     1.95  0.00
Feature_8     9 569   0.05   0.04 1.17     1.03  0.00
Feature_9    10 569   0.18   0.03 0.72     1.25  0.00

Density exploration of diagnosis across four specific variables: radius, area, texture, and concave

# Explore density 
g1 <- ggplot(data = df)+
  theme_minimal()+
  geom_density(mapping = aes(Feature_1,  fill = Diagnosis), col="darkgrey", show.legend = FALSE, alpha=0.5)+ 
  labs(title = "Density of Diagnosis per Attribute | Red = Benignant, Blue = Malignant", y = " ", x = "Radius")

g2 <- ggplot(data = df)+
  theme_minimal()+
  geom_density(mapping = aes(Feature_2, fill = Diagnosis), col="darkgrey", show.legend = FALSE, alpha=0.5)+ 
  labs(title = "", y = " ", x = "Texture")

g3 <- ggplot(data = df)+
  theme_minimal()+
  geom_density(mapping = aes(Feature_4, fill = Diagnosis), col="darkgrey", show.legend = FALSE, alpha=0.5)+ 
  labs(title = "", y = " ", x = "Area")

g4 <- ggplot(data = df)+
  theme_minimal()+
  geom_density(mapping = aes(Feature_7, fill = Diagnosis), col="darkgrey", show.legend = FALSE, alpha=0.5)+ 
  labs(title = "", y = " ", x = "Concavity")

grid.arrange(arrangeGrob(g1,  g2,  g3, g4),  nrow=1)

Key insight, these density plots are useful alternatives illustrating continuous data point. The selected variables are just examples of what can swiftly be illustrated to identify potential relationships between the feature and dependent variable.

Pre-Visualization of Data

Here I use a conditional inference tree to estimate relationships across the data and how its recursively partitioned by the algorithm criteria. The main intent here is to have an idea on what to expect regarding the classification of this data set.

# mutate diagnosis character to numeric
df <- df |> 
  mutate(Diagnosis = ifelse(Diagnosis == "M", 1, 0)) 

# CTREE model and plot
model <- ctree(Diagnosis ~., data = df)
plot(model, type="extended", ep_args = list(justmin=8), 
     main="Breast Cancer | Preliminary Analysis",
     drop_terminal=FALSE, tnex=1.5, 
     gp = gpar(fontsize = 12, col="darkblue"),
     inner_panel = node_inner(model, fill=c("white","green"), pval=TRUE), 
     terminal_panel=node_barplot(model, fill=rev(c("darkred","lightgrey")), beside=TRUE, ymax=1.0, 
                                 just = c(0.95,0.5), ylines=TRUE, widths = 1.0, gap=0.05, 
                                 reverse=FALSE, id=TRUE)
     )

Scatterplot of Matrix (SPLOM)

This scatterplox matrix portraits immediate correlation between the included variables and possible multicollinearity among these. The selected features were limited to the first plane only (items 1 to 10). The other two planes include similar features. My immediate take was to consider a method for dimensionality reduction even when considering a neural network classification model.

# Plot Plane I
clrs <- c("darkred", "lightgrey")

pairs(df[1:11], fill=clrs, main = "Plane I - Matrix of Scatterplots", 
      cex.main= 2.0, cex.labels = 1.0, lower.panel = NULL, pch = 21, 
      col="grey", bg = clrs [unclass(df$Diagnosis)])

par (xpd = TRUE)

legend (0.10, 0.01, horiz = TRUE, as.vector(unique(df$Diagnosis)), fill=clrs, bty = "n")

Dimensionality Reduction

As defined, the purpose of dimensionality reduction is to find a method that can represents a given data set using a smaller number of features but still containing the original data’s properties. I know there are different methods to accomplish this, case in point, LDA, PCa, t-SNE, K-NN, UMAP, etc., but I ultimately decided to use PCA for feature extraction. The following steps illustrate how the method identifies eigenvector of largest eigenvalues of across the covariance matrix. As a result, I create a sub-set of the data using these variables with maximum influence on variance.

# pca of original data
res.pca <- PCA(df, scale.unit = TRUE, graph = FALSE, ncp = 4)
eig.val <- get_eigenvalue(res.pca)
var <- get_pca_var(res.pca)

# Color by cos2 values: quality on the factor map
fviz_pca_var(res.pca, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE 
             )

Sub-selected Features

Here’s my source code to create a subset based on the PCA, followed by conditioning the target variable as a factor for down-sampling purposes. The down-sampling approach was ensure an equally number of target outcomes and avoiding any model disposition towards one way or the other. Completed the down-sampling, I scale and centered the data to maximize model performance.

# selection of principal components 
pdf <- df |> 
  select(Diagnosis, Feature_1, Feature_2, Feature_3, Feature_4, Feature_6, Feature_7, Feature_8,
         Feature_10, Feature_11, Feature_13, Feature_14, Feature_16, Feature_18, Feature_20,
         Feature_21, Feature_23, Feature_24, Feature_26, Feature_27, Feature_28,
         Feature_30
         )

# mutating as a factor for downsampling 
pdf <- pdf |> 
  dplyr::mutate(Diagnosis = as.factor(Diagnosis))

# class definition
target <- "Diagnosis"

# downsampling
set.seed(12345)
downsampled_df <- downSample(x = pdf[, colnames(pdf) != target], y = pdf[[target]])
downsampled_df <- cbind(downsampled_df, downsampled_df$Class)
colnames(downsampled_df)[ncol(downsampled_df)] <- target

# subset, mutate, and scale
df2 <- pdf 
df2 <- df2 |> 
  mutate(Diagnosis = as.character(Diagnosis),
         Diagnosis = as.numeric(Diagnosis))
df2[, -1] <- scale(df2[, -1])

Final summary statistics across the data set.

Data summary
Name	df2
Number of rows	569
Number of columns	22
_______________________
Column type frequency:
numeric	22
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Diagnosis	1	0.37	0.48	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▅
Feature_1	1	0.00	1.00	-2.03	-0.69	-0.21	0.47	3.97	▂▇▃▁▁
Feature_2	1	0.00	1.00	-2.23	-0.73	-0.10	0.58	4.65	▃▇▃▁▁
Feature_3	1	0.00	1.00	-1.98	-0.69	-0.24	0.50	3.97	▃▇▃▁▁
Feature_4	1	0.00	1.00	-1.45	-0.67	-0.29	0.36	5.25	▇▃▂▁▁
Feature_6	1	0.00	1.00	-1.61	-0.75	-0.22	0.49	4.56	▇▇▂▁▁
Feature_7	1	0.00	1.00	-1.11	-0.74	-0.34	0.53	4.24	▇▃▂▁▁
Feature_8	1	0.00	1.00	-1.26	-0.74	-0.40	0.65	3.92	▇▃▂▁▁
Feature_10	1	0.00	1.00	-1.82	-0.72	-0.18	0.47	4.91	▆▇▂▁▁
Feature_11	1	0.00	1.00	-1.06	-0.62	-0.29	0.27	8.90	▇▁▁▁▁
Feature_13	1	0.00	1.00	-1.04	-0.62	-0.29	0.24	9.45	▇▁▁▁▁
Feature_14	1	0.00	1.00	-0.74	-0.49	-0.35	0.11	11.03	▇▁▁▁▁
Feature_16	1	0.00	1.00	-1.30	-0.69	-0.28	0.39	6.14	▇▃▁▁▁
Feature_18	1	0.00	1.00	-1.91	-0.67	-0.14	0.47	6.64	▇▇▁▁▁
Feature_20	1	0.00	1.00	-1.10	-0.58	-0.23	0.29	9.84	▇▁▁▁▁
Feature_21	1	0.00	1.00	-1.73	-0.67	-0.27	0.52	4.09	▆▇▃▁▁
Feature_23	1	0.00	1.00	-1.69	-0.69	-0.29	0.54	4.28	▇▇▃▁▁
Feature_24	1	0.00	1.00	-1.22	-0.64	-0.34	0.36	5.92	▇▂▁▁▁
Feature_26	1	0.00	1.00	-1.44	-0.68	-0.27	0.54	5.11	▇▅▁▁▁
Feature_27	1	0.00	1.00	-1.30	-0.76	-0.22	0.53	4.70	▇▅▂▁▁
Feature_28	1	0.00	1.00	-1.74	-0.76	-0.22	0.71	2.68	▅▇▅▃▁
Feature_30	1	0.00	1.00	-1.60	-0.69	-0.22	0.45	6.84	▇▃▁▁▁

Visualization of variables as defined by the PCA across the first and second dimensions.

# pca of subset data
res.pca <- PCA(df2, scale.unit = FALSE, graph = FALSE, ncp =4)
eig.val <- get_eigenvalue(res.pca)
var <- get_pca_var(res.pca)

# Color by cos2 values: quality on the factor map
fviz_pca_var(res.pca, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE 
             )

Data Partitioning

set.seed(12345)

# splitting  
ind <- sample(1:3, nrow(df2), replace=TRUE, prob=c(0.50, 0.25, 0.25))
trainData <- df2[ind == 1, ]
testData <- df2[ind == 2, ]
xvalData <- df2[ind == 3, ]

Model Training and Visualization

Training of the neural network using the R library of neuralnet. This poweful algorithm “is based on the resilient backpropagation without weight backtracking and additionally modifies one learning rate, either the learningrate associated with the smallest absolute gradient (sag) or the smallest learningrate (slr) itself. The learning rates in the grprop algorithm are limited to the boundaries defined in learningrate.limit.”(Fritsch, Guenther, and Wright 2019)

# neuralnet 
nn <- neuralnet(Diagnosis ~., data = trainData, hidden = c(3),
                lifesign = "minimal", linear.output = FALSE, likelihood=TRUE
                # act.fct = "tanh", err.fct = "sse"
                )

# plot model
plot(nn, radius = 0.03, arrow.length = 0.16, intercept = TRUE,
     intercept.factor = 0.2, information = TRUE, information.pos = 8,
     col.entry.synapse = "black", col.entry = "maroon4", line_stag= 0.1,
     col.hidden = "darkblue", col.hidden.synapse = "dimgrey",
     col.out = "green", col.out.synapse = "blue",
     col.intercept = "red", fontsize = 9, dimension = 2,
     show.weights = TRUE, rep = "best")

Model’s Evaluation

Observation & Notes | Section in-progress …

Approach
- Training
- Testing
- Cross-validation

# model evaluation
mypredict <- neuralnet::compute(nn, nn$covariate)$net.result
mypredict <- apply(mypredict, c(1), round)
# confusion matrix - training set
print(table(mypredict[1:length(trainData$Diagnosis)], trainData$Diagnosis, dnn =c("Actual","Predicted")))

      Predicted
Actual   0   1
     0 158   0
     1   0  97

# accuracy(trainData$Diagnosis, mypredict[1:length(trainData$Diagnosis)]) #cross-entropy

# model evaluation
testPred <- neuralnet::compute(nn, testData[,1:22])$net.result
testPred <- apply(testPred, c(1), round)
# confusion matrix - test set
print(table(testPred[1:length(testData$Diagnosis)], testData$Diagnosis, dnn =c("Actual", "Predicted")))

      Predicted
Actual  0  1
     0 97  5
     1  2 56

# accuracy(testPred[1:length(testData$Diagnosis)], testData$Diagnosis) #cross-entropy

# model evaluation
xvalPred <- neuralnet::compute(nn, xvalData[,1:22])$net.result
xvalPred <- apply(xvalPred, c(1), round)
# confusion matrix - xval set
print(table(xvalPred[1:length(xvalData$Diagnosis)], xvalData$Diagnosis, dnn =c("Actual", "Predicted")))

      Predicted
Actual  0  1
     0 97  0
     1  3 54

# accuracy(xvalPred[1:length(xvalData$Diagnosis)], xvalData$Diagnosis) #cross-entropy

Mean Scores Comparison

Observation & Notes | Section in-progress …

n2 <- (mypredict == trainData$Diagnosis)
n3 <- (testPred == testData$Diagnosis)
n4 <- (xvalPred == xvalData$Diagnosis)

mean(n2); mean(n3); mean(n4)

[1] 1

[1] 0.95625

[1] 0.9805195

ROC Curve

Observation & Notes | Section in-progress …

# ROC curve analysis 
pred <- neuralnet::compute(nn, nn$covariate)$net.result
predObj <- prediction(pred[1:length(xvalData$Diagnosis)], xvalData$Diagnosis)
rocObj <- performance(predObj, measure="tpr", x.measure="fpr")
aucObj <- performance(predObj, measure = "auc")
plot(rocObj,  main = "ROC Curve", cex.lab=1.25, cex.main = 1.5, col = "blue")
text(.75, .25, paste("Area under the curve:", round(aucObj@y.values[[1]], 4)),
     col = "darkred", cex = 1.25)

References

Fritsch, Stefan, Frauke Guenther, and Marvin N. Wright. 2019. “Neuralnet: Training of Neural Networks.” https://CRAN.R-project.org/package=neuralnet.

Henderson, I. Craig. 2015. “Breast Cancer,” October. https://doi.org/10.1093/med/9780199919987.001.0001.

Narasimha, Aparna, B Vasavi, and ML Harendra Kumar. 2013. “Significance of Nuclear Morphometry in Benign and Malignant Breast Aspirates.” International Journal of Applied and Basic Medical Research 3 (1): 22. https://doi.org/10.4103/2229-516x.112237.

William Wolberg, Olvi Mangasarian. 1993. “Breast Cancer Wisconsin (Diagnostic).” UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B.