Abstract:
The mining of a subset of informative genes from microarray gene expression data is a
significant data preparation task in the classification of breast cancer. Out of all the algorithms
developed, CFS-BFS and CONSISTENCY-BFS are the two best ones for gene selection. For
reliable prognostication of breast cancer subtypes, a ground-breaking 2-Stage Gene Selection
algorithm has been developed. Using CFS-BFS in the first stage and CONSISTENCY-BFS in
the second, the majority of the distracting, inappropriate, and redundant genes are removed. To
improve algorithm efficacy, the 2-Stage GeS strategy gets around the uncertainty problem with
CFS-BFS. Surprisingly, using Hidden Weight Naive Bayes to establish the 2-Stage GeS, more
accurate and reliable results are obtained. The standings of recall, precision, f-score, and fallout
show encouraging results. The top four genes E2F3, PSMC3IP, GINS1 and PLAGL2 were
further verified by applying Kaplan-Meier Survival Model. E2F3 and GINS1 are likely targets
for precision therapy