Machine Learning-Based Gene Clustering on Brain Cancer Using K-Means and Hierarchical Clustering Methods

Dublin Core

Title

Machine Learning-Based Gene Clustering on Brain Cancer Using K-Means and Hierarchical Clustering Methods

Author

Fatih Yilmaz, Samed Jukić

Abstract

K-means and hierarchical clustering algorithms are employed to cluster genes according to the gene expression to determine the harming level of the genes in brain cancer. The gene expression data with a control group from The Cancer Genome Atlas database were used. The optimal cluster number for each clustering technique was obtained using the elbow method and dendrogram for K-means and hierarchical clustering methods respectively. We identified the ideal number of clusters as three and further classified them into seven groups. We observed that the second cluster contains over half the genes in healthy people and the cluster distribution of a healthy patient and a patient who died six months after being diagnosed with brain cancer is similar. Further analysis indicated that of all the time spent by patients after
being diagnosed with brain cancer, group 0 has the highest percentage in one month after the diagnosis, while group -2 has the lowest percentage. Most genes shift their clusters when Kmeans and hierarchical clustering techniques we compared with the genes from the control and disease groups. The result of the measure of dissimilarity between the genes expression patterns indicates that the K-means technique outperforms the hierarchical technique with a higher rate
of change in the cluster.

Keywords

Brain cancer, gene clustering, hierarchical clustering, K-means clustering, machine learning.

Identifier

ISSN 2637-2835

DOI

10.14706/JONSAE2021325

Publisher

International Burch University

Language

English language

Type

Original research

Document Viewer