Machine Learning Approach for the Prediction of Bladder Cancer Stages Based on Next-Generation Sequencing Data

Authors

  • A. Imhenkuomon Liverpool John Moores University, Liverpool, England.
  • M. I. Omogbhemhe Ambrose Alli University, Ekpoma, Edo State, Nigeria.

DOI:

https://doi.org/10.26437/hsea1s73

Keywords:

Bioinformatics. bladder cancer. machine learning, next-generation sequencing. RNAseq

Abstract

Purpose: The purpose of this paper is to apply Machine learning algorithms for the classification of various stages of bladder Cancer (BCa) based on RNA-Seq transcriptome per million(TPM) gene counts data and its corresponding pathological stages from the TCGA database. The objective is to assess classification performance across different stages.

Design/Methodology/Approach: This study applied a computational research design on publicly available BCa gene expression data from The Cancer Genome Atlas (TCGA). Multiple supervised machine learning predictive modelling algorithms were trained and evaluated, with a nested cross-validation design. A forward feature selection technique was used to select the best features for ML classifiers, in conjunction with 3-fold nested cross-validation (nCV), applied to binary classification using machine learning algorithms. The dataset preprocessing was carried out in two phases using the R and Python programming languages.

Research Limitation: Reliance on downloaded data raises concerns about the data generator’s bias.

Findings: This study suggests that TPM profiles of bulk RNA-seq samples are unreliable for separating adjacent stages of bladder cancer. These findings suggest that bulk transcriptomic data should not be used solely to inform treatment decisions for bladder cancer. Rather, it will be more informative to integrate molecular subtyping with multi-omics data or to make models that can directly predict clinical outcomes.

Practical Implication: In practical terms, these findings suggest that bulk RNAseq TPM transcriptomic data should not be solely relied on for staging bladder cancer in clinical or predictive settings. Instead, more informative approaches such as combining molecular subtypes, integrating multi-omics data, or focusing on models that predict clinical outcomes are likely to provide greater value for decision-making and future research.

Social Implication: This highlights the effect of over-relying on AI diagnostics that do not capture the full biological characteristics, which is essential for protecting patient safety.

Originality/Value: This research examined the application of machine learning algorithms to predict bladder cancer stages using RNA-seq TPM gene-count NGS data from the TCGA database, a method that researchers have not previously considered.

Author Biographies

  • A. Imhenkuomon, Liverpool John Moores University, Liverpool, England.

    Anthony Imhenkuomon is a PhD Student in the School of Computer Science & Mathematics at Liverpool John Moores University, Liverpool, England.

  • M. I. Omogbhemhe, Ambrose Alli University, Ekpoma, Edo State, Nigeria.

    Dr. Mike Izah Omogbhemhe is a Senior Lecturer in the Department of Computer Science at Ambrose Alli University, Ekpoma, Edo State, Nigeria.

References

Bosserhoff, A., & Kappelmann-Fenzl, M. (2021). Next generation sequencing (NGS): What can be sequenced? In M. Kappelmann-Fenzl (Ed.), Next generation sequencing and data analysis: Learning materials in biosciences. Springer. https://doi.org/10.1007/978-3-030-62490-3_1 DOI: https://doi.org/10.1007/978-3-030-62490-3

Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2. https://doi.org/10.1177/117693510600200030 DOI: https://doi.org/10.1177/117693510600200030

Ferlay, J., Ervik, M., Lam, F., Colombet, M., Mery, L., Piñeros, M., Znaor, A., Soerjomataram, I., & Bray, F. (2023). Global cancer observatory: Cancer today. International Agency for Research on Cancer. https://gco.iarc.fr/

Garapati, S. S., Hadjiiski, L., Cha, K. H., Chan, H. P., Caoili, E. M., Cohan, R. H., Weizer, A., Alva, A., Paramagul, C., Wei, J., & Zhou, C. (2017). Urinary bladder cancer staging in CT urography using machine learning. Medical Physics, 44(11), 5814–5823. https://doi.org/10.1002/mp.12510 DOI: https://doi.org/10.1002/mp.12510

Goutas, D., Tzortzis, A., Gakiopoulou, H., Vlachodimitropoulos, D., Giannopoulou, I., & Lazaris, A. C. (2021). Contemporary molecular classification of urinary bladder cancer. In Vivo, 35(1), 75–80. https://doi.org/10.21873/invivo.12234 DOI: https://doi.org/10.21873/invivo.12234

Guo, C. C., Bondaruk, J., Yao, H., Wang, Z., Zhang, L., Lee, S., Lee, J. G., Cogdell, D., Zhang, M., Yang, G., Dadhania, V., Choi, W., Wei, P., Gao, J., Theodorescu, D., Logothetis, C., Dinney, C., Kimmel, M., Weinstein, J. N., McConkey, D. J., & Czerniak, B. (2020). Assessment of luminal and basal phenotypes in bladder cancer. Scientific Reports, 10(1), 9743. https://doi.org/10.1038/s41598-020-66747-7 DOI: https://doi.org/10.1038/s41598-020-66747-7

Haque, A., Engel, J., Teichmann, S. A., & Lönnberg, T. (2017). A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine, 9, 75. https://doi.org/10.1186/s13073-017-0467-4 DOI: https://doi.org/10.1186/s13073-017-0467-4

Hong, M., Tao, S., Zhang, L., Diao, L.-T., Huang, X., Huang, S., Xie, S.-J., Xiao, Z.-D., & Zhang, H. (2020). RNA sequencing: New technologies and applications in cancer research. Journal of Hematology & Oncology, 13, 166. https://doi.org/10.1186/s13045-020-01005-x DOI: https://doi.org/10.1186/s13045-020-01005-x

https://doi.org/10.1016/j.csbj.2014.11.005 DOI: https://doi.org/10.1016/j.csbj.2014.11.005

https://doi.org/10.1093/bib/bbz081 DOI: https://doi.org/10.1093/bib/bbz081

Huang, Z. (2021). Comparison of mutual information-based feature selection method for biological omics datasets. In Proceedings of the 8th International Conference on Soft Computing & Machine Intelligence (ISCMI) (pp. 60–63). IEEE. https://doi.org/10.1109/ISCMI53840.2021.9654940 DOI: https://doi.org/10.1109/ISCMI53840.2021.9654940

Kamoun, A., de Reynies, A., Allory, Y., Sjödahl, G., Robertson, A. G., Seiler, R., ... & Weinstein, J. (2020). A consensus molecular classification of muscle-invasive bladder cancer. European urology, 77(4), 420-433. DOI: https://doi.org/10.1016/j.eururo.2019.09.006

Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10(1), 5416. https://doi.org/10.1038/s41467-019-13056-x DOI: https://doi.org/10.1038/s41467-019-13056-x

Kong, C., Zhang, S., Lei, Q., & Wu, S. (2022). State-of-the-art advances of nanomedicine for diagnosis and treatment of bladder cancer. Biosensors, 12(10), 796. https://doi.org/10.3390/bios12100796 DOI: https://doi.org/10.3390/bios12100796

Kourou, K., Exarchos, K. P., Papaloukas, C., Sakaloglou, P., Exarchos, T., & Fotiadis, D. I. (2021). Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal, 19, 5546–5555. https://doi.org/10.1016/j.csbj.2021.10.006 DOI: https://doi.org/10.1016/j.csbj.2021.10.006

Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015).

Kumar, Y., Gupta, S., & Singla, R. (2022). A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Archives of Computational Methods in Engineering, 29, 2043–2070. https://doi.org/10.1007/s11831-021-09648-w DOI: https://doi.org/10.1007/s11831-021-09648-w

Machine learning applications in cancer prognosis and prediction: A systematic review. Computational and Structural Biotechnology Journal, 13, 8–17. DOI: https://doi.org/10.1016/j.csbj.2014.11.005

Parvandeh, S., Yeh, H. W., Paulus, M. P., & McKinney, B. A. (2020). Consensus features nested cross-validation. Bioinformatics, 36(10), 3093–3098. https://doi.org/10.1093/bioinformatics/btaa046 DOI: https://doi.org/10.1093/bioinformatics/btaa046

Qbal, M. J., Javed, Z., & Sadia, H. (2021). Clinical applications of artificial intelligence and machine learning in cancer diagnosis: Looking into the future. Cancer Cell International, 21, 270. https://doi.org/10.1186/s12935-021-01981-1 DOI: https://doi.org/10.1186/s12935-021-01981-1

Saginala, K., Barsouk, A., Aluru, J. S., Rawla, P., Padala, S. A., & Barsouk, A. (2020). Epidemiology of bladder cancer. Medical Sciences, 8(1), 15. https://doi.org/10.3390/medsci8010015 DOI: https://doi.org/10.3390/medsci8010015

Shastry, K. A., & Sanjay, H. A. (2020). Machine learning for bioinformatics. In K. Srinivasa, G. Siddesh, & S. Manisekhar (Eds.), Statistical modelling and machine learning principles for bioinformatics techniques, tools, and applications: Algorithms for intelligent systems. Springer. https://doi.org/10.1007/978-981-15-2445-5_3 DOI: https://doi.org/10.1007/978-981-15-2445-5_3

Song, H., Yang, S., Yu, B., Li, N., Huang, Y., Sun, R., Wang, B., Nie, P., Hou, F., Huang, C., Zhang, M., & Wang, H. (2023). CT-based deep learning radiomics nomogram for the prediction of pathological grade in bladder cancer: a multicenter study. Cancer imaging : the official publication of the International Cancer Imaging Society, 23(1), 89. https://doi.org/10.1186/s40644-023-00609-z DOI: https://doi.org/10.1186/s40644-023-00609-z

Tisoc, M., Marcelo, B., & Jhosep. (2022). Mutual information: A way to quantify correlations. Revista Brasileira de Ensino de Física, 44. https://doi.org/10.1590/1806-9126-rbef-2022-0055 DOI: https://doi.org/10.1590/1806-9126-rbef-2022-0055

Toh, C., & Brody, J. P. (2021). Applications of machine learning in healthcare. Smart Manufacturing: When Artificial Intelligence Meets the Internet of Things, 65. DOI: https://doi.org/10.5772/intechopen.92297

TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data. Briefings in Bioinformatics, 21(6), 2223–2234.

Wang, Q., Armenia, J., Zhang, C., Penson, A. V., Reznik, E., Zhang, L., Minet, T., Ochoa, A., Gross, B. E., Iacobuzio-Donahue, C. A., Betel, D., Taylor, B. S., Gao, J., & Schultz, N. (2018). Unifying cancer and normal RNA sequencing data from different sources. Scientific data, 5, 180061. https://doi.org/10.1038/sdata.2018.61 DOI: https://doi.org/10.1038/sdata.2018.61

Wang, Y., Mashock, M., Tong, Z., Mu, X., Chen, H., Zhou, X., Zhang, H., Zhao, G., Liu, B., & Li, X. (2020). Changing technologies of RNA sequencing and their applications in clinical oncology. Frontiers in Oncology, 10, 447. https://doi.org/10.3389/fonc.2020.00447 DOI: https://doi.org/10.3389/fonc.2020.00447

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., & Stuart, J. M. (2013). The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics, 45(10), 1113–1120. https://doi.org/10.1038/ng.2764 DOI: https://doi.org/10.1038/ng.2764

Wigner, P., Grębowski, R., Bijak, M., Saluk-Bijak, J., & Szemraj, J. (2021). The interplay between oxidative stress, inflammation and angiogenesis in bladder cancer development. International Journal of Molecular Sciences, 22(9), 4483. https://doi.org/10.3390/ijms22094483 DOI: https://doi.org/10.3390/ijms22094483

Xu, X., Xie, Z., Yang, Z., Li, D., & Xu, X. (2020). A t-SNE based classification approach to compositional microbiome data. Frontiers in Genetics, 11, 620143. https://doi.org/10.3389/fgene.2020.620143 DOI: https://doi.org/10.3389/fgene.2020.620143

Zhao, Y., Li, M. C., Konaté, M. M., Chen, L., Das, B., Karlovich, C., Williams, P. M., Evrard, Y. A., Doroshow, J. H., & McShane, L. M. (2021). TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository. Journal of translational medicine, 19(1), 269. https://doi.org/10.1186/s12967-021-02936-w DOI: https://doi.org/10.1186/s12967-021-02936-w

Downloads

Published

28-04-2026

How to Cite

Machine Learning Approach for the Prediction of Bladder Cancer Stages Based on Next-Generation Sequencing Data. (2026). AFRICAN JOURNAL OF APPLIED RESEARCH, 12(3), 170-192. https://doi.org/10.26437/hsea1s73

Most read articles by the same author(s)