Enhancing Precision Oncology: Deep Learning Models vs. Classical Machine Learning Models in Multi-Label Breast Cancer Classification

Min Cho; Yanzhen Qu

doi:10.24018/ejece.2025.9.3.711

Research Article

Min Cho

Colorado Technical University, USA

Yanzhen Qu

Colorado Technical University, USA

* Corresponding author

10.24018/ejece.2025.9.3.711

Read Counter
493

Downloads
149

Citations

Share

Submitted 2025-03-29
Published 2025-05-31

Read counter = 493 times

Abstract

Advancements in single-cell RNA sequencing (scRNA-seq) provide critical insights into cancer heterogeneity, yet analyzing high-dimensional data remains challenging. This study compares GRU with Low-Rank Adaptation (LoRA), Transformer, and XGBoost for multi-label breast cancer classification. Using k-fold cross-validation and paired t-tests, results show GRU-LoRA and Transformer outperform XGBoost in accuracy, precision, recall, and F1-score, particularly for rare cancer subtypes. While XGBoost offers interpretability, deep learning models excel in capturing complex gene interactions. These findings underscore the potential of deep learning in precision oncology, enabling more scalable and accurate diagnostic tools.

Keywords: Breast cancer classification deep learning precision oncology scRNA-seq

Introduction

Single-cell RNA sequencing (scRNA-seq) has transformed cancer research by providing detailed insights into gene expression at the cellular level. However, its high dimensionality, sparsity, and complexity pose challenges for traditional machine learning models like XGBoost, which rely on manual feature selection and struggle with class imbalance. In contrast, deep learning approaches, such as Gated Recurrent Unit (GRU) with Low-Rank Adaptation (LoRA) and Transformer models, excel in learning complex gene interactions, making them highly effective for breast cancer subtype classification [1].

This study compares XGBoost, GRU-LoRA, and Transformers for multi-label breast cancer classification using scRNA-seq data. Performance is evaluated using k-fold cross-validation, precision, recall, and F1-score, with statistical significance assessed through paired t-tests. The findings highlight the advantages of deep learning in genomic analysis and its potential for enhancing precision oncology [2]. The datasets used in the study were obtained from publicly available repositories, specifically GEO datasets GSE167036, GSE176078, and GSE161529. These datasets include a variety of breast cancer subtypes—ER+, HER2+, TNBC, BRCA1—as well as non-cancer cell types. Appropriate sampling, preprocessing, and quality control enable robust model performance evaluation across high-dimensional and heterogeneous gene expression profiles.

The paper is structured as follows: Section 2 reviews related work, Section 3 outlines the research problem and methodology, Section 4 details model architectures, Section 5 presents results, and Section 6 concludes with future directions.

Related Works

Machine Learning in Genomic Data Analysis

Machine learning (ML) has revolutionized genomic research by enabling efficient classification, prediction, and feature extraction from high-dimensional biological datasets. Classical ML models such as Support Vector Machines (SVM), Random Forest, and XGBoost have been widely used in gene expression analysis, owing to their interpretability and robustness in structured data classification [3]. XGBoost, a tree-based ensemble learning method, is particularly effective in handling imbalanced datasets and nonlinear feature interactions, making it a preferred choice for biological classification tasks [4].

However, classical ML models often struggle with the high-dimensionality and sparsity of single-cell RNA sequencing (scRNA-seq) data, leading to performance limitations in feature selection and representation learning. These challenges have led to an increasing reliance on deep learning approaches, which automatically learn hierarchical representations from raw genomic sequences [5], [6].

GRU-LoRA in Genomic Research

Recurrent Neural Networks (RNNs) are well-suited for processing sequential biological data, such as gene expression time-series and cell differentiation pathways. Among them, Gated Recurrent Units (GRU) have demonstrated strong performance in modeling long-term dependencies and temporal dynamics in gene expression profiles. GRUs address vanishing gradient issues, enabling more efficient training on large genomic datasets [7].

The Low-Rank Adaptation (LoRA) technique, a parameter-efficient fine-tuning method, has further improved GRU’s performance in genomics by reducing memory usage and computational costs while maintaining classification accuracy. LoRA achieves this by injecting small, trainable matrices into pre-trained models, allowing efficient adaptation to new tasks without modifying the entire network [8]. This combination—GRU with LoRA—has been particularly useful for single-cell analysis, where training large models on limited annotated data is a challenge [8].

Transformer-Based Models for Genomic Applications

Transformers have emerged as state-of-the-art architectures for genomic data analysis, surpassing RNN-based models in both speed and accuracy. Unlike RNNs, Transformers process entire sequences in parallel, significantly improving computational efficiency for large-scale scRNA-seq data [9].

The key advantage of Transformers lies in their self-attention mechanism, which enables them to capture long-range dependencies between genes and identify complex gene regulatory networks. Pretrained models such as Pytorch’s Transformer-based architectures have been successfully applied to gene classification, sequence alignment, and cancer subtype identification, demonstrating high precision and recall in scRNA-seq tasks [10].

Additionally, Transformers have shown superior generalization capabilities, reducing overfitting through multi-head attention and positional embeddings. Their ability to integrate multi-modal genomic datasets makes them ideal for cancer subtype classification, where spatial and molecular interactions must be considered.

Problem Statement, Hypothesis Statement, and Research Question

Problem Statement

The problem is that traditional machine learning models like XGBoost and Random Forest struggle to capture complex gene interactions in single-cell RNA sequencing (scRNA-seq) data because of its high dimensionality, sparsity, and noise, necessitating the use of deep learning models such as GRU with LoRA and Transformer architectures for improved breast cancer subtype classification.

Hypothesis Statement

Deep learning models, specifically GRU-LoRA and Transformer architectures, improve accuracy, precision, recall, and F1-score for breast cancer subtype classification using multi-modal scRNA-seq data compared to traditional machine learning models like XGBoost.

Research Question

How do deep learning models, specifically GRU-LoRA and Transformer architectures, significantly improve performance in terms of accuracy, precision, recall, and F1-score for breast cancer subtype classification using multi-modal scRNA-seq data compared to traditional machine learning models like XGBoost?

Methodology

The study employs an experimental quantitative approach to evaluate the effectiveness of deep learning models (GRU-LoRA, Transformer) against a classical machine learning model (XGBoost) in classifying breast cancer subtypes using single-cell RNA sequencing (scRNA-seq) data. The study systematically manipulates and controls independent variables—model type (XGBoost, GRU-LoRA and Transformer)—to assess their impact on dependent variables—accuracy, precision, recall, and F1-score—through comparative analysis [10].

The methodology includes data collection, data preprocessing, feature selection, model training, hyperparameter tuning, and statistical analysis to determine which model performs best in classifying breast cancer subtypes. The study focusses on model evaluation on accuracy, precision, recall, and F1-score while also examining trade-offs in model interpretability and performance in multi-class handling and generalization [11].

Data Collection Procedure

The dataset utilized in this study consists of single-cell RNA sequencing (scRNA-seq) data obtained from publicly available genomic repositories, including the Gene Expression Omnibus (GEO), Cancer Genome Atlas (TCGA), and European Genome-Phenome Archive (EGA). These datasets provide high-dimensional gene expression profiles categorized into five breast cancer subtypes: ER+, HER2+, TNBC, BRCA1, and Normal. Data acquisition involved curating samples from multiple studies to ensure complete coverage of cells and genes information [12]. A systematic preprocessing approach was applied to obtain high quality data for the study. First, multiple scRNA-seq datasets were integrated while addressing batch effects and technical variations. Duplicates and redundant cell entries were removed to prevent data skewness. Cells with excessive missing values or ambiguous annotations were discarded to improve dataset reliability. A critical aspect of the preprocessing pipeline involved properly annotating the cancer and non-cancer labels, ensuring that the five subtypes—ER+, TNBC, HER2+, BRCA1, and Normal—were clearly defined for downstream analysis. The annotation process was essential for maintaining the biological relevance of the dataset, as inconsistencies in labeling could lead to misclassification errors in predictive analysis [13].

Data Analysis Procedure

A structured preprocessing pipeline used in the study ensured data integrity and consistency across models. Quality control involved filtering cells with high mitochondrial gene expression (>10%) and low gene counts (<200), while scrublet detected and removed doublets. Normalization was performed via log transformation and library size scaling, and the top 5000 most variable genes were selected using Scanpy’s variance thresholding method [14].

The dataset was split into 80–20 and 50–50 configurations to evaluate model performance and stability, with stratified sampling maintaining a balanced subtype distribution. XGBoost was tuned with a learning rate of 0.01, max depth of six, 100 estimators, and early stopping. GRU-LoRA used two bidirectional layers with 256 hidden units, dropout (0.3), and LoRA adaptation for efficiency. The Transformer model included four self-attention layers, 512 embedding dimensions, and ReLU activation with dropout (0.3).

XGBoost Model

The XGBoost classifier is a gradient boosting model optimized for multiclass classification. It uses the multi:softmax objective, applying a SoftMax transformation to assign probabilities across multiple classes and tree leaf weight is applied L2 regularization in a way that minimizes the loss function while maintaining model stability and preventing overfitting.

SoftMax function to convert the class probabilities:

\begin{array}{rcl} P (y = k | x) = \frac{e^{f_{k} (x)}}{\sum_{j = 1}^{K} e^{f_{j} (x)}} \end{array}

\begin{array}{rcl} mlogloss = L = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{k = 1}^{K} y_{i, k} \log p_{i, k} \end{array}

\begin{array}{rcl} L_{total} = m l o g l o s s + L 2 + L 1 == L + λ \sum_{j} β_{j}^{2} + α \sum_{j} | β_{j} | \end{array}

Using Taylor approximation for tree optimization:

\begin{array}{rcl} g_{i} = \frac{\partial f (x_{i})}{\partial L}, h_{i} = \frac{\partial^{2} f (x_{i})}{\partial^{2} L} \end{array}

Gradient base weight and tree optimization:

\begin{array}{rcl} w_{j}^{*} = - (\sum_{i \in I_{j}} h_{i} + λ \sum_{i \in I_{j}} g_{i}) \end{array}

where $P (k | x)$ is the predicted probability that that the input features x belongs to class k, $f_{k} (x)$ is the unnormalized logit output by the model for class k, and K is the total number of classes. $f (x_{i})$ and, $p_{i, k}$ are the raw prediction score and predicted probability for class k, $λ$ and α are L2 and L1 regularization terms, βj are the leaf weights, $g_{i}$ and $h_{i}$ are the first and second order derivatives for tree optimization.

The XGBoost model minimizes a regularized objective function that integrates prediction error and model complexity. It typically uses multiclass log loss (mlogloss) as the loss function to evaluate how closely the predicted class probabilities align with the true labels. To prevent overfitting and enhance generalization, the model includes regularization terms: L1 regularization encourages sparsity by reducing the influence of less relevant features, while L2 regularization penalizes large weight values to maintain model stability. Optimization is performed using Taylor expansion, which incorporates both the first-order derivative (gradient) and the second-order derivative (Hessian) to guide efficient tree structure updates. The final output is a set of additive decision trees that progressively correct errors from previous iterations and achieve accurate prediction.

GRU-LoRA Model

Gated Recurrent Unit (GRU) with Low-Rank Adaptation (LoRA) is an effective deep learning model for sequence-based genomic data analysis while optimizing computational efficiency and preserving model expressiveness. The GRU cell processes sequential inputs through gating mechanisms, where the update gate controls past information retention, the reset gate determines how much prior hidden state influence is retained, and the candidate activation generates new state information. The final hidden state is computed as a weighted combination of the past and candidate states [8].

The candidate hidden state is computed as:

\begin{array}{rcl} \tilde{h_{t}} = \tanh (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1}) + b_{h}) \end{array}

The update hidden state is computed as:

\begin{array}{rcl} h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ \tilde{h_{t}} \end{array}

where $z_{t}$ is update state, $r_{t}$ is reset state, $W_{h}$ and $U_{h}$ are weight matrices and $b_{h}$ is offset parameter.

LoRA enhances this process by decomposing weight updates into low-rank matrices, effectively reducing the number of trainable parameters and maintaining performance. This modification replaces full-rank weight updates with a low-rank factorization, where the weight update ΔW = AB introduces minimal additional computation. The final weight update follows Wnew = W+ΔW ensuring a lightweight yet robust architecture suitable for high-dimensional genomic sequence modeling.

Transformer Model

The Transformer model is designed for high-dimensional genomic sequence classification using self-attention mechanisms to capture long-range dependencies within scRNA-seq data. Unlike recurrent models such as GRU, which process sequences sequentially, the Transformer operates in parallel, making it more computationally efficient for high dimensional features and sequence dataset. The model employed in the study begins with an input projection layer, where the input dimension is projected to a 768-dimensional space using a fully connected layer. This transformation is followed by Batch Normalization to stabilize training and improve convergence. A dropout layer is applied to prevent overfitting before feeding the processed input into the Transformer encoder. The encoder with six tacked layers and multi-head self-attention computes attention scores using query (Q), key (K), and value (V) matrices [10], [15].

The model assigns probability scores to each class using the SoftMax function, ensuring that all predicted outputs sum to one, enabling accurate classification:

\begin{array}{rcl} P (y = k | x) = \frac{e^{f_{k} (x)}}{\sum_{j = 1}^{K} e^{f_{j} (x)}} \end{array}

where $P (y = k | x)$ is the predicted probability that that the input features x belongs to class k, $f_{k} (x)$ is the unnormalized logit output by the model for class k, and K is the total number of classes.

The attention outputs are passed through a two-layer feed-forward network, applying ReLU activation to introduce non-linearity and improve feature extraction:

\begin{array}{rcl} F F N (x) = m a x (0, x W_{1} + b_{1}) W_{2} + b_{2} \end{array}

where x is the input feature vector, $W_{1}$ and $W_{2}$ are weight matrices, $b_{1}$ and $b_{2}$ are bias parameters

Each head learns attention independently, and results are concatenated and projected to the final representation. Multihead attention allow the model to learn different attention representations for gene features by processing multiple attention heads in parallel:

\begin{aligned} M u l t i H e a d (Q, K, V) & = C o n c a t \\ (h e a d_{1}, h e a d_{2}, \dots, h e a d_{h}) W_{O} \end{aligned}

Optimization and Loss Computation

To address class imbalances, Weighted CrossEntropy Loss was used in the deep learning models train and test ensuring that minority classes contribute proportionally to the loss. Also, GRU-LoRA and Transformer models used the AdamW optimizer, with a learning rate of 1e-4 and weight decay to improve generalization and XGBoost used log loss optimization with L1 and L2 penalties to reduce overfitting. Fig. 1 represents data analysis workflow for the breast cancer and non-cancer labels classification [15].

Performance Metrics

To evaluate the effectiveness of the models in breast cancer subtype classification using single-cell RNA sequencing (scRNA-seq) data, multiple performance metrics were employed. These metrics include accuracy, precision, recall, F1-score, to ensure a comprehensive assessment of model performance:

\begin{array}{rcl} A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} \end{array}

\begin{array}{rcl} P r e c i s i o n = \frac{T P}{T P + F P} \end{array}

\begin{array}{rcl} R e c a l l = \frac{T P}{T P + F N} \end{array}

\begin{array}{rcl} F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \end{array}

Validation Metrics

To ensure the robustness and generalizability of the models in classifying breast cancer subtypes from single-cell RNA sequencing (scRNA-seq) data, a five-fold cross-validation was performed, followed by a paired t-test to compare accuracy and F1-score on the minor class like BRCA among XGBoost, GRU-LoRA, and Transformer models. Five-fold cross-validation is a widely used technique in machine learning to assess model performance while reducing the risk of overfitting. The dataset is divided into five equal folds, where each model is trained on four folds and tested on the remaining fold. This process is repeated five times, ensuring that each fold serves as a test set exactly once. The final accuracy is computed as the mean of the five validation results. The paired t-test is used to determine whether the differences in model performance between the deep learning models (GRU-LoRA, Transformer) and the classical model (XGBoost) are statistically significant [15].

Results

The classification reports for both the 80–20 and 50–50 data splits demonstrate the comparative performance of the XGBoost, GRU, and Transformer models in predicting breast cancer subtypes. Tables I and II represent comparison of the performance matrics of the deep learning models (GRU-LoRA and Transformer) and XGBoost, particularly in terms of F1-score for BRCA1 classification. In the 80–20 split, the Transformer model achieved the highest overall accuracy, followed closely by GRU with Low Rank Adaption (LoRA), while XGBoost displayed lower precision and recall, indicating challenges in handling high-dimensional scRNA-seq data. Similarly, in the 50–50 split, both deep learning models maintained superior predictive performance, reinforcing their robustness in breast cancer subtype classification [16].

Table I. MODEL Comparison (50–50)
Model	Class	Precision	Recall	F1-score
XGBoost	BRCA	0.82	0.67	0.75
XGBoost	ER	0.86	0.90	0.88
XGBoost	HER2	0.81	0.76	0.78
XGBoost	Normal	0.93	0.96	0.94
XGBoost	TNBC	0.84	0.80	0.82
Trans	BRCA	0.90	0.86	0.88
Trans	ER	0.93	0.93	0.93
Trans	HER2	0.88	0.88	0.88
Trans	Normal	0.97	0.97	0.97
Trans	TNBC	0.90	0.89	0.90
GRU	BRCA	0.91	0.88	0.89
GRU	ER	0.94	0.94	0.94
GRU	HER	0.89	0.90	0.89
GRU	Normal	0.97	0.98	0.98
GRU	TNBC	0.90	0.90	0.90

Table II. MODEL Comparison (50–50)
Model	Class	Precision	Recall	F1-score
XGBoost	BRCA	0.82	0.67	0.74
XGBoost	ER	0.86	0.90	0.88
XGBoost	HER2	0.80	0.76	0.78
XGBoost	Normal	0.92	0.96	0.94
XGBoost	TNBC	0.84	0.79	0.82
Trans	BRCA	0.88	0.81	0.85
Trans	ER	0.90	0.94	0.92
Trans	HER2	0.89	0.82	0.86
Trans	Normal	0.95	0.97	0.96
Trans	TNBC	0.89	0.88	0.88
GRU	BRCA	0.88	0.88	0.88
GRU	ER	0.92	0.95	0.94
GRU	HER	0.90	0.87	0.88
GRU	Normal	0.97	0.97	0.97
GRU	TNBC	0.91	0.88	0.89

The ROC-AUC scores in Fig. 2 compare the predictive performance of XGBoost, Transformer, and GRU-LoRA models across two data splits: 80–20 (training-testing) and 50–50 (balanced split). These scores measure how well each model distinguishes between different breast cancer subtypes and normal samples. The results indicate that both Transformer and GRU-LoRA models significantly outperform XGBoost in classifying breast cancer subtypes using scRNA-seq data. GRU-LoRA achieved the highest ROC-AUC scores across all subtypes, demonstrating its ability to effectively capture sequential gene expression patterns [17], [18].

Fig. 3 presents a line chart of accuracy comparing the performance of GRU with LoRA, Transformer, and XGBoost across multiple test runs. The chart illustrates the trend of accuracy improvements over epochs, highlighting how deep learning models, GRU and Transformer, achieve higher accuracy compared to the classical XGBoost model. While XGBoost shows stable performance, its accuracy plateaus earlier, whereas GRU with LoRA demonstrates steady improvement, benefiting from low-rank adaptation (LoRA) for parameter-efficient fine-tuning. The Transformer model also achieves high accuracy, leveraging self-attention mechanisms for learning intricate relationships in scRNA-seq data. The visualization in Fig. 3 reinforces the study’s findings that deep learning models outperform classical methods in classifying breast cancer subtypes with greater precision and reliability [19].

To statistically validate the performance differences, a paired independent t-test was conducted to compare the accuracy and F1-scores of the models. As shown in Table III, the p-values for accuracy comparisons between GRU-LoRA and XGBoost, as well as Transformer and XGBoost, were below 0.05, confirming that the observed differences were statistically significant. Additionally, the t-test evaluation of the model performance on underrepresented classes, particularly BRCA1, the F1-scores from both the 80–20 and 50–50 data splits were analyzed. The comparison between deep learning models (GRU-LoRA and Transformer) and the classical XGBoost model was conducted using a paired t-test to determine statistical significance.

Table III. T-Test Results for Accuracy Comparison
Comparison	T-statistic	P-value
XGBoost—GRU (20–80)	52.3022	1.9792e−11
XGBoost—Transformer (20–80)	38.2234	2.4102e−10
XGBoost—GRU (50–50)	55.7338	1.1919e−11
XGBoost—Transformer (50–50)	34.3837	5.5956e−10

Table III results show deep learning models (Transformer and GRU) is significantly outperform XGBoost in the five-fold test accuracy evaluation.

Table IV represent the GRU-LoRA model exhibited the highest t-statistic (18.8698, p-value = 2.7967e−03) across both data splits for F1-Score, indicating a substantial improvement over XGBoost in correctly classifying BRCA1 cases. This suggests that GRU-LoRA’s ability to model sequential dependencies and adapt low-rank parameter updates enhances its robustness, particularly in handling sparse scRNA-seq data offer superior predictive accuracy and reliability in breast cancer subtype classification [20].

Table IV. T-Test Results for Comparing Minor Class Comparison
Comparison (F1-Score)	T-statistic	P-value
XGBoost-GRU (BRCA1)	18.8698	2.7967e-03
XGBoost-Transformer (BRCA1)	16.6825	3.5739e-03

Conclusion

This experimental study encapsulates key insights into the comparative analysis of deep learning and classical machine learning methodologies for single-cell RNA sequencing (scRNA-seq) data analysis, specifically for breast cancer subtype classification. The study demonstrated the superior predictive performance of deep learning models in handling high-dimensional genomic data, outperforming classical machine learning approaches.

The study sought to determine whether deep learning architectures such as Gated Recurrent Units (GRU) and Transformers significantly improved predictive accuracy compared to traditional algorithms like XGBoost. The above findings confirmed this hypothesis, illustrating that deep learning models consistently yielded higher accuracy, precision, recall, and F1 scores across evaluation metrics. The results further indicated that models incorporating attention mechanisms, such as Transformers, were particularly effective in capturing long-range dependencies within gene expression data, while GRU-LoRA demonstrated efficiency in optimizing parameter updates [21].

Despite the evident advantages of deep learning models, certain limitations are acknowledged. These include increased computational requirements, model interpretability challenges, and the potential for overfitting in cases of limited data availability. Additionally, while class-balancing techniques such as weighted loss functions and dropout were applied, handling highly imbalanced datasets remains an ongoing challenge in this study.

The implications of this study extend beyond model performance comparisons, offering insights into the practical applications of deep learning in genomic research. The ability to accurately classify breast cancer subtypes holds great promise for precision oncology, facilitating improved diagnostic workflows, patient stratification, and targeted treatment planning. Future research should further focus on integrating multi-modal data sources, such as proteomics and epigenomics, to enhance predictive accuracy. Additionally, exploring hybrid models that combine classical approaches’ interpretability with deep networks’ learning efficiency may offer a balanced solution for real-world applications [22].

This experimental quantitative study highlights the transformative potential of deep learning in computational biology and cancer genomics. This study demonstrates its advantages over classical machine learning methods and contributes to the ongoing advancement of artificial intelligence applications in genomic research. Expanding these methodologies to larger and more diverse datasets, optimizing model interpretability, and integrating domain-specific knowledge will be crucial steps toward fully leveraging deep learning for future cancer diagnostics and precision medicine breakthroughs.

References

Abdelwahab MM, Al-Karawi KA, Semary HE. Deep learning-based prediction of Alzheimer’s disease using microarray gene expression data. Biomedicines. 2023;11(12):3304.
Google Scholar

Alhusari K, Dhou S. Machine learning-based approaches for breast density estimation from mammograms: a comprehensive review. J Imaging. 2025;11(2):38.
Google Scholar

Athaya T, Ripan RC, Li X, Hu H. Multimodal deep learning approaches for single-cell multi-omics data integration. Brief Bioinform. 2023;24(5):bbad313.
Google Scholar

Xiong FX, Sun L, Zhang XJ, Chen JL, Zhou Y, Ji XM, et al. Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: a cohort study. World J Gastroenterol. 2025;31(9):101383.
Google Scholar

Pullin JM, McCarthy DJ. A comparison of marker gene selection methods for single-cell RNA sequencing data. Genome Biol. 2024;25(1):56.
Google Scholar

Javanmard Z, Zarean Shahraki S, Safari K, Omidi A, Raoufi S, Rajabi M, et al. Artificial intelligence in breast cancer survival prediction: a comprehensive systematic review and meta-analysis. Front Oncol. 2024;14:1420328.
Google Scholar

Chuang KC, Cheng PS, Tsai YH, Tsai MH. Establishing a GRU-GCN coordination-based prediction model for miRNA-disease associations. BMC Genom Data. 2025;26(1):4.
Google Scholar

Weng G, Martin P, Kim H, Won KJ. Integrating prior knowledge using transformer for gene regulatory network inference. Adv Sci (Weinh). 2025;12(3):e2409990.
Google Scholar

Flores M, Liu Z, Zhang T, Hasib MM, Chiu YC, Ye Z, et al. Deep learning tackles single-cell analysis-a survey of deep learning for scRNA-seq analysis. Brief Bioinform. 2022;23(1):bbab531.
Google Scholar

Yao Y, Xu Y, Zhang Y, Gui Y, Bai Q, Zhu Z, et al. Single cell inference of cancer drug response using pathway-based transformer network. Small Methods. 2025 Feb 17:e2400991.
Google Scholar

Wan M, Pan S, Shan B, Diao H, Jin H, Wang Z, et al. Lipid metabolic reprograming: the unsung hero in breast cancer progression and tumor microenvironment. Mol Cancer. 2025;24(1):61.
Google Scholar

Nasser M, Yusof UK. Deep learning based methods for breast cancer diagnosis: a Systematic review and future direction. Diagnostics (Basel). 2023;13(1):161.
Google Scholar

Ikushima H, Watanabe K, Shinozaki-Ushiku A, Oda K, Kage H. A machine learning-based analysis of nationwide cancer com- prehensive genomic profiling data across cancer types to identify features associated with recommendation of genome-matched therapy. ESMO Open. 2024;9(12):103998.
Google Scholar

Arbatsky M, Vasilyeva E, Sysoeva V, Semina E, Saveliev V, Rubina K. Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation. Front Bioinform. 2025;5:1519468.
Google Scholar

Mohammadi H, Baranpouyan M, Thirunarayan K, Chen L. HyperCell: advancing cell type classification with hyperdimensional computing. Annu Int Conf IEEE Eng Med Biol Soc. 2024;2024:1–4.
Google Scholar

Huang N, Nie F, Ni P, Luo F, Gao X, Wang J. NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks. Bioinformatics. 2021;37(19):3120–7.
Google Scholar

Iu X, Li B, Lin Y, Ma X, Liu Y, Ma L, et al. Exploring the shared gene signatures and mechanism among three autoimmune diseases by bulk RNA sequencing integrated with single-cell RNA sequencing analysis. Front Mol Biosci. 2024;11:1520050.
Google Scholar

Zhang H, Xiong X, Cheng M, Ji L, Ning K. Deep learning enabled integration of tumor microenvironment microbial profiles and host gene expressions for interpretable survival subtyping in diverse types of cancers. mSystems. 2024;9(12):e0139524.
Google Scholar

Yang S, Wang Z, Wang C, Li C, Wang B. Comparative evaluation of machine learning models for subtyping triple-negative breast cancer: a deep learning-based multi-omics data integration approach. J Cancer. 2024;15(12):3943–57.
Google Scholar

Saleh H, Abd-El Ghany SF, Alyami H, Alosaimi W. Predicting breast cancer based on optimized deep learning approach. Comput Intell Neurosci. 2022;2022:1820777.
Google Scholar

Lin E, Mukherjee S, Kannan S. A deep adversarial variational autoencoder model for dimensionality reduction in single-cell RNA sequencing analysis. BMC Bioinformatics. 2020;21(1):64.
Google Scholar

Nie Z, Gao M, Jin X, Rao Y, Zhang X. MFPINC: prediction of plant ncRNAs based on multi-source feature fusion. BMC Genomics. 2024;25(1):531.
Google Scholar

Most read articles by the same author(s)

1 2 > >>

Downloads

PDF
HTML
EPUB
JATS XML

How to Cite

[1]

2025. Enhancing Precision Oncology: Deep Learning Models vs. Classical Machine Learning Models in Multi-Label Breast Cancer Classification. European Journal of Electrical Engineering and Computer Science. 9, 3 (May 2025), 16–22. DOI:https://doi.org/10.24018/ejece.2025.9.3.711.

Issue

Vol. 9 No. 3 (2025)

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

[1] Abdelwahab MM, Al-Karawi KA, Semary HE. Deep learning-based prediction of Alzheimer’s disease using microarray gene expression data. Biomedicines. 2023;11(12):3304.
Google Scholar

[2] Alhusari K, Dhou S. Machine learning-based approaches for breast density estimation from mammograms: a comprehensive review. J Imaging. 2025;11(2):38.
Google Scholar

[3] Athaya T, Ripan RC, Li X, Hu H. Multimodal deep learning approaches for single-cell multi-omics data integration. Brief Bioinform. 2023;24(5):bbad313.
Google Scholar

[4] Xiong FX, Sun L, Zhang XJ, Chen JL, Zhou Y, Ji XM, et al. Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: a cohort study. World J Gastroenterol. 2025;31(9):101383.
Google Scholar

[5] Pullin JM, McCarthy DJ. A comparison of marker gene selection methods for single-cell RNA sequencing data. Genome Biol. 2024;25(1):56.
Google Scholar

[6] Javanmard Z, Zarean Shahraki S, Safari K, Omidi A, Raoufi S, Rajabi M, et al. Artificial intelligence in breast cancer survival prediction: a comprehensive systematic review and meta-analysis. Front Oncol. 2024;14:1420328.
Google Scholar

[7] Chuang KC, Cheng PS, Tsai YH, Tsai MH. Establishing a GRU-GCN coordination-based prediction model for miRNA-disease associations. BMC Genom Data. 2025;26(1):4.
Google Scholar

[8] Weng G, Martin P, Kim H, Won KJ. Integrating prior knowledge using transformer for gene regulatory network inference. Adv Sci (Weinh). 2025;12(3):e2409990.
Google Scholar

[9] Flores M, Liu Z, Zhang T, Hasib MM, Chiu YC, Ye Z, et al. Deep learning tackles single-cell analysis-a survey of deep learning for scRNA-seq analysis. Brief Bioinform. 2022;23(1):bbab531.
Google Scholar

[10] Yao Y, Xu Y, Zhang Y, Gui Y, Bai Q, Zhu Z, et al. Single cell inference of cancer drug response using pathway-based transformer network. Small Methods. 2025 Feb 17:e2400991.
Google Scholar

[11] Wan M, Pan S, Shan B, Diao H, Jin H, Wang Z, et al. Lipid metabolic reprograming: the unsung hero in breast cancer progression and tumor microenvironment. Mol Cancer. 2025;24(1):61.
Google Scholar

[12] Nasser M, Yusof UK. Deep learning based methods for breast cancer diagnosis: a Systematic review and future direction. Diagnostics (Basel). 2023;13(1):161.
Google Scholar

[13] Ikushima H, Watanabe K, Shinozaki-Ushiku A, Oda K, Kage H. A machine learning-based analysis of nationwide cancer com- prehensive genomic profiling data across cancer types to identify features associated with recommendation of genome-matched therapy. ESMO Open. 2024;9(12):103998.
Google Scholar

[14] Arbatsky M, Vasilyeva E, Sysoeva V, Semina E, Saveliev V, Rubina K. Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation. Front Bioinform. 2025;5:1519468.
Google Scholar

[15] Mohammadi H, Baranpouyan M, Thirunarayan K, Chen L. HyperCell: advancing cell type classification with hyperdimensional computing. Annu Int Conf IEEE Eng Med Biol Soc. 2024;2024:1–4.
Google Scholar

[16] Huang N, Nie F, Ni P, Luo F, Gao X, Wang J. NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks. Bioinformatics. 2021;37(19):3120–7.
Google Scholar

[17] Iu X, Li B, Lin Y, Ma X, Liu Y, Ma L, et al. Exploring the shared gene signatures and mechanism among three autoimmune diseases by bulk RNA sequencing integrated with single-cell RNA sequencing analysis. Front Mol Biosci. 2024;11:1520050.
Google Scholar

[18] Zhang H, Xiong X, Cheng M, Ji L, Ning K. Deep learning enabled integration of tumor microenvironment microbial profiles and host gene expressions for interpretable survival subtyping in diverse types of cancers. mSystems. 2024;9(12):e0139524.
Google Scholar

[19] Yang S, Wang Z, Wang C, Li C, Wang B. Comparative evaluation of machine learning models for subtyping triple-negative breast cancer: a deep learning-based multi-omics data integration approach. J Cancer. 2024;15(12):3943–57.
Google Scholar

[20] Saleh H, Abd-El Ghany SF, Alyami H, Alosaimi W. Predicting breast cancer based on optimized deep learning approach. Comput Intell Neurosci. 2022;2022:1820777.
Google Scholar

[21] Lin E, Mukherjee S, Kannan S. A deep adversarial variational autoencoder model for dimensionality reduction in single-cell RNA sequencing analysis. BMC Bioinformatics. 2020;21(1):64.
Google Scholar

[22] Nie Z, Gao M, Jin X, Rao Y, Zhang X. MFPINC: prediction of plant ncRNAs based on multi-source feature fusion. BMC Genomics. 2024;25(1):531.
Google Scholar