← Back to Projects
ML & Predictive Analytics

Post-HCT Survival Prediction for Clinical Risk Stratification

Client

Academic Research Team

Industry

Healthcare

Timeline

4 weeks

Type

ML & Predictive Analytics

Overview

A university research team collaborated with us to build a practical machine learning workflow for post-HCT survival prediction. The objective was to move from exploratory analysis to a reproducible modeling process that technical reviewers could trust. In a 4-week sprint, we delivered a full pipeline for data cleaning, feature engineering, model benchmarking, and validation, with a best holdout result of 0.761 ROC-AUC.

Challenge

The project started with a real healthcare machine learning problem: useful clinical features existed, but the modeling workflow was not stable enough for credible comparison.

  • Clinical and transplant variables had missing values and mixed data types.
  • Categorical features had a direct impact on downstream model quality.
  • Single-model testing was not enough for a defensible survival prediction benchmark.
  • The team needed interpretable outputs for technical stakeholders, not only a final score.

Without a structured benchmark, the work risked becoming another one-off notebook result rather than a reusable clinical predictive analytics artifact.

Solution

We implemented a reproducible post-HCT risk modeling pipeline with clear decisions at each stage.

  • Architecture pattern: Sequential workflow across preprocessing, feature engineering, model training, validation, and interpretation.
  • Data strategy: Removed non-core features for modeling scope, then standardized missing-value handling for both numerical and categorical signals.
  • Feature strategy: Discretized select features into bins, then encoded features for compatible model training.
  • Model strategy: Benchmarked LightGBM, CatBoost, AdaBoost, Random Forest, XGBoost, and Naive Bayes under consistent train/test and cross-validation conditions.
  • Evaluation strategy: Used standard classification metrics plus a domain-focused custom metric combining class-1 recall, class-0 precision, and ROC-AUC.

Tradeoff: we prioritized model reliability and interpretability for technical review over building a production API in this phase.

Key Features

  1. End-to-end post-HCT survival prediction pipeline
  2. Missing-value handling for mixed clinical feature types
  3. Categorical-aware model benchmarking across six algorithms
  4. Holdout and 5-fold cross-validation evaluation workflow
  5. Custom metric design aligned with asymmetric classification risk
  6. Explainability layer using feature importance and SHAP outputs

Technical Implementation

Backend & Infrastructure

The implementation was built in Jupyter notebooks with an artifact-first workflow across raw, interim, and processed datasets. This made the experimentation process reproducible and easier to audit. The final training/evaluation flow was deterministic with fixed random state configuration.

Data & AI Components

The pipeline used approximately 28,800 training records with target label after preprocessing.

Core implementation details:

  • Removed non-modeling columns early to reduce noise.
  • Filled and encoded missing values with explicit strategy instead of silent drops.
  • Converted age features into bins for stronger categorical signal handling.
  • Trained and compared six model families using aligned data splits and metrics.
  • Added cross-validation summaries for stability checks.

Frontend & User Experience

This was an analytics-first engagement, so the primary UX value was for technical users reviewing model behavior. Outputs were organized for fast comparison, metric traceability, and interpretation rather than visual dashboard polish.

Security & Reliability

Reliability came from consistent splits, repeatable preprocessing artifacts, and explicit metric definitions across all models.

Results

  • 0.761 ROC-AUC achieved on holdout evaluation with CatBoost.
  • 0.7507 ROC-AUC (5-fold mean) demonstrated stable validation performance.
  • ~5.2 AUC-point improvement versus lower-performing benchmark models in the same workflow.
  • Delivered a credible baseline for clinical risk stratification and future healthcare ML iterations.

Technology Stack

  • AI/ML: CatBoost, LightGBM, XGBoost, AdaBoost, SHAP, scikit-learn
  • Backend:: Python, pandas, NumPy, mlxtend
  • Frontend:: Plotly, Matplotlib, Seaborn
  • Infrastructure: Jupyter notebooks

Interested in Similar Results?

Let's discuss how we can craft a custom solution for your business challenges.

Chat on WhatsApp

Quick response guaranteed