Vision Foundation Models for Computed Tomography

Abstract

Foundation models (FMs) have shown transformative potential in radiology by performing diverse, complex tasks across imaging modalities. Here, we developed CT-FM, a large-scale 3D image-based pre-trained model designed explicitly for various radiological tasks. CT-FM was pre-trained using 148,000 computed tomography (CT) scans from the Imaging Data Commons through label-agnostic contrastive learning. We evaluated CT-FM across four categories of tasks, namely, whole-body segmentation, tumor segmentation, head CT triage, and medical image retrieval, showing superior performance against both baseline and state-of-the-art models. Beyond quantitative success, CT-FM demonstrated the ability to cluster regions anatomically and identify similar anatomical and structural concepts across scans. Furthermore, it remained robust across test-retest settings and indicated reasonable salient regions attached to its embeddings. This study demonstrated the value of large-scale medical imaging foundation models and by open-sourcing the model weights, code, and data, aims to support more adaptable, reliable, and interpretable AI solutions in radiology.

Publication

Pai S., Hadzic I., et al. Vision Foundation Models for Computed Tomography; Submitted; 2025

Read Paper

Dataset

Our pre-training dataset is sourced from the Imaging Data Commons data repository and contains 148,000 CT scans from 81148 studies and 32643 patients. Our dataset is chosen through quality-based selection criteria and includes 69 different cohorts with various inclusion criteria. The NCI Imaging Data Commons (IDC) is a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools and resources.

Imaging Data Commons

Code, Models and Resources

All our code is publicly available on Github while our models are uploaded on HuggingFace with detailed usage instructions and permissible licenses. Code includes pipelines for data preprocessing, pre-training of CT-FM, transfer learning across all demonstrated use cases, and evaluation scripts to generate comparison metrics. The training framework is implemented using our in-house open-source framework, Lighter, a YAML-first configuration system for deep learning pipelines.

CODE

MODEL

LIGHTER

AIM Investigators

Suraj Pai

Ibrahim Hadzic

Dennis Bontempi

Keno Bressem

Benjamin H Kann

Raymond Mak

Hugo Aerts

Acknowledgements

The authors acknowledge financial support from NIH (H.J.W.L.A: NIH-USA U24CA194354, NIH-USA U01CA190234, NIH-USA U01CA209414, NIH-USA R35CA22052, and NIH-USA U54CA274516-01A1) and the European Union - European Research Council (H.J.W.L.A: 866504). This work also used GPUs provided by Jetstream 2 through allocation CIS240307 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by U.S. National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.