:probabl.blog

Open science is powering the Tabular Foundation Models revolution

Written by Gaël Varoquaux | Tuesday, June 9 2026

By Gaël Varoquaux, CSO of Probabl & Research Director at Inria
& Cailean Osborne, Head of Ecosystem Development of Probabl

TL;DR: Tabular foundation models (TFMs) have emerged as the best performing tool for many use cases in data science, achieving state of the art performance on tabular AI benchmarks straight out of the box. The frontier has been advancing fast, and open science is the reason why. TFM researchers and developers across labs and countries are openly releasing research papers, technical reports, pre-training code, model weights, and pip-installable Python libraries. These, in turn, allow others to use, study, experiment with, modify, build on top of, and redistribute each other’s work. This is the open science flywheel in action, and it’s propelling the entire field forward.

Open science is the flywheel of tabular AI

For tabular data, the data scientist’s toolkit has long been well established: gradient-boosted trees and random forests have dominated benchmarks and production pipelines, and for good reason. They are robust, performant, and battle-tested on structured data – that is, data that lives in tables and databases, the bread-and-butter in science and enterprises. Neural networks, for all their success in vision and language, have historically struggled to match them here.

TFMs are a new, revolutionary option for data scientists. The options available to data scientists are changing. TFMs have emerged as a high-performing option for making predictions on structured data. They are pre-trained models designed specifically for structured data that acquire general statistical patterns across thousands or millions of synthetic datasets, then transfer them to new datasets at inference time, often without the need for any fine-tuning at all [1].

If you’re optimizing for performance, TFMs are the highest-performing option available to data scientists for many use cases.1 The evidence is on the scoreboard. TabArena, a rigorous open benchmark hosted on Hugging Face, shows TFMs consistently outperforming tuned gradient-boosted tree ensembles on Elo scores across a range of small and medium-sized datasets [2].


Figure 1: Leaderboard for 15 datasets (medium datasets, all tasks) including all (imputed) models. Medium datasets contain between 10,000 and 250,000 samples. Source: https://huggingface.co/spaces/TabArena/leaderboard

Open science turboboosts the development of better and better TFMs. What makes the TFM space particularly impressive is the speed of scientific progress and innovation. In less than four years, the field has moved from a proof of concept to production-ready models that outperform heavily tuned classical baselines.

In July 2022, the paper of TabPFN v1 came out, introducing the prior-fitted networks paradigm and demonstrating that transformers could do in-context learning on tabular data [3]. By January 2025, TabPFN v2 had been published in Nature, with more sophisticated simulated data for pre-training [4]. TabICLv1 followed in February 2025, scaling the approach to larger datasets with a more sophisticated architecture [5]. In parallel, TabDPT [6] explored the benefit of blending simulated and real data for pre-training.

In November 2025, TabPFN 2.5 pushed the state of the art further, including the idea of blending simulated and real data [7]. TabICLv2 reclaimed the throne just three months later, refining both architecture and pre-training and achieving state of the art while remaining efficient enough to run on a CPU [8]. Then TabPFN 3 arrived in May 2026, switching to an architecture close to that of TabICL and raising the bar yet again [9]. In parallel, TabH2O built on TabICL to simplify deployment with reduced memory usage, and a single model for regression and classification [10].

This incredible speed of innovation is in large part thanks to open science. TFM developers across institutions and countries are openly releasing research papers, technical reports, code, model weights, and pip-installable Python libraries, which in turn allow others to use, study, experiment with, modify, build on top of, and redistribute each other’s work. This is the open science flywheel in action, and it’s powering the entire field and industry forward.

What we mean by this and why it matters, we explain below with reference to two SOTA TFMs: TabICL and TabPFN.

Open science in action: The example of TabICL

TabICL is a TFM developed by the SODA team at Inria, in particular Jingang Qu, David Holzmüller, Marine Le Morvan, and myself (Gaël).

TabICL achieves state-of-the-art performance out of the box. TabICL was pre-trained on millions of synthetic datasets [5, 7] and uses in-context learning to learn from new data in a single forward pass through a Transformer model: y_pred = model(X_train, y_train, X_test). It does not require hyperparameter tuning, and it achieves state-of-the-art performance for tabular classification and regression on the TabArena [2] and TALENT benchmarks [11]. Out of the box, it outperforms heavily-tuned XGBoost, CatBoost, or LightGBM models on ~80% of datasets on TabArena [7].

TabICL is open science in action. Let’s take the Linux Foundation’s Model Openness Framework (MOF) as a heuristic for evaluating the openness of TabICL. The MOF breaks down machine learning models into model weights, code, data, and documentation components, and evaluates the openness and completeness of models based on the release of components under appropriate open licenses [9]. Its three tiers of model completeness range from open models to open science models, staggering which components ought to be released. Evaluated against the MOF, TabICL lands in the top tier of open science.

Here's our component-by-component assessment:

Model weights: Regression and classification checkpoints were openly released for TabICLv2 (default). Classification checkpoints are also available for TabICLv1 and TabICLv1.1. The weights can be loaded directly via the pip-installable package. For advanced configurations, TabICL offers a set of parameters to customize its behavior. The README file in the GitHub repository explains how [10].

Installation code: Fully released. TabICL is pip-installable.

Figure 2: Pip install commands for TabICL https://github.com/soda-inria/tabicl

Inference code: Fully released. TabICL is scikit-learn-compliant and supports zero-shot classification and regression, KV caching for faster repeated inference, save/load of fitted models, fine-tuning on task-specific data, zero-shot time series forecasting, and SHAP-based explainability. The GitHub repository provides example code for each functionality [10].

Figure 3: Example of inference code for basic usage of TabICL https://github.com/soda-inria/tabicl

Training code: Partially released. Pre-training code for TabICLv1, including the synthetic data generation pipeline and the three-stage curriculum learning scripts, is available in the repository [10]. Pre-training code for TabICLv2 is in a pull request in the GitHub repo [12].

Training data: The model is trained on synthetically generated data. The generation methodology is documented and fully reproducible from the pretraining code available in the repository [10].

Technical documentation: A publicly available preprint describes the model's architecture, training approach, and evaluation methodology [7]. The preprint of TabICL is also available [5].

Evaluation: TabICLv2 was benchmarked for performance on tabular classification and regression on the TabArena [2] and TALENT benchmarks [11], achieving state-of-the-art and outperforming heavily-tuned XGBoost, CatBoost, or LightGBM on TabArena on ~80% of datasets [7].

License: The repository uses the BSD 3-Clause License, a permissive open source license with no use-case restrictions.

Why this open science approach matters. A model release that includes not only the weights but also the pre-training code, technical docs, and a pip-installable package is one that means data scientists – all over the world – are not only users, but crucially they can study failure modes, replicate results, and propose improvements. This allows the latest ideas and approaches to be studied, challenged, and improved – all in the open.

Open science is pushing the frontier of TFMs: The example of TabPFN

In May 2025, Prior Labs released TabPFN 3 [8] – a truly remarkable model that set the new state of the art on TabArena, with a single forward pass outperforming all tuned and ensembled baselines on datasets up to one million rows. Its thinking mode applies undisclosed compute tricks to structured data that allow unprecedented scaling to large data, beating AutoGluon extreme in under a tenth of the runtime.

While the weights of TabPFN 3 have not been released, the technical report is a role model of scientific rigor. It covers architecture, inference optimizations, time-series, relational data, and causal inference and is thorough, in particular on the empirical evaluation, though technical aspects of the model are not detailed [8].

What the release also did, openly and graciously, was credit TabICL. Multiple technical innovations developed in TabICL v1 and v2 appear in TabPFN 3. The TabPFN team documented this clearly in their technical report and acknowledged it publicly on LinkedIn. We should celebrate this. When your ideas improve another lab’s model and that lab ships the best model in the world, the entire field wins. Practitioners get better tools. Benchmarks improve. New baselines attract new researchers and adopters. The pie gets bigger.

There is a pattern here that should be familiar to anyone who has studied the history of open source. The Linux kernel did not emerge from a single organization's roadmap. It emerged from thousands of contributors building on each other's patches, each improvement immediately available to every other contributor. The result was an open source operating system that underpins virtually every server, phone, and cloud platform on the planet.

TFMs are not yet at that scale of ubiquity. But the dynamic is similar: open science accelerates progress because it compounds. TabICLv2 learned from TabPFN. TabPFN 3 learned from TabICLv2. The next model will learn from both. This is not distillation in the legal sense; it is the normal operation of scientific progress, and it works precisely because the ideas, models, and code are shared publicly.

The TabPFN 3 / TabICLv2 story is a neat example of transparent mutual benefit. Different teams, different ideas, shared results.

Let’s keep the tabular AI flywheel spinning

Tabular AI is moving fast. In the space of a few years, TFMs have gone from interesting research prototypes to production-ready models that outperform the gradient-boosted tree incumbents that dominated structured data problems in labs and enterprises for the last decade.

That speed is not accidental. It is the direct result of the open science flywheel: openly releasing model weights, pre-training code, technical reports, pip-installable packages, scikit-learn-compatible APIs, and teams that acknowledge each other's contributions openly. TabICLv2 and TabPFN 3 are the proof of concept. They built on each other, and both are better for it. This is the flywheel working as it does best. Let's keep it spinning for the benefit of science and innovation.

Footnotes

1. Disclaimer: While TFMs are the best performing models on tabular AI benchmarks, they may not always be the right choice for your use case. Dataset size, inference speed, compute constraints, and interpretability requirements, among other factors, need to be taken into consideration.

References

[1] Varoquaux, G. "Demystifying Tabular Foundation Models." probabl.ai, 2025. https://blog.probabl.ai/demystifying-tfms

[2] TabArena Leaderboard. Hugging Face Spaces. https://huggingface.co/spaces/TabArena/leaderboard

[3] Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. "TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second." arXiv:2207.01848, 2022. https://arxiv.org/abs/2207.01848

[4] Hollmann, N., Müller, S., Purucker, L., et al. "Accurate predictions on small data with a tabular foundation model." Nature 637, 319–326, 2025. https://doi.org/10.1038/s41586-024-08328-6

[5] Qu, J., Holzmüller, D., Varoquaux, G., and Le Morvan, M. "TabICL: A Tabular Foundation Model for In-Context Learning on Large Data." arXiv:2502.05564, 2025. https://arxiv.org/abs/2502.05564

[6] Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Alex Labach, Hamidreza Kamkari, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, Maksims Volkovs “TabDPT: Scaling Tabular Foundation Models on Real Data”, 2024-2026 arXiv:2410.18164. https://arxiv.org/abs/2410.18164

[6] Grinsztajn, L., Flöge, K., Key, O., et al. "TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models." arXiv:2511.08667, 2025. https://arxiv.org/abs/2511.08667

[7] Qu, J., Holzmüller, D., Le Morvan, M., and Varoquaux, G. "TabICLv2: In-Context Learning for Tabular Data." arXiv:2602.11139, 2026. https://arxiv.org/abs/2602.11139

[8] Hollmann, N., Hutter, F., Grinsztajn, L., et al. "TabPFN 3." arXiv:2605.13986, 2026. https://arxiv.org/abs/2605.13986

[9] White, M., et al. "Model Openness Framework." arXiv:2403.13784, 2024. https://arxiv.org/abs/2403.13784

[10] Pfeiffer, P., Gordeev, D., Müller, M., Fink L., Soler, J.S., Landry, M., Murray, B., Conde M.V., Ambati, S.S., “TabH2O: A Unified Foundation Model for Tabular Prediction”, arXiv:2605.18383. https://arxiv.org/abs/2605.18383

[10] TabICL GitHub Repository. SODA Team, Inria. https://github.com/soda-inria/tabicl

[11] TALENT Benchmark. "A Comprehensive Tabular Learning Benchmark." arXiv:2407.00956, 2024. https://arxiv.org/abs/2407.00956

[12] Pre-training code for TabICLv2: https://github.com/soda-inria/tabicl/pull/111

For more from Probabl