By Gaël Varoquaux, CSO at Probabl
Table foundation models (TFMs) are all the rage lately, with promises to extract more value from data in tables, the bread and butter of enterprise data science. However, are they a rupture or a continuity for data science?
My short answer is that they clearly establish the need for dedicated AI for data science, as opposed to generalist LLMs, but they do not change the nature of data science work.
In recent years, I’ve focused on tabular learning in much of my scientific work, as Probabl’s CSO but even more so as a researcher at Inria, progressively shaping the ideas that led to the rise of TFMs. Just this week, we released TabICLv2 [1], a TFM that is state-of-the-art, visible on public benchmarks, and fully open. Drawing on this experience, I shed light on frequently asked questions about TFMs.
Foundation models are models pretrained on a large amount of data to embed implicit knowledge and priors. They have powered the ChatGPT revolution, providing incredibly useful technology for natural language or images as they understand the information out of the box. But enterprises’ most valuable data is in tables, and often full of cryptic numbers and codes. Foundation models have long been unable to help process such data, where traditional machine learning shines, from linear models to gradient-boosted trees.
With recent progress, TFMs are pushing the boundaries of tabular machine learning. There are two alleys of progress: one based on capturing semantics of the strings in tables, another on modeling better numbers, which are crucial to tabular data. This second alley is where we have seen most excitement, illustrated by popular tools such as TabPFN and TabICL.
TFMs are really tabular learners on stereoids. Their benefits are visible on the classic tabarena benchmark [2]. For instance, the figure below (from [1]) positions TabICL and TabPFN on this benchmark, showing how TFMs reduce the gap to the smallest achieved prediction error: a 5-fold reduction compared to random forest and a 3-fold reduction compared to XGBoost. However, this error reduction comes at a cost: state-of-the-art TFMs are 3 times more expensive than XGBoost and 20 times more expensive than random forests. In addition, for mid-sized or largish tables, TFMs require large GPUs, which are rare resources.
Figure 1: Improvability vs. train time on TabArena [1]
First, the game underlying machine learning or statistics has always been to design “the right model” for given data. A model too simple will not make good use of the data. But a model too complex leads to noisy predictions. Better models use priors and inductive biases adapted to the properties of the data to give only the right flexibility. The new ingredient in TFMs is that these priors and inductive biases are created by pretraining.
Another aspect of current TFMs is that they heavily rely on transformers and “in-context learning”. They still appear as standard machine learning tools, but not much happens during fit: the training data are merely stored. For prediction, given new data, the training data are then used as context for the test data. In a sense, this is a mechanism well known in machine learning, as it is akin to what the nearest neighbor methods do. A simplified but useful view of TFMs is that they combine complex transformations of the input data with a nearest-neighbors mechanism.
For the machine learning experts, a better analogy for the prediction mechanism of TFMs might be that of kernel machines, such as the classic SVM. Indeed TFMs make their predictions by combining information not limited to a small number of nearest neighbors, but by pooling across all training data if useful.
Where TFMs can be game changers is by making the most of small data, “few-shot predictions”, as this is where prior knowledge is make-or-break. To draw a parallel to LLMs: LLMs can solve so many useful problems by drawing analogies to problems that they have seen in the past. In the case of tables, strings (like 'Paris' or "frying pan") in table entries and column names, among others, offer incredible promise to connect much more easily to prior knowledge than numbers. This promise is sketched in our 2024 paper [3] on the CARTE tabular model, as well as in Fundamental’s whitepaper [4] on the Nexus TFMs.
The dream is to bring as much as possible world and procedural knowledge into the analysis of tables. Our experience is that combining tabular models with LLMs to encode strings (for instance, using skrub’s TextEncoder [5]) already brings large benefits.
The asset of TFMs is that they can give strong predictions from limited data without relying on careful data preparation. For large datasets (more than 100,000 data points), other models often catch up while TFMs’ quadratic computational cost is a burden.
The few-shot prediction ability of TFMs may be understood as removing the importance of training data. Labeled data is still needed with TFMs, and more labeled data will improve data science. Not only does more labeled data lead to better performing models, but also it ensures good validation of the data science pipeline. And validation of the data science pipeline is a frequent bottleneck.
Likewise, there is still the associated training computation to consider. In the case of TFMs, it just happens at prediction time and not at fit time. This can be a problem, as it pushes cost to inference, and specific techniques are developed to decrease this cost, such as distillation.
TFMs give amazing tools to tackle a well-framed data science problem. But it is important to keep in mind that often, the bottleneck in data science is exactly framing the problem: finding the right data, the right prediction, and the right measure to optimize [6,7]. TFMs do offer benefits here, as they enable fast iterations with models that tend to work well out of the box. However, they are not a magic bullet: the data scientist is still faced with the important challenge of understanding data and applications, and bridging the gap between the two.
References:
[1] Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan. (2026). TabICLv2: A better, faster, scalable, and open tabular foundation model. https://arxiv.org/abs/2602.11139 Code: https://github.com/soda-inria/tabicl Installation: https://pypi.org/project/tabicl/
[2] Erickson, N., Purucker, L., Tschalzev, A., Holzmüller, D., Desai, P. M., Salinas, D., & Hutter, F. (2025). Tabarena: A living benchmark for machine learning on tabular data. https://arxiv.org/abs/2506.16791
[3] CARTE: pretraining and transfer for tabular learning, MJ Kim, L Grinsztajn, G Varoquaux, ICML 2024. Find via: arXiv preprint, GitHub repo, HF repo: an early paper introducing the idea of bringing background knowledge to tabular learning via strong.
[4] Marta Garnelo, Wojciech Marian Czarnecki. (2026). Developing Foundation Models for Real-World Tabular Data. https://fun-research-whitepaper.s3.us-west-1.amazonaws.com/public/Fundamental_Whitepaper.pdf The whitepaper of fundamental, probably the largest TFM startup. It describes the importance of rich joint modeling of the data at hand and prior knowledge
[5] skrub TextEncoder: Encode string features by applying a pretrained language model downloaded from the Hugging Face Hub. https://skrub-data.org/stable/reference/generated/skrub.TextEncoder.html
[6] Unpacking the craft of an applied machine learning product manager, Sanjana Arun, https://www.productledalliance.com/unpacking-the-craft-of-an-applied-machine-learning-product-manager/?utm_source=ghost&utm_medium=email&utm_campaign=insider_newsletter where Sanjana discuss the importance of understanding how to provide downstream value : define the right measure is more often the bottleneck than the model to optimize it
[7] Lucas Bernardi, Themistoklis Mavridis, and Pablo Estevez. 2019. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19). Association for Computing Machinery, New York, NY, USA, 1743–1751. https://doi.org/10.1145/3292500.3330744 This retrospective analysis of the factors of success of data-science projects at booking.com highlights the danger of disconnection between the data-science metric optimized and the downstream value.