1  First look

By the end of this chapter, you’ll know that:
  • In their usage tabular foundation models are not scary at all.
  • Underneath it’s a different beast altogether.

Before we go into all the complex stuff like architecture, in-context learning, and pretraining, let’s see how the use of a tabular foundation model differs from traditional machine learning.

Looks like traditional machine learning

Let’s start simple. The following code snippet shows a classic case of classification, more particularly, we want to classify the penguin species based on their bodily measurements (Horst, Hill, and Gorman 2020; Gorman, Williams, and Fraser 2014). After splitting the data into training and test data, we use TabICL, a tabular foundation model, to classify the data:

import seaborn as sns
from sklearn.model_selection import train_test_split
from tabicl import TabICLClassifier

penguins = sns.load_dataset("penguins")
X = penguins.drop(columns="species")
y = penguins["species"]
X_train, X_test, y_train, y_test = train_test_split(X, y)

clf = TabICLClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

This looks like … business-as-usual!? While TabICL is one of these novel tabular foundation models, it seems to work the same as any other machine learning algorithm. We could swap out two lines and get a Random Forest instead:

  • Line 3: from sklearn.ensemble import RandomForestClassifier
  • Line 10: clf = RandomForestClassifier()

The reason why applying TabICL looks the same as applying the good old Random Forest is a design-choice by the developers. The developers of TabICL and those of other tabular foundation models build upon the scikit-learn API which standardizes machine learning calls with functions like .fit() and .predict(). But underneath, tabular foundation models work differently from traditional machine learning.

Things don’t add up in the code above

The .fit doesn’t do much: For example, running the code, the “.predict” is suspiciously slow compared to the “.fit” call. For traditional machine learning, the .fit does the heavy lifting: It’s where xgboost grows hundreds or even thousands of trees, or a multi-layered feed-forward neural network adapts weights in thousands of batches of data. If .predict were a theater performance, then .fit would be all the rehearsals beforehand. But TabICL is just getting dressed, not rehearsing this particular piece. In other words: calling .fit for TabICL only loads the model weights, and pre-processes the data.

The model is pretrained: Wait, what weight is the model actually loading, if no specific training happens? While there happens no training on the particular dataset, the weights stem from a pretraining process, involving millions of synthetic datasets.

No task-specific training: These pretrained weights don’t change in our code example above. Neither in .fit, nor in .predict.

.predict does the heavy lifting: Making predictions is surprisingly slow, compared to the .fit call, because that’s where the main computations are done. The tabular foundation model performs in-context learning: Provided with both training and test data at inference time, the model may attend to training data points to make predictions.

No more hyperparameter tuning: You may have noticed that the code snippet above skips hyperparameter tuning. You may have attributed this to the author’s laziness. You would have been right, but with tabular foundation models, my laziness and reality have finally converged. Tabular foundation models don’t need hyperparameter-tuning.

Missing data imputation: The penguins dataset has 11 rows with missing values. TabICL bit it down without complaints. While this could be just a missing data imputation wrapper, it’s an inherent property that tabular foundation models can handle missing data reliably due to how they are pretrained. At least missing at random ones, missing not at random is always a different beast.

Many more gimmicks: Tabular foundation models also work with regression, where they predict the entire predictive distribution, meaning you get things like quantile regression and uncertainty quantification for free. You can use these models for time series forecasting as well. They extrapolate well, and are quite “capable” in many more ways we will explore in this book.

So, while tabular foundation models carry forward the .fit and .predict tradition, it’s a different paradigm.