Announcing Scikit-learn Central
By Yann Lechelle, Executive President of Probabl
In the world of technology, we often talk about "infrastructure" in terms of cables, data centers, and cloud regions. But for the global data science community, the most critical infrastructure isn’t made of silicon or steel–it’s made of code. Specifically, it is built upon open source libraries like scikit-learn.
When we founded Probabl, we did so with a profound sense of responsibility. As an organization founded by the creators of scikit-learn, we recognized that this library had grown beyond a mere tool; it had become the de facto open standard for machine learning worldwide.
Today, I am proud to announce the launch of Scikit-learn Central, a new digital hub designed to visualize, unite, and stimulate the sprawling ecosystem that has grown around this foundational library.

Figure 1: scikit-learn central catalog
The scale of a global standard
To understand why stewarding scikit-learn is important, one only needs to look at the numbers.
Scikit-learn has surpassed 4.1 billion downloads. Every single month, another 160 million downloads occur as data scientists, developers, and students pull the library into their environments. On GitHub, it serves as the foundation for over 1.3 million repositories and 27,000 packages. It is a project that has been sustained by the efforts of over 3,100 contributors and that has received over 65,000 stars. For comparison with other popular libraries for machine learning like XGBoost and deep learning like PyTorch and TensorFlow, take a look at the chart below that shows yearly downloads from PyPi and Conda. Looking at this metric alone over the past few years, scikit-learn commands a 70%+ growth year-on-year which is staggering.

Figure 2: Yearly downloads (pypi + conda) of key machine learning and deep learning libraries
Downloads and stars only tell part of the story. The impact of scikit-learn is also measured in the progress of human knowledge. The seminal paper, “Scikit-learn: machine learning in Python,” has 128,960 citations, and scikit-learn is cited in over 7,000 Nature publications. This is world-leading science in action. From identifying genetic markers in cancer research to optimizing the complex climate modeling models, scikit-learn provides the mathematical building blocks that allow experts in biology, physics, and ecology to apply machine learning to their specific domains without needing a PhD in applied mathematics.
Mission-driven stewardship
Probabl is not a typical tech company. We are a Société à Mission under French law–a status similar to a B-Corp, with our mission integrated into our corporate bylaws: “To develop, maintain at the state of the art, and sustain a complete suite of open source tools for data science to benefit … the world.”
We believe that open source tools of this magnitude require a sustainable commercial model that respects the community. Our mission is to shepherd scikit-learn, helping the community use it and improve it. That means ensuring the core remains robust while fostering the “extended universe” of libraries that make the workflow complete.
Introducing Scikit-learn Central
The scikit-learn ecosystem is vast. While the core library provides the supervised and unsupervised algorithms, the wider community has built specialized tools to handle the nuances of modern data science.
Scikit-learn Central is our attempt to make this ecosystem navigable. It is a catalog of the “building blocks” that turn a simple model into a production-ready pipeline. When you visit the catalog, you see the full breadth of what is possible:
- Data preparation: Tools like Skrub simplify the often-tedious process of data cleaning and feature engineering, ensuring that the “garbage in, garbage out” mantra doesn't derail your project.
- Workflow acceleration: Skore helps data scientists move faster, with smarter cross-validation, automated evaluation reports, and methodological guidance – catching common pitfalls before they reach production.
- Powerful predictions: We highlight the deep integration with libraries like XGBoost, which rely on scikit-learn’s API to deliver state-of-the-art gradient boosting.
- Operational excellence: For those moving to production, MLFlow provides the MLOps framework necessary to track and deploy models at scale.
- Trust and explainability: SHAP offers data scientists tools to evaluate and understand why a model makes the decisions it does.
- Tabular foundation models: The new wave of Tabular Foundation Models often depend implicitly on scikit-learn. For instance, TabICLv2 uses scikit-learn during pretraining to generate good synthetic data that shapes the final model.
An invitation to the wider community
At Probabl, we know that the most compelling arguments for open source aren't found in documentation, but in execution. That is why, alongside the catalog, we are building a library of use cases.
We are inviting data scientists from every corner of the globe to share their code and their stories. Whether you are using libraries for time-series analysis in finance, nilearn for neuroimaging, or building a custom fraud detection engine for a global bank, your work can inspire and educate others.
Scikit-learn succeeded because it made complex mathematics accessible to everyone. Scikit-learn Central aims to succeed by making the entire ecosystem accessible. We are building a future where open-source machine learning is sustainable, transparent, and more powerful than ever.
I invite you to explore the catalog on scikit-learn central, contribute your use cases, and join us in this next chapter.