:probabl.blog

The global pulse of open source: Insights from the GitHub Innovation Graph and the scikit-learn ecosystem

Written by Yann Lechelle | Tuesday, February 3 2026

 

By Yann Lechelle, Executive President and Chairman at Probabl

GitHub just released fresh data from the GitHub Innovation Graph, providing accessible data and aggregated statistics on software development activity around the world in the millions of public repositories hosted on the platform. As the software company committed to the stewardship and long-term success of scikit-learn, this data is invaluable for us at Probabl in helping us to understand global trends in open source and to situate scikit-learn in them.

The data paints a clear picture of the global nature of open source. We see that the EU is now the world leader by some metrics like git pushes, unthroning the USA for the first time. At the same time, the USA leads when it comes to the number of developers on GitHub (30 million!). Countries like India, the UK, Brazil, Korea, Japan, and China, among many others, are also world-leading hubs of open source software development on GitHub [1].

The data also underlines that national or regional economies are not islands: open source thrives because of global interlinkages and collaboration. For example, in 2025, EU-based open source projects received millions of code contributions from developers across the world, including the USA (over 934,000), UK (over 553,000), and India (over 347,000).



Source: GitHub (2026), Year recap and future goals for the GitHub Innovation Graph.

These stats resonate with us at Probabl as the steward of scikit-learn, the global open standard framework for machine learning with 3.9 billion downloads and 1.3 million dependants on GitHub. While most core contributors are based in the EU (in particular, France and Germany), scikit-learn also counts on core contributors from the USA, Australia, and China. scikit-learn also receives a lion’s share of its issue reports and pull requests from developers all over the world. As this map from OSS Insights shows, most pull requests come from developers in the USA, India, Germany, France, the UK, Canada, Japan, and China, among many others. The bottom line: the scikit-learn ecosystem is a global ecosystem. 


Source: OSS Insights (2026), scikit-learn

In terms of users, scikit-learn has the wind in its sails with a growing curve of downloads (around 150 million new downloads per month). Interestingly, when viewed alongside PyTorch, the global standard for deep learning, scikit-learn maintains a significantly broader footprint in terms of pypi downloads, reflecting the vast and growing demand for classical machine learning across almost every industry [2].


Source: Our own analysis of pypi downloads (2026)

At Probabl, we’re regularly analyzing the underlying causes and trends behind the skyrocketing growth of the scikit-learn ecosystem. One thing is for sure: the users and downloaders are global. Hey GitHub Research Team, we should partner up and investigate usage trends together!

Thank you to the GitHub team for making this data available. We recommend reading more about the GitHub Innovation Graph and diving into the data. And, of course, a big thank you to everyone worldwide who contributes to scikit-learn in one way or another!

References:

[1] We suspect China is underrepresented in these numbers, possibly due to unreported geolocations and/or use of Chinese alternatives like gitee. For example, there’s a scikit-learn mirror on Gitee: https://gitee.com/mirrors/scikit-learn

[2] We acknowledge that pypi downloads are not the only way to download a python library and therefore the download stats may be incomplete for scikit-learn and/or PyTorch.