Skip to content

Scikit-learn acceleration with GPUs: A conversation with Dr. Andy Terrel

 

scikit-learn acceleration with GPUs: Gaël Varoquaux interviews NVIDIA's Dr. Andy Terrel

By Gaël Varoquaux, CSO of Probabl

For over a decade, scikit-learn has served as the bedrock of machine learning, supporting the work of millions of data scientists worldwide. While scikit-learn was originally designed for a CPU-centric world, the advent of new hardware presents an opportunity to supercharge machine learning pipelines.

Speeding up machine learning workflows isn't just about technical benchmarks; it's about turning hours of training into seconds, saving time and money for data scientists and enterprises.

I sat down with Dr. Andy Terrel from NVIDIA to discuss why this community effort is such a game-changer for the scientific Python ecosystem and enterprise data science.

Bringing you up to speed

To achieve GPU acceleration in scikit-learn without fragmenting our codebase, we are re-engineering scikit-learn to be backend agnostic. This is a mighty community effort involving our team at Probabl, our peers at Quansight and NVIDIA, and many others from the wider community.

Historically, supporting GPUs required specialized code for every library, but the array API provides a unified specification that allows scikit-learn to remain flexible. Now, when an estimator is array API-compliant, it can inspect your input data–whether it's a PyTorch tensor or a CuPy array–and delegate the computation to the matching library's optimized functions. If your data lives on the GPU, the computation stays on the GPU, avoiding expensive memory transfers.

We have already updated 25 estimators and core tools like the scoring API, ensuring they perform consistently across different hardware backends through rigorous automated testing. The real-world impact of this work is significant; for example, recently Olivier Grisel, my fellow scikit-learn core maintainer and ML Engineer at Probabl, demonstrated a 15x speed-up in complex machine learning pipelines by offloading compute-intensive steps to the GPU.

For a deeper dive into the technical implementation and the latest progress, I highly recommend reading the detailed technical updates by my colleague Olivier and Lucy Liu from Quansight.

In conversation with Dr. Andy Terrel

Gaël Varoquaux: Andy, for someone coming into this with zero context: why are we working so hard to accelerate Python libraries like scikit-learn with GPUs?

Andy Terrel: Two main reasons. The forward march of computing technology and the growing needs of science in the AI era. When I started programming computers, multi-threaded programs were rare. HPC centers had wonderful CPUs that could manage 2 threads, but today your phone has 6 cores. What is leading edge in the HPC center will come to commodity systems in time. GPUs are a must in any data center today, and most people are seeing them deployed in their commodity hardware as well for smaller scale simulations. The other trend is incorporating AI into the scientific workload, we see scientists needing to use tools like scikit-learn to do ensembles of models or build model regressors. By having our beloved Python scientific tools work on the GPU, we allow scientists to be more efficient and utilize the GPU as part of the full application rather than an occasionally used offloading device.

Gaël Varoquaux: At your recent webinar, “Python on the GPU: From Libraries to Kernels,” my colleague and co-maintainer of scikit-learn, Olivier Grisel, spoke about our efforts to adopt the Array API and showcased the benefits of GPU-acceleration for the data scientist working with scikit-learn. For example, he showed that using GPUs instead of CPUs results in a 15x speed-up when tuning hyperparameters in a complex machine learning pipeline. This is fantastic but there’s still much work to be done in scikit-learn and of course in many other Python libraries to enable GPU acceleration. From your perspective, how do you think the ecosystem could evolve to make such an endeavor easier?

Andy Terrel: The Array API is a big step in allowing code to seamlessly integrate with GPUs, but code such as NumPy and SciPy are slow to fully convert their core routines into it. It takes time to move such code bases, but we are pointed in the correct direction. The Array API only takes a code so far, as most code I work with are also needing to adopt the correct array interfaces. The array interfaces, e.g. c_array_interface of NumPy or cuda_array_interface of Numba, help integrate with other native codes. In this vein the DLPack system has become essential to provide an interface that recognizes the different devices and will avoid memory movement as needed. While some tools have adopted these APIs and interfaces, we still have work to do to expand tooling for the scientific ecosystem. For example, pure Python applications have wonderful tools like pyrefly and ty for type inference but scientific code can rarely use them because extension types are not cleanly represented in Python type syntax.

Gaël Varoquaux: Open source communities appreciate choice in software and hardware. How do tools like CuPy, Numba, and the Python Array API help open source maintainers and users navigate the balance between achieving maximum performance while maintaining a healthy, backend-agnostic ecosystem?

Andy Terrel: My viewpoint is that we should build high level open tools for the 80% cases and then let users determine if they want to specialize to hardware for the extra 20%. There are many cases where an extra 20% of performance is crucial, but for many it is not. The transition of a code based on NumPy can be quickly ported to CuPy. From there, if a routine needs a more finely tuned GEMM or FFT, nvmath-python provides bindings to highly optimized CPU and GPU libraries. These optimized libraries are hard to maintain so letting vendors provide them and OSS communities can focus on choice rather than optimal performance.

Gaël Varoquaux: Thanks to GPU acceleration, we can move from minutes to seconds when training models. It’s a win for data scientists, who get to be more productive and remain in the flow. And it’s a win for enterprises, who get to save time and money. What’s your prognosis of how these wins will materialize for enterprise data science teams? Which sectors do you think will reap the greatest rewards?

Andy Terrel: I’ve seen some big wins with scientific instruments. Grid Computing was invented because High Energy Physics needs to offload experimental data and process it. Today scientific workflows that took weeks to process can be done in minutes. When you have instruments that have configurable sensors (and all the big ones do these days), this means scientists can have more control over experiments as data is processed faster and models can be updated on the fly. This sort of acceleration directly translates to industrial processes, robotics, self driving cars, etc. I spent time in the manufacturing space before coming to NVIDIA and we were already seeing better yields and faster times from prototype to production with machining.

Gaël Varoquaux: Most people associate GPUs with LLMs. How do we raise awareness that GPUs are also game-changing for machine learning with tabular data - the type of data that most enterprises actually run on?

Andy Terrel: In my career as a data scientist, I would advise companies to evaluate the speed of decision making with the technology they choose. If a company requires emails of spreadsheets and weekly meetings, it would take 2-4 weeks for decisions to be made. If there was a dashboard with APIs but daily standups, we would see operations changing in 2-4 days. Both these cases are essentially bringing tabular data in front of decision makers, and require careful analysis and subject matter insight to decipher. Now if we can get tabular data to be instant and correct the first time, no more arguing about domain models, then we can see business adapt to the market in near real time. This is scary to business leaders, they like their spreadsheets so there needs to be a phased introduction of the tooling to help transform business. Unfortunately, I don’t know that optimizing enterprises has ever been something seen as cool, but operational efficiency will drive leaders to better results and the tabular data model will be its heart.

Gaël Varoquaux: Looking ahead, do you envision a world where the distinction between "CPU code" and "GPU code" in the Python stack disappears entirely for data scientists?

Andy Terrel: Today, I work with data centers that have LPUs and quantum chips as well. The essential challenge is that the programming model is so different between these different chips. With AI agents, we are seeing some transfer between GPU and CPU code, but the two code paths still need to be managed differently for efficiency. High core count CPUs may get to a point where the memory hierarchy of the GPU starts getting built in, but I’m a software person and I really don’t know the complexities there.

Gaël Varoquaux: Looking ahead again, what are you the most excited about when it comes to making our favorite Python libraries for data science run on GPUs?

Andy Terrel: I’m most excited about scientific discoveries. The further enabling of weather predictions, nuclear fission, and astronomical discoveries are all using GPUs today. Tools for scientific data analysis are incorporating AI and machine learning by default, this allows researchers to focus on the important aspects of science and perform more surveys to validate before experimentation.

 

Learn more about scikit-learn acceleration

🖲️ Demo

  • Test the GPU speed-ups in this demo made by Olivier Grisel, ML Engineer at Probabl and scikit-learn core maintainer.

📺 Watch

📚 Read

About Dr. Andy Terrel

Andy leads CUDA Python from the product management team. His research focused on domain-specific languages to generate high-performance code for physics simulations with the PETSc and FEniCS projects. Andy is a leader in the Python open-source software community. He's most notably a co-creator of the Dask distributed computing framework, the Conda package manager, the SymPy symbolic computing library, and NumFOCUS foundation.


For more from Probabl