To effectively deploy and scale ML models across the development pipeline requires a mix of machine learning, software engineering, and operational skills which is rare to find in a single person or even in a single team. Additionally, organizations with hundreds of models today face the unique challenge arising from the heterogeneity in ML workflows and the siloed nature of these teams.
In this talk, we will discuss the whys and hows around streamlining model release and model management using a Registry;
As providers of an end-to-end MLOps platform, we find that autoscaling ML inference is a frequent customer ask. Recently, serverless computing has been touted as the panacea for elastic compute that can provide flexibility and lower operating costs. However, for ML, the need to precisely define hardware configurations and long warm-up times of certain ML models exacerbate the limitations of serverless. To provide the best solution to our customers, we have run extensive benchmarking experiments comparing the performance of serverless and traditional computing for inference workloads running on Kubernetes (with KubeFlow and with the ModelDB MLOps Toolkit). Our experiments have spanned a variety of model types, data modalities, hardware, and workloads. In this talk, we present the results from our benchmarking study and provide a guide to architect your own k8s-based ML inference system
More ML models are being built today than ever before. However, whether you are a researcher writing a paper, an actuary, or an ML engineer building ML products, reproducing models, tracking their lineage, and versioning them still remains a big challenge. This seemingly simple problem of reproducing models has exposed data science teams in regulated industries to hefty fines, caused ML engineers to spend days remedying issues in production deployments, and caused researchers to spend weeks re-creating results from papers.
At MIT, we faced these challenges of research reproducibility first-hand and developed an open-source model versioning and management tool called ModelDB. Unlike tools that only performed model tracking (e.g., metrics, hyperparameters, checkpoints etc.), ModelDB is the first system to versions all ingredients required to create the model, namely, the model code, data, configuration, and environment. Each of these ingredients is snapshotted and stored so that any model can be reproduced from scratch. Since its development, we have used ModelDB to enable reproducible research and model development across many application areas.
In this talk, I will discuss why model versioning is important and only continues to increases in value, present real-world applications where model versioning was able to safeguard against significant fines and save hundreds of researcher-hours, and show how by using a simple, open-source tool like ModelDB, any data scientist using Python can make their models reproducible.
We are excited to pull the covers off the Verta MLOps platform that we have been heads-down in developing and battle-testing the Verta platform. We have been privileged to work with several top customers including one of the world's leading workplace collaboration companies and several other AI-forward enterprises, demonstrating outcomes including a 10X increase in the speed of new model deployment.
Get a peak into how Verta supports the full MLOps lifecycle.
While machine learning is spreading like wildfire, very little attention has been paid to the ways that it can go wrong when moving from development to production. Even when models work perfectly, they can be attacked and/or degrade quickly if the data changes. Having a well understood MLOps process is necessary for ML security!
In this talk, we will demonstrate how to the common ways machine learning workflows go wrong, and how to mitigate them using MLOps pipelines to provide reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with less risk, than ever before.
No one set out to 'do devops'. That name was retroactively given to the tools and practices which emerged from the pressure to deliver ever faster at increasingly greater scale. A quick history of the emergent evolution, the pressure, lessons learned and discussion on how they might be applied in other domains..
Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
In this talk, we focus on the impact of model versioning on stable and reliable MLOps. During the talk, we will demonstrate two MLOps pipelines; one with a model versioning solution as its foundation and one without, both using Jenkins for building and delivery and Prometheus/Grafana for monitoring. Through a few real-world simulations, we will show how a robust model versioning system can enable fast remediations of incidents and ensure that the MLOps pipeline can run reliably. We will wrap-up with a few best practices on building an MLOps pipeline using open-source components.
- What does an MLOps stack look like?
- How to build an MLOps pipeline with open-source technologies namely ModelDB, Jenkins and Prometheus?
- Why is versioning key to robust operations?
- Basic understanding of DevOps toolchain including Git, Jenkins, and optionally Prometheus
- Attendees should have some experience building models
In a field that is rapidly evolving but lacks infrastructure to operationalize and govern models, ModelDB 2.0 provides the ability to version the full modeling process including the underlying data and training configurations, ensuring that teams can always go back and re-create a model, whether to remedy a production incident or to answer a regulatory query.
- Layered API-focused client: easy extension of functionality and integration with frameworks
- Integration with popular ML frameworks
- Artifact management: reliably track the result of the training process
- Git-based versioning for all components for a model
- Single pane of glass for a company’s model development
- User management support for authentication, RBAC authorization and workspace isolation
Models are the new code. While machine learning models are increasingly being used to make critical product and business decisions, the process of developing and deploying ML models remain ad-hoc. In the “wild-west” of data science and ML tools, versioning, management, and deployment of models are massive hurdles in making ML efforts successful. As creators of ModelDB, an open-source model management solution developed at MIT CSAIL, we have helped manage and deploy a host of models ranging from cutting-edge deep learning models to traditional ML models in finance. In each of these applications, we have found that the key to enabling production ML is an often-overlooked but critical step: model versioning. Without a means to uniquely identify, reproduce, or rollback a model, production ML pipelines remain brittle and unreliable.
In this webinar, we draw upon our experience with ModelDB and Verta to present best practices and tools for model versioning and how having a robust versioning solution (akin to Git for code) can streamlining DS/ML, enable rapid deployment, and ensure high quality of deployed ML models.