So you used the goodness of DevOps and Agile methodologies to release an app that delivers real business value by automating processes, reducing swivel-chair and simplifying the user journeys (process before tools!). You made the app to be container based with microservice goodness, compliant REST APIs and five types of databases (cloud native of course!).
As your users start using this app – you are amazed at all the high quality and timely business data that is being collected thanks to shifting away from manual processes. You also get tons of data about the app that helps you support it and improve its reliability.
Suddenly some bright spark has the idea of processing all this business and app data to answer questions like ‘Can we forecast customer order journey time?’ and ‘Can we predict journeys that are likely to get stuck?’
You just got your first business problem with those magic keywords of ‘forecast’ and ‘predict’ that allows you to take scikit-learn for a spin!
You discuss with the benefits owner how they will measure the benefits of this. Thereafter, on a Friday afternoon you find yourself installing python, sklearn and downloading some data.
Congratulations – you have taken your first steps in MLOps – you are planning to build a model, understanding what features to use and thinking about how to measure its business performance.
Over the next week or so you build some forecasting and classification models that give you good results (business results – not AUC!). The business benefit owner is impressed and gives you the go ahead to generate a report every week so that such orders can be triaged early. Now you need to start thinking about rebuilding the model regularly, checking and comparing its performance with the active model. You don’t mind running the model on your laptop and emailing the report every week.
This is your second step in MLOps – to understand how to train/retrain your model and how to select the best one for use. For this you will need to establish feature pipelines that run as part of the training process and ensure the whole thing is a one-command operation so that you can generate the report easily.
Then someone has a good idea – why not create a dashboard that updates daily and provides a list of order journeys with a high probability of getting stuck so that we further improve our response time AND provide the order completion forecast to the customers for better customer service (and to reduce inbound service calls).
This puts you in a fix – because till now you were just running the model on your laptop every week and creating a report in MS Excel – now you need to grow your model and make it part of a product (the app).
It should be deployable outside your laptop, in a repeatable and scalable way (e.g. infra-as-code). Integrations also need to be worked on – your model will need access to feature data as well as an API for serving the results. Also you need to stand up your model factory that will retrain models, compare with existing ones (quality control) and deploy as required. You will also need to think about infrastructure and model support. Since it will be used daily and some benefit will depend on it working properly – someone needs to keep an eye on it!
This is the third and biggest step in MLOps that moves you from the realm of ad-hoc ML to an ML Product – with product thinking around it like assurance, support, roadmap and feedback.
Now you have to check in all your code so that others can work on the app in case you are chilling on a beach somewhere.
This third big step brings hidden complexity in monitoring requirements. While the model was on your laptop being used on a weekly-basis you did not have to worry about the ‘model environment’ or automated monitoring. You had time for manual monitoring and validation.
Now that model will be deployed and run daily with the output being used to drive customer interaction we cannot depend on manual monitoring. Once the model is deployed it will continuously give forecasts (as orders are generated). If those forecasts are too optimistic then you will increase the in-bound call pressure as people chase up on orders. If they are too conservative then you will reduce sales as customers might find the wait too long. Also unlike software, your ML model could start misbehaving due to data drift (intended or unintended). If you don’t detect and correct for this the forecast results from the model could stop adding any value (the best case) or worse actually increase customer’s pain (the worst case).
In traditional software we would trap these data issues at the edge through validations and we would log these as error events. But here the data can make perfect business sense just that the model can’t make sense of it to give a forecast or a prediction.
Therefore we need to start checking data for data drift, model for bias, model performance, features for validity and the business performance of the app for continued benefit realisation.
Also, this step by step approach for AI/ML is old school. We need to deploy a more continuous approach to discover new problems that can either be farmed off to multi-disciplinary teams if the org is mature enough or can be tackled in phases (going from analytics to prediction to prescription) if the org is still developing the right skill sets.