POSTED: June 16, 2020
The Machine Learning Process
‘All models are wrong, but some are useful’
As this famous quote by George Box (known as the Box Theorem) shows, no model is ever going to be 100% accurate. If one is, run for the hills! Rather, models should be evaluated by their impact on the bottom line, or how useful they are to the business. (To be clear, an accurate model is almost always going to be more useful than an inaccurate model.)
In this blog post, we will explore a way in which models can be more useful, by embracing and leveraging uncertainty to maximize business results.
Much of the time, business users want a single number to represent the ‘goodness’ of a model, but machine learning models can tell us so much more than just a single number (like accuracy). With some small tweaks, business users can get dramatically more utility out of their models.
Almost all machine learning models fall under one of two paradigms:
- Regression – predicting a continuous outcome, such as a sales or budget number
- Classification – predicting a categorical outcome, such as a segment, or a grouped outcome.
Even many of the newest developments in machine learning and artificial intelligence (such as Computer Vision, Natural Language Processing, etc.) are just creative applications of these paradigms.
As Dr. Box informs us, these models are certain to have errors. How can they be useful to businesses then? Suppose I have a model that projects my business will have $2,000,000 in sales this next month, but only $1,900,000 comes in . The model was wrong, but was it seriously wrong? Was the $1,900,000 in the expected range of outcomes, or was there a serious issue with an underlying business process? A simple regression point prediction won’t tell you that.
With classification models, variation in the outcome can be also obscured. Suppose we have a model predicting whether the price of a stock will increase or decrease and there is a 55% chance the stock price will go up. This doesn’t give visibility into the magnitude of the change. Is the expected range plus or minus 1%, 5%, 10% or as much as 50%?
Distributional Modeling, A Better Way
One solution that is not often discussed is to model entire probability distributions. Why is this an improvement on simple regression or classification models? Let’s look at the previously mentioned example of a sales projection and apply some modeled distributions to see we should be concerned that the model missed by $100,000. Assume that we have a model predicting the expected sales distribution for that month. The distribution is going to account for more than just the mean and standard deviation, but also skewness (the direction the distribution is imbalance toward) and tailweight (the magnitude of that imbalance), so we can gain a lot of insight into possible edge scenarios. Consider each example being from a different, parallel universes.
Each of these distributions has the roughly the same mean, meaning a simple regression model would be at risk of predicting all of these scenarios as the same. As we see from the distributions, however, these are very different scenarios with different risk profiles, upsides, and downsides.
While specific interpretations would depend on the business itself, some potential interpretations would be:
- Example 1: One would interpret the month as disappointing, but well within the range of what would be expected to happen.
- Example 2: While the company missed it’s sales goal, there was a major downside risk that was avoided.
- Example 3: The month was disappointing, but the results were very near the edge of the distribution, meaning there may be a faulty underlying business process that needs fixing.
- Example 4: The month was about in line with expectations, but the company missed out on major upside.
As I’ve shown here, the shape of distributions can drastically change how you interpret results, beyond what you can see with traditional machine learning models.
What does this mean for business?
Distributions are a great way to understand and leverage uncertainty but a business user often needs a single number to put in a report. For the sake of simplicity you can still provide answers in traditional regression or classification ways! With distributions you can pull the mean, median, or a random value from the distribution to answer a question in a regression way. You can also look at the probability of going over or under a certain value (like a last year’s value), to answer the question in a classification way. The added context of the distribution doesn’t take away from the information needed to make business decisions.
As Dr. Box admonishes, we need to think of the usefulness of our selected model. Think of your business. Would you rather have insight into a range of possible scenarios, or just a single value?
Covail can help
Two things we’re passionate about at Covail are trustworthiness and intelligent operations. Distributional modeling enables these features. All predictions have some error, distributional modeling allows technical experts and business users to more fully evaluate which misses are due to chance, and which ones are due to bad models, so you can be most confident in the models you deploy.
Intelligent operations require an understanding of full scenarios, not just snapshots. As shown above, distributional modeling enables these views, allowing business users to operate with a full understanding of possible scenarios.
If you want to level up your machine learning models and take advantage of distributional modeling, please email us at firstname.lastname@example.org call us at (614) 591-0440, or visit covail.com. We’re looking forward to optimizing your business.