This is the first on a series of posts about using Machine Learning and optimization methods to improve the performance of marketplaces. From our own experience, we know there are lots of ways to improve a marketplace using Machine Learning, and a good implementation of these systems can bring a great competitive advantage.
But before diving real deep into the nook and crannies of any of these problems and solutions, we’ll give a brief introduction to each of them. After reading this post you will be comfortable explaining what a Marketplace is, what are the Three-Sided Marketplaces and how Machine Learning can improve all of its sides.
Marketplace is a term that has been going around for a while now, but what is it exactly? Think of some of the brand new companies that popped up into relevance this decade: Uber, Airbnb, Amazon, DoorDash. What do they all have in common? Besides the fact that they all are tech-centric, they also are marketplaces.
Marketplaces are all about connecting vendors to customers, guided by a curated experience brought by the marketplace owner (in this case, any of the companies mentioned before). Unlike a traditional business, the marketplace owners do not own any of the vendors, but instead focus on bringing both parties involved the best possible experience.
Delivering a great experience for a marketplace is not easy, and sometimes it can be even harder when even more sides are involved. For example, a food delivery business with couriers would be a Three-Sided Marketplace: customers order food from a restaurant and are assigned a courier that brings the food to their door.
Three-Sided Marketplaces bring a new actor into play, who will receive a monetary compensation for completing the experience between the customer and the vendor. Traditional marketplaces have lots of possible business optimizations, and if they are three-sided then we gain even more interactions to optimize! What do we mean by this?
Let’s present an example to show a possible Three-Sided Marketplace: let’s suppose we have a book delivery platform where customers search for and buy books, bookstores offer their catalogues and when a book is bought they package it up for a courier to pick up. The example could be about books, food or furniture, all of them would have similar opportunities to improve their business.
And what can we improve exactly? The best possible experience has to be delivered to all three parties involved based on their interests. Users might want recommendations for books they might enjoy based on their search history or previous purchases. Bookstores might benefit from a demand forecast of how many orders will arrive on the following days, so as to stock on a specially requested title or to organize their work hours based on when most of the demand is coming in. Meanwhile, giving the couriers schedules and benefits based on demand/supply would allow them to win more money. This is where Machine Learning comes in to help us optimize each of these sides of the marketplace.
The term Machine Learning is everywhere nowadays but it’s often misunderstood. The truth is that it’s an application of Artificial Intelligence capable of creating systems that automatically improve by experience. This experience fed into the algorithm is just data, and there’s plenty of that if your marketplace has multiple transactions through the day.
These forecasts allow us to make intelligent decisions to optimize each side of the marketplace. The best thing is that the same prediction could have multiple uses: having an approximate preparation time for an order lets the user know when the order will arrive at their door but could also serve as one of the many possible inputs for better logistics, by assigning the correct courier to that order or dispatching them just in time to avoid getting there too late or too early.
Having good predictions to optimize the marketplace translates into a lower operational cost, a faster product and new opportunities that otherwise wouldn’t be there. These are some of the ways Machine Learning can help your business.
Knowing the demand for a specific day or hour lets you plan ahead: make sure you have enough stock, budget or man hours to handle the demand from your customers. When you can’t see the future, a demand forecast is the next best thing. Using forecasting methods, it’s possible to get a prediction of just how much demand you’ll have for any unit of time. In our book marketplace example, with a demand forecast bookstores would have an estimate of how many books will be ordered that day, allowing them to prepare their stock so that all customers get to buy their book. It’ll also help them know just how many employees they’ll need on-board to package all those orders.
The best thing is that the data you need to have for a decent forecast is quite simple: the amount of demand you had in the past. Machine Learning methods will take care of finding good estimates based on this historical data. The more specific your data, the better! It’s not the same to know that you got one hundred orders on a Thursday than to know that you got 100 orders on a rainy Thursday the week before Christmas. This extra information to be consumed by the model are called features, and allows it to improve its accuracy.
This is how a plot of demand forecast usually looks like. Notice that the forecast is not always on point, it might under or over predict for a certain hour of the day. But what happens if it’s really important for my business that I never under-predict my demand? For example I’d rather have an over-stock rather than letting clients go away without their purchase. The good thing is that Machine Learning has us covered: business logic like this can be added to the system, to encourage over/under-predictions and have a result that is more useful to your specific business situation.
If we just have a forecast someone has to be in charge of looking at it and making the hard decisions. Coming back to the book shop example, only a domain expert like the store owner or a seasoned employee would be able to translate the demand forecast into good business decisions, like how many employees are needed to handle that demand. This gets even harder the more complex the constraints: what if I just have a limited amount of money to stock on certain books, or if I need to efficiently split the work between my employees to package certain orders before a tight deadline. These are all optimization problems, since we want to know how to get the most revenue considering all of those constraints. This is where Supply and Demand Optimization comes in, to make those hard decisions easy, automatic and as optimal as possible.
If you know about logistics, you’ll know that there are a lot of very good and well known mathematical and computational methods to solve these kinds of problems. Given your business goals and requirements you can associate a cost function to them. A cost function maps your business goals and constraints into inputs for a mathematical function that can be optimized.
These solutions coming from Operations Research are great, but can be improved with the help of Machine Learning. Remember the inputs to your cost function? Well, why not use one of your forecasts as one of them? If you have a forecast of your current supply, demand, and preparation time you’ll have a cost function which is better at modeling the reality where your business operates.
For example, if we want to accurately assign couriers based on demand and supply, having a real-time demand forecast as input to our cost function will do wonders. Now the function knows how many orders to expect on that time slot, helping it achieve a more accurate assignment of resources. This helps avoid under-staffing peak hours and over-staffing hours with less work to do. Not only that, our cost function could also assign the same courier to deliver multiple packages on the same trip. This optimizes its results even more!
In this example we can see how using the demand forecast as input, we can intelligently assign couriers when they are actually needed.
Demand Forecast is just one of the possible inputs for an optimization system. If time is one of the constraints, then having a preparation time prediction will be of great help.
As we mentioned, a preparation time prediction is a great tool to optimize supply and demand scenarios. Deliveries take time and users are not gonna wait a month for their order to arrive home. This means that your order optimization problem has a time constraint and similarly to demand forecast, a preparation time prediction is the next best thing when you can’t tell the future. Another similarity is that the data needed for these predictions is also simple: how long did your previous orders take? And of course, the more features the better the results: were those orders made on bad weather? Did that hour have a high demand causing a slower service? This information can be easily translated into features to improve the accuracy of your prediction.
In your cost function, you’ll probably end up having your time constraint represented in there somehow: the more time you waste, the higher the cost. Make note that you can waste time both by getting to the destination late and by getting there early! A minute in which the courier waits with nothing to do is a minute wasted. With a preparation time prediction you’ll get an approximate duration of the order, helping your cost function decide on the correct courier to take that order just in time.
Customers will also benefit from this information: getting a good approximate prep time for their order improves the user experience. Even better if it’s not just some hard coded number, but instead a dynamic prediction made with Machine Learning with features such as weather, traffic, current and forecasted demand, and more!
Speaking of customers, they are also producing tons of data each time they interact with a marketplace. What did they buy before? Did they rate those products positively or negatively? Having this information allows for personalized recommendations of products they might enjoy based on previous interactions.
If we want to build personalized recommendations for a user and we just look at the their data we might not get that far, we’d probably end up recommending a product they bought before. This means we wasted an opportunity on a good recommendation on something they were going to buy anyways!
Robust recommendations start by looking for similar users and items. Looking at the interactions of the user we are targeting our recommendation, we can find users with similar purchases or items that are usually bought together that might be interesting to them.
In this example our target user has bought a shirt, a pair of shoes and telescope. We can find other users with similar interests to them, assuming their tastes are similar their previous purchases will make for good recommendations.
Whatever the method we use to build that recommendation, recommendations can use the help of being as smart as possible. A user might love or hate a recommendation depending on the time of the day, recent activity or if they are presented with multiple recommendations. Properly ranking and showing them in the correct order might do the difference.
For example, in a food delivery marketplace a customer will probably have different desires through the day: when it’s early they might be more interested in having their breakfast with coffee or having lunch with a salad, meanwhile during the night they might be interested in heavier meals.
Remember how we mentioned before that we might waste a recommendation on someone who was already going to buy that product? A similar thing happens with advertisements. How can we know for sure it was our ad that generated the purchase? Did I just waste money by showing an ad to someone who was already going to purchase that product? And what if the ad actually worsened my probability of generating a purchase?
This is where Uplift Modeling comes in: it allows us to model the impact of an action on our users and helps us answer the question if our advertisement campaign actually helped or not. It does this by dividing each user into four possible categories:
Of course it would make sense to target the persuadables who might be interested in buying after seeing an advertisement. This means less wasted effort targeting people not interested in the ad, whether because they were already going to buy the product advertised or because they are simply not interested in seeing an ad about it.
This lets us build understandable and effective ad campaigns. Instead of going in blind and trying out different changes and seeing the results, Uplift Modeling lets us predict the response to our campaign and measure its effectiveness.
Marketplaces are usually completely open for anyone to sign up and start using them. Most users are there to receive or provide a service when interacting with a marketplace, but a small percentage of them will try to abuse the system for their own gain and for the detriment of the rest of the marketplace. Scams, identity theft and suspicious behavior are some of the common ways people commit fraud. If we think about our bookstore marketplace it’s a big deal if a vendor is scamming users, but it’s even a bigger deal if our marketplace is a fintech: an account theft could empty out an user’s account. Just a small part of the transactions being fraudulent could bring a huge loss for the whole marketplace.
The fact that most interactions with the marketplace are not fraudulent is what fuels all Fraud Detection systems: detecting fraud is the same as detecting an outlier. Previously, Rule Based systems were the go-to solution for detecting these cases, but they require fraud analysts manually creating and adjusting fraud scenarios to check. That’s where Machine Learning comes in: by creating a system that can detect fraud automatically on real time. It can be hard or even impossible for the human eye to detect subtle changes in user behavior or alert on events that are out of place, but it’s not for a machine. A suspicious behavior can be defined for a user based on their and other users previous actions. This means that the behavior of all users help detect what is and what isn’t an anomaly, and things like their average purchase or transaction amount for a certain time of the day can be used. Machine Learning can find a good upper and lower bound for those values, and automatically detect and report if some outlier happens.
In this example we have bounds around each transaction for a user based on the average transaction. At 12PM, a suspicious transaction goes through the bounds and is detected by the system. This could mean that a suspicious and uncommon activity happened and was detected, like buying a really expensive product or transferring lots of money to an unknown account.
These are just some of the possible ways to use Machine Learning to boost the performance of your marketplace. Each marketplace has its own complexities and challenges, and the more sides it has the more opportunities to optimize each of them!
All of these solutions should translate into an automatic, reliable and efficient software system, easily scalable and monitorable. Building a good Machine Learning model ends up being just a small part of the process, accompanied by good software engineering, data and business understanding and a scalable infrastructure. It might be easy to try out some of these solutions by just calling a programming library, but it’s no use if it can’t be deployed and consumed reliably by each side of the marketplace!
To avoid common pitfalls and accelerate time to market you will surely benefit in having expertise building such systems and we have tons of that at Mutt Data. Our process includes understanding the business and data to build a custom solution, using Data Engineering tools such as Airflow and MLflow to build and monitor our systems. We deploy to Public Cloud Providers like Amazon Web Services or Google Cloud, which greatly speed up the construction of these systems and simplifies their maintenance. You can read more about our solutions and processes in our website.
Ali, Fareeha. “What are the top online marketplaces?”. Digital Commerce 360, URL
DC Velocity Staff. “Online marketplaces seize growing revenue share during pandemic e-commerce boom”. DC Velocity, URL
The header picture belongs to Postcards of markets category of Wikimedia Commons:
Champlain Market, Quebec: Unknown author, from the Leonard A. Lauder collection of Raphael Tuck & Sons postcards. Public domain.