RL requires a lot of data, and as such, it has often been associated with domains where simulated data is available (gameplay, robotics). It also isn’t easy to take results from research papers and implement them in applications. Reproducing research results can be challenging even for RL researchers, let alone regular data scientists. As machine learning gets deployed in mission-critical situations, reproducibility and the ability to estimate error become essential. So, at least for now, RL may not be ideal for mission-critical applications that require continuous control.

AI notwithstanding, there are already interesting applications and products that rely on RL. There are many settings involving personalization, or the automation of well-defined tasks, that would benefit from sequential decision-making that RL can help automate (or at least, where RL can augment a human expert). The key for companies is to start with simple uses cases that fit this profile rather than overly complicated problems that “require AI.” To make things more concrete, let me highlight some of the key application domains where RL is beginning to appear.

Industrial robots are capable of extreme precision and speed, but they normally need to be programmed very carefully in order to do something like grasp an object. This is difficult and time-consuming, and it means that such robots can usually work only in tightly controlled environments. The Fanuc robot uses deep reinforcement learning to pick a device from one box and putting it in a container. Whether it succeeds or fails, it memorizes the object and gains knowledge and train’s itself to do this job with great speed and precision. After eight hours or so it gets to 90% accuracy or above, which is almost the same as if an expert were to program it.

Pit.ai uses RL for trading strategies. It turns out to be a robust tool for training systems to optimize financial objectives.

Online platforms are beginning to experiment with using machine learning to create personalized experiences. Several researchers are investigating the use of RL and other machine learning methods in tutoring systems and personalized learning. The use of RL can lead to training systems that provide custom instruction and materials tuned to the needs of individual students. A group of researchers is developing RL algorithms and statistical methods that require less data for use in future tutoring systems.

Companies collect a lot of text, and good tools that can help unlock unstructured text will find users. Earlier this year, AI researchers at SalesForce used deep RL for abstractive text summarization (a technique for automatically generating summaries from text based on content “abstracted” from some original text document). This could be an area where RL-based tools gain new users, as many companies are in need of better text mining solutions.

RL is also being used to allow dialog systems (i.e., chatbots) to learn from user interactions and thus help them improve over time (many enterprise chatbots currently rely on decision trees). This is an active area of researchand VC investments: see Semantic Machines and VocalIQ—acquired by Apple.

Microsoft recently described an internal system called Decision Service that has since been made available on Azure. This paper describes applications of Decision Service to content recommendation and advertising. Decision Service more generally targets machine learning products that suffer from failure modes including “feedback loops and bias, distributed data collection, changes in the environment, and weak monitoring and debugging.”

Other applications of RL include cross-channel marketing optimization and real time bidding systems for online display advertising.