The A.I. Journals: What is Reinforcement Learning?

Whether self-driving cars or automated personal assistants, Artificial Intelligence (A.I.) is becoming an integral part of our lives. In the first of our new series dedicated to Artificial Intelligence - The A.I. Journals - we’ll take some time to explain one of the most interesting fields related to the Artificial Intelligence world - Reinforcement Learning.

Reinforcement Learning is an area of Machine Learning which explores the ways software agents learn by interacting with an environment. We call a software agent a goal-oriented program that runs without constant supervision to perform certain task for an end-user or another program. Instead of being specifically taught how to act (this is when we talk about Supervised Learning), a Reinforcement Learning agent is able to learn by itself and is able to individually decide what actions to take next.

There are two methods that the agent uses for its decision-making process: the agent chooses its action based on past experiences, or if it’s set in a completely new environment and it has no previous experience to use as basis, the agent uses a strategy that we are all familiar with - trial and error. After deciding on a new choice for action, the agent is then able to learn from the outcome of its decision and adjust accordingly.

Famous companies applying Reinforcement Learning are for example Honda's humanoid robot ASIMO, Google’s DeepMind (more particularly check AlphaGo, the program that for the first time ever won against a professional Go player) and self-driving cars.

Now for those of you in the digital advertising field, it’ll be interesting to find out that Reinforcement Learning is also used in the programmatic buying optimization process. Campaign optimization engines rely on Reinforcement Learning to learn from the outcome of their optimization decisions and continuously improve their next actions towards achieving better performance.

A good example is when we talk about exploitation and exploration in campaign optimization. When a campaign optimization engine is initiated to optimize a campaign, it will receive specific instructions about the goal of the campaign (reach, engagement, etc.) and in most cases it will have past experience data to rely on. The engine will be aware what tactics have brought good results - which consumer segments should be targeted, how to bid and at which particular hour of the day, etc. By exploiting these confirmed tactics, the engine will be able to build on them and improve them further.

However, the market is a dynamic world and consumers’ behaviour is changing and shifting every day. There are always new opportunities to be found that could bring significant improvements to the campaign performance. So while applying the proven tactics, the optimization engine will also take time to look for new consumers to target, for new locations, for new time frames that could bring better results and diversify the campaign. And when faced with this completely new environment, the engine will rely on its reflective, continuously learning nature to test the most relevant action and learn from its outcome.