This approach combines the strengths of two algorithms that are already well known in reinforcement learning and are believed to be present in humans and rodents as well. “Model-based” algorithms learn environmental models that can then be simulated to produce estimates of future reward, while “non-model” algorithms learn future reward estimates directly from environmental experience. Model-based algorithms are flexible but computationally expensive, while non-model algorithms are computationally cheap but inflexible.
The algorithm inspiring our theory combines some of the flexibility of model-based algorithms with the efficiency of non-model algorithms. Because the calculation is a simple weighted sum, it is computationally efficient, just like a modelless algorithm. At the same time, by distinguishing between premium expectations and government expectations (Forecast Map), it can quickly adapt to changes in the premium by updating the premium expectations simply by leaving the government expectations unchanged (see our recent article for more information).
In future work, we intend to further test the theory. Since the prediction map theory can be inverted a neural network architecture, we want to explore the extent to which this learning strategy can promote flexible and rapid planning in silico.
More generally, a major challenge for the future is to look at how the brain integrates different types of learning. Although we presented this model as an alternative to model-based and non-model brain learning, a more realistic view is that the brain simultaneously coordinates many types of learning during learning and planning. Understanding the combination of these learning algorithms is an important step toward understanding the human and animal brains, and can provide key insights into the design of an equally complex, diverse artificial intelligence.
Read whole paper