Reinforcement learning: the naturalist, the hedonist and the disciplined

Reinforcement learning: the naturalist, the hedonist and the disciplinedElena NisiotiBlockedUnblockFollowFollowingDec 1Embracing the chaos of a biological brain and the order of an electronic one. — Image credit: pursuit of artificial intelligence has always been intermingled with another struggle, more philosophical, more romantic, less tangible..The understanding of human intelligence.Although current breakthroughs in supervised learning seem to be based on optimized hardware, sophisticated training algorithms and over-complicated neural network architectures, reinforcement learning is still as old school as it gets.The idea is quite simple: you are a learning agent in an environment..Our recent successes are not just a product of deep neural networks, but a deep history of observations, conclusions and attempts to comprehend the mechanisms of learning.Reinforcement learning is a field whose origins are hard to trace..The algorithms that we still use today required ideas, such as classical conditioning and temporal-difference learning, to formalize the process of learning.Had it not been for a handful of curious biologists, psychologists and non-conformist computer scientists, the AI community would probably not possess the tools to implement learning.How do we act before unforeseen situations?.This differentiates it from supervised learning, since an agent tries alternatives and selects among them by comparing their consequences.It is associative..Excellence, then, is not an act, but a habit.” ― AristotleA hedonist’s guide to learningWhen it comes to analyzing the human mind, Klopf is quite concise:“ What is the fundamental nature of man?Man is a hedonist.”In his controversial book The Hedonistic Neuron — A Theory of Memory, Learning, and Intelligence, Klopf employs neuroscience, biology, psychology and the disarming simplicity and curiosity of his reasoning to persuade us that our neurons are hedonists..Can such a view lead to explanations for memory, learning and, more generally, intelligence?.What was missing, according to Klopf, were the hedonic aspects of behavior, the drive to achieve some result from the environment, to control the environment toward desired ends and away from undesired ends.In an extensive chapter that criticizes the current principles of cybernetics, as machine learning was termed at that time, one can highlight three lines of attack:Should we use deep neural networks?Just to be clear, two layers sufficed for a network in the 50s to be termed deep..Nowadays, RL algorithms mainly employ temporal-difference learning, which means that when calculating the “quality” of an action in order to make a decision, we also consider future rewards.The temporal-difference and optimal control threads were fully brought together in 1989 with Chris Watkins’s development of Q-learning, one of the most famous reinforcement learning algorithms.In 1992, Tesauro employed the concept of temporal-difference learning on agents that played backgammon..This is the moment and application that persuaded the research community that there is potential in this type of machine learning.Although current threads of research focus on deep learning and games, we would not have the field of reinforcement learning today, had is not been for a bunch of guys talking about cats, neurons and dogs.One could say that the reward we got from solving backgammon, until that point unimaginable difficult task, motivated us to further explore the potential of reinforcement learning.. More details

Leave a Reply