Reinforcement learning of 2-joint virtual arm reaching in computer model of sensory and motor cortex

TitleReinforcement learning of 2-joint virtual arm reaching in computer model of sensory and motor cortex
Publication TypeConference Paper
Year of Publication2012
AuthorsNeymotin, S. A., Chadderdon G. L., Kerr C. C., Francis J. T., & Lytton WW.
Conference NameSociety for Neuroscience 2012 (SFN '12)
KeywordsSFN, Society for Neuroscience
Abstract

Few attempts have been made to model learning of sensory-motor control using spiking neural units. We trained a 2-degree-of-freedom virtual arm to reach for a target using a spiking-neuron model of sensory and motor cortex that maps proprioceptive representation of limb position onto motor commands and undergoes learning based on reinforcement mechanisms suggested by the dopaminergic reward system. The virtual arm consisted of two joints and two arm segments. Arm segments were controlled by a pair of flexor/extensor muscles. A model of motor cortex (M1) sent motor commands to the virtual arm and received proprioceptive position information from it via sensory cortex (S1). Output M1 units were partially driven by noise, creating stochastic movements that were shaped to achieve desired outcomes. M1 and S1 each had 576 excitatory and 192 inhibitory event-based integrate-and-fire neurons, with AMPA/NMDA and GABA synapses. Units were interconnected probabilistically. Plasticity was enabled in feedforward, feedback, and recurrent connections between excitatory to excitatory and excitatory to inhibitory units. Reinforcement learning (RL) used eligibility traces for synaptic credit/blame assignment, and a global signal (reward, punishment) corresponded to dopaminergic bursting/dipping. Pre- and postsynaptic spike-timing modulated the sign of LTP/LTD. Reward (punishment) was delivered when the distance between the hand and target decreased (increased). Learning occurred over 100 training sessions. In each session the arm started at 256 different positions and was trained for 400 ms or longer. We monitored short-term (200 ms) motor commands after each training session and found that the network was able to learn correct movements after only a few training sessions. However, performance gradually improved with further training. After training, the network reached the arm to target from multiple starting positions, over the course of a 30 s trial. This was most clearly pronounced when the arm started at a large distance from the target. Injecting noise after training sometimes enhanced motor performance in the 30 s trials due to the noise occasionally pushing the arm from locations where the movement command was unknown onto a known trajectory. Our model demonstrates the feasibility of training a neuronal network to control motor programs using spike-timing-dependent reinforcement learning. Our model predicts that basic performance of simple motor programs is learned quickly, but optimal performance takes significantly longer. In addition, our model suggests that reinforcement learning may produce attractors governing neuronal network dynamics.