Emmanuel Daucé is associate professor at the Ecole Centrale de Marseille, doing his research at the Institut de Neurosciences de la Timone (France).
His research lies at the crosssroad of machine learning, artificial intelligence and neuroscience, seeking to develop innovative computational models and methods though remaining consistent with the principles of biological systems. His current work focuses on sample efficient learning and perception in natural and artificial systems (Daucé, 2018, Daucé et al, 2020, Dabane et al, 2022, Daucé, 2022 ).
He graduated from the Ecole Nationale Supérieure d'Electronique, d'Electrotechnique, d'Informatique et d'Hydraulique de Toulouse (1995), and obtained a Ph.D in Knowledge Representation and Formal reasoning from the Ecole Nationale Supérieure de l'Aeronautique et de l'Espace (2000), on learning and plasticity in artificial neural networks with random recurrent connectivity graphs ( Daucé et al, 1998 ). He contributed to extend the model to multiple populations (Daucé et al. ,2001), and spatio-temporal sequence learning (Daucé et al., 2002).
He joined the Institut des Sciences du Mouvement in Marseille in 2001, where he contributed to develop neurally plausible reinforcement schemes in closed-loop control systems (Daucé, 2004, Daucé and Dutech, 2010), address spike-timing dependent plasticity in balanced networks of spiking neurons (Henry et al, 2006, Daucé, 2014) and develop models of dynamic retention in discrete neural-fields (Daucé, 2004).
He more recently joined Viktor Jirsa's group at the Institut de Neurosciences des Systèmes, at the Faculté de Médecine de La Timone (Marseille), where he contributed to develop on-line learning methods for non-stationary data streams - adapted to the case of Brain Computer Interfaces (Daucé and Thomas, 2014), and participated in modelling brain non-stationarities with simple neural-mass dynamics on large-scale connectivity graphs (Golos et al., 2015).
open_in_new
Daucé, E. (2022) Concurrent Credit Assignment for Data-efficient Reinforcement Learning, 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, doi: 10.1109/IJCNN55064.2022.9892560.
The capacity to widely sample the state and action spaces is a key ingredient toward building effective reinforcement learning algorithms. The method presented in this paper relies on an occupancy model, that is the empirical distribution of the states encountered by the agent under a given policy, that is its “domain of operation”. Then, under a uniform occupancy prior assumption, an evidence lower bound on the parameters of the policy provides a way to express a balance between two concurrent tendencies, namely the widening of the occupancy space and the maximization of the rewards, reminding of the classical exploration/exploitation trade-off. During training, both the policy and the occupancy model are updated as the exploration progresses, and that new states are undisclosed during the course of the training. Implemented on an actor-critic off-policy on classic continuous action benchmarks, this approach is shown to provide significant increase in the sampling efficacy, that is reflected in a reduced training time and higher returns, in both the dense and the sparse rewards cases.
open_in_new
Dabane, G., Perrinet, L. U., & Daucé, E. (2022) What You See Is What You Transform: Foveated Spatial Transformers as a bio-inspired attention mechanism, 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, doi: 10.1109/IJCNN55064.2022.9892313.
Decoding the semantic content of images is nowadays dominated by the use of deep convolutional neural networks (DCNNs), However, their generalization capability is still undermined by the small translation invariance of their max-pooling layers. Taking inspiration from biological vision, we develop here a new methodology for translation-invariant processing with DCNNs. We build upon a recent model that implements two key biological mechanisms: foveated vision and the separation of the visual processing into a “what” and a “where” pathways. Alongside such foveal vision, we demonstrate the capability of a foveated spatial transformer to learn both pathways in an end-to-end fashion, without any spatial labelling whatsoever. Our results pave the way towards a new class of spatial visual transformers, implementing the principles of active (saccadic) vision over large visual displays.
open_in_new Daucé, E. (2020). End-Effect Exploration Drive for Effective Motor Learning. proc. of 1st International Workshop on Active Inference (IWAI 2020), CCIS 1326:114-124,. Stemming on the idea that a key objective in reinforcement learning is to invert a target distribution of effects, end-effect drives are proposed as an effective way to implement goal-directed motor learning, in the absence of an explicit forward model. An end-effect model relies on a simple statistical recording of the effect of the current policy, here used as a substitute for the more resource-demanding forward models. When combined with a reward structure, it forms the core of a lightweight variational free energy minimization setup. The main difficulty lies in the maintenance of this simplified effect model together with the online update of the policy. When the prior target distribution is uniform, it provides a ways to learn an efficient exploration policy, consistently with the intrinsic curiosity principles. When combined with an extrinsic reward, our approach is finally shown to provide a faster training than traditional off-policy techniques.
open_in_new Daucé, E., & Perrinet, L. U. (2020). Visual search as active inference. proc. of 1st International Workshop on Active Inference (IWAI 2020), CCIS 1326:165-178. Springer, Cham. Visual search is an essential cognitive ability, offering a prototypical control problem to be addressed with Active Inference. Under a Naive Bayes assumption, the maximization of the information gain objective is consistent with the separation of the visual sensory flow in two independent pathways, namely the ``What'' and the ``Where'' pathways. On the ``What'' side, the processing of the central part of the visual field (the fovea) provides the current interpretation of the scene, here the category of the target. On the ``Where'' side, the processing of the full visual field (at lower resolution) is expected to provide hints about future central foveal processing given the potential realization of saccadic movements. A map of the classification accuracies, as obtained by such counterfactual saccades, defines a utility function on the motor space, whose maximal argument prescribes the next saccade. The comparison of the foveal and the peripheral predictions finally forms an estimate of the future information gain, providing a simple and resource-efficient way to implement information gain seeking policies in active vision. This dual-pathway information processing framework is found efficient on a synthetic visual search task with a variable (eccentricity-dependent) precision. More importantly, it is expected to draw connections toward a more general actor-critic principle in action selection, with the accuracy of the central processing taking the role of a value (or intrinsic reward) of the previous saccade.
open_in_new Daucé, E., Albiges, P., & Perrinet, L. U. (2020). A dual foveal-peripheral visual processing model implements efficient saccade selection. Journal of Vision 20(8):22 We develop a visuomotor model that implements visual search as a focal accuracy-seeking policy, with the target’s position and category drawn independently from a common generative process. Consistently with the anatomical separation between the ventral versus dorsal pathways, the model is composed of two pathways that respectively infer what to see and where to look. The “What” network is a classical deep learning classifier that only processes a small region around the center of fixation, providing a “foveal” accuracy. In contrast, the “Where” network processes the full visual field in a biomimetic fashion, using a log-polar retinotopic encoding, which is preserved up to the action selection level. In our model, the foveal accuracy is used as a monitoring signal to train the “Where” network, much like in the “actor/critic” framework. After training, the “Where” network provides an “accuracy map” that serves to guide the eye toward peripheral objects. Finally, the comparison of both networks’ accuracies amounts to either selecting a saccade or keeping the eye focused at the center to identify the target. We test this setup on a simple task of finding a digit in a large, cluttered image. Our simulation results demonstrate the effectiveness of this approach, increasing by one order of magnitude the radius of the visual field toward which the agent can detect and recognize a target, either through a single saccade or with multiple ones. Importantly, our log-polar treatment of the visual information exploits the strong compression rate performed at the sensory level, providing ways to implement visual search in a sublinear fashion, in contrast with mainstream computer vision.
open_in_new Daucé, E, Albigès, P & Perrinet, L (2019) Learning where to look: a foveated visuomotor control model. In CNS*2019, July 13-17, Barcelona, Spain. In computer vision, the visual search task consists in extracting a scarce and specific visual information (the target) from a large and crowded visual display. This task is usually implemented by scanning the different possible target identities at all possible spatial positions, hence with strong computational load. The human visual system employs a different strategy, combining a foveated sensor with the capacity to rapidly move the center of fixation using saccades. Saccade-based visual exploration can be idealized as an inference process, assuming that the target position and category are independently drawn from a common generative process. Knowing that process, visual processing is then separated in two specialized pathways, the where pathway mainly conveying information about target position in peripheral space, and the what pathway mainly conveying information about the category of the target. We consider here a dual neural network architecture learning independently where to look and then at what to see. This allows in particular to infer target position in retinotopic coordinates, independently to its category. This framework was tested on a simple task of finding digits in a large, cluttered image. Simulation results demonstrate the benefit of specifically learning where to look before actually knowing the target category. The approach is also energy-efficient as it includes the strong compression rate performed at the sensor level, by retina and V1 encoding, which is preserved up to the action selection level, highlighting the advantages of bio-mimetic strategies with regards to traditional computer vision when computing resources are at stake.
open_in_new Daucé, E. (2018). Active fovea-based vision through computationally-effective model-based prediction. Frontiers in neurorobotics, 12, 76. What motivates an action in the absence of a definite reward? Taking the case of visuomotor control, we consider a minimal control problem that is how select the next saccade, in a sequence of discrete eye movements, when the final objective is to better interpret the current visual scene. The visual scene is modeled here as a partially-observed environment, with a generative model explaining how the visual data is shaped by action. This allows to interpret different action selection metrics proposed in the literature, including the Salience, the Infomax and the Variational Free Energy, under a single information theoretic construct, namely the view-based Information Gain. Pursuing this analytic track, two original action selection metrics named the Information Gain Lower Bound (IGLB) and the Information Gain Upper Bound (IGUB) are then proposed. Showing either a conservative or an optimistic bias regarding the Information Gain, they strongly simplify its calculation. An original fovea-based visual scene decoding setup is then proposed, with numerical experiments highlighting different facets of artificial fovea-based vision. A first and principal result is that state-of-the-art recognition rates are obtained with fovea-based saccadic exploration, using less than 10% of the original image's data. Those satisfactory results illustrate the advantage of mixing predictive control with accurate state-of-the-art predictors, namely a deep neural network. A second result is the sub-optimality of some classical action-selection metrics widely used in the literature, that is not manifest with finely-tuned inference models, but becomes patent when coarse or faulty models are used. Last, a computationally-effective predictive model is developed using the IGLB objective, with pre-processed visual scan-path read-out from memory, bypassing computationally-demanding predictive calculations. This last simplified setting is shown effective in our case, showing both a competing accuracy and a good robustness to model flaws.
open_in_new Daucé, E (2017) Toward predictive machine learning for active vision. IMOL 2017 – Third International Workshop on Intrinsically Motivated Open-ended Learning, October 4-6, Rome, Italy. We develop a comprehensive description of the active inference framework, as proposed by Friston (2010), under a machine-learning compliant perspective. Stemming from a biological inspiration and the auto-encoding principles, the sketch of a cognitive architecture is proposed that should provide ways to implement estimation-oriented control policies. Computer simulations illustrate the effectiveness of the approach through a foveated inspection of the input data. The pros and cons of the control policy are analyzed in detail, showing interesting promises in terms of processing compression. Though optimizing future posterior entropy over the actions set is shown enough to attain locally optimal action selection, offline calculation using class-specific saliency maps is shown better for it saves processing costs through saccades pathways pre-processing, with a negligible effect on the recognition/compression rates.
open_in_new Daucé, E (2016) Predicting the consequence of action in digital control state spaces. ArXiv report. The objective of this dissertation is to shed light on some fundamental impediments in learning control laws in continuous state spaces. In particular, if one wants to build artificial devices capable to learn motor tasks the same way they learn to classify signals and images, one needs to establish control rules that do not necessitate comparisons between quantities of the surrounding space. We propose, in that context, to take inspiration from the "end effector control" principle, as suggested by neuroscience studies, as opposed to the "displacement control" principle used in the classical control theory.
open_in_new Zhong, H & Daucé, E (submitted) Sparse online learning with bandit feedback. The bandit classification problem considers learning the labels of a time-indexed data stream under a mere " hit-or-miss " binary guiding. Adapting the OVA (" one-versus-all ") hinge loss setup, we develop a sparse and lightweight solution to this problem. The issued sequential norm-minimal update solves the classification problem in finite time in the separable case, provided enough redundancy is present in the data. An O(√ T) regret in moreover expected in the non-separable case. The algorithm shows effectiveness on both large scale text-mining and machine learning datasets, with (i) a favorable comparison with the more demanding confidence-based second-order bandits setups on large scale datasets and (ii) a good sparsity and efficacy when a kernel approach is applied to non-separable datasets.
open_in_new Clerc, M, Daucé, E & Mattout, J (2016) Adaptive Methods in Machine Learning. In Clerc, M., Bougrain, L. and Lotte, F. (eds): Brain–Computer Interfaces 1: Foundations and Methods , John Wiley & Sons, Inc., Hoboken, NJ, USA. Human biomedical research distinguishes between two principal types of variability, particularly in the domain of cognitive neurosciences and neuroimagery: intrasubject variability and intersubject variability. The research in cognitive neuroscience specifically aims to improve our understanding of the origins of the intrasubject variability in behavioral performance. The results of this research may greatly enhance the development of more robust, adaptive brain–computer interface (BCI)s. This chapter is organized in two parts, which presents the two approaches for describing variability: statistical decoding and generative models. In mathematical terms, adaptive learning is an optimization problem in which the current performance must be optimized while preserving the performances acquired during previous training phases. The chapter presents a wide range of methods of adaptive learning, grouped into two families: methods that perform statistical decoding and methods based on a generative model. These methods will likely be built with ever-improving mathematical tools suitable for online deployment.
open_in_new Daucé, E & Zhong, H (2016) Optimisation quadratique pour l’apprentissage en ligne d’un bandit contextuel. In conférence francophone sur l’apprentissage automatique (CAP 2016), Marseille ,France. Nous développons un algorithme d'apprentissage en ligne de classifieurs multiclasses dans le cas où l'information de classification apparaît sous une forme binaire (réponse correcte ou incorrecte). L'absence d'information de label explicite conduit à échantillonner de manière aléatoire l'espace des labels, sur le modèle des bandits contextuels. L'algorithme développé repose sur l'optimisation à chaque essai d'une fonction de coût, sur le modèle de l'approche ``Passive Agressive'' (Crammer et al, 2006). L'analyse mathématique permet de mettre en évidence des bornes sur la somme des coûts cumulés, à la fois dans le cas séparable et dans le cas non séparable, comparables aux bornes obtenues dans le cas supervisé. Les expériences numériques confirment le bon comportement de l'algorithme d'apprentissage, à la fois sur des données de grande dimension et sur des jeux de données non-linéairement séparables.
open_in_new Daucé, E (2016) Apprentissage et Contrôle dans les Architectures Neuronales. Mémoire d'Habilitation à Diriger les Recherches, Aix-Marseille Université, Marseille, France. The brain, beyond its primary sensori-motor and regulation functions, is an outstanding adaptive system, capable of developping novel responses in novel situations. The principles of machine learning, a fast-developping domain, are at stake for a better understanding of the learning processes in the brain. Computational models of learning have provided several success stories, from which the "layered neural networks" are the most famous ones. This HDR dissertation presents different kinds neural networks models, displaying a more strict obedience to the biological constraints, in particular regarding the recurrent aspect of the neuronal interaction graph, the discreteness of the signals emitted by the neurons and the local aspect of the plasticity rules that govern the synaptic changes. We show in particular how recurrent neural networks organize their sensory input in different regions, how the the synaptic plasticity drives the network toward a more "simple" collective activity, allowing a better separation and prediction of the sensory stimuli, and how motor learning can rely on matching motor primitives with sensory data to organize the physical environment. Several projects are proposed, aiming at expanding some of those ideas into large-scale brain activity models, or also for the design of brain-computer interfaces.
open_in_new Golos, M, Jirsa, V & Daucé, E (2015) Multistability in large-scale models of brain activity, PLoS Computational Biology 11 (12). Noise driven exploration of a brain network’s dynamic repertoire has been hypothesized to be causally involved in cognitive function, aging and neurodegeneration. The dynamic repertoire crucially depends on the network’s capacity to store patterns, as well as their stability. Here we systematically explore the capacity of networks derived from human connectomes to store attractor states, as well as various network mechanisms to control the brain’s dynamic repertoire. Using a deterministic graded response Hopfield model with connectome-based interactions, we reconstruct the system’s attractor space through a uniform sampling of the initial conditions. Large fixed-point attractor sets are obtained in the low temperature condition, with a bigger number of attractors than ever reported so far. Different variants of the initial model, including (i) a uniform activation threshold or (ii) a global negative feedback, produce a similarly robust multistability in a limited parameter range. A numerical analysis of the distribution of the attractors identifies spatially-segregated components, with a centro-medial core and several well-delineated regional patches. Those different modes share similarity with the fMRI independent components observed in the “resting state” condition. We demonstrate non-stationary behavior in noise-driven generalizations of the models, with different meta-stable attractors visited along the same time course. Only the model with a global dynamic density control is found to display robust and long-lasting non-stationarity with no tendency toward either overactivity or extinction. The best fit with empirical signals is observed at the edge of multistability, a parameter region that also corresponds to the highest entropy of the attractors.
open_in_new Zhong, H & Daucé, E (2015) Passive-Agressive bounds in bandit feedback classification. In Hollmén, J and Papapetrou, P eds, proc. of the ECMLPKDD 2015 Doctoral Consortium : 255-264, September 07-11, Porto, Portugal. This paper presents a new online multiclass algorithm with bandit feedback, where, after making a prediction, the learning algorithm receives only partial feedback, i.e., the prediction is correct or not, rather than the true label. This algorithm, named Bandit Passive-Aggressive online algorithm (BPA), is based on the Passive-Aggressive Online algorithm (PA) proposed by [2], the latter being an effective framework for performing max-margin online learning. We analyze some of its operating principles, and we also derive a competitive cumulative mistake bound for this algorithm. Further experimental evaluation on several multiclass data sets, including three real world and two synthetic data sets, shows interesting performance in the high-dimentional and high label cardinality case.
open_in_new Daucé, E, Proix, T & Ralaivola, L (2015) Reward-based online learning in non-stationary environments: adapting a P300-speller with a ``Backspace'' key. In proc. of the International Joint Conference on Neural Networks (IJCNN 2015), July 12-17, Killarney, Ireland: 2864-2871. We adapt a policy gradient approach to the problem of reward-based online learning of a non-invasive EEG-based “P300”-speller. We first clarify the nature of the P300-speller classification problem and present a general regularized gradient ascent formula. We then show that when the reward is immediate and binary (namely “bad response” or “good response”), each update is expected to improve the classifier accuracy, whether the actual response is correct or not. We also estimate the robustness of the method to occasional mistaken rewards, i.e. show that the learning efficacy may only linearly decrease with the rate of invalid rewards. The effectiveness of our approach is tested in a series of simulations reproducing the conditions of real experiments. We show in a first experiment that a systematic improvement of the spelling rate is obtained for all subjects in the absence of initial calibration. In a second experiment, we consider the case of the online recovery that is expected to follow failed electrodes. Combined with a specific failure detection algorithm, the spelling error information (typically contained in a “backspace” hit) is shown useful for the policy gradient to adapt the P300 classifier to the new situation, provided the feedback is reliable enough (namely having a reliability greater than 70%).
open_in_new Daucé, E, Golos, M and Jirsa, V (2015) Global control of attractor switches in large-scale brain dynamics, June 8-10, 1st International Conference on Mathematical Neurosciences, Antibes - Juan les Pins , France. Diffusion Tensor Imaging allows to reconstruct the brain connectivity at large-scale, forming a network of interactions named the ”Connectome”. Dynamical models of brain activity use the connectome couplings to unveil the determinants of the large-scale brain dynamics, as observed in electrophysiology or functional imagery signals. They rely on simplifying assumptions that reduce the populations activity in few ”neural mass” state variables. The Fokker-Planck equation allows to represent the stationary distribution of activities at the network level, depending on a noise (”temperature”) parameter that can be adjusted to fit the data. However, the many non-stationary behaviors observed in the physiological signals are difficult to handle in such models. One question at stake is for instance the anomalous scaling of the signal variance when passing from short (100-500 ms) to long (10-20 minutes) temporal ranges. Those anomalies are interpreted as a signature of criticality, as observed in spin-glass systems near the critical temperature for instance. Our approach to nonstationarity relies on a thorough evaluation of fixed-point multistability in Connectome-based deterministic dynamical systems. Several variants of a deterministic neural mass model, including a local or global threshold adaptation, inspired from the ”graded-response” Hopfield model [4], are used. The resulting multistability maps show non-monotonous transitions from single stability to multiple stability (see Figure 1). Consistently with [5], regions of maximal entropy are identified near the bifurcation line. The number of attractors however exceeds by several orders the numbers reported so far in previous studies. A clustering analysis of the attractors empirical distributions moreover identifies spatially-segregated components, sharing similarities with the fMRI independent components observed in the ”resting state” condition. When noise in introduced in the dynamics, a temporally multistable behavior is obtained (with alternating metastable attractors visited along the same time course) in a wide range of the parameter space. Noise however causes a large proportion of attractors to vanish and become invisible, leaving space to a much smaller attractor sets, including trivial attractors like the “Up” (full brain activation) and “Down” (full brain deactivation) sets. Only the model with a central adaptive threshold, imposing stable density across time, provides a condition where no tendency toward overactivation or extinction is observed. The multistable behavior is obtained on a large parameter range, but the best fit with the ultra-slow functional connectivity dynamics, as observed in the BOLD time courses, is obtained at the edge of multistability, a parameter region that also corresponds to the highest entropy of the attractors distribution. The general conclusion is the importance of the noise-free dynamics in analyzing the attractors landscape, for identifying high-multistability/high entropy parameter regions that both fit with the most physiological distributions of activity, and the most relevant time courses in the noisy condition.
open_in_new Zhong, H, Daucé, E and Ralaivola, L (2015) Online multiclass learning with "bandit" feedback under a Passive-Aggressive approach. In Verleysen M ed., proc. of the 23th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015): 403-408, Bruges, Belgium, April 22-24 2015. This paper presents a new approach to online multi-class learning with bandit feedback. This algorithm, named PAB (Passive Aggressive in Bandit) is a variant of Online Passive-Aggressive Algorithm proposed by [Crammer, 2006], the latter being an effective framework for performing max-margin online learning. We analyze some of its operating principles, and show it to provide a good and scalable solution to the bandit classification problem, in particular in the case of a real-world dataset where it outperforms the best existing algorithms.
open_in_new Thomas, E, Daucé, E, Devlaminck, D, Mahé, L, Carpentier, A, Munos, R, Perrin, M, Maby, E, Mattout, J, Papadopoulo, T and Clerc, M (2014) CoAdapt P300 speller: optimized flashing sequences and online learning, proc of the 6th International Brain-Computer Interface Conference, September 16-19, Graz, Austria. This paper presents a series of recent improvements made on the P300 speller paradigm in the context of the CoAdapt project. The flashing sequence is elicited by a new design called RIPRAND, in which the flashing rate of elements can be controlled independently of grid cardinality. Element-based evidence accumulation allows early-stopping of the flashes as soon as the symbol has been detected with confidence. No calibration session is necessary, thanks to a mixture-of-experts method which makes the initial predictions. When suffcient data can be buffered, subject-specific spatial and temporal filters are learned, with which the interface seamlessly makes its predictions, and the classifiers are adapted online. This paper, which presents results of three online sessions totalling 26 subjects, is the rst to report online performance of a P300 speller with no calibration.
open_in_new Daucé, E and Thomas, E (2014) Evidence build-up facilitates on-line adaptivity in dynamic environments: example of the BCI P300-speller. In Verleysen, M. ed., proc. of the 22th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014): 377-382, April 23-25, Bruges, Belgium. We consider a P300 BCI application where the subjects can write figures and letters in an unsupervised fashion. We (i) show that a generic speller can attain the state-of-the-art accuracy without any training phase or calibration and (ii) present an adaptive setup that consistently increases the bit rate for most of the subjects.
open_in_new Daucé, E (2014) Toward STDP-based population action in large networks of spiking neurons. In Verleysen, M. ed., proc. of the 22th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014): 29-34, April 23-25, Bruges, Belgium. We present simulation results that clarify the role of Spike-Timing Dependent Plasticity (STDP) in brain processing as a putative mechanism to transfer spatio-temporal regularities, as observed in sensory signals, toward action, expressed as a global increase of the target population activity, followed by a reset. The repetition of this activation-reset mechanism gives rise to a series of synchronous waves of activity when the same stimulus is repeated over and over. Our simulation results are obtained in recurrent networks of conductance-based neurons under realistic coupling contraints.
open_in_new Daucé, E, Proix, T. and Ralaivola, L (2013) Fast online adaptivity with policy gradient: example of the BCI "P300" speller. In Verleysen, M. ed., proc. of the 21th European Symposium on Artificial Neural Networks, computational intelligence and machine learning (ESANN 2013): 197-202, April 24-26, Bruges, Belgium. We tackle the problem of reward-based online learning of multiclass classifiers and show that a policy gradient ascent can solve this problem in the linear case. We apply it to the online adaptation of an EEG-based ``P300''-speller. When applied from scratch, a robust classifier is obtained in few steps. When combined with offline calibration, adaptivity to changes is enhanced.
open_in_new Daucé, E. and Proix, T (2013) P300-speller Adaptivity to Change with a Backspace Key. In proc. of TOBI workshop IV: Practical Brain-Computer Interfaces for End-Users: Progresses and Challenges: 105-106, January 23-25, Sion, Switzerland. We develop a simple algorithm that uses the backspace key to recalibrate a standard P300 speller during use. We show it to be efficient in a series of computer simulations mimicking an electrode breakdown, where the spelling accuracy is shown to recover in about 50 trials.
open_in_new Thomas, E, Clerc, M, Daucé, E, Carpentier, A, Devlaminck, D and Munos, R (2013) Optimizing P300-Speller Sequences by RIP-ping Groups Apart. In proc. of the 6th International IEEE EMBS Conference on Neural Engineering: 1062-1065, November 6-8, San Diego, CA, USA. So far P300-speller design has put very little emphasis on the design of optimized flash patterns, a surprising fact given the importance of the sequence of flashes on the selection outcome. Previous work in this domain has consisted in studying consecutive flashes, to prevent the same letter or its neighbors from flashing consecutively. To this effect, the flashing letters form more random groups than the original row-column sequences for the P300 paradigm, but the groups remain fixed across repetitions. This has several important consequences, among which a lack of discrepancy between the scores of the different letters. The new approach proposed in this paper accumulates evidence for individual elements, and optimizes the sequences by relaxing the constraint that letters should belong to fixed groups across repetitions. The method is inspired by the theory of Restricted Isometry Property matrices in Compressed Sensing, and it can be applied to any display grid size, and for any target flash frequency. This leads to P300 sequences which are shown here to perform significantly better than the state of the art, in simulations and online tests.
open_in_new Daucé, E. et Ralaivola, L. (2012) Approche adaptative pour les interfaces cerveau-machine. Actes des XIXèmes rencontres de la société francophone de classification, 29-31 octobre, Marseille, France. Nous considérons un algorithme de classification adaptative pour les interfaces cerveau-machine basé sur le principe d’un signal de “récompense” indiquant le caractère valide (ou non) de la réponse courante. Nous testons cette approche sur une base de signaux issus d’une expérience de “P300 speller”, et nous montrons un apprentissage rapide permettant d’envisager une utilisation en conditions réelles.
open_in_new Daucé, E. (2009) A model of neuronal specialization using a Hebbian Policy-gradient approach with "slow" noise, proc. of the 19th International conference on artificial neural networks (ICANN 2009) part I, Alippi, C. et al. eds, Springer-Verlag Berlin, Heidelberg: 218-228, September 14-17, Limassol, Cyprus. We study a model of neuronal specialization using a policy gradient reinforcement approach. (1) The neurons stochastically fire according to their synaptic input plus a noise term; (2) The environment is a closed-loop system composed of a rotating eye and a visual punctual target; (3) The network is composed of a foveated retina, a primary layer and a motoneuron layer; (4) The reward depends on the distance between the subjective target position and the fovea and (5) the weight update depends on a Hebbian trace defined according to a policy gradient principle. In order to take into account the mismatch between neuronal and environmental integration times, we distort the firing probability with a ``pink noise'' term whose autocorrelation is of the order of 100 ms, so that the firing probability is overestimated (or underestimated) for about 100 ms periods. The rewards occuring meanwhile assess the ``value'' of those elementary shifts, and modify the firing probability accordingly. Every motoneuron being associated to a particular angular direction, we test at the end of the learning process the preferred output of the visual cells. We find that accordingly with the observed final behavior, the visual cells preferentially excite the motoneurons heading in the opposite angular direction.