Electroencephalography (EEG) is a long-established method for investigation of event-related cortical processing, where the electrical activity of the brain is recorded in high-resolution real time by scalp electrodes. The resulting collection of signals is highly complex, being multivariate, non-stationary, extremely noisy and high-dimensional . These inherent properties result in analysis difficulties traditionally overcome by offline averaging of numerous events, time-fixed to a stimulus. In contrast, machine learning approaches provide tools for detection and classification of cortical patterns in real time. Moreover, these methods are valuable for efficient data dimensionality reduction and feature selection, issues which receive growing attention in neuroscience as hardware technology, particularly multi-channel EEG and fMRI, offers increasingly improved spatial and temporal resolution.
The real-time pattern identification potential is particularly important for applications such as brain computer interfaces (BCI) – devices allowing the brain to directly control external appliances through the detection of given cortical patterns. Studies in this area have shown that it is, for example, possible to distinctly differentiate between the single-trial EEG patterns produced during right and left finger movement, both actual and imagined, in sedentary subjects [2,3].
Much motor-based BCI-research has focused exclusively on the primary motor cortex, restricting signal registration to a few predefined, mainly central, electrode locations [4-8]. However, motor actions generate relevant EEG activity in other complementary areas as well [9-11]. Aspects that vary not only throughout movement but also between individuals, such as dipole orientation, affect the spatial EEG pattern and make it difficult to predict which electrodes provide relevant information without imposing potentially restricting assumptions about the signal source. Similarly, BCIs typically limit EEG signal characterization to preset frequency ranges. Studies focusing on optimizing individual feature sets have, however, reported that between subjects, areas and frequencies most relevant for laterality discrimination vary widely [10-12]. As pointed out by Graimann et al , a BCI based on only one phenomenon, such as the event-related potential or event-related synchronisation and desynchronisation, will be less robust and accurate than a BCI based on both or more.
Including all possible signal features would, however, result in an extremely high dimensional feature space, given the myriad of methods for transforming and describing the EEG signal mathematically. As a consequence of the curse of dimensionality , the number of observations must be drastically increased as the feature space grows in order to maintain the same classification results. The extent of the acquired EEG signals is for practical reasons limited and thus the number of features used must be minimized.
Common methods of dimensionality reduction include principal component analysis (PCA) and linear discriminant analysis (LDA) where the original features are mathematically projected onto a lower-dimensional space. Here, however, we look at dimensionality reduction from a combinatorial perspective and attempt to detect which combination of a limited number of features carry relevant information. This process, referred to as feature subset selection, involves discarding redundant or irrelevant features while promoting ones that maintain or improve classification accuracy [10,14]. An optimized feature set leads to faster, computationally more efficient and, most importantly, more accurate classification. Also, a properly designed feature selection process generates a feature relevance ranking, describing how well signal components capture elements of the cortical processing related to given stimuli. There are two distinct approaches to feature subset optimization, termed wrapper and filter feature selection . The former involves simultaneous and continuous optimization of classifier parameters and feature subset. The filter method, on the other hand, involves feature subset selection independent of classifier parameter optimization. The wrapper approach typically gives better results due to maximal integration between classifier and feature subset, yet filter feature subset selection is sometimes preferred since it usually requires less computer resources.
The combinatorial aspect of feature selection has been successfully explored by evolutionary algorithms (EA), within BCI research [11,12], and other areas [16,17], although not in combination with classifier tailoring. EAs are population-based optimization methods inspired by Darwinian evolution, which can, by proper parameter coding, optimize classifier and feature subsets by either the wrapper or filter approach. EAs are also suitable for optimizing classifier parameters, such as multilayer artificial neural network (ANN) weights and architecture [18,19]. ANNs can, given proper design and training, solve any classification problem and have proven effective at generalizing to unseen data . However, standard ANN design procedures require complete external specification of the network architecture, typically based on time-consuming empirical exploration or crude system assumptions. In contrast, network optimization using EAs allows the architecture to evolve much like in biological systems, rendering user intervention or system postulations dispensable. Moreover, allowing evolution of not only internal architecture, but also the included features directly performs feature subset selection in a wrapper fashion. Other classification schemes, such as multiple linear regression (MLR), can be similarly optimized .
There is reason to believe that systematically tailoring classifiers and feature subsets for every individual will maximize extraction of relevant information, as opposed to noise, from the EEG. Consequently, the aim of this study was to design and compare methods for automatic classifier tailoring and feature subset optimization in order to maximize EEG pattern detection accuracy. The results have in part been previously presented in poster format .