#### EEG Acquisition and Pre-Processing

The study was performed in accordance with the Declaration of Helsinki and approved by the Göteborg University ethics committee. Four healthy untrained subjects, three female and one male, aged 24–43 years, one left-handed, participated in the study. The subjects, comfortably seated in a chair, were instructed to move either the left or the right index finger in a brisk, self-paced manner according to cues presented on a screen. The interval between the randomized cues was four seconds. Each cue was presented for three seconds, during which the subject moved the finger at a self-determined point. Between 250–900 movements were registered for each subject. Movements were recorded with accelerometers attached to the fingers (EGAX-5 monoaxial, Entran Inc., Fairfield, NJ, USA). EEG was acquired at a sampling rate of 256 Hz using active electrodes and the Active Two digital EEG amplifier and recording system from Biosemi, Inc. (Amsterdam, The Netherlands), with 32 scalp electrodes positioned according to the extended 10/20 system.

After amplification, the acquired data was high-pass filtered with cutoff frequency of 1 Hz and a reference average of all channels was subtracted. No notch filter was used. Epochs of -1000 to +500 ms relative to movement were extracted and visually inspected for eye blink artifacts. All data processing was performed with Matlab™(The Mathworks, Massachusetts, USA) software. In order to limit computing times, only 100 movements, randomly selected, were retained per subject. The epochs were divided into training (80%) and validation (20%) data sets containing equal numbers of left and right finger movements.

#### Feature Extraction using Wavelets

The wavelet transform has been shown to be more effective in single-trial EEG characterization than traditional processing approaches [23]. The continuous wavelet transform (CWT) treats a function of time in constituent oscillations, localized in both time and frequency [24]. The CWT is defined as follows:

(1)

(2)

where * denotes complex conjugation, *τ *is referred to as the translation, giving the position in time, and *s *the scale parameter, which is inversely related to the frequency content. Ψ(*τ*) is called the mother wavelet, and in this study the standard Daubechies function is used. The discrete wavelet transform (DWT), in turn, is the result of selecting scales and translations based on powers of two, yielding a more efficient yet as accurate analysis.

In order to limit the analysis to a smaller number of discriminative signal features, the DWT was applied to the difference of the average of the right and left finger movement epochs of the training data set, and the five coefficients accounting for the largest portions of the difference (i.e. with the largest amplitude) were established. This approach is a modified version of the discriminant pursuit method [25]. Thus, five coefficients were extracted for each of the 32 EEG channels, totaling in a feature pool of size 160 for classification. The processing was performed using the Matlab wavelet toolbox.

#### Classifiers

Two classifiers were investigated in this study: the non-linear artificial neural networks (ANNs) and multiple linear regression (MLR).

An ANN is a biologically inspired information processing paradigm comprised of a network of highly interconnected units called neurons [20]. A feed-forward network is constructed by connecting a number of these neurons in layers. The output of a neuron in layer *j*, out of *m*, is computed as:

(3)

where *n *is the number of units in the preceding layer, *x*_{i }is the outputs from the preceding layer, and *w*_{ij }are the corresponding weights. The sigmoidal activation function used here is:

(4)

An ANN learns a given task by training, that is, adapting its weights according to given training data. A properly designed and trained network is insensitive to noise and can approximate solutions to problems it has previously not been exposed to. ANNs fully model the task internally and no mathematical parameterization of the problem is required. The ANN topology must, however, be specified, either by optimization methods or empirically.

The MLR model, for a system with *m *observations and *n *features, is typically stated as:

(5)

where, for pattern number *j*, out of *m*, *y*_{j }is the category estimation, *x*_{ij }is the *i *: th feature in the feature subset of size *n*, and *α*_{ij }and *β*_{j }are parameters that must be established.

#### Evolutionary Classifier and Feature Subset Optimization

There are two distinct stages to our classification procedure: classifier parameter approximation and feature subset selection. In the wrapper feature selection approach, the classifier parameters and feature subset are tailored to the given problem simultaneously. In contrast, for filter feature selection the classifier parameters and the feature subsets are optimized separately. In the filter approach, traditional methods (least squares estimation for the MLR and standard back-propagation [20] for the ANNs) are used for establishing the classifier parameters, whereas in the wrapper scheme these (including ANN topology) are determined using evolutionary algorithms (EAs).

An EA is an optimization scheme inspired by Darwinian evolution, where potential problem solutions are encoded as individuals in a population [26]. A fitness measure is computed for each individual, after which various genetic operations, including reproduction, mutation and recombination are applied. A few individuals that will parent the next generation are selected according to a given stochastic scheme, in which probability of selection is related to relative fitness. To ensure that the maximum fitness of the population never decreases, the fittest individual is replaced in the new population unchanged. This process is repeated until performance decreases, evolution stagnates, or a pre-set number of generations have been completed.

In this study, tournament selection is used and the fitness is computed as the proportion of correctly classified epochs. Crossover has been proven to ruin rather than improve the distributed structure of ANNs [18], and has therefore been omitted. Mutation operations are attempted according to given mutation rates, and can be either structural (modifying classifier architecture, i.e. ANN topology) or parametric (modifying classifier parameters, i.e. ANN weights, MLR parameters and feature substitution). The wrapper mutation operations are summarized in tables 1 and 2. ANN hidden neuron addition is performed by adding a neuron with weak, random weights.

The ANN weights are mutated as follows:

*w *= *ηc *+ *rw*(6)

where *c *is a random number in the range [-3 3] and *r *is a random normally distributed number in the range [0 2]. That is, depending on the random numbers, the classifier parameters are modified drastically or partially. *η *is a mutation step control parameter that initially is set to 1 and adjusted downwards if evolution stagnates.

The ANN and MLR feature substitution operation, utilized for both filter and wrapper optimization, involves substituting a given feature for another, randomly selected from the pool of unused features. Although not done here, it should be pointed out that the number of features can also be subject to optimization for the wrapper as well as the filter approach.

The algorithms were implemented in Matlab and C on a standard PC by one of the authors (M. Åberg), and EEGlab software was used for scalp visualization [27]. For reference, randomly selected feature subsets were evaluated as well.

In summary, the following classification schemes were investigated:

1. **Linear random**: Multiple linear regression with least squares estimation of classifier parameters and a randomly generated feature subset.

2. **Non-linear random**: Artificial neural networks with back-propagation weight estimation and a randomly generated feature subset.

3. **Linear filter**: Multiple linear regression with least squares estimation of classifier parameters and optimization of feature subset.

4. **Non-linear filter**: Artificial neural networks with back-propagation weight estimation and optimization of feature subset.

5. **Linear wrapper**: Multiple linear regression with optimization of classifier parameters and feature subset.

6. **Non-linear wrapper**: Artificial neural networks with optimization of weights, architecture and feature subset.

For the filter methods, 50% of the training data was used for evolutionary fitness computation, and 50% was used for establishing classifier parameters. The fitness was computed on fully trained classifiers. For the wrapper methods, where classifier parameter estimation and feature subset optimization are integrated, only one training dataset is required for evaluating the fitness.