Pattern Classification: a Walkthrough Using a Simple Example

Can we recognize a person by his or her typing? Idea: Measure the timing of the keystrokes. Spot the differences between different persons.

Always a good idea: Start by recording the signals you want to examine into files. This makes it much easier to do experiments and tests. Later, when you know what you're doing, you can switch to real-time data processing. In our case, the recording software is KeyRecorder. It writes CSV files.

Another good idea: Visualize your raw data. Get an idea of what you have at your hands. Are there obvious patterns you can leverage? In out case, the CSV files are read by KeyExaminer, a software intended to be used as workspace for data exploration. A spreadsheet or special data analysis software may work as well.

The first idea is to display the average time difference and its fluctuation (technically: standard deviation) between all two pairs of keys. Average = sum of the items / number of the items. Standard deviation = sqrt(average of the square - square of the average). We see that many pairs of keys never occur (missing data, always tricky) and that only some pairs of keys occur so frequently that their average and fluctuation values seem to be reliable.

Thus, it seems to be reasonable to pick only the most frequent pairs of keys to get reliable data and to avoid trouble with missing data. We pick three pairs of keys, which reduces the dimensionality of our feature space from 64*64*2 to 3*2. (Times two because we look at both average and fluctuation.) A classical visualization for this 6D space are Parallel Coordinates: Draw (in this case) six parallel coordinate axes; for every data item create a polyline that connect the points representing its components on the axes. Hopefully, one can spot some clusters after plotting tens to thousands of items. Remember that in our case every item represents one recording of a certain user.

With the right set of features (here: the six values), the values that belong to a certain pattern (here: user/typist) form clusters in a space of low dimension (here: six). The bulk of pattern classification is concerned with how to learn the form of such clusters during a training phase and afterwards classify new data as to which cluster/pattern it belongs to. The simplest approach is to determine the centroid of every cluster during training; to classify a new data point presented afterwards, one chooses the cluster with the closest centroid (nearest neighbor). Other approaches such as Neural Nets and Support-Vector Machines are much more sophisticated but accomplish essentially the same job. Nicely enough, these methods are available through function libraries that hide the ugly details.

Of course, there are variations on this topic: Sometimes, a pattern classification algorithm may learn during its use. Additionally, pattern classification algorithms may need to take temporal order into account. For instance, speech recognition requires to find the sequence of phonemes that is the most plausible one in terms of acoustics (the recorded signal) and linguistics (the words that make sense). This is rather complex. To keep things simple, you would try to create features that incorporate temporal behavior and thus stick to the approach of static clusters in a low number of dimensions. One example for this is the (instantaneous) frequency of a signal. A general signal, however, is composed of a great or even infinite number of frequencies. The Fourier transform is employed to compute the spectrum of the signal. The spectrum tells you which frequency (x-axis) is present with which intensity (y-axis).