Contents

Distance functions in the Pattern Recognition Toolbox

The Pattern Recognition Toolbox provides a wide range of functionality for computing distance metrics between between vectors. The distance metrics work on the observation vectors of prtDataSets. The simplest distance is a Euclidean distance. Consider the following example:

% Create 2 data sets
dsx = prtDataSetStandard('Observations', [0 0; 1 1]);
dsy = prtDataSetStandard('Observations', [1 0;2 2; 3 3]);
% Compute distance
distance = prtDistanceEuclidean(dsx,dsy)
distance =
    1.0000    2.8284    4.2426
    1.0000    1.4142    2.8284

The above example computes the Euclidean distance from the data points [0 0] and [1 1] in the data set dsx to all the data points in dsy. The result is a double matrix, where distance(i,j) corresponds to the distance from the ith observation in dsx to the jth observation in dsy.

Distance functions as members of prtActions

prtDistance functions are intended to be used as part of prtActions, to determine the distance between observations for example. A common use of this would be in K-means clustering. Different distance metrics can lead to very different results in clustering, as the following example illustrates:

ds = prtDataGenMary;         % Create a data set
cluster = prtClusterKmeans;  % Create a K-means clustering object

cluster = cluster.train(ds);  % Train
subplot(2,1,1); plot(cluster) % Plot
title('Euclidean distance metric')

% Change the distance metric to City Block.
cluster.distanceMetricFn = @prtDistanceCityBlock;
cluster = cluster.train(ds);  %Train
subplot(2,1,2); plot(cluster) % Plot
title('City block distance metric')

All distance functions in the Pattern Recognition Toolbox have the same API as discussed above. For a list of all the different techniques, and links to their individual help entries, A list of commonly used functions