In this tutorial, we build on our earlier MKL example and we demonstrate low-level CV (via MKL) and mid/high-level CV (via SVM classification) using the algorithm MKLGL. Note, we are not running GAMKLp due to our inefficient implementation in Matlab (all this code and tutorials were built on an underpowered laptop). In this MKL tutorial, we use two features (LBP and HOG) extracted using the VLFeat library (see our earlier tutorial for more details) on the (in)famous MNIST character recognition data set (images of size 28x28). See our paper (for full MK algorithm details); "A. Pinar, J. Rice, L. Hu, D. T. Anderson, T. C. Havens, Efficient Multiple Kernel Classification using Feature and Decision Level Fusion," IEEE Trans. on Fuzzy Systems, 2016. You can download the data set at here. Note, on that site you can find published classifier rates to compare your algorithm against, e.g., SK-SVM, CNN, KNN, etc.

First, clear your path and figures.


close all; 
clear all;

Next, lets do 2 folds. Lets use 75% of our kernel data (sampling) and lets repeat the experiment two times.


% how many folds (in a N-fold cross validation setup) do we want to do?
NF = 1;

% how much do we want to sample the kernel matrices?
% = 1 means use all data
% = 0.1 means use 10% of the data
PercentageOfKernelReduction = 0.75;

% data structures to store results
SingleKernelResults = zeros(1,NF);
MKLGLResults = zeros(1,NF);

% number of times to run? (repeat our experiments)
NumberOfRepeats = 1;

%Main loop (number of repeated experiments)
for main_loop=0:(NumberOfRepeats-1)

    %Print out iteration #
    fprintf(1,'######################## Iteration %d\n',main_loop);
    %How many folds?
    for t=1:NF

        fprintf(1,'####### Fold %d\n',t);		

Now, we load a fraction (how much depends on how much memory you have on your machine and how long you are willing to wait for this tutorial to run) of the MNIST data set. I converted this into a two class case. The first class is characters 1, 2 and 3 (picked arbitrarily) and all other characters make up class 2. Since we are using a SVM you can build a multi-class solution by doing things like training one versus all or with respect to all pairs. We just did not do this for the demo because it sort of detracts from what we are trying to teach; how to apply MKL to a computer vision problem (of which two class is a simple case). I am loading 15 percent of the data set (again, due to limitations on my laptop). D1a are the LBP training features, D1b are the corresponding training HOG features, and D2a and D2b are the LBP and HOG test features. L1 and L2 are the train and test labels.


        %Load our data set
        [D1a,D1b,L1,D2a,D2b,L2] = mkl_load_mnist([1 2 3],0.15);

        %Sorting for us to look at jazz (does not impact results)
        IndsOne = find( L1 == 1 );
        IndsNeg = find( L1 == -1 );
        Inds = [ IndsOne IndsNeg ];
        L1 = L1(Inds);
        D1a = D1a(Inds,:);
        D1b = D1b(Inds,:);

The images in the MNIST data set are of size 28x28. Like stated above, I extract HOG and LBP featuers on the MNIST data (see below). Note, there are different ways to extract these features, e.g., at each pixel, histogram (with what cell configuration or configurations (e.g., a pyramid of different configurations)), etc. I have included the files loadMNISTImages and loadMNISTLabels in our library.


function [D1a,D1b,L1,D2a,D2b,L2] = mkl_load_mnist( class1 , SubSampleAmount )

    % lets do the classical MNIST data set
    % location prefix
    LocPrefix = 'C:\\Users\\anderson\\Dropbox\\FuzzIeee2017Codes\\DataSets\\mnist';
    % load the training data
    images1 = loadMNISTImages(sprintf('%s\\train-images.idx3-ubyte',LocPrefix),0);
    labels1 = loadMNISTLabels(sprintf('%s\\train-labels.idx1-ubyte',LocPrefix));

    % load the test data
    images2 = loadMNISTImages(sprintf('%s\\t10k-images.idx3-ubyte',LocPrefix),0);
    labels2 = loadMNISTLabels(sprintf('%s\\t10k-labels.idx1-ubyte',LocPrefix));
    % setup vlfeat
    % doing two class here
    % do for training
    Inds = [];
    for i=1:length(class1)
        Inds = [ Inds ; find(labels1 == class1(i)) ];
    Inds2 = 1:length(labels1);
    % do for testing
    Inds3 = [];
    for i=1:length(class1)
        Inds3 = [ Inds3 ; find(labels2 == class1(i)) ];
    Inds4 = 1:length(labels2);
    % extract features for training
    Xdata1a = zeros(length(labels1),5*5*58);
    for i=1:length(labels1) %lbp
        tmp = vl_lbp(single(images1(:,:,i)),5);
        Xdata1a(i,:) = tmp(:);
    Xdata1b = zeros(length(labels1),5*5*31);
    for i=1:length(labels1) %hog
        tmp = vl_hog(single(images1(:,:,i)),6,'NumOrientations',9);
        Xdata1b(i,:) = tmp(:);
    % extract features for testing
    Xdata2a = zeros(length(labels2),5*5*58);
    for i=1:length(labels2) %lbp
        tmp = vl_lbp(single(images2(:,:,i)),5);
        Xdata2a(i,:) = tmp(:);
    Xdata2b = zeros(length(labels2),5*5*31);
    for i=1:length(labels2) %hog
        tmp = vl_hog(single(images2(:,:,i)),6,'NumOrientations',9);
        Xdata2b(i,:) = tmp(:);
    % do some subsampling
    if( SubSampleAmount < 1 )
        % do for training
        R1 = randperm(length(Inds));
        R2 = randperm(length(Inds2));
        keepR1 = R1( 1:ceil(length(Inds)*SubSampleAmount) ); 
        keepR2 = R2( 1:ceil(length(Inds2)*SubSampleAmount) ); 
        Inds = Inds(keepR1);
        Inds2 = Inds2(keepR2);
        % do for testing
        R1 = randperm(length(Inds3));
        R2 = randperm(length(Inds4));
        keepR1 = R1( 1:ceil(length(Inds3)*SubSampleAmount) ); 
        keepR2 = R2( 1:ceil(length(Inds4)*SubSampleAmount) ); 
        Inds3 = Inds3(keepR1);
        Inds4 = Inds4(keepR2);        
    % return data 
    L1 = [ ones(1,length(Inds))   (-1)*ones(1,length(Inds2)) ];
    L2 = [ ones(1,length(Inds3))  (-1)*ones(1,length(Inds4)) ];
    D1a = [ Xdata1a( Inds , : ) ; Xdata1a( Inds2 , : ) ];
    D1b = [ Xdata1b( Inds , : ) ; Xdata1b( Inds2 , : ) ];
    D2a = [ Xdata2a( Inds3 , : ) ; Xdata2a( Inds4 , : ) ];
    D2b = [ Xdata2b( Inds3 , : ) ; Xdata2b( Inds4 , : ) ];


For demonstration, I use a single configuration; 4 RBF kernels, 2 on each feature. There are a large number of possible configuratons that you could try, meaning, different kernels, associated kernel parameters, etc. However, I would like to highlight that we have two features, the LBP and HOG. This adds another level of complexity. We could concatenate the features, e.g., [LBP HOG], and do MKL or we could apply some kernels to LBP and others to HOG, and even do [LBP HOG] and [LBP] and [HOG]. The point is, we have many possibilities and no clear indication of what is a good (in general) selection. Furthermore, many have tried to apply something like a single kernel with different parameters, e.g., many RBFs with different sigmas, with the hope that MKL is doing feature selection. As you can see, there are MANY ways to connect our features and kernels and associated parameters in MKL. Last, if you use different kernels, e.g., a dot product and a polynomial and an RBF, they live on different numeric scales. The weights for LCS have to account for both "importance" and "feature space scaling". There are different methods in the literature that have been proposed for kernel normalization if you are using heterogeneous kernels.


        %Number of kernels (hard coded here)
        nk = 4;        
        %Number of data points
        n = size(L1,2);
        n2 = size(L2,2);

First, I build the training kernels.


        %Training data

        %Allocate kernel variables
        K = single( zeros(n,n,nk) );

        %Make kernels
        %Do on the LBP
        d = size(D1a,2);
        Med_dist = ( norm(mad(D1a)) );
        K(:,:,1) = single( mkl_kernel('rbf2',D1a',D1a',1/d) );
        K(:,:,2) = single( mkl_kernel('rbf2',D1a',D1a',1/Med_dist) ); 
        %Do on the HOG		
        d = size(D1b,2);
        Med_dist = ( norm(mad(D1b)) );
        K(:,:,3) = single( mkl_kernel('rbf2',D1b',D1b',1/d) );
        K(:,:,4) = single( mkl_kernel('rbf2',D1b',D1b',1/Med_dist) );  

Next, I build the test kernels.


        %Testing data
        %Allocate kernel variables
        K2 = single( zeros(n2,n,nk) );

        %Make kernels
        d = size(D1a,2);
        Med_dist = ( norm(mad(D1a)) );
        K2(:,:,1) = single( mkl_kernel('rbf2',D2a',D1a',1/d) );
        K2(:,:,2) = single( mkl_kernel('rbf2',D2a',D1a',1/Med_dist) );     
        d = size(D1b,2);
        Med_dist = ( norm(mad(D1b)) );
        K2(:,:,3) = single( mkl_kernel('rbf2',D2b',D1b',1/d) );
        K2(:,:,4) = single( mkl_kernel('rbf2',D2b',D1b',1/Med_dist) ); 

Next, train single kernel and evaluate (take the best one).


        %STEP 1: train single kernel and evaluate (take the best one)
        SingleKernelIndResults = zeros(1,nk);
        for looper=1:nk
            %Do Nystrom sampling and eigen analysis and use linear SVM (but kernel SVM in RHKS)
            %Not commenting this in the later uses of this coding
            z = randperm(size(K(:,:,looper),1)); %step 1, random permutation of indices
            s = ceil( size(K(:,:,looper),1) * PercentageOfKernelReduction ); %step 2: pick samples 
            z = z(1:s); %select them
            Kzz = K(z,z,looper); %Build our first matrix
            Kz = K(:,z,looper); %Build our second matrix
            [U,V]=eig(Kzz); %step 3: do the eigen decomp
            Xz = Kz*U*diag(1./sqrt(diag(V)));  %step 4: compute final kernel matrix (will compute w.r.t. below)
            %Kp = Xz*Xz'; %can expand to get reconstructed kernel matrix
            %err = sum((FinalKernel(:)-Kp(:)).^2); %can calc the final error of the reconstruction
            model = train(double(L1'),sparse(double(Xz)),'-q'); %train with our svm (dot product kernel)
            %Now, test, have to follow same sampling
            Kz = K2(:,z,end); 
            Xz = Kz*U*diag(1./sqrt(diag(V)));
            %Kp = Xz*Xz'; %can look at reconstructed kernel matrix if desired
            [predicted_label, accuracy, decv] = predict(double(L2'),sparse(double(Xz)), model, '-q'); %predict
            SingleKernelIndResults(looper) = accuracy(1) / 100; %calc acc
        %take the best individual
        SingleKernelResults(main_loop*NF + t) = max( SingleKernelIndResults );
        SingleKernelResultsnorm = SingleKernelIndResults ./ sum( SingleKernelIndResults );

Next, run MKLGL.


        %STEP 2: MKLGL
        %Try based on quality of base learner and uniform initial weights

        %Start with uniform weights
        [model,sigmaK,acc,dval,pred] = mkl_mklgl(L1',K,1,1); %call solver
        FinalK = single(zeros(size(K2(:,:,1)))); %allocate final kernel matrix
        for j=1:nk
            FinalK = FinalK + sigmaK(j) .* K2(:,:,j); %do the aggregation - can do faster Matlab wise
        [~,decv] = libsvmpredict_d1(double(L2'), [(1:length(L2))' double(FinalK)] , model); %predict labels
        %Have to deal with case of if L(1) is not a 1
        if( L1(1) == -1 )
            decv = decv * (-1);
        %Make score
        decv( decv < 0 ) = -1;
        decv( decv >= 0 ) = 1;
        SameSign = L2' .* decv;
        SameSign( SameSign < 0 ) = 0;
        Acc1 = sum(SameSign) / length(decv);        

        %Now, try to seed with base learner qualities

        SingleKernelResultsnorm = SingleKernelIndResults ./ sum( SingleKernelIndResults );
        [model,sigmaK,acc,dval,pred] = mkl_mklgl(L1',K,1,1,SingleKernelResultsnorm);   
        FinalK = single(zeros(size(K2(:,:,1))));
        for j=1:nk
            FinalK = FinalK + sigmaK(j) .* K2(:,:,j); %again, could speed up if desired
        [~,decv] = libsvmpredict_d1(double(L2'), [(1:length(L2))' double(FinalK)] , model);
        %Have to deal with case of if L(1) is not a 1
        if( L1(1) == -1 )
            decv = decv * (-1);
        %Make score
        decv( decv < 0 ) = -1;
        decv( decv >= 0 ) = 1;
        SameSign = L2' .* decv;
        SameSign( SameSign < 0 ) = 0;
        Acc2 = sum(SameSign) / length(decv);

        MKLGLResults(main_loop*NF + t) = max( [ Acc1 Acc2 ] );

Last, print off our statistics.


fprintf(1,'\n\nSingle kernel is %f and std of %f (max of %f)\n',mean(SingleKernelResults),std(SingleKernelResults),max(SingleKernelResults));
fprintf(1,'MKLGL is %f and std of %f (max of %f)\n',mean(MKLGLResults),std(MKLGLResults),max(MKLGLResults));

The result is

Single kernel is 0.988008 and std of 0.000000 (max of 0.988008)
MKLGL is 0.990007 and std of 0.000000 (max of 0.990007)