Model¶

A linear Support Vector Machine implements the linear model $$y = \text{sign}\left(\boldsymbol{w}^T\boldsymbol{x} + b\right),$$ where $y \in \{-1, 1\}$ is a class label, $\boldsymbol{x} \in \mathbb{R}^D$ is an input vector, and $\boldsymbol{w} \in \mathbb{R}^D$, $b \in \mathbb{R}$ are the model parameters.

Objective Function¶

The original objective function for a Support Vector Machine is

$$\min_{\boldsymbol{w}} ||\boldsymbol{w}||^2 \text{ subject to } y_i \left( \boldsymbol{w}^T \boldsymbol{x}_i + b \right) \geq 1 \text{ for } i=1,\ldots,n.$$

To allow missclassification to some degree, we can formulate a relaxed version. The objective function of soft margin SVM is

$$\min_{\boldsymbol{w},\xi_i}||\boldsymbol{w}||^2 + C \sum_{i=1}^{n} \xi_i \text{ subject to } y_i \left( \boldsymbol{w}^T \boldsymbol{x}_i + b \right) \geq 1 - \xi_i \text{ and } \xi_i \geq 0 \text{ for } i=1,\ldots,n.$$

We can measure both parts of the objective.

In [1]:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [2]:

def classification_error(y, y_pred):
    return np.mean(np.maximum(0, 1 - y * y_pred))
def regularization(svm):
    return np.linalg.norm(svm.coef_)

Implementation¶

A simple SVM implementation can be found at github. It is based on Andrej Karpathy's svm.js. The main part of a good SVM implementation is the optimization algorithm that is used to minimize the objective. Probably the most widely used library at the moment is libsvm which is wrapped in sklearn.

Regularization¶

We take a look at a simple dataset. We define a function to compare several values of $C$.

In [3]:

from svm import SVM


def plot_decision_surface(X, y, Cs, plot_sv=False):
    random_state = np.random.RandomState(0)
    plt.figure(figsize=(10, 3))
    for i, C in enumerate(Cs):
        svm = SVM(kernel="linear", C=C, random_state=random_state)
        svm.fit(X, y)

        xx = np.linspace(0, 1)
        a = -svm.coef_[0] / svm.coef_[1]
        yy = a * xx - svm.intercept_ / svm.coef_[1]

        y_pred = svm.decision_function(X)

        plt.subplot(1, len(Cs), 1 + i)
        if plot_sv:
            plt.scatter(svm.support_vectors_[:, 0],
                        svm.support_vectors_[:, 1], c="g", s=100)
        plt.scatter(X[:, 0], X[:, 1], c=y)
        plt.plot(xx, yy, 'k-')
        plt.title("$C = %g$, $l = %.3f C + %.3f$"
                  % (C, classification_error(y, y_pred),
                     regularization(svm)))
        plt.xlim(0, 1)
        plt.ylim(0, 1)
        plt.xticks(())
        plt.yticks(())
    plt.show()

The dataset consists of samples drawn from two Gaussian distributions.

In [4]:

random_state = np.random.RandomState(0)
n_samples = 100
X = np.vstack((0.1 * random_state.randn(n_samples / 2, 2) + np.array([0.25, 0.5]),
               0.1 * random_state.randn(n_samples / 2, 2) + np.array([0.75, 0.5])))
y = np.ones(n_samples)
y[X[:, 0] > 0.5] = -1.0

plt.scatter(X[:, 0], X[:, 1], c=y)
plt.xlim(0, 1)
plt.ylim(0, 1)

Out[4]:

(0, 1)

A linear SVM with a low value for C will have a low weight vector norm and a higher classification error. With $C=\infty$ the classification error will have the lowest possible value.

In [5]:

plot_decision_surface(X, y, Cs=[1e-1, np.inf])

In the previous example, a high value of C is not a problem at all because both classes are well separated. Let's take a look at the same dataset with just one more point.

In [6]:

random_state = np.random.RandomState(0)
n_samples = 100
X = np.vstack((0.1 * random_state.randn(n_samples / 2, 2) + np.array([0.25, 0.5]),
               0.1 * random_state.randn(n_samples / 2, 2) + np.array([0.75, 0.5])))
X = np.vstack((X, np.array([[0.6, 0.5]])))
y = np.ones(n_samples + 1)
y[X[:, 0] > 0.5] = -1.0
y[-1] = 1.0

plt.scatter(X[:, 0], X[:, 1], c=y)
plt.xlim(0, 1)
plt.ylim(0, 1)

Out[6]:

(0, 1)

High values of C don't work at all in this case.

In [7]:

plot_decision_surface(X, y, Cs=[10.0, np.inf])

Alexander Fabisch

Linear Support Vector Machine

Model¶

Objective Function¶

Implementation¶

Regularization¶