始めに

この記事では、[第2版]Python 機械学習プログラミング達人データサイエンティストによる理論と実践

の2章を写経した記録をまとめます。本を選んだ理由はこちら新しく写経する本を紹介（Python 機械学習プログラミング）。実行はJupyter Labにて実行しています。

写経は以下のようにまとめました。

写経ではありますが、関数を作成し、コードの再実行をやりやすいようにした部分があります。
より良いと思われるコードを考えた場合は書き換えてコメントを添えるようにし、変更点をなるべく明示するようにしてあります。
個人的に気になった点のメモを残すようにしてあります。同じような疑問を持った方の助けになれば幸いです。
以前書いたコードと同じようなコード（例えばグラフの描写等）は効率化のために飛ばしているところもあります。
記事内で使用するモジュールなどは一番最初に宣言するようにしてあります。

[第2版]Python 機械学習プログラミング達人データサイエンティストによる理論と実践を読んでいている際のちょっとした正誤表代わりになればと思います。

この記事で使用する主なモジュール、設定

この記事では、主に以下のモジュールや設定を使用しています。

2章分類問題ー単純な機械学習 アルゴリズムのトレーニング

使用モジュール

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
from matplotlib.colors import ListedColormap

2.2 パーセプトロンの学習アルゴリズムをPythonで実装する

2.2.1 オブジェクト指向のパーセプトロン API

class Perceptron(object):
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        self.errors_ = []

        for _ in range(self.n_iter):
            errors = 0
            for xi, target in zip(X, y):
                update = self.eta * (target - self.predict(xi))
                self.w_[1:] += update * xi
                self.w_[0] += update
                errors += int(update != 0.0)
            self.errors_.append(errors)
        return self
    
    def net_input(self, X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def predict(self, X):
        return np.where(self.net_input(X) >= 0.0, 1, -1)

メモ（`np.random.RandomState`クラス）

np.random.RandomStateクラス（numpy.random.RandomState）は確率分布に従う乱数を返すメソッドをいくつか持っています。normalメソッドもそのうちの一つで、正規分布に従う乱数を返します。

rgen = np.random.RandomState(1)
rgen.normal(loc=0.0, scale=0.01, size=5)

Output:

    array([ 0.01624345, -0.00611756, -0.00528172, -0.01072969,  0.00865408])

メモ（`np.where`関数）

上記のコードのnp.where(self.net_input(X) >= 0.0, 1, -1)では、self.net_input(X)の要素が0.0以上なら1を、0.0より小さいなら-1を要素に持つarrayオブジェクトを返します。np.whereのdocstringは以下になります。

np.where??

Output:（クリックすると展開されます）

Docstring:
where(condition, [x, y])

Return elements chosen from `x` or `y` depending on `condition`.

.. note::
    When only `condition` is provided, this function is a shorthand for
    ``np.asarray(condition).nonzero()``. Using `nonzero` directly should be
    preferred, as it behaves correctly for subclasses. The rest of this
    documentation covers only the case where all three arguments are
    provided.

Parameters
----------
condition : array_like, bool
    Where True, yield `x`, otherwise yield `y`.
x, y : array_like
    Values from which to choose. `x`, `y` and `condition` need to be
    broadcastable to some shape.

Returns
-------
out : ndarray
    An array with elements from `x` where `condition` is True, and elements
    from `y` elsewhere.

See Also
--------
choose
nonzero : The function that is called when x and y are omitted

Notes
-----
If all the arrays are 1-D, `where` is equivalent to::

    [xv if c else yv
     for c, xv, yv in zip(condition, x, y)]

Examples
--------
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])

This can be used on multidimensional arrays too:

>>> np.where([[True, False], [True, True]],
...          [[1, 2], [3, 4]],
...          [[9, 8], [7, 6]])
array([[1, 8],
       [3, 4]])

The shapes of x, y, and the condition are broadcast together:

>>> x, y = np.ogrid[:3, :4]
>>> np.where(x < y, x, 10 + y)  # both x and 10+y are broadcast
array([[10,  0,  0,  0],
       [10, 11,  1,  1],
       [10, 11, 12,  2]])

>>> a = np.array([[0, 1, 2],
...               [0, 2, 4],
...               [0, 3, 6]])
>>> np.where(a < 4, a, -1)  # -1 is broadcast
array([[ 0,  1,  2],
       [ 0,  2, -1],
       [ 0,  3, -1]])
Type:      builtin_function_or_method

以下のような使い方もできるようです。個人的に、使い方をマスターしておきたい関数です。応用が広がりそうな気がします。

np.where([True, False, True, True],
          [1, 1, 1, 1],
          [0, 0, 0, 0])

Output:

    array([1, 0, 1, 1])

2.3 Irisデータセットでのパーセプトロンモデルのトレーニング

df = pd.read_csv('https://archive.ics.uci.edu/ml/'
                 'machine-learning-databases/iris/iris.data', header=None)
df.tail()

Output:

	0	1	2	3	4
145	6.7	3.0	5.2	2.3	Iris-virginica
146	6.3	2.5	5.0	1.9	Iris-virginica
147	6.5	3.0	5.2	2.0	Iris-virginica
148	6.2	3.4	5.4	2.3	Iris-virginica
149	5.9	3.0	5.1	1.8	Iris-virginica

メモ（`df.tail`メソッド）

df.tailメソッドは引数に整数を受け取り、その行数分の末尾のデータを返します。デフォルトでは5行を返すことになっています。

print(df.tail.__doc__)

Output:（クリックすると展開されます）

    
            Return the last `n` rows.
    
            This function returns last `n` rows from the object based on
            position. It is useful for quickly verifying data, for example,
            after sorting or appending rows.
    
            Parameters
            ----------
            n : int, default 5
                Number of rows to select.
    
            Returns
            -------
            type of caller
                The last `n` rows of the caller object.
    
            See Also
            --------
            pandas.DataFrame.head : The first `n` rows of the caller object.
    
            Examples
            --------
            >>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
            ...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
            >>> df
                  animal
            0  alligator
            1        bee
            2     falcon
            3       lion
            4     monkey
            5     parrot
            6      shark
            7      whale
            8      zebra
    
            Viewing the last 5 lines
    
            >>> df.tail()
               animal
            4  monkey
            5  parrot
            6   shark
            7   whale
            8   zebra
    
            Viewing the last `n` lines (three in this case)
    
            >>> df.tail(3)
              animal
            6  shark
            7  whale
            8  zebra

y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)

X = df.iloc[0:100, [0, 2]].values

figsize(10, 7)
plt.rcParams['font.size'] = 20
plt.scatter(X[:50, 0], X[:50, 1],
            color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1],
            color='blue', marker='x', label='versicolor')

plt.xlabel('sepal length [cm]')
plt.ylabel('petal length [cm]')
plt.legend(loc='upper left')

plt.show()

Output:

f:id:koheitsutsumi223:20190430204737p:plain

ppn = Perceptron(eta=0.1, n_iter=10)

ppn.fit(X, y)

plt.plot(range(1, len(ppn.errors_) + 1), ppn.errors_, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Number of updates')

plt.show()

Output:

f:id:koheitsutsumi223:20190430204740p:plain

def plot_decision_regions(X, y, classifier, resolution=0.02):

    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])

    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                           np.arange(x2_min, x2_max, resolution))
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T) # ravelにより、一次元arrayに変換し、Tにより転置し、predictメソッドが引数として受け取れるように変換している。
    Z = Z.reshape(xx1.shape) # predictメソッドにより予測した結果をreshapeによりxx1らと同じsizeのarrayに変換している。
    plt.contourf(xx1, xx2, Z, alpha=0.3, cmap=cmap)
    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())

    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y == cl, 0], 
                    y=X[y == cl, 1],
                    alpha=0.8, 
                    c=colors[idx],
                    marker=markers[idx], 
                    label=cl, 
                    edgecolor='black')

メモ（`ravel`メソッド）

np.ravleメソッドは引数に渡されたarrayを一次元のarrayに変換します：

np.array([[1, 2], [3, 4]]).ravel()

Output:

    array([1, 2, 3, 4])

また、展開する順序を定めるorder引数があります。order引数には'C'、'F'、'A'、'K'のいずれかを設定することができます。デフォルトは'C'です。例えば、'F'を設定すると、列方向に展開します：

np.array([[1, 2], [3, 4]]).ravel(order='F')

Output:

    array([1, 3, 2, 4])

figsize(10, 7)
plt.rcParams['font.size'] = 20

plot_decision_regions(X, y, classifier=ppn)
plt.xlabel('sepal length [cm]')
plt.ylabel('petal length [cm]')
plt.legend(loc='upper left')

plt.show()

Output:

f:id:koheitsutsumi223:20190430204743p:plain

2.5.1 ADALINEをPythonで実装する

class AdalineGD(object):
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        self.cost_ = []
        
        for i in range(self.n_iter):
            net_input = self.net_input(X)
            output = self.activation(net_input)
            errors = (y - output)
            self.w_[1:] += self.eta * X.T.dot(errors)
            self.w_[0] += self.eta * errors.sum()
            cost = (errors**2).sum() / 2.0
            self.cost_.append(cost)
        return self

    def net_input(self, X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        return X

    def predict(self, X):
        return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)

figsize(20, 7)
plt.rcParams['font.size'] = 20

fig, ax = plt.subplots(nrows=1, ncols=2)

ada1 = AdalineGD(n_iter=10, eta=0.01).fit(X, y)
ax[0].plot(range(1, len(ada1.cost_) + 1), np.log10(ada1.cost_), marker='o')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Adaline - Learning rate 0.01')

ada2 = AdalineGD(n_iter=10, eta=0.0001).fit(X, y)
ax[1].plot(range(1, len(ada2.cost_) + 1), ada2.cost_, marker='o')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Adaline - Learning rate 0.0001')

plt.show()

Output:

f:id:koheitsutsumi223:20190430204747p:plain

2.5.2 特徴量のスケーリングを通じて勾配降下法を改善する

X_std = np.copy(X)
X_std[:, 0] = (X[:, 0] - X[:, 0].mean()) / X[:, 0].std()
X_std[:, 1] = (X[:, 1] - X[:, 1].mean()) / X[:, 1].std()

ada = AdalineGD(n_iter=15, eta=0.01)
ada.fit(X_std, y)

figsize(10, 7)
plt.rcParams['font.size'] = 20

plot_decision_regions(X_std, y, classifier=ada)
plt.title('Adaline - Gradient Descent')
plt.xlabel('sepal length [standardized]')
plt.ylabel('petal length [standardized]')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()

plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Sum-squared-error')

plt.tight_layout()
plt.show()

Output:

f:id:koheitsutsumi223:20190430204751p:plain

Output:

f:id:koheitsutsumi223:20190430204755p:plain

class AdalineSGD(object):
    def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.shuffle = shuffle
        self.random_state = random_state
    
    def fit(self, X, y):
        self._initialize_weights(X.shape[1])
        self.cost_ = []
        for i in range(self.n_iter):
            if self.shuffle:
                X, y = self._shuffle(X, y)
            cost = []
            for xi, target in zip(X, y):
                cost.append(self._update_weights(xi, target))
            avg_cost = sum(cost) / len(y)
            self.cost_.append(avg_cost)
        return self

    def partial_fit(self, X, y):
        if not self.w_initialized:
            self._initialize_weights(X.shape[1])
        if y.ravel().shape[0] > 1:
            for xi, target in zip(X, y):
                self._update_weights(xi, target)
        else:
            self._update_weights(X, y)
        return self

    def _shuffle(self, X, y):
        r = self.rgen.permutation(len(y))
        return X[r], y[r]
    
    def _initialize_weights(self, m):
        self.rgen = np.random.RandomState(self.random_state)
        self.w_ = self.rgen.normal(loc=0.0, scale=0.01, size=1 + m)
        self.w_initialized = True
        
    def _update_weights(self, xi, target):
        output = self.activation(self.net_input(xi))
        error = (target - output)
        self.w_[1:] += self.eta * xi.dot(error)
        self.w_[0] += self.eta * error
        cost = 0.5 * error**2
        return cost
    
    def net_input(self, X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        return X

    def predict(self, X):
        return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)

ada = AdalineSGD(n_iter=15, eta=0.01, random_state=1)
ada.fit(X_std, y)

plot_decision_regions(X_std, y, classifier=ada)
plt.title('Adaline - Stochastic Gradient Descent')
plt.xlabel('sepal length [standardized]')
plt.ylabel('petal length [standardized]')
plt.legend(loc='upper left')

plt.tight_layout()
plt.show()

plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Average Cost')

plt.tight_layout()
plt.show()

Output:

f:id:koheitsutsumi223:20190430204758p:plain

Output:

f:id:koheitsutsumi223:20190430204802p:plain

メモ（`if y.ravel().shape[0] > 1:`の場合分けについて）

コードを見ていて、if y.ravel().shape[0] > 1:の場合分けは必要ないのではと思っていました。しかし、これはこのコードにおいては本質的に必要なようです。どうやら、zipの挙動がy.ravel().shape[0] = 1とy.ravel().shape[0] > 1で異なるために場合分けしているようです。

以下の例によりzipの挙動の違いを見ることができます。まず、yの要素数が1より大きい時は、zipは望んでいる動作をします。

y_value = np.array([1, 1])

X_std[0:2, :]

Output:

    array([[-0.5810659 , -1.01435952],
           [-0.89430898, -1.01435952]])

for xi, target in zip(X_std[0:2, :], y_value):
    print(xi, target)

Output:

    [-0.5810659  -1.01435952] 1
    [-0.89430898 -1.01435952] 1

しかし、yが一つの値しか持たない場合、zipは以下のようなループを実行します。yが値を一つしか含まないようなarrayの場合、zipによりX_std[0, :]からは1行1列目の値が返されます。1行を丸ごと返してはくれません。

y_value = np.array([1])

X_std[0, :]

Output:

    array([-0.5810659 , -1.01435952])

for xi, target in zip(X_std[0, :], y_value):
    print(xi, target)

Output:

    -0.5810659036233256 1

zipが上記のような動作をするために、if y.ravel().shape[0] > 1:の場合分けが必要になってきます。

今回の写経は以上です。ここまで読んでいただきありがとうございました。

一日一万字の感謝の写経

一日一万字の感謝の写経をして強くなります。そのうち本を置き去りにして何かを作り始める予定です。

Python 機械学習プログラミング達人データサイエンティストによる理論と実践第2章写経

目次

始めに

この記事で使用する主なモジュール、設定

2章分類問題ー単純な機械学習 アルゴリズムのトレーニング

使用モジュール

2.2 パーセプトロンの学習アルゴリズムをPythonで実装する

2.2.1 オブジェクト指向のパーセプトロン API

メモ（`np.random.RandomState`クラス）

メモ（`np.where`関数）

2.3 Irisデータセットでのパーセプトロンモデルのトレーニング

メモ（`df.tail`メソッド）

メモ（`ravel`メソッド）

2.5.1 ADALINEをPythonで実装する

2.5.2 特徴量のスケーリングを通じて勾配降下法を改善する

メモ（`if y.ravel().shape[0] > 1:`の場合分けについて）

目次

始めに

この記事で使用する主なモジュール、設定

2章 分類問題ー単純な機械学習アルゴリズムのトレーニング

使用モジュール

2.2 パーセプトロンの学習アルゴリズムをPythonで実装する

2.2.1 オブジェクト指向のパーセプトロンAPI

メモ（np.random.RandomStateクラス）

メモ（np.where関数）

2.3 Irisデータセットでのパーセプトロンモデルのトレーニング

メモ（df.tailメソッド）

メモ（ravelメソッド）

2.5.1 ADALINEをPythonで実装する

2.5.2 特徴量のスケーリングを通じて勾配降下法を改善する

メモ（if y.ravel().shape[0] > 1:の場合分けについて）

2章分類問題ー単純な機械学習アルゴリズムのトレーニング

2.2.1 オブジェクト指向のパーセプトロン API

メモ（`np.random.RandomState`クラス）

メモ（`np.where`関数）

メモ（`df.tail`メソッド）

メモ（`ravel`メソッド）

メモ（`if y.ravel().shape[0] > 1:`の場合分けについて）