写経
最後に

　世界標準MIT教科書 Python言語によるプログラミングイントロダクション第2版: データサイエンスとアプリケーション

の写経をしたので記録を公開しておきます。この記事では、17章をまとめておきます。

　Jupyter Labで実行したものをペーストしているので、ちょっとした正誤表になるかもしれません。

　以下、使用モジュールとrcParameterの設定。

import csv
import pylab
import collections
import statistics as stat
import math

pylab.rc('figure', dpi=200)

写経

18.1 バネの振舞い

springDataTxt = 'springData.txt'

code 18.1

def getData(fileName):
    distances = []
    masses = []
    with open(fileName, 'rt', newline='') as fin:
        csvReader = csv.reader(fin, delimiter=' ') # csvモジュールの練習で、ダイアレクトを設定してみた。
        for line in csvReader:
            try: # ヘッダーはハンドリングする。float関数でValueErrorがでることを利用した。
                distances.append(float(line[0]))
                masses.append(float(line[1]))
            except ValueError:
                continue
    return (masses, distances) # テキストファイルの中身並び順をわざわざ変えるの気持ち悪すぎる・・・

code 18.2

def plotData(inputFile):
    masses, distances = getData(inputFile)
    distances = pylab.array(distances)
    masses = pylab.array(masses)
    forces = masses*9.81
    pylab.plot(forces, distances, 'bo', label = 'Measured displacements')
    pylab.title('Measured Displacement of Spring')
    pylab.xlabel('|Force| (Newtons)')
    pylab.ylabel('Distance (meters)')

plotData(springDataTxt)

f:id:koheitsutsumi223:20190224223603p:plain

18.1.1 線形回帰による適合

code 18.3

def fitData(inputFile):
    masses, distances = getData(inputFile)
    distances = pylab.array(distances)
    masses = pylab.array(masses)
    forces = masses*9.81
    pylab.plot(forces, distances, 'bo', label = 'Measured displacements')
    pylab.title('Measured Displacement of Spring')
    pylab.xlabel('|Force| (Newtons)')
    pylab.ylabel('Distance (meters)')
    a, b = pylab.polyfit(forces, distances, 1)
    predictedDistances = a*forces + b
    k = 1.0/a
    pylab.plot(forces, predictedDistances, label=f'Displacements predicted by\nlinear fit, k = {round(k, 5)}')
    pylab.legend(loc='best')

fitData(springDataTxt)

f:id:koheitsutsumi223:20190224223616p:plain

code 18.3 (3次曲線追加)

def fitData(inputFile):
    masses, distances = getData(inputFile)
    distances = pylab.array(distances)
    masses = pylab.array(masses)
    forces = masses*9.81
    pylab.plot(forces, distances, 'bo', label = 'Measured displacements')
    pylab.title('Measured Displacement of Spring')
    pylab.xlabel('|Force| (Newtons)')
    pylab.ylabel('Distance (meters)')
    a, b = pylab.polyfit(forces, distances, 1)
    predictedDistances = a*forces + b
    k = 1.0/a
    pylab.plot(forces, predictedDistances, label=f'Displacements predicted by\nlinear fit, k = {round(k, 5)}')
    fit = pylab.polyfit(forces, distances, 3)
    predictedDistances = pylab.polyval(fit, forces)
    pylab.plot(forces, predictedDistances, 'k:', label = 'cubic fit')
    pylab.legend(loc='best')

fitData(springDataTxt)

f:id:koheitsutsumi223:20190224223620p:plain

指練習 (オリジナル)

　テキストの指練習の指示が曖昧過ぎる（同じ図をプロットするようにしろってなんだよ。x軸の最大値10のくせに図18.5は14以上までプロットしてるぞ・・・わからん・・）ので、各次数のプロットをやらせてみた。

　データの数が19なので、十八次の曲線までプロットしてみよう。

def fitData(inputFile, start=1, stop=18, step=1):
    masses, distances = getData(inputFile)
    distances = pylab.array(distances)
    masses = pylab.array(masses)
    forces = masses*9.81
    pylab.plot(forces, distances, 'bo', label = 'Measured displacements')
    pylab.title('Measured Displacement of Spring')
    pylab.xlabel('|Force| (Newtons)')
    pylab.ylabel('Distance (meters)')
    for i in range(start, stop + 1, step):
        fit = pylab.polyfit(forces, distances, i)
        predictedDistances = pylab.polyval(fit, forces)
        pylab.plot(forces, predictedDistances, label = f'digree: {i}')
    pylab.legend(loc='best')

fitData(springDataTxt, start=1, stop=9)

f:id:koheitsutsumi223:20190224223626p:plain

fitData(springDataTxt, start=10, stop=16)

f:id:koheitsutsumi223:20190224223850p:plain

fitData(springDataTxt, start=17, stop=18)

    C:\Users\ktsutsum\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: RankWarning: Polyfit may be poorly conditioned
      # This is added back by InteractiveShellApp.init_path()
    C:\Users\ktsutsum\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: RankWarning: Polyfit may be poorly conditioned
      # This is added back by InteractiveShellApp.init_path()

f:id:koheitsutsumi223:20190224223900p:plain

　十八次の多項式ならすべての点を通るようなグラフが欠けるかと予想してたけど、どうやら無理っぽい。Polyfit may be poorly conditionedってメッセージがでてるので、これらの点にはなんらかの線形関係があるのかな？

　 # This is added back by InteractiveShellApp.init_path()は意味わからんな。

　同じようなメッセージは直線を四次でフィッティングさせようとしたときも出た。

fit = pylab.polyfit([1, 2, 3], [2, 4, 6], 4)
predictedDistances = pylab.polyval(fit, [1, 2, 3])
pylab.plot([1, 2, 3], predictedDistances)

    C:\Users\ktsutsum\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RankWarning: Polyfit may be poorly conditioned
      """Entry point for launching an IPython kernel.
    

    [<matplotlib.lines.Line2D at 0x23e1d2e6e10>]

f:id:koheitsutsumi223:20190224223907p:plain

3点なら問題なくフィッティング出来た。これで満足しよう。

fit = pylab.polyfit([1, 2, 3], [2, 5, 2], 2)
predictedDistances = pylab.polyval(fit, [1, 2, 3])
pylab.plot([1, 2, 3], predictedDistances)
arange = pylab.arange(1, 3.1, 0.1)
predictedDistances = pylab.polyval(fit, arange)
pylab.plot(arange, predictedDistances)

    [<matplotlib.lines.Line2D at 0x23e1d58a2b0>]

f:id:koheitsutsumi223:20190224223912p:plain

18.2 発射体の振舞い

code 18.4

メモ

　8.3.4. defaultdict オブジェクトの練習で少し書き直した。

　また、僕が入手したlauncherData.txtは区切り文字がスペースとタブが入り乱れているアホみたいなファイルだったので、スペース区切りに直した。あと、ついでにcsv.Snifferの使い方を練習した。csv.Snifferは、入力されたファイルに対する適切なDialectのサブクラスを返す。

def getTrajectoryData(fileName):
    data = collections.defaultdict(list)
    with open(fileName, 'rt', newline='') as fin:
        dialect = csv.Sniffer().sniff(fin.read(2048)) # csv.Snifferの練習
        fin.seek(0) # sniff関数内でfinでの位置が先頭からずれてしまっている（今回は終端まで来てしまっている）ため、先頭に戻さないといけない。さもなくばgetTrajectoryDataは意図しないデータ（空のリスト等）を返す。
        csvReader = csv.reader(fin, dialect)
        for line in csvReader:
            try: # ヘッダーはハンドリングする。浮動小数に変換することができないヘッダーをfloat関数に渡したときにValueErrorがでることを利用した。
                for i in range(len(line)):
                    data[i].append(float(line[i]))
            except ValueError:
                continue
    return (data[0], [data[i] for i in data][1:])

def processTrajectories(fileName):
    distances, heights = getTrajectoryData(fileName)
    numTrials = len(heights)
    distances = pylab.array(distances)
    totHeights = pylab.array(len(distances)*[0.0])
    for h in heights:
        totHeights += pylab.array(h)
    meanHeights = totHeights/numTrials # テキストでは、numTrialsを計算していたのを忘れていたのか？使わせてもらった。
    pylab.title(f'Trajectory of Projectile (Mean of {numTrials} Trials)')
    pylab.xlabel('Inches from Launch Point')
    pylab.ylabel('Inches Above Launch Point')
    pylab.plot(distances, meanHeights, 'ko')
    fit = pylab.polyfit(distances, meanHeights, 1)
    altitudes = pylab.polyval(fit, distances)
    pylab.plot(distances, altitudes, 'b', label='Linear Fit')
    fit = pylab.polyfit(distances, meanHeights, 2)
    altitudes = pylab.polyval(fit, distances) # テキストはミスだろうな。なんで2回もpolyfitやってんねんって感じ。
    pylab.plot(distances, altitudes, 'b', label='Quadratic Fit')
    pylab.legend()

launcherDataTxt = 'launcherData.txt'

メモ

　なぜか、横軸は等間隔でとられておらず、1000付近の方が観測地点が多いようだ。

processTrajectories(launcherDataTxt)

f:id:koheitsutsumi223:20190224223918p:plain

18.2.1 決定係数

code 18.5

def rSquared(measured, predicted):
    estimateError = ((predicted - measured)**2).sum()
    meanOfMeasured = stat.mean(measured)
    variability = ((measured - meanOfMeasured)**2).sum()
    return 1 - estimateError/variability

def getTrajectoryData(fileName):
    data = collections.defaultdict(list)
    with open(fileName, 'rt', newline='') as fin:
        dialect = csv.Sniffer().sniff(fin.read(2048)) # csv.Snifferの練習
        fin.seek(0) # sviff関数内でfinでの位置が先頭からずれてしまっている（今回は終端まで来てしまっている）ため、先頭に戻さないといけない。さもなくばgetTrajectoryDataは意図しないデータ（空のリスト等）を返す。
        csvReader = csv.reader(fin, dialect)
        for line in csvReader:
            try: # ヘッダーはハンドリングする。浮動小数に変換することができないヘッダーをfloat関数に渡したときにValueErrorがでることを利用した。
                for i in range(len(line)):
                    data[i].append(float(line[i]))
            except ValueError:
                continue
    return (data[0], [data[i] for i in data][1:])

def processTrajectories(fileName):
    distances, heights = getTrajectoryData(fileName)
    numTrials = len(heights)
    distances = pylab.array(distances)
    totHeights = pylab.array(len(distances)*[0.0])
    for h in heights:
        totHeights += pylab.array(h)
    meanHeights = totHeights/numTrials # テキストでは、numTrialsを計算していたのを忘れていたのか？使わせてもらった。
    pylab.title(f'Trajectory of Projectile (Mean of {numTrials} Trials)')
    pylab.xlabel('Inches from Launch Point')
    pylab.ylabel('Inches Above Launch Point')
    pylab.plot(distances, meanHeights, 'ko')
    fit = pylab.polyfit(distances, meanHeights, 1)
    altitudes = pylab.polyval(fit, distances)
    pylab.plot(distances, altitudes, 'b', label='Linear Fit')
    print(f'RSquare of linear fit = {rSquared(meanHeights, altitudes)}')
    fit = pylab.polyfit(distances, meanHeights, 2)
    altitudes = pylab.polyval(fit, distances) # テキストはミスだろうな。なんで2回もpolyfitやってんねんって感じ。
    pylab.plot(distances, altitudes, 'b', label='Quadratic Fit')
    print(f'RSquare of quadratic fit = {rSquared(meanHeights, altitudes)}')
    pylab.legend()

processTrajectories(launcherDataTxt)

RSquare of linear fit = 0.0177433205440769
RSquare of quadratic fit = 0.9857653692869693

18.2.2 計算モデルを使う

code 18.6

def getHorizontalSpeed(quadfit, minX, maxX):
    inchesPerFoot = 12
    xMid = (maxX - minX)/2
    a, b, c = quadfit[0], quadfit[1], quadfit[2]
    yPeak = a*xMid**2 + b*xMid + c
    g = 32.16*inchesPerFoot
    t = (2*yPeak/g)**0.5
    print(f'Horizontal speed = {int(xMid/(t*inchesPerFoot))}feet/sec')

def getTrajectoryData(fileName):
    data = collections.defaultdict(list)
    with open(fileName, 'rt', newline='') as fin:
        dialect = csv.Sniffer().sniff(fin.read(2048)) # csv.Snifferの練習
        fin.seek(0) # sviff関数内でfinでの位置が先頭からずれてしまっている（今回は終端まで来てしまっている）ため、先頭に戻さないといけない。さもなくばgetTrajectoryDataは意図しないデータ（空のリスト等）を返す。
        csvReader = csv.reader(fin, dialect)
        for line in csvReader:
            try: # ヘッダーはハンドリングする。浮動小数に変換することができないヘッダーをfloat関数に渡したときにValueErrorがでることを利用した。
                for i in range(len(line)):
                    data[i].append(float(line[i]))
            except ValueError:
                continue
    return (data[0], [data[i] for i in data][1:])

def processTrajectories(fileName):
    distances, heights = getTrajectoryData(fileName)
    numTrials = len(heights)
    distances = pylab.array(distances)
    totHeights = pylab.array(len(distances)*[0.0])
    for h in heights:
        totHeights += pylab.array(h)
    meanHeights = totHeights/numTrials # テキストでは、numTrialsを計算していたのを忘れていたのか？使わせてもらった。
    pylab.title(f'Trajectory of Projectile (Mean of {numTrials} Trials)')
    pylab.xlabel('Inches from Launch Point')
    pylab.ylabel('Inches Above Launch Point')
    pylab.plot(distances, meanHeights, 'ko')
    fit = pylab.polyfit(distances, meanHeights, 1)
    altitudes = pylab.polyval(fit, distances)
    pylab.plot(distances, altitudes, 'b', label='Linear Fit')
    fit = pylab.polyfit(distances, meanHeights, 2)
    altitudes = pylab.polyval(fit, distances) # テキストはミスだろうな。なんで2回もpolyfitやってんねんって感じ。
    pylab.plot(distances, altitudes, 'b', label='Quadratic Fit')
    pylab.legend()
    getHorizontalSpeed(fit, distances[-1], distances[0]) # テキストは絶対ミスやろなぁ・・・getHorizontalSpeedの第一引数は二次多項式の係数のはずだからこれが正解。

processTrajectories(launcherDataTxt)

Horizontal speed = 136feet/sec

18.3 指数的に分布するデータに適合させる

code 18.7

def plotFittingPowerOf3(stop=10):
    xVals = pylab.arange(stop) # テキストではfor文を使って3**iの数値から成るリストを作成していたが、arrayを使った方が簡単だろう。
    vals = 3**xVals            # 2行で終わる。
    pylab.plot(vals, 'ko', label='Actual points')
    fit = pylab.polyfit(xVals, vals, 5)
    yVals = pylab.polyval(fit, xVals)
    pylab.plot(yVals, 'kx', label='Predicted points', markeredgewidth=2, markersize=15) # markersize=25はでかすぎるので15にしておいた。
    pylab.title('Fitting y = 3**x')
    pylab.legend(loc = 'upper left')

ちなみに、図 18.9のタイトルが2**xになっているが、3**xの間違いだろう。

plotFittingPowerOf3()

f:id:koheitsutsumi223:20190224223942p:plain

def plotFittingPowerOf3(stop=10):
    xVals = pylab.arange(stop) # テキストではfor文を使って3**iの数値から成るリストを作成していたが、arrayを使った方が簡単だろう。
    vals = 3**xVals            # 2行で終わる。
    pylab.plot(vals, 'ko', label='Actual points')
    fit = pylab.polyfit(xVals, vals, 5)
    yVals = pylab.polyval(fit, xVals)
    pylab.plot(yVals, 'kx', label='Predicted points', markeredgewidth=2, markersize=15) # markersize=25はでかすぎるので15にしておいた。
    pylab.title('Fitting y = 3**x')
    pylab.legend(loc = 'upper left')
    print(f'Model predicts that 3**20 is roughly {pylab.polyval(fit, [3**20][0])}')
    print(f'Actual value of 3**20 = {3**20}')

plotFittingPowerOf3()

Model predicts that 3**20 is roughly 2.4547827637212492e+48
Actual value of 3**20 = 3486784401

図 18.10のプロット

def plotFig1810(stop=10):
    vals = 3**pylab.arange(stop)
    pylab.plot(vals, 'ko', label='Actual points')
    pylab.semilogy()

plotFig1810()

f:id:koheitsutsumi223:20190224223955p:plain

code 18.8

def createData(f, xVals): # この関数作る必要あるのか？
    return f(xVals)

def fitExpData(xVals, yVals):
    logVals = math.log(yVals, 2)
    fit = pylab.polyfit(xVals, logVals, 1)
    return fit, 2.0

def plotFittingExpFunc(stop=10, anyX=20):
    xVals = pylab.arange(stop)
    f = lambda x: 3**x
    yVals = createData(f, xVals)
    pylab.plot(xVals, yVals, 'ko', label='Actual values')
    fit, base = fitExpData(xVals, yVals)
    predictedYVals = base**(pylab.polyval(fit, xVals))
    pylab.plot(xVals, predictedYVals, label = 'Predicted values')
    pylab.title('Fitting an Exponential Function')
    pylab.legend(loc='upper left')
    print(f'f({anyX}) = {f(anyX)}')
    print(f'Predicted value = {int(base**(pylab.polyval(fit, [anyX])))}') # テキストのylabってなんだよｗ

メモ

　今書いたplotFittingExpFunc関数は以下のエラーが出る。原因はfitExpData関数の中のmath.log(yVals, 2)にある。[Python]exp,sin等の関数に引数として配列を渡したい時を参考にして解決した。

plotFittingExpFunc()

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-189-bc73174cc2e3> in <module>
    ----> 1 plotFittingExpFunc()
    

    <ipython-input-188-16a8181357ef> in plotFittingExpFunc(stop, anyX)
          4     yVals = createData(f, xVals)
          5     pylab.plot(xVals, yVals, 'ko', label='Actual values')
    ----> 6     fit, base = fitExpData(xVals, yVals)
          7     predictedYVals = base**(pylab.polyval(fit, xVals))
          8     pylab.plot(xVals, predictedYVals, label = 'Predicted values')
    

    <ipython-input-187-40ad0566ff29> in fitExpData(xVals, yVals)
          1 def fitExpData(xVals, yVals):
    ----> 2     logVals = math.log(yVals, 2)
          3     fit = pylab.polyfit(xVals, logVals, 1)
          4     return fit, 2.0
    

    TypeError: only size-1 arrays can be converted to Python scalars

f:id:koheitsutsumi223:20190224224002p:plain

以下のfitExpData関数を使えばエラー無く実行できる。

def fitExpData(xVals, yVals):
    # logVals = math.log(yVals, 2)
    logVals = pylab.log2(yVals)
    fit = pylab.polyfit(xVals, logVals, 1)
    return fit, 2.0

plotFittingExpFunc()

f(20) = 3486784401
Predicted value = 3486784401

f:id:koheitsutsumi223:20190224224008p:plain

最後に

　この本読んでるけど、コードが冗長なとこが結構あるな。僕の性格的にこういうのがおおいと結構げんなりする。けど、同時にある程度奇麗に書き直せた時は快感だから引き続き続けていこう。