Computing Memoir: May 2016

Saturday, May 28, 2016

Fluent for educated

Recently came across a book named 'Fluent Python' and being inspired to make my python code fluent and concise.

First there is no need to read each line and parse and put to a list like below.

def LoadTrainingExamples( file, separator ) :
    examples = []
    for line in open( file ) :
        examples.append( [ float(e) for e in line.rstrip().split(separator) ] )
    return examples

>>> Examples = LoadTrainingExamples( 'faults.txt', '\t' )

You just need to call loadtxt() like below

>>> Examples = np.loadtxt( 'faults.txt' )

Next, I used array index as in implicit index for storing min, max. It is a sore to eyes but have to show you again.

>>> minmax = np.vstack( (A.min(0),A.max(0)) ); 
>>> A = Normalize( A, minmax,[-1,1] );

There is 'namedtuple' to make a C structure like class in a line.

>>> import collections
>>> Limit = collections.namedtuple( 'Limit', ['Upper','Lower']) 
>>> limits = [Limit(min(aj),max(aj)) for c in A.T]   # A.T because min, max across column vector. Also named aj instead of ai 
>>> A = Normalize( A, limits, Limit(-1,1))

Then Normalize function that is another pain to read.

def Normalize( A, minmax, interval ) :
    N = []
    scale = math.fabs( interval[1] - interval[0] )
    offset = interval[0]
    for i in range(A.shape[1]) :
        minv = minmax[0,i]
        maxv = minmax[1,i]
        N.append( [ (((float(a)-minv)/(maxv-minv))*scale + offset)
                    for a in A[:,i] ] )
    return np.matrix( N ).getT()

With 'Limit' namedtuple and numpy.matrix 's operator overloading of + and *, the code can be written as below.

def Normalize( A, Limits, interval ) :
    N = np.matrix([ (aj-li.Lower)/(li.Upper-li.Lower) 
        for aj,li in zip(A.T, Limits) ])                         # convert to [0:1] 
    N = N * (interval.Upper - interval.Lower) + interval.Lower   # scale and offset
    return N.getT()

Fluent Python, Luciano Ramalho

Wednesday, May 25, 2016

Aritificial Neural Network with two layers and weight update

ANN is popular machine learning suject. Here I describe basic two layer neural network with backpropagation. It is simplest form of neural network that can update itself to reduce classification error.
First, input range from -1 to 1 instead of 0 to 1. Below is the code that normalize input value on this range.

def Normalize( A, minmax, interval ) :
    N = []
    scale = math.fabs( interval[1] - interval[0] )
    offset = interval[0]
    for i in range(A.shape[1]) :
        minv = minmax[0,i]
        maxv = minmax[1,i]
        N.append( [ (((float(a)-minv)/(maxv-minv))*scale + offset)
                    for a in A[:,i] ] )
    return np.matrix( N ).getT()

Forward propagation with sigmoid neuron can be coded as below. Here X is the input vector and HN is the weight for Hidden Neurons - nerons after input. And ON is the weight of output neuron. Here

def ForwardPropagation( X, HN, ON ) :
    Z = np.dot( HN, X )
    H = [ (1/(1+math.exp(-zi))) for zi in Z ]
    Z = np.dot( ON, H )
    O = [ (1/(1+math.exp(-zi))) for zi in Z ]
    return H, O

Backward propagation can be expressed in matrix form as below. H and O is output from above ForwardPropagation and T is the output observed and nu is training rate.

def BackwardPropagation( X, HN, ON, H, O, T, nu ) :
    D1 = [ yi*(1-yi)*(ti-yi) for yi,ti in zip(O,T) ]
    D2 = [ hi*(1-hi)*pi for hi, pi in zip(H, np.dot(ON.T,D1))]
    ON += nu * np.dot(np.matrix(D1).T, np.matrix(H))
    HN += nu * np.dot(np.matrix(D2).T, np.matrix(X).T )
    return HN, ON

With steel plate database - as from previous blog - below is a code to run a epoch ( running through all example to train ).

Examples = np.array(LoadTrainingExamples( 'faults.txt', '\t' ))
Examples = Examples[ range(0,346) ]  # 0-345 rows is for classification 27,28 column
A = Examples[:, range(4,27)]  # attributes - exclude 0-3 column as it is x,y position
minmax = np.vstack( (A.min(0),A.max(0)) );
A = Normalize( A, minmax, [-1,1] )
N = A.shape[1]                # number of attributes - input vector length
C = Examples[:, [27,28] ]
M = C.shape[1]                # number of output - output vector length
HN = 0.01 * np.ones( [N,N] )  # 0.01 for initial weight
ON = 0.01 * np.ones( [M,N] ) 

for Ai,Ci in zip( A,C)  :
    H, O = ForwardPropagation( Ai.T, HN, ON )
    HN, ON = BackwardPropagation( Ai.T, HN, ON, H, O, Ci, 0.1)

When running above with 200 epoch, the error rate goes below 1%. Below is a graph shows how the error rates changes as ANN is get trained.

Though when running with all 7 attributes, it doesn't get trained well as below graph shows.

An Introduction to Machine Learning, Miroslav Kubat. Chapter5
dataset provided by Semeion, Research Center of Sciences of Communication, Via Sersale 117, 00128, Rome, Italy. www.semeion.it

Sunday, May 15, 2016

k-Nearest-Neighbor classifer and cross-validation

Next subject is k-Nearest-Neighbor classifier. When it gets a new sample, it searches through training example to find similar one ( or k ) and say that it should be same as history. If you have a big training samples that covers most of the case, then it should pick something similiar.

Comparing with Decision Tree, you may wonder what is pros and cons. Here is analogy. When you drive to a city never visited before, a GPS navigator can help you to go to the hotel. It tells you where it is by the hotel address. As long as you are going some well-known hotel, the navigator is the first choice. To follow the navigator is like using a deicision tree. You just need to take turn when decision tree ask you do. Though if you have a wrong street name, then no matter how good the building number is, you may end up at wrong address. Or if you don't have address, or name of the place, but just a description , then navigator leads to nowhere.

Let's say you want somewhere near hill-top where you can see the sunset and have beer at a pub and then go to a cosy B&B in a walking distance , then your best bet is parking the car and take a cab and explain yourself. An experienced cab driver can take you a place in where all description meets in the middle. A dumb navigator is no match in here. Of course, you have to tip the cab driver dearly as she has to go through every locations in her memory and match with those description.

Above analogy is quite crude but I am trying to make a point that kNN classifier is something you should try when you know that the classification should go through a wide range of choices with uncertainty in everywhere.

There is an open database for this kind of test. Try to visit UCI Machine Learning Repository Here I am using Steel Plate Falut

First the database is a tab separated text file which can be loaded up as below.

def LoadTrainingExamples( file, separator ) :
    examples = []
    for line in open( file ) :
        examples.append( [ float(e) for e in line.rstrip().split(separator) ] )
    return examples

>>> LoadTrainingExamples( 'faultsshort.txt', '\t' )
...

When there are large number of attributes - like this database which has 27 variables - we have to choose variables to play with. These chosen variables are called a model in large. The selection can come from an expert or from a extensive automatic search - like hill climbing. No matter how the selection comes, it has to be verified whether the chosen model and training set is a good match. One of the verification method is a cross-validation. It is to take out some of examples and train the model with those examples and then classify the remained examples with the trained model and see if the result classification matches with record.

So we need a function to choose attribute to make a model. Here examples comes with each row as a record and each column as a attribute. Below function pick columns and forms a model

def PickColumns( A, I ) :
    P = [ A[:,i] for i in I ]
    return np.matrix( P ).getT()

>>> PickColumns( np.array( [[1,2,3],[4,5,6] ] ), [0,2] )
matrix([[1, 3],
        [4, 6]])

Before apply training, each attribute should be scaled up/down so that each contribute to the result equally. In here, the normalization is done using max,min of each attribute.

def Normalize( A, minmax ) :
    N = []
    for i in range(A.shape[1]) :
        minv = minmax[0,i]
        maxv = minmax[1,i]
        N.append( [ ((float(a)-minv)/(maxv-minv)) for a in A[:,i] ] )
    return np.matrix( N ).getT()

>>> Normalize( np.matrix('1,2;0,1'), np.matrix('0,0;1,2'))
matrix([[ 1. ,  1. ],
        [ 0. ,  0.5]])

Finding out k-Nearest-Neighbor is relatively simple as we can use scipy 's spatial.KDTree. KDTree is like binary tree with n-Dimenonal attributes and should return result in O(log2(N)). Here is code snippet.

    KD = spatial.KDTree( A )
    distance, index = KD.query( X, k )

With above KDTree, we can find out k-Nerest-Neighbor at index. Then with those neighbor, we have to choose one neighbor or classifcation result. Here is a function that select most frequent neighbor.

def MaxArg( index, C ) :
    if len(index.shape) == 1 :
        S = sum( [ C[i] for i in index ] )
        NS = 1
    else :
        S = sum( [ C[i] for i in index[0] ] )
        NS = len( index[0] )
    return [ int(si/NS+0.5) for si in S ]

Integrating all the above code, here is a code that does cross validation. Refer [Moore and Lee] for LOOCV ( Leave-One-Out Cross Validation )


Examples = np.array(LoadTrainingExamples( 'faults.txt', '\t' ))
AI = range( 4,8 ); CI = range( 27,28 )  # The model is to use column 4,5,6,7 as attribute and 27 as classification
error_sum = 0

k = 1  # k can be 1 or 3 
E = PickColumns( Examples, AI );  
minmax = np.vstack( (E.min(0),E.max(0)) ); 
E = Normalize( E, minmax );
EC = PickColumns( Examples, CI )

# LOOCV ( Leave-One-Out Cross Validation )
for out_one in range(len(Examples)): 
    A = np.delete( E, (out_one), axis=0)    # training attribute
    C = np.delete( EC, (out_one), axis=0)   # training classification
 
    X = E[out_one];                         # validation point
    CX = EC[out_one];                       # recodred classification

    KD = spatial.KDTree( A )
    distance, index = KD.query( X, k )
    predict = MaxArg( index, C )
    error_sum += sum( [ abs(p-CX[0,i]) for i,p in enumerate(predict)])
    
    print( str.format( "Leave-One {0} is classifed by {1}{2} as {3} while sampled as {4}",
           out_one, index, distance, predict, CX[0] ))  

print( str.format( "Error = {0} ({1}/{2})",
    error_sum/(len(Examples)*len(CI)), error_sum, len(Examples)*len(CI)))

When running above script with k=1, the cross validation error is 10.8%.

Leave-One 0 is classifed by [1299][ 0.00052379] as [0] while sampled as [[ 1.]]
Leave-One 1 is classifed by [115][ 0.00033883] as [1] while sampled as [[ 1.]]
...
Leave-One 1940 is classifed by [225][ 0.00013041] as [0] while sampled as [[ 0.]]
Error = 0.10819165378670788 (210.0/1941)

With k=3, the error is slightly less of 8.6%

Leave-One 0 is classifed by [[1299 1314 1004]][[ 0.00052379  0.00063176  0.00063623]] as [0] while sampled as [[ 1.]]
Leave-One 1 is classifed by [[ 115 1595 1554]][[ 0.00033883  0.00036469  0.00039562]] as [0] while sampled as [[ 1.]]
...
Leave-One 1940 is classifed by [[ 225 1927  857]][[ 0.00013041  0.00022388  0.00024139]] as [0] while sampled as [[ 0.]]
Error = 0.0865533230293663 (168.0/1941)

The model chosen is not from an expert. Though the result is relatively impressive to me. I mean it is not easy to be correct on 9 out of 10 case.

Choosing model automatically using Hill-climbing and see if there is 'No free hunch'. I mean my guess - AI = range( 4,8 )- should be really bad choice compared to extensive and through search
Every validation needs KDTree setup with almost similar training set. If previous KDTree can be reused - by removing and adding a row - the speed can be improved dramatically.
To speed up at run time, the training set needs to be trimmed by removing redundancy examples. 'Tomek Link' test can be tried

An Introduction to Machine Learning, Miroslav Kubat. p51
Efficient Algorithms for Minimizing Cross Validation Error, Andrew W. Moore, Mary S.Lee
dataset provided by Semeion, Research Center of Sciences of Communication, Via Sersale 117, 00128, Rome, Italy. www.semeion.it

Sunday, May 8, 2016

Decision Tree with Python

Studying machine learning and writing code to see whether I understand the theory of decision tree correctly or not by writing python code. For the reference, refer [MLT].

First Entrophy and Gain should be calculated and here is the code snippet. Entrophy can be used like Entrophy( [9, 5] ) and it will return 0.9402859586706309. And Gain( [9,5], [[6,2],[3,3]] ) will return 0.04812703040826927. .

import numpy as NP
import itertools
import functools
import math

def Entrophy( S ):
    N = sum( S )
    if N == 0.0 : return 0.0

    P = [Si/N for Si in S]
    return sum( [ (-pi * math.log(pi,2) if pi != 0.0 else 0.0) for pi in P] ) 

def Gain( S, A ):
    NS = sum( S )
    NSv = [ sum(Ai) for Ai in A ]
    return Entrophy(S) - sum( [ (NSv[i]/NS)*Entrophy(A[i]) for i in range(len(A))] )

For test example , I make up O variables like below. It is numpy array to use range index like O[:,-1].


O = NP.array([
    ['Sunny', 'Hot', 'High', 'Weak', 'No'],
    ['Sunny', 'Hot', 'High', 'Strong', 'No'],
    ['Overcast', 'Hot', 'High', 'Weak', 'Yes'],
    ['Rain', 'Mild', 'High', 'Weak', 'Yes'],
    ['Rain', 'Cool', 'Normal', 'Weak', 'Yes'],
    ['Rain', 'Cool', 'Normal', 'Strong', 'No'],
    ['Overcast', 'Cool', 'Normal', 'Strong', 'Yes'],
    ['Sunny', 'Mild', 'High', 'Weak', 'No'],
    ['Sunny', 'Cool', 'Normal', 'Weak', 'Yes'],
    ['Rain', 'Mild', 'Normal', 'Weak', 'Yes'],
    ['Sunny', 'Mild', 'Normal', 'Strong', 'Yes'],
    ['Overcast', 'Mild', 'High', 'Strong', 'Yes'],
    ['Overcast', 'Hot', 'Normal', 'Weak', 'Yes'],
    ['Rain', 'Mild', 'High', 'Strong', 'No'],    
    ])

Then there should be holder to know what kind of variables are there.

def DistinctAttributeMap( O ) :
    DA = []
    for i in range(len(O[0])) :
        m = {}
        A = list(set(O[:,i]))
        for j in range(0,len(A)) : 
             m[ A[j] ] = j
        m[ 'Reverse' ] = A
        m[ 'Count' ] = len(A)
        DA.append( m  )
    return DA

DistinctAtrributeMap goes through each column and figure out how many different variables are there. It will generate output like below with the examples at above.

[
{    'Rain': 0, 'Sunny': 1, 'Overcast': 2,      'Reverse': ['Rain', 'Sunny', 'Overcast']}, 
{    'Cool': 0, 'Hot': 1, 'Mild': 2,          'Reverse': ['Cool', 'Hot', 'Mild']}, 
{    'High': 0, 'Normal': 1,                 'Reverse': ['High', 'Normal']}, 
{    'Weak': 0, 'Strong': 1,                 'Reverse': ['Weak', 'Strong']}, 
{    'No': 0, 'Yes': 1,                         'Reverse': ['No', 'Yes']}
]

And to generate A matrix for Gain(), here comes Attributes().

def Attributes( O, DA, i ) : 
    A = NP.zeros( [ DA[i]['Count'], DA[-1]['Count'] ] )

    for j in range(len(O)) :
        a = DA[i][ O[j,i] ]
        c = DA[-1][ O[j,-1] ]
        A[ a, c ] =  A[ a, c ] + 1

    return A

The Attributes( O, DA, 0 ) will generate result like below.

array([[ 2.,  3.],
       [ 3.,  2.],
       [ 0.,  4.]])

Armed with above functions, the ID3 is like below.

def ID3( Examples, AttributeMap, Availables, Tree  ) :

    C = Attributes( Examples, AttributeMap, -1 )
    S = [ C[i,i] for i in range(len(C)) ]

    print( 'Examples = ', Examples )
    print( 'Availables = ', Availables )
    print( 'S = ', S )

    for i in range(len(S)) :
        if S[i] == len(Examples) :
            Tree[ 'Leaf' ] = AttributeMap[ 'Reverse' ][i]
            return

    if len(Availables) == 0 :
        maxS = max( S )
        maxIndex = S.index( maxS )
        Tree[ 'Probably' ] = AttributeMap[-1][maxIndex]
        Tree[ 'Odd' ] = maxS / sum( S )
        return
    
    maxAttr = -1
    maxGain = -1
    for i in Availables :
        Gi = Gain( S, Attributes( Examples, AttributeMap, i ))
        if Gi > maxGain :
            maxAttr = i
            maxGain = Gi

    Tree[ 'Best' ] = maxAttr
    Tree[ 'Gain' ] = maxGain

    leftAttr = Availables.copy()
    leftAttr.remove( maxAttr )

    for a in AttributeMap[maxAttr][ 'Reverse' ] :
        leftExamples = NP.array( list(filter( lambda r : r[maxAttr] == a, Examples )))

        leftClass = set(leftExamples[:,-1])
        if len( leftClass ) == 1 :
            Tree[ a ] = leftClass.pop()
        else :
            Tree[ a ] = {}  
            ID3( leftExamples, AttributeMap, leftAttr, Tree[a] )
            
    return

The result of this is like below with some formatting. .


>DA = DistinctAttributeMap( O )
>DT = {}
>ID3( O, DA, [0,1,2,3], DT )
>print( 'DecisionTree = ', DT )
DecisionTree ={
'Best': 0,
'Gain': 0.24674981977443911}
'Rain': {
    'Best': 3,
    'Gain': 0.97095059445466858,
    'Weak': 'Yes',
    'Strong': 'No'},
'Sunny': {
   'Best': 2,
   'Gain': 0.97095059445466858,
   'High': 'No',
   'Normal': 'Yes'},
'Overcast': 'Yes', 
}

I used dictionary to represent a tree structure. I guess I will try to show result in drawing. But it will be another blog page .
How about using 'Gini index' rather than 'Gain' ? Will it make any difference ?
How do I do to make noise resilent - prevent overfitting ?
How to let user do fine tuning ? Like decision tree with at most 3 depths and at least 95% classification ?
How about presenting the decision tree in diagram and let user verfiy it ? An interactive diagram that shows and update decision tree ?
How about making this as an API so that user provide data and retrieve decision tree ?
How about dynamically updating decision tree in run time ?

Reference :
[MLT] Machine Learning by Tom Mitchell. Chapter 3

Computing Memoir