essay on programming languages, computer science, information techonlogies and all.

Wednesday, May 25, 2016

Aritificial Neural Network with two layers and weight update

ANN is popular machine learning suject. Here I describe basic two layer neural network with backpropagation. It is simplest form of neural network that can update itself to reduce classification error.
First, input range from -1 to 1 instead of 0 to 1. Below is the code that normalize input value on this range.
def Normalize( A, minmax, interval ) :
    N = []
    scale = math.fabs( interval[1] - interval[0] )
    offset = interval[0]
    for i in range(A.shape[1]) :
        minv = minmax[0,i]
        maxv = minmax[1,i]
        N.append( [ (((float(a)-minv)/(maxv-minv))*scale + offset)
                    for a in A[:,i] ] )
    return np.matrix( N ).getT()
Forward propagation with sigmoid neuron can be coded as below. Here X is the input vector and HN is the weight for Hidden Neurons - nerons after input. And ON is the weight of output neuron. Here
def ForwardPropagation( X, HN, ON ) :
    Z = np.dot( HN, X )
    H = [ (1/(1+math.exp(-zi))) for zi in Z ]
    Z = np.dot( ON, H )
    O = [ (1/(1+math.exp(-zi))) for zi in Z ]
    return H, O
Backward propagation can be expressed in matrix form as below. H and O is output from above ForwardPropagation and T is the output observed and nu is training rate.
def BackwardPropagation( X, HN, ON, H, O, T, nu ) :
    D1 = [ yi*(1-yi)*(ti-yi) for yi,ti in zip(O,T) ]
    D2 = [ hi*(1-hi)*pi for hi, pi in zip(H, np.dot(ON.T,D1))]
    ON += nu * np.dot(np.matrix(D1).T, np.matrix(H))
    HN += nu * np.dot(np.matrix(D2).T, np.matrix(X).T )
    return HN, ON
With steel plate database - as from previous blog - below is a code to run a epoch ( running through all example to train ).
Examples = np.array(LoadTrainingExamples( 'faults.txt', '\t' ))
Examples = Examples[ range(0,346) ]  # 0-345 rows is for classification 27,28 column
A = Examples[:, range(4,27)]  # attributes - exclude 0-3 column as it is x,y position
minmax = np.vstack( (A.min(0),A.max(0)) );
A = Normalize( A, minmax, [-1,1] )
N = A.shape[1]                # number of attributes - input vector length
C = Examples[:, [27,28] ]
M = C.shape[1]                # number of output - output vector length
HN = 0.01 * np.ones( [N,N] )  # 0.01 for initial weight
ON = 0.01 * np.ones( [M,N] ) 

for Ai,Ci in zip( A,C)  :
    H, O = ForwardPropagation( Ai.T, HN, ON )
    HN, ON = BackwardPropagation( Ai.T, HN, ON, H, O, Ci, 0.1)             

When running above with 200 epoch, the error rate goes below 1%. Below is a graph shows how the error rates changes as ANN is get trained.


Though when running with all 7 attributes, it doesn't get trained well as below graph shows.


    References
  • An Introduction to Machine Learning, Miroslav Kubat. Chapter5
  • dataset provided by Semeion, Research Center of Sciences of Communication, Via Sersale 117, 00128, Rome, Italy. www.semeion.it

Sunday, May 15, 2016

k-Nearest-Neighbor classifer and cross-validation

Next subject is k-Nearest-Neighbor classifier. When it gets a new sample, it searches through training example to find similar one ( or k ) and say that it should be same as history. If you have a big training samples that covers most of the case, then it should pick something similiar.

Comparing with Decision Tree, you may wonder what is pros and cons. Here is analogy. When you drive to a city never visited before, a GPS navigator can help you to go to the hotel. It tells you where it is by the hotel address. As long as you are going some well-known hotel, the navigator is the first choice. To follow the navigator is like using a deicision tree. You just need to take turn when decision tree ask you do. Though if you have a wrong street name, then no matter how good the building number is, you may end up at wrong address. Or if you don't have address, or name of the place, but just a description , then navigator leads to nowhere.

Let's say you want somewhere near hill-top where you can see the sunset and have beer at a pub and then go to a cosy B&B in a walking distance , then your best bet is parking the car and take a cab and explain yourself. An experienced cab driver can take you a place in where all description meets in the middle. A dumb navigator is no match in here. Of course, you have to tip the cab driver dearly as she has to go through every locations in her memory and match with those description.

Above analogy is quite crude but I am trying to make a point that kNN classifier is something you should try when you know that the classification should go through a wide range of choices with uncertainty in everywhere.

There is an open database for this kind of test. Try to visit UCI Machine Learning Repository Here I am using Steel Plate Falut

First the database is a tab separated text file which can be loaded up as below.
def LoadTrainingExamples( file, separator ) :
    examples = []
    for line in open( file ) :
        examples.append( [ float(e) for e in line.rstrip().split(separator) ] )
    return examples
>>> LoadTrainingExamples( 'faultsshort.txt', '\t' )
...

When there are large number of attributes - like this database which has 27 variables - we have to choose variables to play with. These chosen variables are called a model in large. The selection can come from an expert or from a extensive automatic search - like hill climbing. No matter how the selection comes, it has to be verified whether the chosen model and training set is a good match. One of the verification method is a cross-validation. It is to take out some of examples and train the model with those examples and then classify the remained examples with the trained model and see if the result classification matches with record.

So we need a function to choose attribute to make a model. Here examples comes with each row as a record and each column as a attribute. Below function pick columns and forms a model
def PickColumns( A, I ) :
    P = [ A[:,i] for i in I ]
    return np.matrix( P ).getT()
>>> PickColumns( np.array( [[1,2,3],[4,5,6] ] ), [0,2] )
matrix([[1, 3],
        [4, 6]])

Before apply training, each attribute should be scaled up/down so that each contribute to the result equally. In here, the normalization is done using max,min of each attribute.
def Normalize( A, minmax ) :
    N = []
    for i in range(A.shape[1]) :
        minv = minmax[0,i]
        maxv = minmax[1,i]
        N.append( [ ((float(a)-minv)/(maxv-minv)) for a in A[:,i] ] )
    return np.matrix( N ).getT()
>>> Normalize( np.matrix('1,2;0,1'), np.matrix('0,0;1,2'))
matrix([[ 1. ,  1. ],
        [ 0. ,  0.5]])

Finding out k-Nearest-Neighbor is relatively simple as we can use scipy 's spatial.KDTree. KDTree is like binary tree with n-Dimenonal attributes and should return result in O(log2(N)). Here is code snippet.
    KD = spatial.KDTree( A )
    distance, index = KD.query( X, k )

With above KDTree, we can find out k-Nerest-Neighbor at index. Then with those neighbor, we have to choose one neighbor or classifcation result. Here is a function that select most frequent neighbor.
def MaxArg( index, C ) :
    if len(index.shape) == 1 :
        S = sum( [ C[i] for i in index ] )
        NS = 1
    else :
        S = sum( [ C[i] for i in index[0] ] )
        NS = len( index[0] )
    return [ int(si/NS+0.5) for si in S ]

Integrating all the above code, here is a code that does cross validation. Refer [Moore and Lee] for LOOCV ( Leave-One-Out Cross Validation )

Examples = np.array(LoadTrainingExamples( 'faults.txt', '\t' ))
AI = range( 4,8 ); CI = range( 27,28 )  # The model is to use column 4,5,6,7 as attribute and 27 as classification
error_sum = 0

k = 1  # k can be 1 or 3 
E = PickColumns( Examples, AI );  
minmax = np.vstack( (E.min(0),E.max(0)) ); 
E = Normalize( E, minmax );
EC = PickColumns( Examples, CI )

# LOOCV ( Leave-One-Out Cross Validation )
for out_one in range(len(Examples)): 
    A = np.delete( E, (out_one), axis=0)    # training attribute
    C = np.delete( EC, (out_one), axis=0)   # training classification
 
    X = E[out_one];                         # validation point
    CX = EC[out_one];                       # recodred classification

    KD = spatial.KDTree( A )
    distance, index = KD.query( X, k )
    predict = MaxArg( index, C )
    error_sum += sum( [ abs(p-CX[0,i]) for i,p in enumerate(predict)])
    
    print( str.format( "Leave-One {0} is classifed by {1}{2} as {3} while sampled as {4}",
           out_one, index, distance, predict, CX[0] ))  

print( str.format( "Error = {0} ({1}/{2})",
    error_sum/(len(Examples)*len(CI)), error_sum, len(Examples)*len(CI)))  

When running above script with k=1, the cross validation error is 10.8%.
Leave-One 0 is classifed by [1299][ 0.00052379] as [0] while sampled as [[ 1.]]
Leave-One 1 is classifed by [115][ 0.00033883] as [1] while sampled as [[ 1.]]
...
Leave-One 1940 is classifed by [225][ 0.00013041] as [0] while sampled as [[ 0.]]
Error = 0.10819165378670788 (210.0/1941)

With k=3, the error is slightly less of 8.6%
Leave-One 0 is classifed by [[1299 1314 1004]][[ 0.00052379  0.00063176  0.00063623]] as [0] while sampled as [[ 1.]]
Leave-One 1 is classifed by [[ 115 1595 1554]][[ 0.00033883  0.00036469  0.00039562]] as [0] while sampled as [[ 1.]]
...
Leave-One 1940 is classifed by [[ 225 1927  857]][[ 0.00013041  0.00022388  0.00024139]] as [0] while sampled as [[ 0.]]
Error = 0.0865533230293663 (168.0/1941)

The model chosen is not from an expert. Though the result is relatively impressive to me. I mean it is not easy to be correct on 9 out of 10 case.
    There are a lot to explore from here.
  • Choosing model automatically using Hill-climbing and see if there is 'No free hunch'. I mean my guess - AI = range( 4,8 )- should be really bad choice compared to extensive and through search
  • Every validation needs KDTree setup with almost similar training set. If previous KDTree can be reused - by removing and adding a row - the speed can be improved dramatically.
  • To speed up at run time, the training set needs to be trimmed by removing redundancy examples. 'Tomek Link' test can be tried


    References
  • An Introduction to Machine Learning, Miroslav Kubat. p51
  • Efficient Algorithms for Minimizing Cross Validation Error, Andrew W. Moore, Mary S.Lee
  • dataset provided by Semeion, Research Center of Sciences of Communication, Via Sersale 117, 00128, Rome, Italy. www.semeion.it

Sunday, May 8, 2016

Decision Tree with Python

Studying machine learning and writing code to see whether I understand the theory of decision tree correctly or not by writing python code. For the reference, refer [MLT].

First Entrophy and Gain should be calculated and here is the code snippet. Entrophy can be used like Entrophy( [9, 5] ) and it will return 0.9402859586706309. And Gain( [9,5], [[6,2],[3,3]] ) will return 0.04812703040826927. .
import numpy as NP
import itertools
import functools
import math

def Entrophy( S ):
    N = sum( S )
    if N == 0.0 : return 0.0

    P = [Si/N for Si in S]
    return sum( [ (-pi * math.log(pi,2) if pi != 0.0 else 0.0) for pi in P] ) 

def Gain( S, A ):
    NS = sum( S )
    NSv = [ sum(Ai) for Ai in A ]
    return Entrophy(S) - sum( [ (NSv[i]/NS)*Entrophy(A[i]) for i in range(len(A))] ) 
For test example , I make up O variables like below. It is numpy array to use range index like O[:,-1].

O = NP.array([
    ['Sunny', 'Hot', 'High', 'Weak', 'No'],
    ['Sunny', 'Hot', 'High', 'Strong', 'No'],
    ['Overcast', 'Hot', 'High', 'Weak', 'Yes'],
    ['Rain', 'Mild', 'High', 'Weak', 'Yes'],
    ['Rain', 'Cool', 'Normal', 'Weak', 'Yes'],
    ['Rain', 'Cool', 'Normal', 'Strong', 'No'],
    ['Overcast', 'Cool', 'Normal', 'Strong', 'Yes'],
    ['Sunny', 'Mild', 'High', 'Weak', 'No'],
    ['Sunny', 'Cool', 'Normal', 'Weak', 'Yes'],
    ['Rain', 'Mild', 'Normal', 'Weak', 'Yes'],
    ['Sunny', 'Mild', 'Normal', 'Strong', 'Yes'],
    ['Overcast', 'Mild', 'High', 'Strong', 'Yes'],
    ['Overcast', 'Hot', 'Normal', 'Weak', 'Yes'],
    ['Rain', 'Mild', 'High', 'Strong', 'No'],    
    ])
Then there should be holder to know what kind of variables are there.
def DistinctAttributeMap( O ) :
    DA = []
    for i in range(len(O[0])) :
        m = {}
        A = list(set(O[:,i]))
        for j in range(0,len(A)) : 
             m[ A[j] ] = j
        m[ 'Reverse' ] = A
        m[ 'Count' ] = len(A)
        DA.append( m  )
    return DA
DistinctAtrributeMap goes through each column and figure out how many different variables are there. It will generate output like below with the examples at above.
[
{    'Rain': 0, 'Sunny': 1, 'Overcast': 2,      'Reverse': ['Rain', 'Sunny', 'Overcast']}, 
{    'Cool': 0, 'Hot': 1, 'Mild': 2,          'Reverse': ['Cool', 'Hot', 'Mild']}, 
{    'High': 0, 'Normal': 1,                 'Reverse': ['High', 'Normal']}, 
{    'Weak': 0, 'Strong': 1,                 'Reverse': ['Weak', 'Strong']}, 
{    'No': 0, 'Yes': 1,                         'Reverse': ['No', 'Yes']}
]
And to generate A matrix for Gain(), here comes Attributes().
def Attributes( O, DA, i ) : 
    A = NP.zeros( [ DA[i]['Count'], DA[-1]['Count'] ] )

    for j in range(len(O)) :
        a = DA[i][ O[j,i] ]
        c = DA[-1][ O[j,-1] ]
        A[ a, c ] =  A[ a, c ] + 1

    return A
The Attributes( O, DA, 0 ) will generate result like below.
array([[ 2.,  3.],
       [ 3.,  2.],
       [ 0.,  4.]])
Armed with above functions, the ID3 is like below.
def ID3( Examples, AttributeMap, Availables, Tree  ) :

    C = Attributes( Examples, AttributeMap, -1 )
    S = [ C[i,i] for i in range(len(C)) ]

    print( 'Examples = ', Examples )
    print( 'Availables = ', Availables )
    print( 'S = ', S )

    for i in range(len(S)) :
        if S[i] == len(Examples) :
            Tree[ 'Leaf' ] = AttributeMap[ 'Reverse' ][i]
            return

    if len(Availables) == 0 :
        maxS = max( S )
        maxIndex = S.index( maxS )
        Tree[ 'Probably' ] = AttributeMap[-1][maxIndex]
        Tree[ 'Odd' ] = maxS / sum( S )
        return
    
    maxAttr = -1
    maxGain = -1
    for i in Availables :
        Gi = Gain( S, Attributes( Examples, AttributeMap, i ))
        if Gi > maxGain :
            maxAttr = i
            maxGain = Gi

    Tree[ 'Best' ] = maxAttr
    Tree[ 'Gain' ] = maxGain

    leftAttr = Availables.copy()
    leftAttr.remove( maxAttr )

    for a in AttributeMap[maxAttr][ 'Reverse' ] :
        leftExamples = NP.array( list(filter( lambda r : r[maxAttr] == a, Examples )))

        leftClass = set(leftExamples[:,-1])
        if len( leftClass ) == 1 :
            Tree[ a ] = leftClass.pop()
        else :
            Tree[ a ] = {}  
            ID3( leftExamples, AttributeMap, leftAttr, Tree[a] )
            
    return
The result of this is like below with some formatting. .

>DA = DistinctAttributeMap( O )
>DT = {}
>ID3( O, DA, [0,1,2,3], DT )
>print( 'DecisionTree = ', DT )
DecisionTree ={
'Best': 0,
'Gain': 0.24674981977443911}
'Rain': {
    'Best': 3,
    'Gain': 0.97095059445466858,
    'Weak': 'Yes',
    'Strong': 'No'},
'Sunny': {
   'Best': 2,
   'Gain': 0.97095059445466858,
   'High': 'No',
   'Normal': 'Yes'},
'Overcast': 'Yes', 
}
I used dictionary to represent a tree structure. I guess I will try to show result in drawing. But it will be another blog page .
How about using 'Gini index' rather than 'Gain' ? Will it make any difference ?
How do I do to make noise resilent - prevent overfitting ?
How to let user do fine tuning ? Like decision tree with at most 3 depths and at least 95% classification ?
How about presenting the decision tree in diagram and let user verfiy it ? An interactive diagram that shows and update decision tree ?
How about making this as an API so that user provide data and retrieve decision tree ?
How about dynamically updating decision tree in run time ?

Reference :
[MLT] Machine Learning by Tom Mitchell. Chapter 3

Monday, December 15, 2014

An attempt on parallel execution in OpenCL

To achieve heterogeneous computing in full strength, it is needed to use every bit of computing power of a system and also every bandwidth possible. One important thing for that road is parallel execution. And one of parallel execution is parallelism between kernel and bus. In other word, when a GPU is running a kernel, the bus should prepare the next data to process.

First of all, the hardware of GPU should be constructed in such a way that when a kernel is running and accessing the memory in it, the bus e.g. PCIe connecting GPU and PC main memory should be able to transfer the next data to run the kernel. I found that it is not so easy to figure out in datasheets of GPU boards.

I tried with my laptop - HP Pavillion e119wm with AMD A10 and AMD Radeon HD 8650G. And installed AMD APP 2.9. I made up the kernel and data size so that time for kernel and data transfer are equal. Refer below the screen shot using AMD Profiler.


Then I tried using out-of-order queue and putting events to run data transfer for next image start with kernel execution. Though it is pushed out to the last somehow.


Then I tried making two in-order queue. Still it is serialized somehow as below screenshot.

Going through internet, found that OpenCL of AMD APP does not support the out-of-order queue, but not easy to find out about report of multiple queue parallel execution. I will try to update when I have more news.

Thursday, June 26, 2014

SketchUp motion script uploaded on github

Here sketchup-motion.git is the git that contains the ruby script that can make animation in the SketchUp.

And here is a PCB inspection equipment animation.



Have you noticed that the cable duct is also moving around as the stage moves ? I had to spent some times to make this works with Ruby API. It is crude mimic but it should tells how the cable duct works.

Tuesday, April 15, 2014

Heartbeat of SketchUp animation pumping Ruby script

The heartbeat is a timer that ticks regularly. This tick runs Ruby script block, which increments position or angle of SketchUp's instance toward target. Refer below code snippet for a timer usage and block execution by the timer tick.


In the code, I said the heart beats at precise interval of 'period' and you can say that I should somehow replace the timer with atom clock to get that behaviour. Well, we may meet in the middle by measuring duration; We can't make consistent period but we can get elapsed time between consequent timer calls. Refer below code.



Below is code for object translation. 'entity.transform!' updates the position. This change will be shown by model.active_view.update - search up above code.

Updating position in the middle is easy as it is moving in constant speed. Difficulty lies at the boundaries - reminding me of studying differential equations, asking for great care on boundary condition at all times. In here I have to figure out where and when to stop. I am just checking the remaining distance to the target position in here.




This simple approach - constant speed - makes unrealistic movements at the boundaries. It starts and stops in infinite acceleration. Better simulation should involve smooth change at ends.

Reference : Animate Yo' Cheese

Tuesday, April 8, 2014

One step further with Sketchup and RubyAPI

Trying to impress audience, I have gone further on this simulation and created this glass handling robot interacting with AOI and cassette.



Apparent complex movement are all come down to either rotation or translation in a certain axis. And then these simple motions are chained to each other or set to run concurrently. In the end, it is all about designing class to easily create, link and launch motions.