SSH Key Authentication with GitLab

Every time i start building a product for a new company, one of the first step is creating a repository and uploading SSH key. Instead of browsing the web looking for a reminder on how to do it, i decided i’ll post the quickest solution here.


1. Enter the following command in the Terminal window (Mac OS X)

ssh-keygen -t rsa


2. Accept default location and leave password blank (or not, up to you)


3. The key will get generated

Your identification has been saved in /Users/mariuszprzydatek/.ssh/id_rsa.
Your public key has been saved in /Users/mariuszprzydatek/.ssh/
The key fingerprint is:
ce:80:76:66:5b:5d:d2:29:3d:64:66:65:e8:d3:aa:5e mariuszprzydatek@Mariuszs-MacBook-Pro.local
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|         .       |
|        E .      |
|   .   . o       |
|  o . . S .      |
| + + o . +       |
|. + o = o +      |
| o...o * o       |
|.  oo.o .        |


4. The private key (id_rsa) is saved in the .ssh directory and used to verify the public key. The public key ( is the key you’ll be uploading to your GitLab account.


5. Copy your public key to the clipboard

pbcopy < ~/.ssh/


6. Paste the key to GitLab


GitLab SSH Key Authentication




Git branch name in zsh terminal

Ever wondered how nice it would be, to always know which git branch you’re current on, in a given directory? If so, then i encourage you to give Prezto — Instantly Awesome Zsh a try.


Git branch name in zsh terminal


You’ll find it here (as well as instruction on how to install):


Prezto integrates nicely with (among others):

iTerm2, SSH, Ruby, Git, various editors, etc.



Iterative Dichotomiser 3 (ID3) algorithm – Decision Trees – Machine Learning

ID3 is the first of a series of algorithms created by Ross Quinlan to generate decision trees.



  • ID3 does not guarantee an optimal solution; it can get stuck in local optimums
  • It uses a greedy approach by selecting the best attribute to split the dataset on each iteration (one improvement that can be made on the algorithm can be to use backtracking during the search for the optimal decision tree)
  • ID3 can overfit to the training data (to avoid overfitting, smaller decision trees should be preferred over larger ones)
  • This algorithm usually produces small trees, but it does not always produce the smallest possible tree
  • ID3 is harder to use on continuous data (if the values of any given attribute is continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time consuming).



  • ID3 is a precursor to both C4.5 algorithm, as well as C5.0 algorithm.
  • C4.5 improvements over ID3:
    • discrete and continuous attributes,
    • missing attribute values,
    • attributes with differing costs,
    • pruning trees (replacing irrelevant branches with leaf nodes)
  • C5.0 improvements over C4.5:
    • several orders of magnitude faster,
    • memory efficiency,
    • smaller decision trees,
    • boosting (more accuracy),
    • ability to weight different attributes,
    • winnowing (reducing noise)
  • J48 is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool
  • C5.0 is being sold commercially (single-threaded version is distributed under the terms of the GNU General Public License) under following names: C5.0 (Unix/Linux), See5 (Windows)



  • The ID3 algorithm is used by training a dataset S to produce a decision tree which is stored in memory.
  • At runtime, the decision tree is used to classify new unseen test cases by working down the tree nodes using the values of a given test case to arrive at a terminal node that tells you what class this test case belongs to.



  • Entropy H(S) – measures the amount of uncertainty in the (data) set S
  • Information gain IG(A) – measures how much uncertainty in S was reduced, after splitting the (data) set S on a attribute
  • More details on both Entropy and Information Gain you’ll find here.


High-level inner workings:

  • Calculate the entropy of every attribute using the data set S
  • Split the set S into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum)
  • Make a decision tree node containing that attribute
  • Recurse on subsets using remaining attributes


Detailed algorithm steps:

  1. We begin with the original data set S as the root node
  2. In each iteration the algorithm iterates through every unused attribute of the data set S and calculates the entropy H(S) (or information gain IG(A)) of that attribute
  3. Next it selects the attribute which has the smallest entropy (or largest information gain) value
  4. The data set S is then split by the selected attribute (e.g. age < 50, 50 <= age < 100, age >= 100) to produce subsets of the data
  5. The algorithm continues to recurse on each subset, considering only attributes never selected before
  6. Recursion on a subset may stop in one of these cases:
    • every element in the subset belongs to the same class (+ or -), then the node is turned into a leaf and labelled with the class of the examples
    • there are no more attributes to be selected, but the examples still do not belong to the same class (some are + and some are -), then the node is turned into a leaf and labelled with the most common class of the examples in the subset
    • there are no examples in the subset, this happens when no example in the parent set was found to be matching a specific value of the selected attribute, for example if there was no example with age >= 100. Then a leaf is created, and labelled with the most common class of the examples in the parent set
  7. Throughout the algorithm, the decision tree is constructed with each non-terminal node representing the selected attribute on which the data was split, and terminal nodes representing the class label of the final subset of this branch


Python implementation:

  1. Create a new python file called
  2. Import logarithmic capabilities from math lib as well as the operator library
        from math import log
        import operator
  3. Add a function to calculate the entropy of a data set
    def entropy(data):
        entries = len(data)
        labels = {}
        for feat in data:
            label = feat[-1]
            if label not in labels.keys():
            labels[label] = 0
            labels[label] += 1
        entropy = 0.0
        for key in labels:
            probability = float(labels[key])/entries
            entropy -= probability * log(probability,2)
        return entropy
  4. Add a function to split the data set on a given feature
    def split(data, axis, val):
        newData = []
        for feat in data:
            if feat[axis] == val:
                reducedFeat = feat[:axis]
        return newData
  5. Add a function to choose the best feature to split on
    def choose(data):
        features = len(data[0]) - 1
        baseEntropy = entropy(data)
        bestInfoGain = 0.0;
        bestFeat = -1
        for i in range(features):
            featList = [ex[i] for ex in data]
            uniqueVals = set(featList)
            newEntropy = 0.0
            for value in uniqueVals:
                newData = split(data, i, value)
                probability = len(newData)/float(len(data))
                newEntropy += probability * entropy(newData)
            infoGain = baseEntropy - newEntropy
            if (infoGain > bestInfoGain):
                bestInfoGain = infoGain
                bestFeat = i
        return bestFeat
  6. According to step 6 of the “Detailed algorithm steps” section above, there are certain cases in which the recursion may stop. If we don’t meet any of the stopping conditions, then the small function below will allow us to choose the best feature depending on the “majority”:
    def majority(classList):
        for vote in classList:
            if vote not in classCount.keys(): classCount[vote] = 0
            classCount[vote] += 1
        sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
        return sortedClassCount[0][0]
  7. Finally add the main function to generate the decision tree
    def tree(data,labels):
        classList = [ex[-1] for ex in data]
        if classList.count(classList[0]) == len(classList):
            return classList[0]
        if len(data[0]) == 1:
            return majority(classList)
        bestFeat = choose(data)
        bestFeatLabel = labels[bestFeat]
        theTree = {bestFeatLabel:{}}
        featValues = [ex[bestFeat] for ex in data]
        uniqueVals = set(featValues)
        for value in uniqueVals:
            subLabels = labels[:]
            theTree[bestFeatLabel][value] = tree(split\(data, bestFeat, value),subLabels)
        return theTree








Measuring Entropy (data disorder) and Information Gain

This is a very short post about two of the most basic metrics in the Information Theory



  • is a measure of the amount of uncertainty in the (data) set S (i.e. entropy characterizes the (data) set S).
  • in other words, it is the average amount of information contained in each message received (message here stands for an event, sample or character drawn from a distribution or data stream)
  • it characterizes the uncertainty about our source of information (Entropy is best understood as a measure of uncertainty rather than certainty, as entropy is larger for more random sources)
  • a data source is also characterized by the probability distribution of the samples drawn from it (the less likely an event is, the more information it provides when it occurs)
  • it makes sense to define information as the negative of the logarithm of the probability distribution (the probability distribution of the events, coupled with the information amount of every event, forms a random variable whose average (expected) value is the average amount of information (entropy) generated by this distribution).
  • because entropy is average information, it is also measured in shannons, nats, or hartleys, depending on the base of the logarithm used to define it


Math interpretation:







Python implementation:

# Calculates the entropy of the given data set for the target attribute.
def entropy(data, target_attr):

    val_freq = {}
    data_entropy = 0.0

    # Calculate the frequency of each of the values in the target attr
    for record in data:
        if (val_freq.has_key(record[target_attr])):
            val_freq[record[target_attr]] += 1.0
            val_freq[record[target_attr]]  = 1.0

    # Calculate the entropy of the data for the target attribute
    for freq in val_freq.values():
        data_entropy += (-freq/len(data)) * math.log(freq/len(data), 2) 

    return data_entropy



Information Gain:

  • is the measure of the difference in entropy from before to after the data set S is split on an attribute A
  • in other words, how much uncertainty in S was reduced after splitting data set S on attribute A
  • it is a synonym for Kullback–Leibler divergence (in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the expectation value of the Kullback–Leibler divergence of a conditional probability distribution. The expected value of the information gain is the mutual information I(X; A) of X and A – i.e. the reduction in the entropy of X achieved by learning the state of the random variable A. In machine learning, this concept is used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree. Usually an attribute with high mutual information should be preferred to other attributes).


Math interpretation:





Python implementation:

# Calculates the information gain (reduction in entropy) that would result by splitting the data on the chosen attribute (attr).
def gain(data, attr, target_attr):

    val_freq = {}
    subset_entropy = 0.0

    # Calculate the frequency of each of the values in the target attribute
    for record in data:
        if (val_freq.has_key(record[attr])):
            val_freq[record[attr]] += 1.0
            val_freq[record[attr]]  = 1.0

    # Calculate the sum of the entropy for each subset of records weighted by their probability of occuring in the training set.
    for val in val_freq.keys():
        val_prob = val_freq[val] / sum(val_freq.values())
        data_subset = [record for record in data if record[attr] == val]
        subset_entropy += val_prob * entropy(data_subset, target_attr)

    # Subtract the entropy of the chosen attribute from the entropy of the whole data set with respect to the target attribute (and return it)
    return (entropy(data, target_attr) - subset_entropy)





Amazon AWS – Installing Redis on EBS

In this step-by-step guide i’ll show you how to install Redis on AWS (Amazon Linux AMI).

I’ll assume you’re performing steps below as a su (sudo -s).

  1. First thing you need is to have following tools installed:
    > gcc
    > gcc-c++
    > make

    yum -y install gcc gcc-c++ make


  2. Download Redis:
    cd /usr/local/src
    tar xzf redis-2.8.12.tar.gz
    rm -f redis-2.8.12.tar.gz


  3. Build it:
    cd redis-2.8.12
    make distclean


  4. Create following directories and copy binaries:
    mkdir /etc/redis /var/redis
    cp src/redis-server src/redis-cli /usr/local/bin


  5. Copy Redis template configuration file into /etc/redis/ (using Redis port number instance as its name (according to best practices mentioned on Redis site)):
    cp redis.conf /etc/redis/6379.conf


  6. Create directory inside /var/redis that will act as working/data directory for this Redis instance:
    mkdir /var/redis/6379


  7. Edit Redis config file to make necessary changes:
    nano /etc/redis/6379.conf


  8. Make following changes to 6379.conf
    > Set daemonize to yes (by default it is set to no).
    > Set pidfile to /var/run/
    > Set preferred loglevel
    > Set logfile to /var/log/redis_6379.log
    > Set dir to /var/redis/6379


  9. Don’t copy the standard Redis init script from utils directory into /etc/init.d (as it’s not Amazon Linux AMI/chkconfig compliant), instead download the following:


  10. Move and chmod downloaded redis init script:
    mv redis-server /etc/init.d
    chmod 755 /etc/init.d/redis-server


  11. Edit redis-server init script and change redis conf file name as following:
    > REDIS_CONF_FILE=”/etc/redis/6379.conf”

    nano /etc/init.d/redis-server


  12. Auto-enable Redis instance:
    chkconfig --add redis-server
    chkconfig --level 345 redis-server on


  13. Start Redis:
    service redis-server start


  14. (optional) Add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf (otherwise background save may fail under low memory condition – according to info on Redis site):
    > vm.overcommit_memory = 1

    nano /etc/sysctl.conf


  15. Activate the new sysctl change:
    sysctl vm.overcommit_memory=1


  16. Try pinging your instance with redis-cli:
    /usr/local/bin/redis-cli ping


  17. Do few tests with redis-cli and check that the dump file is correctly stored into /var/redis/6379/ (you should find a file called dump.rdb):
    >set testkey testval
    >get testkey
    >del testkey


  18. Check that your Redis instance is correctly logging in the log file:
    cat /var/log/redis_6379.log



And that would be basically it. Cheers.


k-Nearest Neighbors (kNN) algorithm – Machine Learning

k-Nearest Neighbors (kNN) is an easy to grasp algorithm (and quite effective one), which:

  • finds a group of k objects in the training set that are closest to the test object, and
  • bases the assignment of a label on the predominance of a particular class in this neighborhood.


There are three key elements of this approach:

  • a set of labeled objects, e.g., a set of stored records (data),
  • a distance or similarity metric to compute distance between objects,
  • and the value of k, the number of nearest neighbors.


To classify an unlabeled object/item:

  • the distance of this object to the labeled objects is computed,
  • its k-nearest neighbors are identified,
  • and the class labels of these nearest neighbors are then used to determine the class label of the object.


Figure below provides a high-level summary of the nearest-neighbor classification algorithm.






Distances are calculated using the Euclidian distance, where the distance between two vectors, xA and xB, with two elements, is given by:



k-Nearest Neighbors Pros vs. Cons:

  • Pros – High accuracy, insensitive to outliers, no assumptions about data
  • Cons – Computationally expensive, requires a lot of memory
  • Works with – Numeric values, nominal values



Prior to starting coding, here’s what we assume we have: (details in Peter Harrington’s exceptional “Machine Learning in Action”)

  • We have the training data (an existing set of example data)
  • We have labels for all of this data
  • We know what class each piece of the data should fall into
  • When we’re given a new piece of data without a label, we compare that new piece of data to the existing data (every piece of existing data)
  • We then take the most similar pieces of data (the nearest neighbors) and look at their labels
  • We look at the top k most similar pieces of data from our known dataset; this is where the k comes from
  • Lastly, we take a majority vote from the k most similar pieces of data, and the majority is the new class we assign to the data we were asked to classify



I’ll be using Python (v3.3.5) programming language for code examples:

  1. Create a text file named “test_data.txt” that will contain our test data set (x,y coordinates/pairs of points in a 2D space together with labels (3rd column))
        0.2    1.3    top-left
        0.1    1.1    top-left
        0.9    0.1    bottom-right
        1.0    0.2    bottom-right
  2. Create a new python file called
  3. Import NumPy (package for scientific computing) and the Operator module (sorting tuples). NumPy requires previous installation from here.
        from numpy import *
        import operator
  4. Add a function to prepare the data set (load it from the file and transform into data(matrix) and labels(vector) components)
        def prepareData(fname):
            file = open(fname)
            lines = len(file.readlines())
            data = zeros((lines,2))
            labels = []
            file = open(fname)
            i = 0
            for line in file.readlines():
                line = line.strip()
                list = line.split('\t')
                data[i,:] = list[0:2]
                i += 1
            return data,labels
  5. Add a function to classify the data
        def classify(x, data, labels, k):
            size = data.shape[0]
            diff = tile(x, (size,1)) - data
            sqDiff = diff**2
            sqDist = sqDiff.sum(axis=1)
            dist = sqDist**0.5
            sortedDist = dist.argsort()
            for i in range(k):
                label = labels[sortedDist[i]]
                count[label] = count.get(label,0) + 1
            sCount = sorted(count.items(), key=operator.itemgetter(1), reverse=True)
            return sCount[0][0]
  6. Save the file
  7. Execute (from Python CLI or IDLE):
        >>> import knn_example
        >>> group,labels = knn_example.prepareData('test_data.txt')
        >>> knn_example.classify([1,0], group, labels, 2)


What you should see after executing the above, is following:




Voila, you just created your first classifier (which successfully classified a point with x,y coordinates of 1,0 by assigning it the ‘bottom-right’ label)





Machine Learning

Some time ago while reading the journal of Knowledge and Information Systems (KAIS; vol. Dec24, 2007) i came across a paper titled “Top 10 Algorithms in Data Mining”.

This paper was presented at the IEEE International Conference on Data Mining (ICDM; 2006 Hong Kong), and a companion book was published in 2009; edited by the authors of the mentioned paper (Xindong Wu, Vipin Kumar et al).

It was “the paper” that attracted me to the field of Machine Learning, and with this post i’m starting a series of articles related to this exciting area 🙂

Below you’ll find my rough notes (which you still may find useful) as well as my descriptions of some of the Top 10 Algorithms presented in the aforementioned paper.




  • Machine Learning – “making sense of the data”
  • Supervised learning – we specify a target variable and the machine learns from our data (by identifying patterns) to eventually get the target variable. (“You know what you are looking for”)
    • Sample tasks:
      • Classification – predicting what class an instance of data should fall into
      • Regression – prediction of a numeric value
  • Unsupervised learning – in case of which you don’t know what you’re looking for, and ask the machine to tell you this instead. (“what do these data have in common?”)


Examples of supervised learning algorithms:

  • k-Nearest Neighbors (kNN) – uses a distance metric to classify items
  • Decision Trees – map observations about an item to conclusions about the item’s target value
  • Naïve Bayes – uses probability distributions for classification
  • Logistic Regression – finds best parameters to properly classify data
  • Support Vector Machines – construct a hyperplane or set of hyperplanes in a high- or infinite-dimensional space
  • AdaBoost – is made up of a collection of classifiers (a meta-algorithm)


Examples of unsupervised learning algorithms:

  • k-Means clustering
  • Apriori algorithm
  • FP-Growth



  • In classification, the target variable (“class”) can take:
    • nominal values (true, false, car, plane, human, animal, etc.)
    • infinite number of numeric values (in this case we’re talking about regression)
  • Classification Imbalance – a real-world problem where you have more data from one class than other classes


Overview of the classification algorithms based on their design (Haralambos Marmanis)





Steps in developing a machine learning application:

  • Data collection – using publicly available sources, API’s, RSS feeds, sensors, etc.
  • Input data preparation – algorithm-specific data formatting
  • Input data Analysis – it’s important to “understand the data”
  • Algorithm training – extraction of knowledge or information (this step does not apply to unsupervised learning)
  • Algorithm testing – putting to use information learned in the previous step (evaluating the algorithm)
  • Algorithm usage – solving a problem



Take care.