Welcome to my new website!

Wow! It's been more than a year and a half since I posted on my website. A lot has happened and I wanted to refresh the look and the content of the site while trying a new web platform (ghost).

Because I am a good Internet citizen, I imported all my previous posts and I created permanent redirects so that the mighty google does not SEO-punish me.

These days, I write a lot of Python and... Javascript, so I have many reasons to complain. Expect some posts in the near future :-)

ttfn

Read More →

Executing a specific test with nose

Because I am always looking for the right syntax and I have 20 bad invocations of nose in my bash history, here is how you execute a single, specific test with nose:

nosetests package.module:Class.test_method  
Read More →

Job queue in bash

All right, I've been busy starting a new job while finishing my Ph.D. during the weekends, but on the plus side, I have tons of little tricks to post. Let's start with this one.

I configured our server to perform a build after each push to our mercurial repositories: test, coverage analysis, documentation generation, email to the team, etc. I wanted to make sure that only one build at a time could occur so I tried to use the at command in Linux, only to realize later that although at had the concept of a job queue, it did not ensure that the jobs were executed one at a time. *

Finally, I found a nice trick in bash to do this. It is not perfect (there is still the possibility that two jobs will run concurrently or that the

Read More →

pymining is now available on pypi!

Just a quick post to say that I released the very first version of pymining on pypi. It is now super easy to install this small but hopefully useful library:

pip install pymining

The library includes three frequent item set mining algorithms and one association rule mining algorithm. As shown in the previous post, running this library with pypy results in impressive performance.

Feature request, improvements and bug reports are welcome as usual.

Happy data mining :-)

Read More →

On the speed of pypy

I had heard that pypy was fast. Like really fast.

Well, it's true! In the following post, I'll show you how one data mining algorithm went from 23 seconds (cpython) to 4 seconds (pypy). Without any modification, tweak, or special compiler/interpreter switch. I actually installed pypy 1.5.2 from the archlinux community repository so I did not compile it.

The Story

I recently searched for an implementation of a frequent item set mining algorithm in Python but I could not find a library that was easy to use and that implemented a recent algorithm (apriori is quite old now).

I ended up implementing three frequent item set mining algorithms in python and, although I was pleased with the result, I found that they were slow. The following session with Python 3.2.1 shows how much time it

Read More →

Compiling PIL on Ubuntu Natty

Again, I just lost a precious hour trying to install the Python Imaging Library in a virtual environment on Ubuntu. Even though I had installed the required dependencies, the install script did not detect that freetype and zlib were installed... The culprit: Ubuntu installs the libraries in a very weird directory and you need to set these directories in the PIL setup.py script.

First, install the required dependencies:

    apt-get install python-dev \
    libfreetype6-dev zlib1g-dev libjpeg8-dev

    tar -xvzf Imaging-1.1.7.tar.gz
    cd Imaging-1.1.7.tar.gz
    vim setup.py

Then, in the setup.py file, set these two variables

    ZLIB_ROOT = ("/usr/lib/i386-linux-gnu", "/usr/include")
    FREETYPE_ROOT = ("/usr/lib/i386-linux-gnu",
        "/usr/include/freetype2/freetype")

Then just run python setup.py install when in your virtual environment.

Read More →