Our first internal code challenge was held during the end of the summer in e-Travel and ended on September 11th. The reason we came up with the idea of the challenge in the first place was to learn a bit more about some underlying technology that we often encounter in our line of work while having some fun at the process!
The problem of the challenge was the following:
We want to implement a dictionary of words that can work with very little memory. The dictionary will do only two things:
- Load a word to the dictionary.
- Check if a word is present in the dictionary or not.
- The sole data structure that can be used to save/retrieve data is a bit array with a total size of 2 megabytes.
- The dictionary should be able to process at least 100 thousand words of text within the above constraint.
- When the dictionary checks if a word is present it will return False or True. If it returns False, then the word should absolutely not be in the dictionary. If it returns True, then the word is most likely in the dictionary but we can allow for false-positives. We want to have a false-positive ratio below 0.01%.
With a little research on Google, one can quickly find out that a Bloom filter is what the challenge is all about.
It is fairly trivial to write up some code to implement a bloom filter or even easier to find existing code that does the job. So where’s the fun, one might ask? Well, the fun is in the rules of the challenge (which you can read at their entirety here, along with the full challenge description).
First of all, submitters were required to use an instance of a class implementing a specific interface, which provided the bit storage, instead of being free to write up their own. In addition to focusing submitters to the calculation part of the exercise, this also translated to asking for submissions in C#. We use both C# and Ruby, but it’s only natural to expect that developers do have their favorites. By observation during the last few years, there seems to be a general rule that developers that love one of these two, tend to feel a bit strongly against the other one. Therefore, forcing one of these on the submitters might make things interesting.
Moreover submissions were ranked using a formula. The rank took under consideration both the false-positive ratio and the speed of execution of an implementation. So instead of getting a simple thumbs-up or thumbs-down, submitters were getting back a numeric score.
Finally, scores were posted on a leaderboard which was updated regularly for everyone to see - but the code of submissions themselves was kept secret! This probably was the coolest part of the challenge and it soon became obvious that participants kept trying to up their scores and climb to the top.
This GitHub repository has the best submissions, as well as the application that ranked them. We learned a couple of, perhaps trivial, but unexpected things in the process of this challenge. The first one was that bloom filters can be really fast. The submitters wrote code that could perform anything between 5.000 and 16.000 bloom filter checks every millisecond. The second thing we learned (or rather perhaps re-discovered) was that it’s not easy to benchmark small and/or fast pieces of code. While our ranking application did a good job of always finding the winning order of submissions, it was not as easy to come up with a ranking that would be consistent in every workstation that was executed. Perhaps we need to figure out a scheme that benchmarks the CPU and assigns a score to it, then uses it to normalize the score obtained by running an algorithm - all food for thought for our next code challenge.
Chris displaying the materialistic part of the code challenge prize.
The challenge was successfully navigated by several submitters but the one that ranked best was Chris Amanatidis, who won eternal fame by being mentioned both in this blog post and in the GitHub repo and also a t-shirt for finishing first. It’s worth noting that 4 out of the top 5 submissions were entered by developers that either picked up a bit of C# just to enter the challenge or who used it sparingly in the past.
One last bit of trolling trivia. Once each submitter provide an entry in the challenge, they were asked in front of a camera for their name, role and how they expected their submission to fare. Some of these videos were stitched together in the YouTube video that follows!