Latest Toys

I have been toying around with a few ideas lately, the process allowed me to tackle some interesting problems by ways that are not so conventional, at least not the ones from Google page 1's.

Problem One - Random Access of MongoDB Documents

This problem is definitely not new, indicated by the number of times the same or very similar questions asked on Stack Overflow. In general, there are 3 solutions:

  1. Add a new field to be filled with a comparable random value, usually a float; index this field for production ready performance.
  2. Leverage skip, but it is said to be highly expensive.
  3. Not exactly a solution just yet, but wait for MongoDB core team to implement a native, server side solution. It is noteworthy that the issue on their JIRA was once closed as "Won't Fix" but eventually re-opened due to popular demands.

I am fairly certain that the server side solution will be as solid as other APIs MongoDB generally exhibits, but it is not here yet. So crossing out 2 and 3, and I do not really like 1. The project is still very young (young enough that I am not talking about its name at the moment), so I can easily drop the collection or the entire db in MongoDB to rebuild the structure without worrying about consequences. However, I do not like the fact that one additional field has to be added just to support one use case, especially when it also requires indexing, which could result in a mess if I ever decide to drop it.

So here is my solution:


  • No extra field occupying both disk and RAM (if indexed), delegated to a Redis SET
  • No messy random ID determination logic in client code, delegated to Redis SRANDMEMBER command

Of course, I will likely switch to the official solution when the time comes. But now, I am more than satisfied with what I came up with.

Problem Two - More Accessible Event/Error Logging in Python

Let me elaborate more:

  • I want to be able to get a sense of urgence when error occurs, but not to make myself panic by browsing through 10 gzipped logs containing the same stack trace
  • I want to be able to have a central hub to see all logs, but without worrying too much about system resource consumption, such as disk space and RAM

In addition, I have planned to seperate the Redis Queue implementations from tidehunter for a while already. Therefore, techies was born. You can also check it out on its GitHub repository page.

To solve problem two, I would do:

from techies import CountQueue, QueueHandler, REF_LOG_FORMAT
import logging

q = CountQueue('app_logs')

logger = logging.getLogger(__name__)
handler = QueueHandler(q)


for i in xrange(5000000):
        1 / 0
    except ZeroDivisionError as e:

print(len(q))  # 1
print(q.get())  # ('ERROR_MSG', 5000000)

So imagine that your nicely crafted web app suddenly got hit by hackernews viewers. 5,000,000 views on a particular buggy route resulted in 5,000,000 identical exceptions raised. No sweat, you'll only get one error log, with a clear indication of how many times it was recorded. You saved resource to store logs, and time to dig logs out. Before they even realized, you have deployed the fix already.

As usual, Redis required.


I was joking the other day, that by looking at the current Ubuntu version number, one could realize how fast time goes by.

Though I believe these time based concepts are just illusions that human beings created, that I would rather not let them clutter my mind as other unworthy things, they matter in the real world, or meta world that I would like to call it.

It is very common and natural that when organizations seek for new employees, one of the first things they gauge at is the years of experience one has with a particular industry/market/technology/practice/product/etc. While I understand that the concept of years of experience may be a good measure for manual type of work, why would this matter to knowledge work fields? Especially when one can easily acquire more than enough information nowadays through various media. It is not uncommon to hear from people claiming that they would only learn certain things during work, where previous "experience" and even "education" are rendered nearly useless. While that can open up a whole new topic, the focus here is the gap between how people value years of experience, and how much it is really worth. In other words, having experience is definitely valuable to oneself, but to gauge the potential employees (especially knowledge based) by that as a measurement or standard is becoming more trivial to organizations in this information era.

What about time? What about the metrics adopted by many organizations to measure the performance of their existing employees? I believe it should be considered trivial as well. It is similar to the notorious way of measuring a programmer's productivity by her lines of code, when those code golf lovers spend hours to minimize the lines of code to achieve certain algorithms. Sometimes these code golf pieces result in sub-par performances, but do not think for a second that having shorter code in general has no practical meaning - in fact, it was one of the major concerns to many, when the first personal computers had very limited resources, and the source code size (in whatever form at its time) did make huge impact. However that again would be a good topic on itself, while the real focus here is that knowledge workers, and their employers, should sincerely think about what are the measurements that fit, that actually mean something real, not illusional.

If these illusions really matter to the meta world, then this meta world for sure can be changed, just like the concept of meta game in Dota 2 and League of Legends, Meta Stack Overflow, and many other "meta's". They're meant to be changed by, at the same time adapted by, people. The trick is to maintain a balance that benefits the majority, if not all.

Work Yet No Work

Ultimately, this is something I want to achieve in the time and space that I create values to have a sustainable life - work yet no work.

I want to make my work enjoyable to the point that it feels like no work, to be exact. Other than some of the self-employed and GitHub, I've yet to see this being a mainstream despite it being a trending topic and buzz among people, especially the programmer (-alike) community.

TPW gave a great talk and wrote a fairly clear post on how they have done it at GitHub; Zach Holman (one of the first GitHub engineering hires) wrote a three-part series on How GitHub Works, which also addressed some very interesting and inspiring points. These all make sense, yet it's still not a mainstream despite the huge success of GitHub and their people. These are largely from the people that played the leader roles in their stories, while I intend to make it happen within my existing environment not as a leader.

How am I going to achieve that goal? I know I can't change people, but I can adapt to the greater environment that I work in and influence some people around me. So here is how I'm going to achieve it, in neither a bootstrapped startup nor a self-employed manner:

  1. Propose for a more flexible schedule for myself, and hopefully others too (so it doesn't make anyone too special).
  2. Convince coworkers to use communication means more efficiently. Chat rooms, emails, face-to-face meetings, or whatever works best for specific scenarios. In the case of inevitable meetings, encourage participants to be prepared and be focused.
  3. Convince existing management to lower their tones. I'd encourage them to start with saying more "we"s and hopefully eliminate the use of "I"s.
  4. With the excessive use of digital tools to communicate (see point 2, if it's executed), encourage coworkers to be social. For example, put down headphones and offer fixed or flexible hours to help others. Also promote team/organization wide parties/activities. A great thing I've learned two days ago, about pair-programming, is that it's a good way for programmers to close social gaps and let their creativity and thoughts get effectively shared and refined.

There might be many more along the way, I'll ensure to log them here (including any changes to the mentioned four).

Update (2013-10-25): I found this on 37signal's blog, while being another story told by a leader, it's still inspiring. Enjoy!

League of Legends and Software Engineering

It was about the same time when I got into the amazing worlds of programming and strategy games, when I was about 16 years old.

Ten years later, I've figured out some astonishing similarities between them.

Of course, these two naturally go together because they all require a computer to begin with. The similarity though are the shared concepts within.

It goes without saying that both require a huge amount of brain work to strategize in order to perform well. In both worlds, strategies are formed around at least three areas: Evaluation, Conceptualization, and Materialization.


This very term applies to both worlds, and both in broad senses. For example, professional League of Legends players and teams would evaluate themselves and their opponents, backed by statistics and recent replays to identify strengths and weaknesses at scopes of individuals and teams.

In software engineering, the same should be done, except that most of the times we are just facing hard problems to solve, not the other human-beings to defeat.


Conceptualization is about forming the core visions and directions of strategies. At the highest scope of LoL game play, it applies in forms of game balancing changes, and "metagame" formation. Onto the battlefield, it involves character pick (for yourselves) and bans (against opponents) to form confident team compositions.

In software engineering, the pace of "meta" changing is rather slow, but conceptualization applies equally heavily in terms of design, architecture, modularization and more.


In games like League of Legends, materialization is the in-game portion of strategy play. It's about placing priorities on objectives, carry protection, vision control, itemization, etc.

In software engineering, it's about development process management, choice of technology stacks, workload distribution, tasks priorities and many more.

Or maybe, these are just my made up excuses to play LoL ;)

Tame (Py)C Url Part 2

Previously, we discussed the possibility to hack PycURL to achieve somewhat a more controllable HTTP streaming client.

Today, let's add some Redis ingredients into our recipe and make this tool more controllable so that we can solve the remaining problems aforementioned.

The rational behind utilizing Redis is that we want a scalable (both server and client), and easy-to-use in memory key-value store to maximize our control over PycURL. With that in mind, I actually encouraged the company that I work for to open source this project I created during my day time job.

Introducing TideHunter:

  • Accurate quota limit - total control over your stream quota.
  • An instant off switch - when sht hits the fan and you don't want to crash your process.
  • Redis backed control tools - semi-persisted, fast, and scalable.
  • Core mechanisms based on the solid cURL and PycURL - inherits the built-in goodness (gzip support and more).
  • OAuth support based on python-oauth2 - see this demo in action.

This project is submitted to PyPI, which means that its installation is as easy as:

$ pip install tidehunter

The repository is on GitHub with sufficient examples to get you started.

Tame (Py)C Url

Recently I needed to have something that allows me to have more control over my data stream client, so that:

  1. When desired, it would stop and close the connection as gracefully as possible.
  2. A precise counter for how many records have been received is in place.

The initial implementation was quite straightforward:

Though practical, this approach could be improved much further, in particular:

  1. Lacks the capability to flip the switch on and off on demand.
  2. The number of records needs to be saved "manually", and only at the end of the session (excluding the possibility of nasty interpolation). Something that's accessible mid-session is much more preferable.

Coming up next, we'll tame (Py)cURL further A_A

There's No Such Thing as P.Eng

The country that I currently reside in has an interesting system where engineers are not encouraged to formally call themselves Engineers until they become licensed "Professional Engineers" or "P.Eng" through a period of "Engineer in Training" and other regulations.

This did bother me a bit, since I've always wanted to become an engineer since the age of 7. I thought I could call myself an "Engineer" after graduation. Shame.

Do we really need a license before we call ourselves "Engineers"? The ethical guidelines are definitely worth following, but they're also the bottom line of being an engineer, or any profession. If people cannot follow the ethical bottom line adapted to their professions, then they should not be qualified anyways, with or without meaningless designations.

It's even harder to define "P.Eng" by proficiency. There were a few times when I heard P.Eng titled professors saying things such as "Python does not have network programming capability" by the time Python has reached version 2.7 for a while already. The point is, the association really gives this kind of people a "P.Eng" license without even gauging their proficiency in practical world, simply because the person already holds a Master or Ph.D in Engineering already. Nonsense. I thought having a "P.Eng" designation would show your professional credibility.

Similar things such as "EIT" (Engineer in Training) really does not make sense either. Or it does, but not by their definition as a career rank. Engineers who spend time practice engineering are always in training, they have to constantly train themselves for better quality of their work and services.

Droplet 2

As mentioned earlier, I finally dropped virtual host and rented myself a Droplet. It's cheap, easy to setup, no downside yet (except that it's a really small instance compare to what I work with during day times).

I've wanted to move our to a VPS for a long time now since I don't like Wordpress without heavy optimizations. With a virtual host there wasn't much I could do.

So here's the extended list of what I've done:

  1. apt-get install php-fpm mysql php-apc. I finally get to use a bytecode cache for PHP in a production environment now.
  2. Configured another server block and content root for to be hosted here through Nginx.
  3. Grabbed the latest Wordpress and placed it under the content root.
  4. scped all the uploads to wp-content/uploads. I didn't have to do this since I already use Jetpack Photon. But it's always good to have a failover.
  5. Imported the SQL dump through MySQL shell (which is horrible compare to MongoDB's and Redis'). I didn't do the XML export-import way since I co-author with my girlfriend together.
  6. Configured name server to be CloudFlare's.
  7. Followed this guide to make Wordpress serve in-memory static contents through Redis.
  8. Installed my favorite Wordpress plugins and picked a new cool theme.
  9. Flipped the "CDN + Full Optimization" switch on CloudFlare for

Without too much hassle (thanks to the Open Source community), I managed to make the great yet bloated blogware almost as fast as my true static blog. Cool. The reason why I chose Redis instead of Varnish (which is more popular for this particular use case), is because I want to maximize CloudFlare's duty to cache those already static contents (JS, CSS, images), while using Redis only for generated HTML pages.

And really, why not Redis?

UPDATE 2013-05-15: The hacky solution of Redis served cache becomes a bit problematic. As stress test revealed its instability, I decided to drop it and find a more sophisticated solution (despite the fact that I still LOVE Redis and use it everyday). I know that W3 Total Cache plugin for Wordpress does a great job interfacing with a variety of caching mechanisms including APC, Memcached and Varnish. But I quickly abandoned this plan because I only intent to use APC the lazy way, mostly for logged in users and wp-admin stuff; and I don't trust Memcached or Varnish PHP driver's performance.

So I Google'd around and found this. Brilliant solution! The only thing hacky here is that the needed purge module does not come with standard Nginx installation. The good news is that compiling Nginx with --add-module was a no brainer. As of now, with limited RUSH I could perform on, my Wordpress driven site performs equally well as this true static site.

Needed a Drop

GitHub page has served me well. But since I already signed up a $5 VPS at, I figured it's time to switch to gain more control and pollute GitHub commit number no more.

Here's a list of things I did with this little devil:

  1. Signed up their service with a promo code. You'd have to use a credit card to take advantage of that $10 credit, though I really wanted to use my "For fun" budget set aside on Paypal.
  2. Spun up a $5 instance within 60 seconds (as they advertised), with Ubuntu 12.10 x64.
  3. Added my SSH public keys with their control panel.
  4. Logged in with root and the password they sent through email.
  5. Changed root password (passwd).
  6. apt-get update and apt-get upgrade, without upgrading the kernel.
  7. Set up OpenVPN server for Netflix US exclusive shows and China trip.
  8. Set up Nginx with virtual host profiles, which includes this very site.
  9. Switched A Record from GitHub page to this server IP.
  10. Reconfigured Octopress Rakefile to deploy using rsync that through SSH to this VPS.
  11. Set up Dropbox on VPS for cheap and selective backup. I don't need to backup the whole server, just some key configurations for OpenVPN server, MySQL dumps (for future Wordpress host), etc. In short, I'm too cheap to pay for the backup and snapshot service they provide.

The only part I struggled a bit was the OpenVPN setup, which took me 2 hours last night but the knowledge gain was impressive.

Next, I want to migrate to this VPS and say bye to Dreamhost's virtual host, which not only costs more annually, but also too restrictive for my use cases.

I need some thoughts on how to optimize Wordpress on a VPS using APC, Varnish, Memcached, Redis, etc.

Engineering and Programming

When I was teaching Python as an instructional assistant in Engineering 1, McMaster University, I got into this type of conversation with students a lot:

  • Student: "Why do I have to learn programming?"
  • Me: "Why did you choose engineering in the first place? Why not English or Philosophy?"

And the answers to my question would fall into the following 3 tags:

  1. I don't know, my parents said that I should get into engineering.
  2. I heard engineering graduates get hired.
  3. I like specific_field related stuff.

Category 1

That's really unfortunate, you're going to have a bad time, guaranteed.

Category 2

You heard it wrong. I have plenty of friends graduated from engineering schools, who either end up going back to their parents for help, pursue higher/another education (because they don't know what else to do), or simply stay at home with occasional yet unsuccessful job applications.

I also have another handful of friends who scratched their heads off and kept their GPA high enough to actually secure the lie that they heard (about engineering graduates get hired). Most of them hate what they're doing right now. They either decide to go back to school for higher education, or constantly look for new, meaningless certificates to get them "competitive" in their field (this topic deserves another post).

Category 3

That's good. If you want to be a good engineer, or a manager in certain engineering field, having some programming skills and insights will give you a huge boost in productivity, if applied appropriately.

Learning how to program has reached a state that's very similar to mathematics and physics -- it's not a must, but it's highly beneficial to have a certain level of understanding.

I have a very long list of "Why you should learn programming", but some of the most important ones are:

  • Programming reflects a logical process of problem solving. It's not even funny to see students who took "Logical Thinking" course in university and still make common-nonsense everyday. In programming, you must think logically to make your software solve your problem precisely and accurately.
  • Programming is a cheap, yet efficient prototyping/modeling technique. Wondering why that digital signal processing course never asked you to wire encoders and decoders by hand and test them with some $10,000 lab equipments? Because you could test out your theories and processes by programming simulations in Matlab just fine.
  • Programming gets labor things automated. You don't have to make a sophisticated system that empowers a robot that roams around and fetches newspaper and milk for you (but if you can, go ahead). You can write a little program that collects and extracts data from around the Internet and generates a handy spreadsheet with easy to understand graphs and charts, for your next Microeconomics 101 report, automagically. In most cases, the limitation is not your programming skill, but your creativity, which is another extremely important characteristic of a good engineer.
  • Programming education is cheap. This shouldn't be the main incentive, but it's definitely a good thing if you've already decided to learn it. Just Google around, you don't even need to buy a book.

After all, it's your decision whether to accept the fact that programming is helpful in your (future) career.

TL;DR. Let's have some celebrities to convince you better: