Inbox 0

“Yes, you have my full attention right now.”

TL;DR: Archive emails (ham), with discipline.

What Mailbox enlightened me in the beginning wasn’t even its swipe to archive/read-later feature, but the help guide on archiving all my emails to clear the inbox. It was a moment of sudden realization on the very concept of “inbox” - just like the physical inbox, you get some message or item delivered in it, and you take them out right away. I doubt if anybody would read a letter and shove it right back into the inbox.

However, the email inbox isn’t exactly the same as the physical inbox. You see, more so than email itself, it is an abstract and virtual artifact. Some recent improvements in various email systems (arguably led by Gmail), such as huge storage, visual separation of unread and read messages (now even by auto-detected categories), folders, flags, labels, etc., all seem to help users not to worry about archiving emails anymore. And that’s exactly what I believed in before the sudden realization.

What happened after archiving all the emails was fascinating. There was a mental and visual relief when I looked at my inbox from various email clients that I use. There was a sudden anticipation for the next email I’d receive, hence my full attention when I actually received it. I think it was a weekly letter from EB Games, the very first one I actually read since subscription. After reading it, I archived it right away, and bought Halo 4 after work.

It seems to be a lot of work, but try it, it actually saves you so much more time than you can imagine. Archive the email when you’re done, and keep your inbox 0. Mailbox can be handy to set some emails to be read-later, which helps the case when you want to keep inbox 0 so you pay full attention to new messages, but also want to handle some of them another time (for example, a message alerting you on your mobile bill that’s due in 30 days).

For spams, mark them as spam. I think this is currently missing from Mailbox app.

Archive (All mail by Gmail terminology, a bit weird to me still) becomes more obvious to me now on what it is for - just like the physical ones, it’s there for you to occasionally go back and search for information. Derp.

It’s easy to either let your read-later list grow indefinitely to become another mental burden, or give yourself excuses that you would archive this email later that eventually lead to another thousand-message inbox, which then often leads to a portion (sometimes, most) of them being unread.

Back to square one now, are we? Please don’t.

Once you read an email, decide precisely on:

  • Whether you’re done with it (archive?)
  • If not, when exactly you want to deal with it (read-later?)

Then actually follow up on the ones you haven’t done with and get them done so you can archive them. Mailbox, for example, reminds you by pushing the message right back to your inbox, which by now should have your full attention.

It’s viable to not use something like Mailbox (hopefully this clarifies that I don’t work for them), just that you would need a bit more discipline to clear the unfinished messages you left in inbox because there is no read-later list that reminds you automatically at a later time. Of course, optimizing your overall workflow so that you wouldn’t need to handle things asynchronously is the superior productivity boost. Unfortunately, it’s near impossible in this internet and information era.

Latest Toys

I have been toying around with a few ideas lately, the process allowed me to tackle some interesting problems by ways that are not so conventional, at least not the ones from Google page 1’s.

Problem One - Random Access of MongoDB Documents

This problem is definitely not new, indicated by the number of times the same or very similar questions asked on Stack Overflow. In general, there are 3 solutions:

  1. Add a new field to be filled with a comparable random value, usually a float; index this field for production ready performance.
  2. Leverage skip, but it is said to be highly expensive.
  3. Not exactly a solution just yet, but wait for MongoDB core team to implement a native, server side solution. It is noteworthy that the issue on their JIRA was once closed as “Won’t Fix” but eventually re-opened due to popular demands.

I am fairly certain that the server side solution will be as solid as other APIs MongoDB generally exhibits, but it is not here yet. So crossing out 2 and 3, and I do not really like 1. The project is still very young (young enough that I am not talking about its name at the moment), so I can easily drop the collection or the entire db in MongoDB to rebuild the structure without worrying about consequences. However, I do not like the fact that one additional field has to be added just to support one use case, especially when it also requires indexing, which could result in a mess if I ever decide to drop it.

So here is my solution:

Basically:

  • No extra field occupying both disk and RAM (if indexed), delegated to a Redis SET
  • No messy random ID determination logic in client code, delegated to Redis SRANDMEMBER command

Of course, I will likely switch to the official solution when the time comes. But now, I am more than satisfied with what I came up with.

Problem Two - More Accessible Event/Error Logging in Python

Let me elaborate more:

  • I want to be able to get a sense of urgence when error occurs, but not to make myself panic by browsing through 10 gzipped logs containing the same stack trace
  • I want to be able to have a central hub to see all logs, but without worrying too much about system resource consumption, such as disk space and RAM

In addition, I have planned to seperate the Redis Queue implementations from tidehunter for a while already. Therefore, techies was born. You can also check it out on its GitHub repository page.

To solve problem two, I would do:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from techies import CountQueue, QueueHandler, REF_LOG_FORMAT
import logging

q = CountQueue('app_logs')
q.clear()

logger = logging.getLogger(__name__)
handler = QueueHandler(q)

handler.setFormatter(logging.Formatter(REF_LOG_FORMAT))
logger.addHandler(handler)

for i in xrange(5000000):
    try:
        1 / 0
    except ZeroDivisionError as e:
        logger.exception(e)

print(len(q))  # 1
print(q.get())  # ('ERROR_MSG', 5000000)

So imagine that your nicely crafted web app suddenly got hit by hackernews viewers. 5,000,000 views on a particular buggy route resulted in 5,000,000 identical exceptions raised. No sweat, you’ll only get one error log, with a clear indication of how many times it was recorded. You saved resource to store logs, and time to dig logs out. Before they even realized, you have deployed the fix already.

As usual, Redis required.

Meta

I was joking the other day, that by looking at the current Ubuntu version number, one could realize how fast time goes by.

Though I believe these time based concepts are just illusions that human beings created, that I would rather not let them clutter my mind as other unworthy things, they matter in the real world, or meta world that I would like to call it.

It is very common and natural that when organizations seek for new employees, one of the first things they gauge at is the years of experience one has with a particular industry/market/technology/practice/product/etc. While I understand that the concept of years of experience may be a good measure for manual type of work, why would this matter to knowledge work fields? Especially when one can easily acquire more than enough information nowadays through various media. It is not uncommon to hear from people claiming that they would only learn certain things during work, where previous “experience” and even “education” are rendered nearly useless. While that can open up a whole new topic, the focus here is the gap between how people value years of experience, and how much it is really worth. In other words, having experience is definitely valuable to oneself, but to gauge the potential employees (especially knowledge based) by that as a measurement or standard is becoming more trivial to organizations in this information era.

What about time? What about the metrics adopted by many organizations to measure the performance of their existing employees? I believe it should be considered trivial as well. It is similar to the notorious way of measuring a programmer’s productivity by her lines of code, when those code golf lovers spend hours to minimize the lines of code to achieve certain algorithms. Sometimes these code golf pieces result in sub-par performances, but do not think for a second that having shorter code in general has no practical meaning - in fact, it was one of the major concerns to many, when the first personal computers had very limited resources, and the source code size (in whatever form at its time) did make huge impact. However that again would be a good topic on itself, while the real focus here is that knowledge workers, and their employers, should sincerely think about what are the measurements that fit, that actually mean something real, not illusional.

If these illusions really matter to the meta world, then this meta world for sure can be changed, just like the concept of meta game in Dota 2 and League of Legends, Meta Stack Overflow, and many other “meta’s”. They’re meant to be changed by, at the same time adapted by, people. The trick is to maintain a balance that benefits the majority, if not all.

Work Yet No Work

Ultimately, this is something I want to achieve in the time and space that I create values to have a sustainable life - work yet no work.

I want to make my work enjoyable to the point that it feels like no work, to be exact. Other than some of the self-employed and GitHub, I’ve yet to see this being a mainstream despite it being a trending topic and buzz among people, especially the programmer (-alike) community.

TPW gave a great talk and wrote a fairly clear post on how they have done it at GitHub; Zach Holman (one of the first GitHub engineering hires) wrote a three-part series on How GitHub Works, which also addressed some very interesting and inspiring points. These all make sense, yet it’s still not a mainstream despite the huge success of GitHub and their people. These are largely from the people that played the leader roles in their stories, while I intend to make it happen within my existing environment not as a leader.

How am I going to achieve that goal? I know I can’t change people, but I can adapt to the greater environment that I work in and influence some people around me. So here is how I’m going to achieve it, in neither a bootstrapped startup nor a self-employed manner:

  1. Propose for a more flexible schedule for myself, and hopefully others too (so it doesn’t make anyone too special).
  2. Convince coworkers to use communication means more efficiently. Chat rooms, emails, face-to-face meetings, or whatever works best for specific scenarios. In the case of inevitable meetings, encourage participants to be prepared and be focused.
  3. Convince existing management to lower their tones. I’d encourage them to start with saying more “we”s and hopefully eliminate the use of “I”s.
  4. With the excessive use of digital tools to communicate (see point 2, if it’s executed), encourage coworkers to be social. For example, put down headphones and offer fixed or flexible hours to help others. Also promote team/organization wide parties/activities. A great thing I’ve learned two days ago, about pair-programming, is that it’s a good way for programmers to close social gaps and let their creativity and thoughts get effectively shared and refined.

There might be many more along the way, I’ll ensure to log them here (including any changes to the mentioned four).

Update (2013-10-25): I found this on 37signal’s blog, while being another story told by a leader, it’s still inspiring. Enjoy!

League of Legends and Software Engineering

It was about the same time when I got into the amazing worlds of programming and strategy games, when I was about 16 years old.

Ten years later, I’ve figured out some astonishing similarities between them.

Of course, these two naturally go together because they all require a computer to begin with. The similarity though are the shared concepts within.

It goes without saying that both require a huge amount of brain work to strategize in order to perform well. In both worlds, strategies are formed around at least three areas: Evaluation, Conceptualization, and Materialization.

Evaluation

This very term applies to both worlds, and both in broad senses. For example, professional League of Legends players and teams would evaluate themselves and their opponents, backed by statistics and recent replays to identify strengths and weaknesses at scopes of individuals and teams.

In software engineering, the same should be done, except that most of the times we are just facing hard problems to solve, not the other human-beings to defeat.

Conceptualization

Conceptualization is about forming the core visions and directions of strategies. At the highest scope of LoL game play, it applies in forms of game balancing changes, and “metagame” formation. Onto the battlefield, it involves character pick (for yourselves) and bans (against opponents) to form confident team compositions.

In software engineering, the pace of “meta” changing is rather slow, but conceptualization applies equally heavily in terms of design, architecture, modularization and more.

Materialization

In games like League of Legends, materialization is the in-game portion of strategy play. It’s about placing priorities on objectives, carry protection, vision control, itemization, etc.

In software engineering, it’s about development process management, choice of technology stacks, workload distribution, tasks priorities and many more.

Or maybe, these are just my made up excuses to play LoL ;)

Tame (Py)C Url Part 2

Previously, we discussed the possibility to hack PycURL to achieve somewhat a more controllable HTTP streaming client.

Today, let’s add some Redis ingredients into our recipe and make this tool more controllable so that we can solve the remaining problems aforementioned.

The rational behind utilizing Redis is that we want a scalable (both server and client), and easy-to-use in memory key-value store to maximize our control over PycURL. With that in mind, I actually encouraged the company that I work for to open source this project I created during my day time job.

Introducing TideHunter:

  • Accurate quota limit - total control over your stream quota.
  • An instant off switch - when sht hits the fan and you don’t want to crash your process.
  • Redis backed control tools - semi-persisted, fast, and scalable.
  • Core mechanisms based on the solid cURL and PycURL - inherits the built-in goodness (gzip support and more).
  • OAuth support based on python-oauth2 - see this demo in action.

This project is submitted to PyPI, which means that its installation is as easy as:

1
$ pip install tidehunter

The repository is on GitHub with sufficient examples to get you started.

Tame (Py)C Url

Recently I needed to have something that allows me to have more control over my data stream client, so that:

  1. When desired, it would stop and close the connection as gracefully as possible.
  2. A precise counter for how many records have been received is in place.

The initial implementation was quite straightforward:

Though practical, this approach could be improved much further, in particular:

  1. Lacks the capability to flip the switch on and off on demand.
  2. The number of records needs to be saved “manually”, and only at the end of the session (excluding the possibility of nasty interpolation). Something that’s accessible mid-session is much more preferable.

Coming up next, we’ll tame (Py)cURL further A_A

There's No Such Thing as P.Eng

The country that I currently reside in has an interesting system where engineers are not encouraged to formally call themselves Engineers until they become licensed “Professional Engineers” or “P.Eng” through a period of “Engineer in Training” and other regulations.

This did bother me a bit, since I’ve always wanted to become an engineer since the age of 7. I thought I could call myself an “Engineer” after graduation. Shame.

Do we really need a license before we call ourselves “Engineers”? The ethical guidelines are definitely worth following, but they’re also the bottom line of being an engineer, or any profession. If people cannot follow the ethical bottom line adapted to their professions, then they should not be qualified anyways, with or without meaningless designations.

It’s even harder to define “P.Eng” by proficiency. There were a few times when I heard P.Eng titled professors saying things such as “Python does not have network programming capability” by the time Python has reached version 2.7 for a while already. The point is, the association really gives this kind of people a “P.Eng” license without even gauging their proficiency in practical world, simply because the person already holds a Master or Ph.D in Engineering already. Nonsense. I thought having a “P.Eng” designation would show your professional credibility.

Similar things such as “EIT” (Engineer in Training) really does not make sense either. Or it does, but not by their definition as a career rank. Engineers who spend time practice engineering are always in training, they have to constantly train themselves for better quality of their work and services.

Droplet 2

As mentioned earlier, I finally dropped virtual host and rented myself a Droplet. It’s cheap, easy to setup, no downside yet (except that it’s a really small instance compare to what I work with during day times).

I’ve wanted to move our xiangpi.ca to a VPS for a long time now since I don’t like Wordpress without heavy optimizations. With a virtual host there wasn’t much I could do.

So here’s the extended list of what I’ve done:

  1. apt-get install php-fpm mysql php-apc. I finally get to use a bytecode cache for PHP in a production environment now.
  2. Configured another server block and content root for xiangpi.ca to be hosted here through Nginx.
  3. Grabbed the latest Wordpress and placed it under the content root.
  4. scped all the uploads to wp-content/uploads. I didn’t have to do this since I already use Jetpack Photon. But it’s always good to have a failover.
  5. Imported the SQL dump through MySQL shell (which is horrible compare to MongoDB’s and Redis’). I didn’t do the XML export-import way since I co-author xiangpi.ca with my girlfriend together.
  6. Configured xiangpi.ca name server to be CloudFlare’s.
  7. Followed this guide to make Wordpress serve in-memory static contents through Redis.
  8. Installed my favorite Wordpress plugins and picked a new cool theme.
  9. Flipped the “CDN + Full Optimization” switch on CloudFlare for xiangpi.ca.

Without too much hassle (thanks to the Open Source community), I managed to make the great yet bloated blogware almost as fast as my true static blog. Cool. The reason why I chose Redis instead of Varnish (which is more popular for this particular use case), is because I want to maximize CloudFlare’s duty to cache those already static contents (JS, CSS, images), while using Redis only for generated HTML pages.

And really, why not Redis?

UPDATE 2013-05-15: The hacky solution of Redis served cache becomes a bit problematic. As blitz.io stress test revealed its instability, I decided to drop it and find a more sophisticated solution (despite the fact that I still LOVE Redis and use it everyday). I know that W3 Total Cache plugin for Wordpress does a great job interfacing with a variety of caching mechanisms including APC, Memcached and Varnish. But I quickly abandoned this plan because I only intent to use APC the lazy way, mostly for logged in users and wp-admin stuff; and I don’t trust Memcached or Varnish PHP driver’s performance.

So I Google’d around and found this. Brilliant solution! The only thing hacky here is that the needed purge module does not come with standard Nginx installation. The good news is that compiling Nginx with --add-module was a no brainer. As of now, with limited RUSH I could perform on blitz.io, my Wordpress driven site performs equally well as this true static site.

Needed a Drop

GitHub page has served me well. But since I already signed up a $5 VPS at digitalocean.com, I figured it’s time to switch to gain more control and pollute GitHub commit number no more.

Here’s a list of things I did with this little devil:

  1. Signed up their service with a promo code. You’d have to use a credit card to take advantage of that $10 credit, though I really wanted to use my “For fun” budget set aside on Paypal.
  2. Spun up a $5 instance within 60 seconds (as they advertised), with Ubuntu 12.10 x64.
  3. Added my SSH public keys with their control panel.
  4. Logged in with root and the password they sent through email.
  5. Changed root password (passwd).
  6. apt-get update and apt-get upgrade, without upgrading the kernel.
  7. Set up OpenVPN server for Netflix US exclusive shows and China trip.
  8. Set up Nginx with virtual host profiles, which includes this very site.
  9. Switched A Record from GitHub page to this server IP.
  10. Reconfigured Octopress Rakefile to deploy using rsync that through SSH to this VPS.
  11. Set up Dropbox on VPS for cheap and selective backup. I don’t need to backup the whole server, just some key configurations for OpenVPN server, MySQL dumps (for future Wordpress host), etc. In short, I’m too cheap to pay for the backup and snapshot service they provide.

The only part I struggled a bit was the OpenVPN setup, which took me 2 hours last night but the knowledge gain was impressive.

Next, I want to migrate xiangpi.ca to this VPS and say bye to Dreamhost’s virtual host, which not only costs more annually, but also too restrictive for my use cases.

I need some thoughts on how to optimize Wordpress on a VPS using APC, Varnish, Memcached, Redis, etc.