As per my previous post, Blogger is terminating FTP support, and as a result I am terminating this blog in its current form. This will be the last post to it.
I may or may not create a new blog in some form or another eventually. In the meantime, please follow me on Twitter; if you're not a Twitter user, then you can subscribe to the RSS feed of my Twitter posts.
Blogger is terminating support for FTP toward the end of May. I don't like any of their non-FTP options, so I'm looking around for a new solution. I prefer the model where the data lives locally and just static HTML is uploaded to the server via FTP, but I'm not wed to that. Writing my own is still on the table. Suggestions are welcome.
In the meantime this blog may languish a little, but I have been posting to Twitter more lately.
I don't generally post anything about the books I read, but the last three were a hat trick of semi-obscure books that I really enjoyed, so I thought I'd share.
Jennifer Government by Max Barry. This is somewhat absurd (in a good way; think Catch-22) near-future satire, not unlike one of my all-time favorites, Sewer, Gas, and Electric. In an ultracapitalist America, the government is almost nonexistent and big corporations are the seats of power. An ambitious executive decides that his company should take military action against its competitors, which arouses unwanted attention from one of the few remaining government agents.
The Ridiculous Race by Steve Hely and Vali Chandrasekaran. A true, double-first-person narrative by two friends who compete to race each other around the world without using airplanes. The goal is not only to win, but to outdo one another in the experiences had along the way. Absolutely hilarious.
Don't Sleep, There Are Snakes by Daniel Everett. A memoir by a missionary/linguist who has spent much of the last 30 years with a tribe of native Amazonians called the Piraha. They're really enchanting: happy and lighthearted, almost never angry or violent, nonmaterialistic and mostly at peace with each other and with the world. It makes you think about what it really means to be 'civilized.'
I just ran across a pretty good CACM article on API design. This is a topic I feel very strongly about. In numerous interviews, as I was touting my user-interface design skills, my interviewer has said something along the lines of, "But ours is a batch-mode product; it doesn't have a user interface." My usual response to this is that all programming, especially API design, is user-interface design. The exact same principles that make a good user interface apply to a programming API, and indeed to programming in general. A program is (among other things, obviously) a user interface between programmers, including between you and your future self. A poorly-designed API (or more generally, poorly designed code) is a long-term source of friction, and as with most design work, extra time spent at the beginning to create a usable interface can have a tremendous return on investment.
As that article mentions, a really good API almost disappears. Methods exist to do what you want, they are named what you expect them to be, and they do what you need them to without requiring any gymnastics of the sort described in the article. The only two APIs I've ever seen that come close to what I would consider the gold standard are Qt and OpenAccess (disclaimer: I worked on the latter, so I may be biased there).
My friend Lawrence pointed me to a web page (static local copy) that appears to be some kind of Markov-chain-generated linkfarm nonsense, that contains both of our names. This shocked us at first, but there are a number of logical explanations; for example, Lawrence has commented on my blog, so the two names do coexist elsewhere on the web. Still, I'm not completely sure that's what is going on here, and regardless it's a little bit spooky.
This reminded me a little of Lawrence's outstanding short story Coding Machines, so I'll take this opportunity to plug that.
On the left, a $25 Timex Camper. On the right, a $100 Bertucci A-2T. I wanted the Bertucci, but that black-on-black design was out of stock everywhere, so I got the Timex instead. It amazes me how similar these are; I wonder which was first? I also wonder, as I always do with watches, about the relative quality of these two. There's no doubt that the Bertucci is a better watch, but four times better? I've been dubious about expensive watches ever since my $250 Casio bit the dust after snorkeling in 12 feet of water. I'd love to see someone do a real field stress-test of a bunch of watches in a range of prices.
Aside: The closest thing I could find to an official G.I. standard issue watch looks very similar to these.
I have a lot of old projects laying around in varying degrees of completeness, and I am finally going to put them all to bed so as to clear the decks for some new projects.
The first is a simulated annealing library (tarball or at GitHub). I wrote most of this on a plane several years ago. It is complete, except that there is an issue with the stop criterion that makes it often pass better solutions and end up on solutions not quite as good. This could be mitigated by writing the move manager to save the best solution seen, but the right answer is to just fix the stop criterion. Both it and the cooling schedule are very simple; my problem in writing an open simulated annealing library is that I know more sophisticated techniques for virtually every aspect of the annealing machinery, but frankly I don't feel like doing to research to figure out whether they're known prior art or whether we invented them at my past jobs (I've written several annealing engines before this one, and this one is extremely unsophisticated compared to those).
It's in C++. There is a Visual Studio solution and project, but no makefile; however, it's straightforward to compile.
I'm releasing this under what I like to call the "do the right thing" license. Which means, do whatever you want with it. No restrictions whatsoever. But please do the right thing. That means credit me. Let me know if it's useful to you, or if you make improvements to it that you choose to make open. If you use it in something that makes you money, maybe throw an Amazon.com gift certificate or something my way.
Due to the nature of my work, I keep a calendar on my desk at work that is separate from my various personal calendar and note-taking machinery.
Initially I used a standard month-at-a-glance calendar, but I found it a nuisance writing things inside of little boxes. Then I switched to a linear perpetual calendar (similar to this) - basically just a lined, numbered list - but found that in locating today on the page, I often don't know the date exactly, but rely on the day of the week as an indexing cue.
So what I came up with was a list format, but with the day numbers offset according to the day of the week. Here is this month's in PDF form: weekdays only, which is what I use, or with the full week. One of these days I may automate the generation of these, but in the meantime I'll try to post each month's as I generate it for myself.
A couple of summers ago we were in Costa Rica, and took a tour down the Tempisque River. To point out animals and such on the banks, rather than a flashlight or laser pointer, our guide used a small compact mirror to reflect the sun onto the spot where he wanted us to look. Cheap, easily obtained, no batteries, and no worries about getting wet - just the sort of low-tech solution that I love. I've since learned that this is a common trick used by soldiers as well.
BTW, our tour guides, Jose and Amy, were incredible. Highly recommended.
I was playing around with the OpenStreetMap data, and I noticed how interesting and organic the highway systems of major cities look. So, I extracted a few to monochrome images, which can be found here.
I remember when Star Trek: The Next Generation first aired (yikes, 22 years ago), I looked at their consoles - big flat glass panels covered with virtual buttons and sliders - and thought that had to be the dumbest user interface ever. Now look: With the iPhone and its various clones, and the soon-to-be-legion (if you believe the breathless press) tablet PCs, this would seem to be the user interface of the future. (TechCrunch declares the imminent end of button keyboards.) My verdict is still out on this; while the interface is fraught with exactly the kinds of problems I envisioned when I saw that Star Trek console, I must admit that I've gotten fairly proficient even on the miniscule keyboard on the iPhone screen, and in general the interface works better than I would've expected it would. I have to imagine that it will be even more usable with a larger screen. There's no denying that it's incredibly flexible as an interface medium, that it provides a lot of really useful new affordances (e.g. multitouch), and that - perhaps most important of all - it's really cool and fun to use.
Entrepreneurs (and wannabes) often talk about the value of ideas. I'm firmly in the camp that good business ideas are a dime a dozen, and that execution is the hard part. Here are a few ideas for businesses (or just little projects) that I don't expect to ever get around to executing.
A web site for managing emergency contact information. Every time my kids go on a field trip, I have to fill out a form with all of our emergency contact information, the kids' allergies and medications, etc. I'd like to see a web site where I can enter this information once. Then, I can point the field trip leaders to that site. They would request access, and I would get an email letting me allow that access (or not). There are obviously some minor security concerns here, but it's not as if my phone numbers and my kids' allergies require Fort-Knox-level security.
Similarly, a web site for managing immunization records. Doctors would be able to add to the records when the kids get their shots, and schools would be able to access the records; again, with a similar access mechanism that would allow me to grant one-at-a-time access. Same security concerns, but again, this isn't terribly sensitive data.
Matt Mullenweg (founder of WordPress) used to have a business card that just said "(1) Go to Google, (2) Enter 'Matt', (3) Hit 'I'm Feeling Lucky'." It would be cool for those of us who aren't the first return (or even in the first thousand) to have a widget that, when you enter a term, tells you where a given term would rank in Google's search results. I'm not sure if this could be done by scraping, or if you'd need direct access to Google's indices (i.e. you would have to be Google).
I just love the vibe of little paper-goods stores like Dandelion Patch, but such places always seem very (forgive me) girl-oriented. I'd love to run a store that is sort of a brick-and-mortar version of LifeHacker. It would stock high-end and niche office supplies, planners and other organizational tools, and the like. I'd want sort of a coffeehouse vibe - one of my local Barnes and Nobles has an attached Starbucks, right next to the section with all of the planners, pens, and the like. Like that, except smaller and with a much larger inventory of supplies and only a very small section of books relevant to organization, life management, Getting Things Done, and so forth. I'd love for it to provide an inventory of what are currently mostly "print it yourself" or "have Kinko's make this for you" supplies, such as those from David Seah. They might also offer custom printing services, and classes on GTD and similar organizational skills. Sort of a hybrid of coffeehouse, Barnes and Noble, Dandelion Patch, Office Depot/Staples, Levenger, the Franklin-Covey store, and The Container Store. My best idea for the store's name: Index.
I'd love to have a news reader that learns from what articles you find interesting, and over time shows you more like them. It seems hard to believe that this doesn't already exist. It would also have functionality to allow you to explicitly tag specific subjects for which you'd like to see subsequent articles.
I'd love to see a web-2.0 project management solution. Sort of MS Project, except web-based and dead simple in a 37Signals sort of way.
I'd love to have a single-serving site for "what should I do this weekend?" It would obviously be localized, probably fed by users, and would just dump a few ideas for fun things to do in your area - both specific things that are happening this weekend and interesting destinations in your area that aren't tied to a specific date.
These, like many business ideas, are all things that I wish existed. If anyone is motivated to make them real, please do.
Open letter to web services companies: When you send me emails that need me to follow a link to respond, don't make me jump through a bunch of hoops to do so. For example, take Twitter. About twice a week, I would get an email telling me that someone wants to follow me. All of these are spammers. In order to block them, I have to click a link in the email, log into Twitter, click a button to block them and then another to confirm. Instead, there should be a link in the email to accept and one to block, and neither should require me to log in. This is not a high-security situation; I wouldn't want emails from my bank to work this way, but for Twitter there really aren't any consequences if someone gets into my email and accepts a follow, or if I do so accidentally. In the end, the easiest way for me to solve this problem was to just terminate my Twitter account, which is what I did.
Similarly, Meetup: I get an email asking me to RSVP. They conveniently provide Yes, Maybe, and No buttons in the email. However, in order to follow them, I have to log into my Meetup account. It's a pain. So, when my answer is "No" (as it usually is), I just don't bother. This can only end up hurting the meeting organizers, and thus ultimately hurting Meetup itself.
For an example of doing this right, take Netflix. Periodically I get an email from them asking when a DVD was sent, or when it arrived. There are appropriate links in the email. I click the right one, and I'm done. No login, no confirmation. One click. The companies who are doing this wrong should take a lesson here.
Especially in the software industry, a topic that gets revisited again and again is that of working from home vs. working in an office. There are those who seem to believe that working from home is a thoroughly bad idea, and at the other end of the spectrum there are companies that are entirely locationless, with all of their employees working from their own homes.
I worked for 12 years, for four different companies, from home. All four companies were headquartered in Silicon Valley, and I live in the suburbs of Washington, D.C. I worked from my home, except that I visited headquarters for a week at a time, at intervals ranging (depending on the company and on where we were in the release cycle) from once per month to once per quarter. At the first company, I had worked in the office in California for 18 months before I moved to D.C. At the second, a startup, I moved to California for the first 6 months, then returned to D.C. The remaining companies, I entered cold, working from home from the beginning.
Now, for almost two years, I've switched to the other extreme. I work for a local company, and due to the nature of the work it must be done 100% in the office, and typically only 40 hours per week.
The first thing that people always seem to talk about is productivity. From my own experience, I can report that I was at least as productive at home as I am in the office. However, that claim can be a little misleading; I was more personally productive, but I made fewer contributions to other developers' work, and in general I now do a lot more of that and more generally facilitating the work of others. All in all, I would say that the net sum is about even; however, as my career advances, my job becomes more about collaborating, overseeing, and facilitating others and less about my own productivity, so it becomes more important to have the kind of face-to-face, high-bandwidth contact that you get in an office. Indeed, this is a big part of the reason why I finally stopped working at home in favor of working on-site. Proponents of telecommuting are quick to point out all of the ways that modern technology facilitates communication: Not just phone, but instant messaging, wikis, chat clients, screen sharing and collaboration software, video conferencing, and on and on. These things definitely help - I was certainly able to be more effective in the later days of high-speed broadband than I was at the beginning with nothing but phone and 128KB/sec ISDN - but they're still a pale shadow of being together in person in front of a whiteboard.
In some ways, a good telecommuter turns this handicap into a win. You do more of your design up front, and more carefully and thoroughly, because that reduces the need for high-bandwidth communication throughout the development process. You do this design work during those times when you are together face to face. As a side effect, careful design at the beginning leads to better software. On the other hand, it is impossible to nail down all of the design up front; inevitably you figure things out during development that require changes, sometimes large ones, in the architecture of the system. Here, being geographically distant becomes a real hindrance; what happens is generally that the lead architect comes up with a design, and then the other members of the team critique it, and in my experience the result is often inferior to what would have come of true collaboration through the entire design.
As far as your work schedule, being at home is obviously very flexible. This is both good and bad. It is good because you can work when you are most productive. I am a lark, and I am at my sharpest first thing in the morning. At home, I could go straight to work when I woke up; now, I spend 90 minutes of what would be my most effective time of day showering, getting dressed, and commuting. Similarly, you can put that flexibility to work on a larger scale: If you are having one of those days where you just can't seem to focus, just step away; clean the house, or take a 3-hour lunch, or get a white chocolate mocha. And on those days when you are really in the zone, you can work 12 or 14 hours while still having a few hours with your family. I also found it easier to stay focused on work in general, because my hours were all mine; not only was there none of the kind of non-productive water-cooler BS'ing that you do in the office, but I also spent very little time surfing the web, reading XKCD, and the like, because it was easy to see that the time I spent doing so was mine. You can also move your schedule around to enable more time with your family; I typically got up very early and did an hour or two of work before anyone else woke up, and did another hour or two of work after the kids went to bed, thus freeing those hours during the day for family time.
The downside is that it's hard to maintain any kind of work-home separation. You feel like you are always at work, or at least like when you're not at work, you should be. Exacerbating this is the fact that the bulk of my teams were in a timezone where it was three hours earlier, so I often fielded calls and IMs through the dinner and kids' bedtime hours. Similarly, despite my constantly telling them not to worry about it, my coworkers were loathe to call me when they knew it was my evening, and so whatever issue they had got delayed until mid-day (i.e. their morning) the next day.
Some telecommuters report having trouble staying focused on work, with the constant potential distractions around: Housework to be done, kids to play with, etc. I never had much of a problem with this; I'm a fairly disciplined person, and I love what I do, so except for those occasional times I mentioned where focus is elusive, the call of the laundry was rarely louder than that of the work. Any anyway, a lot of household tasks, like laundry, don't require require a lot of attention. But if you're the sort who has trouble focusing on work at home, then working at home simply might not be for you. It's also notable that my wife was home during my core work hours, and she ran interference with the kids and such.
Another factor to be considered is the social interaction of being around other people. If you are the sort who needs this on a regular basis, then working at home is probably not for you. However, many software people, even in the office, spend most of their day hunkered down in their cube, talking to others only when work requires it. For such people, being at home may not make much of a difference in their social schedule.
Another important factor is that employees who want to work from home are happy doing so. The effect of this is often underestimated; a happy employee is a productive employee. For many of us, giving us this kind of flexibility and trusting us to do the work in whatever way we find most effective has an incredible ROI in terms of employee productivity.
Most of this applies to the line engineer, a leaf node in the org chart. Things get much more difficult when you need to manage others, or to lead efforts with multiple developers. Then, it becomes not just about you; those other developers need to interact with you, both for technical guidance and for the more touchy-feely managerial stuff. It can be done, especially if you're really proactive about communicating with them and giving feedback, but it's hard and arguably inherently less effective than being together in person. Again, this is why I stopped working at home; I reached a point where I felt that the level of leadership I needed to do could not be effectively accomplished remotely. At least, not by me.
Finally, note that for really great developers, you may not have a choice - they either work from home or they don't work for you. Folklore says that an outstanding developer is many times more productive than an average one, so even if such a person isn't quite as effective from home as they would be from an office, hiring them can often still be a net win. Certainly the companies I worked for were happy with my performance working from home.
The bottom line: A leaf-node developer, if they have the right discipline and temperament, can be every bit as productive as a telecommuter as they could in the office (if not moreso). As their responsibilities of leadership and mentorship grow, this will become increasingly difficult, until a point where they may not be able to effectively do their job remotely. Even for such people, though, the need to be in the office isn't constant, and while I haven't ever been in a situation where this is possible, I believe that even a manager could effectively spend a day or two a week working at home.
Update: I found this study interesting, and it squares well with all of my experience.
I ran into a bug the other day that I've run into enough times in my career that I thought it was worth sharing.
One of the perils of C++ is that it does certain things behind your back, and if you don't understand what those things are, they can bite you. One way this manifests is that apparently meaning-preserving changes can have surprising consequences.
In this recent case, I inherited some code that I was cleaning up. One of the changes was to change this:
foo = factory.getSomeClass();
SomeClass foo = factory.getSomeClass();
Simple enough, right? But after making this change, a later access to foo segfaulted.
Can you guess why? Here's a hint: SomeClass uses a reference-counted shared-data implementation, similar to std::string.
Here's what happened: SomeClass had defined an assignment operator, but not a copy constructor. The old code used the former, and the new code used the latter. In the absence of a user-defined copy constructor, the compiler generates one that just does a bitwisememberwise copy. As a result, the reference count is copied and not incremented, and so later it becomes 0 before it should, and the data is deleted. A subsequent access to that data produces the segfault.
The solution is simple: Define a correct copy constructor. My own policy is never to rely on the compiler-generated copy constructor and assignment operator. When I write a new class, if I think I don't need these, I immediately add method prototypes for them that are private and have no implementations. Then, if you ever invoke them (deliberately or not), the compile will fail, and you'll know that you have to implement them for real (or change the calling code).
Since Microsoft is terminating Money, I've been looking around for a replacement, and so far it's just a whole lot of fail. I tried both GnuCash and MoneyDance, and not to put too fine a point on it, but they both suck. Quicken doesn't offer a free trial, as far as I can find. They do offer a free online version, but I just can't get comfortable with giving all of my credit-card logins, much less all of my financial data itself, to an online service. (This is the same thing that has kept me from trying out Mint.) I guess I'll keep looking.
Inquiring minds want to know what happened to abort our move to Sydney. The answer is quite simple: It's really amazingly expensive there. Think Manhattan expensive. You'd think this would've been something I looked into before it got as far as it did, and to some extent it was, but it's actually not an easy thing to find a quantitative answer to. I did push back against the offer I was given by Google, and was told repeatedly that the cost of living there was lower than here in the US, and that relative to it the offer was quite generous. I can only assume that those who told me this were badly mistaken about the cost of living here, or about the state of the family housing market there, or some combination of those and related factors.
The result is that when we got there and began to house-hunt, we discovered that we were totally unable to find anything we would live in, within an hour of work downtown, in a good public high school's catchment, within our budget. Note that said budget was about 30% higher than what we pay for our large 5-bedroom home here in the DC area. Our choices were either a 3-bedroom apartment about the size of an extended-stay hotel (note that I have three kids, so this would've meant the youngest two sharing the third, tiny bedroom), or a slightly larger 4-bedroom house or townhouse that is very old and generally unrenovated. The latter places reminded me of places I lived when I was in college.
After the fact, someone at Google (who wasn't involved in the original conversations) told me this was pretty much the status quo: That families have to live an hour or more from work. I was definitely unwilling to do that, especially in addition to paying so much more for housing and pretty much everything else as well.
I will say, though, that Sydney is a really great city, and the people there are the nicest I've ever encountered anywhere.
So we're staying where we are. I've happily returned to WOTI, though at a different site and in a different job. No hard feelings; though we were excited about the would-be adventure, we love it here, and are quite delighted to be staying.
People may have noticed that the photos of my children are no longer public on Flickr. After the third time someone with no photos of their own on Flickr favorited a picture of my 12-year-old, I decided it was time to draw the curtains. If you're a friend of ours, email me and I'll give you access.
It's a minor thing, but this sort of thing is endemic to the suckfest that is iTunes on Windows. For security reasons, I disabled autorun on my machine. The next time I fired up iTunes, I got this alarming dialog:
Oh, crap. I don't want to turn autorun back on, but this seems to be saying that I won't be able to use CDs with iTunes if I don't. Oh well, I'll click No and deal with this later. The very next thing, I got this dialog:
Just kidding! When I said iTunes wouldn't recognize CDs any more, I meant unless you hit F5. I just wanted to see if I could make you turn autorun back on.
Mission accomplished: I wrote a new blog post every day in the month of February.
I love taking photos, but I hate traveling with a camera. You end up looking at everything as a photo opportunity, and in effect watching your vacation through the viewfinder of the camera instead of fully participating in it.
So it is, I've found, with trying to blog every day. You end up considering every thought and every experience through the filter of, "Can I blog about this?" As a result, you aren't fully participating in those experiences; to some degree, you become a spectator. It's an interesting way to feel, but largely not a positive one; while I'm happy to have done this, I don't expect to continue or repeat the effort. I can only assume that many journalists and writers feel like this all the time, and for that I feel bad for them.
Furthermore, the pressure to post frequently conflicts with the desire to write longer and more deeply considered work. I find that I've generated a small backlog of things that I'd like to write about, but that will require more than a day's investment to produce. Hopefully I'll get to these in the coming months.
This is the story of the biggest mistake I've made in my professional career. (I say I; there were other people involved, but it was as much my fault as anyone's.) It was early in my career, and I was tasked with building a new algorithmic optimization system from the ground up. This meant we needed a database; this is a database of geometric objects and connectivity, not the kind people normally think of when they hear the word database. This is exactly what OpenAccess, which I helped develop some years later and which had it existed back then would've allowed me to avoid this whole fiasco, is.
When I came on board, we needed a prototype done, like, yesterday. So I slapped together the quickest, dirtiest database that could possibly work, with the intention that eventually it would be replaced by something more production-worthy. That may have been a mistake too, but it's not the big one that the title refers to.
Fast-forward about a year. The prototype is done, and is producing fantastic results. We've written a real database, and it is time to move the system onto it. Here's the big mistake: I did that as open-heart surgery, completely ripping the system apart and replacing the old database calls with new ones. The data model had changed substantially, so this was a major effort, not just a syntactic change, and calls to the database were woven through the entire codebase; to make another surgical analogy, imagine trying to replace all of a person's nerves. There was a period of a couple of months in which the system did not work at all. Eventually we got it running, and then there were several weeks of debugging - just plain old bug fixes, many of them fixing bugs that had probably already been fixed in the old version but the fixes were lost with the old database calls.
Meanwhile, we weren't able to just freeze the old system in carbonite; we needed to continue improving it, competing in benchmarks, and the like. So it continued to evolve forward from the code base that had been used to begin the conversion. Had the new version ever worked properly, we would have had to make all of the same improvements to it when we were done that we had made to the original system during those months.
Because here is the worst part: The system running on top of this database was mostly Monte Carlo optimization algorithms. Such machinery is highly dependent, in unpredictable and hard-to-debug ways, on such harmless-seeming transformations as changing the units in which a size is expressed, or changing the order in which a graph vertex's edges are enumerated. There were many such differences between the old and new databases, and the new system never did produce results as good as the old one.
After it was all over, it was clear to me that this way of making this conversion was totally wrong-headed. The right way would have been to first write a bunch of regression tests. Then write a facade over the new database that had the old database's API. Move the system onto it (which is nothing more than recompiling against the new library). Then slowly migrate the code, and the facade API, to look more like the new database's API. Run the regression tests frequently, so that if you make a change that breaks things, you know what change is to blame. Eventually the facade API looks just like the new database's API, and at this point the facade is vestigial and can be removed.
This approach has two key features: There is just one version of the system, and it is always working throughout the process. It probably takes substantially more time than the open-heart surgery approach would if everything went smoothly, but how often does that happen?
So imagine how I felt listening to this week's Stack Overflow podcast, in which Jeff talks about facing the same problem with Stack Overflow. Evidently the schemas of his core tables turn out to be really wrong, and force a lot of baroque and complicated over-joining and such in the site's code. Joel suggested something almost identical to what I decided so long ago I should have done: Create a view that looks like what he wants the new table to look like but has the original tables underneath. Then, migrate the code a piece at a time from the underlying tables to the new view. Again, there remains just one, working system the whole time. To my horror, Jeff disagreed quite vehemently and said he planned to go the open-heart surgery route. He went on a bit about the romance and adventure of that sort of swashbuckling. Surprisingly, Joel acquiesced a little and said that might be the right approach for Jeff. I seriously doubt it, and I was disappointed that Joel didn't let Jeff have it with both barrels; after all, this is just a smaller-scale version of the exact same mistake Joel wrote about in Things You Should Never Do, Part I, for pretty much the same reasons. (By the way, that hadn't been written yet when I had my little misadventure.)
Just as Joel says quite unequivocally that you should never do a full rewrite of an application, I'll say just as unequivocally that you shouldn't perform this kind of massive surgery on a working application unless it is simply impossible to do it incrementally. Indeed, in the decade since then I've formed a habit of never having my software be broken for more than a few hours at a time.
If you took the spirit of the Analogy clock, and then stripped away pieces until you couldn't take away anything else without rendering it useless, you might get something like this:
The numbers are the current hour and the next one, with the current hour's size inversely proportional to how far into the hour it currently is, and the next hour's size directly proportional to how far. Thus, if the top number is huge and the bottom one tiny, it's almost exactly that time; if the bottom one is huge, it's very nearly the next hour, and if they're the same size, it's about half past.
If I were going to spend any more time on this, I would make it animate smoothly as the bottom number moves to the top at the stroke of the hour.
I have always been fascinated by clocks. Their mechanical design, their artistic and industrial design, the way people relate to them and to time itself, all intrigue me.
One theme that I riff on from time to time are approximate clocks. These are clocks that only show about what time it is, to varying degrees of accuracy. After all, sometimes it just doesn't really matter that much. Long ago, I thought of making a clock that only read night, morning, mid-day, afternoon, or evening, or one that shows just the current season. At a somewhat finer resolution, but still pleasing, is Laurence Wilmott's It's About Time clock, which reads the way you would tell someone what time it is; for example, "almost noon." This one shows only the day, and this really interestingly shows just the hour.
A while ago I made this one that displays nothing but the year.
Here is a more accurate, but still rough, one that I threw together:
This works only on browsers that implement the CANVAS tag, and even some of those (e.g. Chrome) don't implement the text API. So if you're not seeing it, it looks like this.
I'm not sure all of the context left and right of the current hour really contributes anything; I considered just showing the minutes bar with the current and next hours' text to its left and right. I have some other ideas in this same vein that I'll be posting soon.
Here's another one I made:
I'm not sure that one really tells you much that you couldn't get by looking out the window, but it looks cool. BTW, I cheated; it doesn't actually compute sunrise and sunset, but just divides the globe evenly into hours. (If you're on a non-CANVAS browser, it looks like this.)
While I'm on the subject, here are some other interesting (though not approximate) clock designs:
Looking around, I found a blog post that lists four efforts along these lines (no pun intended). They are pretty impressive, but it seems that performance is likely to be acceptable only for fairly simple scenes. Nevertheless, these are very much what I was thinking of doing, and their performance is at least as good as I would have expected, so kudos to their authors.
Until then, probably the most sensible way to implement hardware-accelerated 3D graphics in a web browser is to use Flash.
A couple of months ago, as I was preparing to buy a netbook (which I've since bought), I made myself a note to blog about how just about everything I do on a computer could be done in the cloud these days, except for writing software. Since then, a number of pretty full-fledged IDEs have sprung up in the cloud, including Mozilla's Bespin and a few others.
My search for a means of tracking tasks continues. In the cloud, I tried out Remember the Milk, Ta-da List, Todoist, and Toodledo. Of these, Toodledo is my favorite, but ultimately I wasn't happy with it either, and in any event I really wanted a local app rather than a web-based solution, because I want my data kept locally and I want to be able to use it when I'm offline.
In the app world, I tried the Jira bug-tracking system, which is free for personal use, but it is far heavier-weight than I need for this purpose. I tried AxoSoft OnTime and FruitfulTime TaskManager, but didn't really care for either of those. Things looks really nice, but it's Mac-only. My current favorite is Tudumo. It's simple and well-designed, and while it isn't quite perfect, I have few complaints. I like that it is designed with simplicity as a primary goal, it is GTD-friendly, it is reasonably priced (though not free), it has a generous 60-day trial period, and it gives me full control over my data (it will export CSV). If I end up sticking with it, I'll make a later post about how I think it could be improved.
A while ago an idea for a TV ad popped into my head. It was so vivid that I feel like it must be a real ad that I just don't remember having seen, but I can find no evidence that is true.
The ad opens with a wife leaving the house and telling her husband to remember that he is supposed to clean the house while she is gone. Then a montage of him goofing off. Finally he gets up, goes to the garage and gets a leaf blower, and proceeds to clean the house with it, just blowing everything loose into a pile, out the door, and into the trash. Maybe the wife comes home and thanks him for doing such a great job.
Lately I've been studying design. The word is a broad one with a lot of meanings, but I have a very specific one in mind. Building useful, elegant solutions is both an art and a science. The scientific part is engineering, and the artistic part is design. Here are a few of the books and magazines I've been reading on the subject that I highly recommend:
You can't learn to design from a book, any more than you can learn writing or music from a book. But in On Writing, Stephen King said that while no amount of teaching will turn a bad writer into a good one or a good writer into a great one, it can turn a good writer into a better one. I expect the same is true of design. I don't expect to ever be a great designer, but I think I'm a good one, and hopefully on my way to becoming a better one.
Yesterday my youngest daughter asked me, "Where did the first people come from?" I tried to explain
evolution to her, in terms that a 6-year-old can understand. This is difficult to do, but then
I got an unexpected assist: She said, "Oh, like in Spore!" Spore
I've been looking for an electronic way to track my to-do lists, tasks, and projects. None of the online services I've tried does quite what I want, and anyway I prefer to keep the data locally. This morning I had the revelation that what I wanted was almost identical to the functionality of a bug-tracking system. I could use Bugzilla! I've used it as a client for years, but never tried to install it myself. Here is how the installation instructions begin:
Install a Database Engine.
Install a Webserver.
And anyway, it isn't clear whether it works on Windows. Fail.
It seemed that a smart business model for commercial bug-tracking software would be to give away single-user licenses for personal use. And indeed, AxoSoft does this. However, I installed it, and it is incredibly complicated. I didn't look at it long enough to determine whether it just has way more functionality than I need, or is poorly engineered, but certainly its interface is overwhelmingly complex for the simple things I need to do.
I'd love to try FogBugz for this purpose, but even a single-user license is $200. I wouldn't mind paying, but that's far too much for what I need.
So, I keep looking. I might have to write something myself.
In a recent Stack Overflow podcast, Rands and the guys were talking about learning, and how nerds have a tendency to go wide and shallow - to want to learn about a million different things, not necessarily to any great depth on any of them. This is well-troddenterritory for Rands, but this time it really struck a chord for me because I've started trying to resist that urge.
Professionally, I've never been all that much of a dilettante. I am continuously expanding the envelope of my expertise, but I don't bounce around like a magpie, and I'm not easily seduced by the latest toys. I have steered my career in the directions where I want to continue to learn and grow, and am currently happy to be able to explore at work a lot of new areas of analytics, visualization, web technologies, and user experience design that have been sort of in my peripheral vision for a while now. (Aside: I always bristle when I hear people say that a developer 'should' have side projects, work on open-source, and the like. I prefer to guide my career in directions that keep me stimulated, so that I get that fix from my work. Good for me, good for my career, and good for my employer; everyone wins, except I guess for whatever open-source project I might otherwise be contributing to.)
But personally, as far as hobbies go, I tend to bounce around a little more. And I've noticed something in the past year or two: The amount of readily available information about topics that I find interesting has surpassed the time I have available to digest it. Thus, I'm trying to be more selective about what topics I pursue, paring down the blog feeds I subscribe to and limiting my attention to a couple of key hobby interests: Design (which dovetails nicely with what I'm doing at work, and which is of course a huge and broad subject anyway), and making things.
We'll see how that goes. There are always a lot of shiny things around.
In the mid-90's, long before that atrocious movie, my then-landlady told me about how she'd made a list of things she wanted to do in her life. I thought that was a great idea, and started making one myself. I've always felt like it was a little personal to share, and I'm still not sure that's not true, but it's pretty well established that sharing your goals is a positive thing, and in the interest of this month's blogging experiment, here it is.
As a sample, and to put even more pressure on myself to do some of these things, here are a few that I expect to do in the next few years:
Bike a century.
Build a clock from scratch (buying gears is okay, a kit is not).
Do the Bay Bridge Walk.
Learn to rollerblade well.
Make liquid-nitrogen ice cream.
Read the entire bible.
Still far from peaking, and I'm already annoyed with internet video. Don't get me wrong, in the right circumstances I love video. I like Hulu and Netflix on demand. But the trend toward presenting everything exclusively in video format is a real problem for me. I cannot see videos at work. I cannot watch them in the car, and I can't watch them in public unless I bring along earbuds. None of which is to say that video shouldn't be used, but it would be nice if a few stills were provided so that I could have any idea what the video contained, and in the case of videos where the picture isn't really required, it would be nice if they would provide an audio-only feed. (Yes, some do, and yes, I can extract them myself if not, so this isn't a big deal.) The Make blog, which otherwise I love, is particularly notorious about this. And if I have issues with video, I can only imagine how frustrating it is for the blind for information to trend in this direction.
(I found this draft sitting in my account from last August.)
My daughters and I got attacked by yellow jackets on Sunday. We were traipsing around in the woods in a local park, and suddenly my youngest, about 15 feet from me, starts screaming like I've never heard her scream before. I look over, and she's standing still, with a bunch of yellow jackets on her and many more swarming around her; presumably she had stepped on a nest (they nest underground). I always worry, and I know I'm not alone here, how I might react in such a situation, but gratifyingly, before my conscious brain even figured out what was going on, I bounded over, grabbed her, and ran. I got us 50 feet away or so, and then swatted the bugs off of her; meanwhile, a bunch of them are stinging me. Once she was clear, I got rid of the ones on me. All in all, we got off pretty easy; she got 6 or 7 stings, and I got about a dozen. My middle child, who was outside of the main fray, got a couple too. Fortunately, none of us is allergic. The worst part, perhaps, is that youngest was already sort of freaky about bees, and now will surely be moreso. The pain went away pretty quickly, but two days later, they still itch.
I've been trying to figure out how long ago I first had a web presence. The earliest evidence I could find of me having a URL was in the signature line of a usenet post from September 01994. This fits with what I thought I remembered, that I first made a web site a few months before the first release of Netscape (which was December 01994). Hard to believe that was only 15 years ago.
A more vivid case for making résumés visual occurred to me: Suppose someone told you that tomorrow you would have 60 seconds in which to convince a hiring manager to hire you, in person, with whatever props you want. Would you go in there and read to him or her from your work history? Of course not. You'd go in there with pictures, screen shots, maybe a laptop with a demo on it. Your résumé has the same goal, and should be presented similarly.
On the other hand, I found a thread on this topic over at Edward Tufte's site, and many - including Tufte himself - don't like this idea, and think that it's dangerous to break from the résumé status quo. I disagree, naturally. Once I have mine done, I'll run it by my HR contacts and see what they think.
A few more random notes on this work... First, a graphical representation of my education:
It's certainly more compact than the corresponding text would be, though it's not as, well, designy as I'd like. Also, I wonder at this point in my career if anyone - or at least, anyone who didn't go to Virginia Tech or the University of Virginia - cares at all about any aspect of my education other than the fact that I have a Ph.D. in Computer Science. All the more reason to present that information efficiently.
I like LinkedIn. It is my main tool for maintaining contact information for my current and (especially) former business associates. And they've done a few things really well, like the way they made profile completion into a game. However, I feel like there's still a lot of potential left on the table. LinkedIn is not a community in the way that, say, Facebook is. If they offered compelling ways for me to interact regularly with my network, that would be enormously valuable for both me and them. It would turn LinkedIn from a tool into a community. They're trying, but they're not there yet, and I don't have any ideas yet about what would be the killer app that could make this happen.
If you do a join on the lists of male and female first names from the 02000 census, you find that over 300 names appear in both lists. These include the ones you'd expect, such as Chris, Francis, Jamie, Kelly, Pat, and Terry, but also quite a few that were more interesting, such as Anthony, Charles, Dominique, Edward, Norman, Richard, and Thomas. It seems likely that some of these appear due to clerical error, but those surprising ones I listed appear with what would seem to be a statistically significant frequency. Women with traditionally male names seem much more common than the reverse.
Like most people who cook, I suspect, we have a few staple recipes for which we try to keep the ingredients on hand; these are what we make if it gets to be 18:30 and we still haven't thought about what to make for dinner.
Whatever Pot Pie. This is a leftover consumer. Take whatever leftover meat you have: Chicken or turkey is best, but we've used pork or ham as well. Lunchmeat that is nearing the end of its lifespan works. Add whatever leftover or frozen or canned vegetables you want; we usually use frozen peas and carrots, and leftover broccoli if we have any. Add cubed potatoes; either microwave a potato for a few minutes and then dice it, or use frozen cube-style hash browns. Add a can of cream of whatever soup (we prefer cream of chicken, but have also used cream of mushroom, celery, or potato), and a little extra liquid: we use milk and/or white wine. Season as you like; we usually use dill. Put it in a frozen pie crust and cook for 30 or 40 minutes at 375 or so.
Curry Sausage Couscous. Cook sausage (we use turkey sausage), onions, garlic, pistachios, and curry until the sausage is fully cooked. None of the amounts are critical; the recipe we started with said 1T of curry, but I use 2 or 3. Add a little cream or milk right at the end if you wish. Serve over couscous.
I've had this thought for some time, but the past year working on analytics and visualization has really driven it home: Pages full of text are a terrible way to accomplish a résumé's goals. An effective business presentation or a sales brochure is very visual, full of charts and pictures. There is a good reason for this: Such communications need to convey a lot of information, not necessarily at a very high level of detail, to people who have limited attention to give, and who are intrinsically difficult to engage. A résumé's audience and purpose are similar.
I've been playing around with some concepts in this direction. For example, the following chart shows my experience with various programming languages chronologically:
A couple of online services are small steps in this direction. VisualCV adds some charts and such to an otherwise fairly standard résumé format, and codersCV adds a timeline, but in both cases the résumé is still dominated by text, and they are also aimed at online presentation; I don't know if or how well they support print. What I'm working on will look more like something that could have been done by Nick Felton.
The other day my 7-year-old saw me working on my blog, and informed me, "Daddy, 'blog' is not a word. 'Weblog' is a word, but 'blog' isn't."
She has some authority on the subject. A while back my beloved keyboard died, and I clipped off the cord and gave it to her to play with. Now, whenever we have guests over, she brings down that keyboard and "makes them a web site," interviewing them and banging away on the keyboard.
Recently A List Apart and Rands both covered, from somewhat different angles, the notion that much of great design is about details. As far back as I can remember, seeing a message like this has made me want to scream:
You have 1 new messages.
It would have taken the programmer seconds - literally, seconds - to add the conditional that would omit the plural when the number is 1. When I write such code, I also special-case 0, writing no instead of 0.
Coincidentally, just days before I read that Rands article, I'd added code to this blog to write Today and Yesterday in place of those dates, and to omit the year when the date falls in the current year. That has long been one of my favorite little design details. A couple of my other favorites are how in iTunes, it displays an image of the model and color of iPod that is presently docked (too bad the rest of iTunes is such a mess). And how my TV, in the final seconds before the sleep timer shuts it off, says good night.
Such details sometimes cost extra, especially with physical products. But that cost is often nominal, especially in the software world, as in the 1 message example above. Look for those kinds of opportunities and take advantage of them; they can help make the difference between a product and a thing of beauty.
I have the outlines of two novels pretty well fleshed out in my head. All that's stopping me from writing them is a lack of time; well, that and the fact that I hate the way my creative writing sounds. A friend of mine tells me that all writers hate their own writing, but the fact is, I need a lot more practice at writing fiction.
Anyway, I have this written in a notebook from maybe five years ago: A plot point in one of them hinges on a fictional piece of legislation called the Digital Surveillance Privacy Act (DSPA), which (among other things) would mandate that all cameras must make an audible beep or click when they take a picture. I had intended this to be a sarcastic commentary on ludicrous security-theater legislation. As is typical of satire, I exaggerated it beyond anything I thought likely to really happen. Well, guess what is on the House floor? The Camera Phone Predator Alert Act, which specifies exactly what I imagined. As Bruce Schneier said, this is so silly it defies comment. They even gave the law a goofier name than I did.
Parsing postal addresses is surprisingly difficult. The USPS document that describes how to properly format US addresses is over 200 pages long. How many types of road (e.g. Street, Court, Avenue, etc.) do you suppose are in that document? I once tried to list all of the ones I could think of, and came up with a couple of dozen. That document contains over 200, including some strange ones such as Loaf and Stravenue. Further complicating matters, many names can serve more than one purpose depending on where in an address they are; for example, according to the 02000 census data, there are at least 68 cities in the US whose names are the names of states (including 25 cities named Washington). And of course, US rules don't apply in other countries, whose rules are all different. Supposing you had a good parser for US addresses, you'd like to be able to determine if an address was in the US. One way might be to just search for the names of other countries in your string. However, again from the census data, there are at least 85 cities in the US whose names are the names of countries elsewhere in the world (including, strangely, 17 cities named Lebanon).
It's one of those problems where you can throw together a 75% solution in a few hours, but from there each subsequent level of improvement gets increasingly difficult. It's no surprise, then, that the address parsing software out there is either really expensive, or doesn't work very well, or both. On the long list of projects I hope to get to someday is to write a good open-source address-parsing package.
When I was a kid - and yes, I feel like a total geezer when I start sentences that way - there were a lot of shows on prime-time TV that were ostensibly adult shows, but that were wholesome enough for even fairly young children to watch. I'm thinking of The A-Team, The Dukes of Hazzard, Scarecrow and Mrs. King, Salvage One, Battlestar Galactica, Buck Rogers in the 25th Century, and so forth. Not exactly the highest-quality television, but shows that the entire family could watch together. Today, we like to let our 12-year-old watch TV with us after the two younger girls to go bed, but there is almost nothing on network TV at even 8:00pm that we will let her watch. We watch a lot of Discovery (particularly Mythbusters) and TLC. Those are great, and surely more educational than those shows I watched when I was a kid, but it would be nice if there was some fiction on TV that was suitable for young people as well as adults. I'm surprised this void exists, and I wonder how other parents fill it. Do they watch the same educational programming we do? Do they not let their kids watch network TV at all? Or are they letting their kids watch the shows that I find unsuitable, and if so, are they being too permissive or am I being too restrictive?
One of the problems with my blog(s) is that historically I have posted sporadically, and in general not often enough. To try to get in the habit of doing so, I'm going to try an experiment: I'm going to try and make a post every day in the month of February.
You may also notice that I've redesigned: I've moved the blog back to be the front page of joeganley.com, and simplified the design quite a bit. It's a little too simple (and a little gross under the hood), but I'm working on my own blogging system which will require a complete reimplementation of the blog design, so I didn't want to spend too much time fiddling with Blogger templates right now.
I've finally gotten around to reading Norman's book The Design of Everyday Things.
This book probably belongs on the list
of books programmers haven't really read, which is a pity; it's a fantastic book. It covers many
of the same complaints I made about alarm
clocks, and it also brought to a head another of my biggest design complaints: iTunes.
I love Apple. By and large, their products are the very model of good design: Simple, intuitive, and
elegant. However, even after eight versions, they've completely fumbled with iTunes; almost every time I
use it, I find something new to dislike about it.
First, simply from a gestalt point of view, it's a mess. Some functionality is in the menus (including
the dreaded Advanced menu), and some
is hidden behind buttons bearing cryptic icons, many of which provide little clue what they do. Within
the iTunes window, these icons are all over the place; when I need to perform a task, there is little
affordance as to where I should look for the widget that makes it happen. The functionality of a music
player is intermingled with that of a music-library manager in a most haphazard way. And the Podcasts
pane is even worse than the main one.
Playlists are simply a broken model for how to organize music. Consider my family: My wife, my oldest
daughter, and I each have an iPod. To try to subdivide our music library into songs that belong on
each iPod, I have to have a playlist for all of us, one for my wife and I, one for my wife and daughter,
one for my daughter and I, and one for each of us alone. When my two youngest girls get iPods, this
is going to exponentiate;
I won't need all 31 possibilities, but the number will at least double. And this doesn't even cover
the case of subdividing by mood or genre. The right model is tags. I could then specify that, for
example, my daughter's iPod should be stocked with songs tagged megan first, and the remaining
space filled with (joe and not explicit) (she likes a lot of the same music I do). I know some of this can be
done with smart playlists, but it wouldn't be easy and simple and effortlessly "just work" like it could.
Also, this would further allow visualization and manipulation of my collection via
tag clouds and similar devices like you see on tag-based sites like Flickr.
Initially building a library is always an unpleasant experience in iTunes. Recently I installed a new
hard drive for music. I reset iTunes to point there, and asked it to consolidate my library, copying
songs from anywhere on the machine to the new drive, and organizing them its own way. This took
forever, and many hours into it, I got a dialog informing me that the drive was full, and that
iTunes would point to the song in its original location. The buttons on this dialog are OK and
Cancel. Presumably OK just acknowledges and dismisses the dialog, but what does Cancel
mean? Don't add this song to the library, or abort the entire consolidation? So many hours in, I don't
want to abort the whole operation, so I hit OK. Now I get the same dialog again, presumably for
the next song (it doesn't tell me what song it is complaining about). I hit OK again, and again,
and again. Finally, frustrated, I hit Cancel, but I continue to get these dialogs a bunch more
times, and then they stop. I can't tell now if I aborted the consolidation, or if it finished.
I see no recourse other than to erase everything and start over, and to hit OK for every one
of those dialogs (there ended up being a few hundred of them). There is no "Yes to all" or "Don't show
this again" option.
Part of the reason the disk is full is duplicates: Songs that appear in multiple places on my machine.
For some reason, iTunes copies and inserts in the library every copy. I can think of no reason to
include duplicates in a library, especially when iTunes is making its own copies of the songs. Mind
you, I'm not talking about songs with the same title or filename; I'm talking about songs that are
bitwise identical on the disk. I don't want these taking up disk space or appearing multiple times in
my library (though you can turn off the latter), but I know of no way to avoid it.
Similarly, the dialog you get if you try to add songs to a playlist that are already in it. You are
asked whether you want to add them multiple times, or to skip the duplicates. They should have done
this the Apple way: Simply don't enable the rare corner case where someone might want the same song in
their playlist multiple times, and skip them always. Or at least give the dialog a "Don't ask this
As an extra added bonus, one of its automatic upgrades deleted all of my ratings data.
It's not just me; if you Google "itunes sucks"
you get almost 18,000 hits, which cover most everything I've complained about here and many other
equally legitimate issues.
I'm going to move my whole collection to a Linux box running some open-source music manager (I haven't
figured out which one yet). I hardly ever buy music from iTunes anyway, because I can get it DRM-free
from Amazon without paying extra for the privilege. I'll miss the free songs from
Starbucks, but not enough to put up with this piece of garbage any longer. Given how hard it is to
move a music collection to a new platform, you can guess how likely I am to ever come back to iTunes,
even if they make it awesome. Congrats, Apple - your sorry usability has lost you a customer from the
iTunes ecosystem for life. It'd be shameful even for a company who doesn't have such a long history
of great design. Maybe I should send them my copy of Design of Everyday Things; apparently the
iTunes team hasn't read it.
NP-completeness isn't just a theoretical notion of interest only to pointy-headed computability nerds. When faced with a problem, it is important to determine whether it is NP-hard (or not) because if it isn't, you don't want to write some heuristic to produce nonoptimal solutions when it is possible to produce optimal solutions efficiently. Conversely, you don't want to waste your time trying to devise optimal solutions to an NP-hard problem, because you will surely fail.
Often, the reduction from your problem to a known NP-complete problem is obvious. When it isn't is when things get interesting. If you have trouble proving a problem NP-complete, it is often - but certainly not always - because the problem in fact isn't NP-complete. After you've done a lot of these proofs, you start to get a feel for how many degrees of freedom in the solution space might still be apprehended in polynomial time, and how many will probably lead to NP-completeness. Often, the magic boundary is between 2 and 3 degrees of freedom: 2-coloring a graph can be done efficiently, while 3-coloring is NP-complete, and likewise bipartite matching is in P while three-dimensional matching is NP-complete. However, some of the NP-completeness proofs that have been devised are incredibly complicated and nonobvious, so don't take your difficulty in proving NP-completeness as strong evidence that the problem is not NP-complete.
A productive approach is often to bounce back and forth between trying to devise an efficient algorithm and trying to prove NP-completeness. Your failures at proving NP-completeness can produce inspirations about how to attack the problem algorithmically, and your failures at devising an efficient algorithm can lead to insights into how to establish NP-completeness. One of my favorite examples from my own work was to solve a system of rectilinear orientation constraints. You had a bunch of objects in two dimensions that could have one of the eight axis-parallel orientations, and a set of constraints between pairs of them: That they must have the same orientation, have orientation mirrored about one or the other axis, have orientation rotated 90 degrees from one another, or any compatible combination of these. My initial attempts to prove it NP-complete focused on looking at it as an 8-coloring problem, but the problem didn't seem quite general enough to encode a generic graph coloring problem as a system of these constraints. Ultimately these failed efforts to prove NP-completeness led to the crucial insight to solving the problem efficiently: That the constraints could be viewed as three independent systems of constraints with respect to the three elements of the orientations: mirroring about x, mirroring about y, and 90-degree rotation. Each of those could be formulated as a graph 2-coloring problem independent of the other two problems, and thus efficiently solved. The independence of these three 2-coloring problems was the source of the difficulty in proving NP-completeness, and I'm certain I made that cognitive leap as a result of my attempts to prove the problem NP-complete.
Once you've determined that your problem is NP-complete, that still doesn't necessarily mean it is hard in practice. NP-completeness means that the problem is hard in the worst case and for general inputs. There are some NP-complete problems for which algorithms have been devised that, while technically heuristics -- they cannot guarantee to find optimal solutions for all problem inputs -- nonetheless in practice almost always do find optimal solutions to problem instances that arise in the real world. This is true, for example, of the satisfiability problem: Determining whether there is an assignment of values to boolean variables that makes a particular logic formula evaluate to true.
Further, it is often the case that the inputs your problem will be presented with in whatever real-world situation you are trying to solve will not, in fact, be fully general. Often, an NP-complete problem is no longer NP-complete if the inputs are known to have some kind of structure more restricted than the general case. A column by David Johnson, for example, addresses what graph problems turn out to be efficiently solvable if the input graphs are restricted to specific classes of graphs, e.g. trees, planar graphs, chordal graphs, bipartite graphs, and many more. If you know that your inputs are restricted in some way, sometimes a problem that is NP-complete in general will turn out to be efficiently solvable for those kinds of inputs.
Subsequent posts will address what to do if your problem does turn out to be hard in practice for the kinds of inputs you really see.
The comment stream over at Coding Horror has gotten quite toxic, with those defending the original post seeming to say that since it was just supposed to be a lay introduction, it doesn't need to be exactly correct. I find that attitude pretty incomprehensible, so I thought I'd try to write a really simple intro to NP-completeness that is actually right.
There is a class of problems known as NP-complete, as well as a similar class called NP-hard that are at least as difficult; for this discussion you can consider those terms interchangeable. These problems have the property that if any of them can be solved efficiently (that is, in polynomial time), then they all can be. Further, despite much effort by many smart people, no efficient algorithm has ever been devised for any of them. This leads to the prevailing opinion that none of the NP-complete/NP-hard problems can be efficiently solved. You can know that a problem you face is NP-hard if a magic oracle that solved your problem instantly could be used to solve another problem that is known to be NP-hard in polynomial time (hundreds of such problems are known). If this is true, you have a number of options (which I will elaborate upon in future posts), but the most likely one is to devise a heuristic that will produce [hopefully] good, but not necessarily optimal, solutions.
(Aside: The title of this post might be my most obscure ever; +1 to anyone who can identify it without consulting the web.)
Recently Jeff Atwood posted about NP-completeness. I and a few other commenters complained that he made a bit of a mess of it. It's easy to be a critic, so I thought I'd give it a go myself. I intend to write a series of posts; this first one will define NP-completeness (correctly), and subsequent posts will talk about various ways of attacking NP-hard problems.
The executive summary is that there is a large class of problems that are NP-complete, all of which probably cannot be solved on general inputs more quickly than in time exponential in the input size. There is no known polynomial-time algorithm for any of them, and smart money says none exists. This is what is meant by the "P=NP" debate; if a polynomial-time algorithm for any NP-complete problem were found, that would imply that P (the class of polynomial-time solvable problems) and NP (the class of NP-complete problems, which we'll define in a moment) are the same. This almost certainly isn't true, though, which in practice means that none of these problems can be solved in time that is not exponential in the size of the input. For a few reasons, this isn't quite as grim as it sounds, as will become clear in subsequent installments.
For a problem to be in the class NP, it must be possible to verify the validity of a solution to the problem in polynomial time. Mostly in NP we see the decision versions of the optimization problems we actually care about: "Does there exist a traveling salesman tour for these points whose length is less than 100?" rather than "Find me the shortest traveling salesman tour for these points." You can easily see that the optimization problem can be used to solve the decision problem, and is thus at least as hard. Often the decision problem is in NP (given a set of points and a tour, it's trivial to determine whether the tour is shorter than the given bound), while the optimization problem is not (there's no polynomial-time way to verify that a tour is the shortest possible, unless P=NP).
To be NP-complete, a problem must be in NP, and it must also be possible to transform any instance of some known NP-complete problem into an instance of our problem, and to transform a solution to our problem back into a solution to that one. This is where we get the all-or-nothing nature of the P=NP conundrum: If you had a polynomial-time algorithm for any NP-complete problem, you could use it to solve every other NP-complete problem via these transformations. Devising such a transformation is at the heart of proving a problem NP-complete.
A problem that can be transformed to an NP-complete problem, but isn't (or isn't known to be) in NP, is referred to as NP-hard rather than NP-complete. The optimization versions of many of the NP-complete decision problems fall into this category.
After you've seen a number of NP-hard problems, you start to get an intuition for whether a new problem is likely to be NP-hard or not. Be careful, though; sometimes problems that 'feel' NP-hard actually aren't, and vice-versa.
A few of the classic NP-complete problems are:
Satisfiability (aka SAT): Given a boolean formula, is there a set of assignments to its variables that makes the formula true?
Vertex cover: Given a graph, is there a set of fewer than n vertices such that at least one endpoint of every edge is in the set?
Bin packing: Is it possible to pack some sized objects into a set of sized bins?
See Wikipedia for a longer list, though many of them aren't defined. I may have to fix that.
Garey and Johnson's Computers and Intractability: A Guide to the Theory of NP-Completeness is the book to read if you want more; it's one of the better-written CS books I've ever encountered. (I believe somewhere my wife has a photo of me sitting on the beach reading this book. I'm such a nerd.) David Johnson also wrote a column on NP-completeness for Journal of Algorithms for a number of years; it is also excellent reading, and is available online!
A New York Timesarticle mentions offhand that a Presidential election could be won with just 22% of the popular vote, and Kottke wonders if this could possibly be true. It is. For example: Sort the states by decreasing electoral votes per voter, then greedily remove states from the bottom of the list until you are left with 270 electoral votes. If a candidate won each of those states (the ones shown in red) by just one vote, he/she would win the election with just 21.56% of the popular vote (note that this requires more than half the votes in Maine and Nebraska, which aren't winner-take-all). This might be the source of the NYT mention, but the scenario there requires about 700,000 more votes than mine, and he also missed the Maine/Nebraska subtlety. I used the same raw data as in that article.
In today's Washington Post was what is perhaps my favorite infographic I've ever seen. It plots, as a timeline, GDP growth, unemployment, and who was president from Hoover forward. There's an enormous amount of information in it, but you can read it like a book; indeed, I spent about an hour today doing so. Unfortunately, WaPo doesn't seem to have posted the graphic online; the only place I could find it requires registration (it looks like they require money, but if you register you can see a few articles for free).
I have long wanted to be able to create in-cell sparklines in OpenOffice calc. There are a few commercial tools to do this, but they are made for Excel, and also they are proprietary. I've put together enough of a solution to prove the concept. The idea is to create a font containing bars, lines, etc., and then an OpenOffice macro generates strings of characters in this font from the spreadsheet data.
To create the font, I used SVG, since it allows fairly simple specification of fonts. I generated the SVG using a Python script. I then used FontForge to convert the SVG font to a TrueType font.
Then I wrote some simple OpenOffice macros to generate characters in this font from the spreadsheet data.
To use it, install the font on your system, and import the macros into OpenOffice. Then, in a cell where you want a sparkline, do =SPARKLINES(cell range) for a line graph or SPARKBARS(cell range) for a bar graph.
This is pretty raw; like I said, it's just a proof of concept. I figured I'd go ahead and put it out there because I know there's a demand, and I'm hopeful that someone with a little more time to spend on it might be able to polish it up.
Update:This looks way better than my crude prototype.
Note that if you're running Internet Explorer, what you're seeing is an image and not the actual script-generated SVG, so if it is no longer 02008, it'll be wrong. This is because IE is a piece of crap, or more specifically, because it doesn't have native SVG support and doesn't allow scripts to interact with plugins.