Joe Ganley
ABOUT | BLOG RSS | RESUME | PROJECTS | EMAIL
 
 
After our house-hunting trip was unsuccessful to say the least, we are not moving to Australia. Details later.

Labels: ,

Comments (1)
 
After a great deal of back and forth, false starts, and difficult hurdles, we are moving to Australia. I am going to work for Google in Sydney. Wish me luck!

Labels: , ,

Comments (1)
 
People may have noticed that the photos of my children are no longer public on Flickr. After the third time someone with no photos of their own on Flickr favorited a picture of my 12-year-old, I decided it was time to draw the curtains. If you're a friend of ours, email me and I'll give you access.

Labels: , ,

Comments (2)
 

It's a minor thing, but this sort of thing is endemic to the suckfest that is iTunes on Windows. For security reasons, I disabled autorun on my machine. The next time I fired up iTunes, I got this alarming dialog:

autorun bad

Oh, crap. I don't want to turn autorun back on, but this seems to be saying that I won't be able to use CDs with iTunes if I don't. Oh well, I'll click No and deal with this later. The very next thing, I got this dialog:

just kidding

Just kidding! When I said iTunes wouldn't recognize CDs any more, I meant unless you hit F5. I just wanted to see if I could make you turn autorun back on.

Disgraceful.

Labels: , , ,

Comments (1)
 
I'm blogging this.

Mission accomplished: I wrote a new blog post every day in the month of February.

I love taking photos, but I hate traveling with a camera. You end up looking at everything as a photo opportunity, and in effect watching your vacation through the viewfinder of the camera instead of fully participating in it.

So it is, I've found, with trying to blog every day. You end up considering every thought and every experience through the filter of, "Can I blog about this?" As a result, you aren't fully participating in those experiences; to some degree, you become a spectator. It's an interesting way to feel, but largely not a positive one; while I'm happy to have done this, I don't expect to continue or repeat the effort. I can only assume that many journalists and writers feel like this all the time, and for that I feel bad for them.

Furthermore, the pressure to post frequently conflicts with the desire to write longer and more deeply considered work. I find that I've generated a small backlog of things that I'd like to write about, but that will require more than a day's investment to produce. Hopefully I'll get to these in the coming months.

Labels: ,

Comments (1)
 

This is the story of the biggest mistake I've made in my professional career. (I say I; there were other people involved, but it was as much my fault as anyone's.) It was early in my career, and I was tasked with building a new algorithmic optimization system from the ground up. This meant we needed a database; this is a database of geometric objects and connectivity, not the kind people normally think of when they hear the word database. This is exactly what OpenAccess, which I helped develop some years later and which had it existed back then would've allowed me to avoid this whole fiasco, is.

When I came on board, we needed a prototype done, like, yesterday. So I slapped together the quickest, dirtiest database that could possibly work, with the intention that eventually it would be replaced by something more production-worthy. That may have been a mistake too, but it's not the big one that the title refers to.

Fast-forward about a year. The prototype is done, and is producing fantastic results. We've written a real database, and it is time to move the system onto it. Here's the big mistake: I did that as open-heart surgery, completely ripping the system apart and replacing the old database calls with new ones. The data model had changed substantially, so this was a major effort, not just a syntactic change, and calls to the database were woven through the entire codebase; to make another surgical analogy, imagine trying to replace all of a person's nerves. There was a period of a couple of months in which the system did not work at all. Eventually we got it running, and then there were several weeks of debugging - just plain old bug fixes, many of them fixing bugs that had probably already been fixed in the old version but the fixes were lost with the old database calls.

Meanwhile, we weren't able to just freeze the old system in carbonite; we needed to continue improving it, competing in benchmarks, and the like. So it continued to evolve forward from the code base that had been used to begin the conversion. Had the new version ever worked properly, we would have had to make all of the same improvements to it when we were done that we had made to the original system during those months.

Because here is the worst part: The system running on top of this database was mostly Monte Carlo optimization algorithms. Such machinery is highly dependent, in unpredictable and hard-to-debug ways, on such harmless-seeming transformations as changing the units in which a size is expressed, or changing the order in which a graph vertex's edges are enumerated. There were many such differences between the old and new databases, and the new system never did produce results as good as the old one.

After it was all over, it was clear to me that this way of making this conversion was totally wrong-headed. The right way would have been to first write a bunch of regression tests. Then write a facade over the new database that had the old database's API. Move the system onto it (which is nothing more than recompiling against the new library). Then slowly migrate the code, and the facade API, to look more like the new database's API. Run the regression tests frequently, so that if you make a change that breaks things, you know what change is to blame. Eventually the facade API looks just like the new database's API, and at this point the facade is vestigial and can be removed.

This approach has two key features: There is just one version of the system, and it is always working throughout the process. It probably takes substantially more time than the open-heart surgery approach would if everything went smoothly, but how often does that happen?

So imagine how I felt listening to this week's Stack Overflow podcast, in which Jeff talks about facing the same problem with Stack Overflow. Evidently the schemas of his core tables turn out to be really wrong, and force a lot of baroque and complicated over-joining and such in the site's code. Joel suggested something almost identical to what I decided so long ago I should have done: Create a view that looks like what he wants the new table to look like but has the original tables underneath. Then, migrate the code a piece at a time from the underlying tables to the new view. Again, there remains just one, working system the whole time. To my horror, Jeff disagreed quite vehemently and said he planned to go the open-heart surgery route. He went on a bit about the romance and adventure of that sort of swashbuckling. Surprisingly, Joel acquiesced a little and said that might be the right approach for Jeff. I seriously doubt it, and I was disappointed that Joel didn't let Jeff have it with both barrels; after all, this is just a smaller-scale version of the exact same mistake Joel wrote about in Things You Should Never Do, Part I, for pretty much the same reasons. (By the way, that hadn't been written yet when I had my little misadventure.)

Just as Joel says quite unequivocally that you should never do a full rewrite of an application, I'll say just as unequivocally that you shouldn't perform this kind of massive surgery on a working application unless it is simply impossible to do it incrementally. Indeed, in the decade since then I've formed a habit of never having my software be broken for more than a few hours at a time.

Labels: , ,

Comments (1)
 

If you took the spirit of the Analogy clock, and then stripped away pieces until you couldn't take away anything else without rendering it useless, you might get something like this:

No
JavaScript

The numbers are the current hour and the next one, with the current hour's size inversely proportional to how far into the hour it currently is, and the next hour's size directly proportional to how far. Thus, if the top number is huge and the bottom one tiny, it's almost exactly that time; if the bottom one is huge, it's very nearly the next hour, and if they're the same size, it's about half past.

If I were going to spend any more time on this, I would make it animate smoothly as the bottom number moves to the top at the stroke of the hour.

Labels: , ,

Comments (2)
 

I have always been fascinated by clocks. Their mechanical design, their artistic and industrial design, the way people relate to them and to time itself, all intrigue me.

One theme that I riff on from time to time are approximate clocks. These are clocks that only show about what time it is, to varying degrees of accuracy. After all, sometimes it just doesn't really matter that much. Long ago, I thought of making a clock that only read night, morning, mid-day, afternoon, or evening, or one that shows just the current season. At a somewhat finer resolution, but still pleasing, is Laurence Wilmott's It's About Time clock, which reads the way you would tell someone what time it is; for example, "almost noon." This one shows only the day, and this really interestingly shows just the hour.

A while ago I made this one that displays nothing but the year. Here is a more accurate, but still rough, one that I threw together:

This works only on browsers that implement the CANVAS tag, and even some of those (e.g. Chrome) don't implement the text API. So if you're not seeing it, it looks like this. I'm not sure all of the context left and right of the current hour really contributes anything; I considered just showing the minutes bar with the current and next hours' text to its left and right. I have some other ideas in this same vein that I'll be posting soon.

Here's another one I made:

I'm not sure that one really tells you much that you couldn't get by looking out the window, but it looks cool. BTW, I cheated; it doesn't actually compute sunrise and sunset, but just divides the globe evenly into hours. (If you're on a non-CANVAS browser, it looks like this.)

While I'm on the subject, here are some other interesting (though not approximate) clock designs:

Labels: , , ,

Comments (1)
 

I had a dream last night that I had implemented OpenGL in JavaScript on the HTML5 canvas (yes, this is the sort of thing I dream about). Not hardware acceleration, just a 3D graphics context with an OpenGL API. For a few minutes after I woke up, this seemed like a good idea. Then my head cleared, and I realized that OpenGL is a pretty clumsy API to choose if you're not getting hardware acceleration in return, but that nonetheless a 3D graphics context for canvas, written in pure JavaScript, could be useful.

Looking around, I found a blog post that lists four efforts along these lines (no pun intended). They are pretty impressive, but it seems that performance is likely to be acceptable only for fairly simple scenes. Nevertheless, these are very much what I was thinking of doing, and their performance is at least as good as I would have expected, so kudos to their authors.

Dropping the pure-JavaScript requirement, we come to the Canvas:3D project, which is a Mozilla plugin that provides direct access to hardware acceleration through the HTML5 canvas using OpenGL. (There is a similar project for the Opera browser as well.) Having to install a plugin rubs me the wrong way, but it sounds as if this might be integrated into Firefox in the not too distant future.

Until then, probably the most sensible way to implement hardware-accelerated 3D graphics in a web browser is to use Flash.

Labels: , , , ,

Comments (0)
 
A couple of months ago, as I was preparing to buy a netbook (which I've since bought), I made myself a note to blog about how just about everything I do on a computer could be done in the cloud these days, except for writing software. Since then, a number of pretty full-fledged IDEs have sprung up in the cloud, including Mozilla's Bespin and a few others.

Labels: , ,

Comments (0)
 
Adult in Kid's Body Kid in Adult's Body
Male 17 Again
(2009)
Big
(1988)
Female Freaky Friday
(1976 and 2003)
13 Going on 30
(2004)

Labels:

Comments (0)
 

My search for a means of tracking tasks continues. In the cloud, I tried out Remember the Milk, Ta-da List, Todoist, and Toodledo. Of these, Toodledo is my favorite, but ultimately I wasn't happy with it either, and in any event I really wanted a local app rather than a web-based solution, because I want my data kept locally and I want to be able to use it when I'm offline.

In the app world, I tried the Jira bug-tracking system, which is free for personal use, but it is far heavier-weight than I need for this purpose. I tried AxoSoft OnTime and FruitfulTime TaskManager, but didn't really care for either of those. Things looks really nice, but it's Mac-only. My current favorite is Tudumo. It's simple and well-designed, and while it isn't quite perfect, I have few complaints. I like that it is designed with simplicity as a primary goal, it is GTD-friendly, it is reasonably priced (though not free), it has a generous 60-day trial period, and it gives me full control over my data (it will export CSV). If I end up sticking with it, I'll make a later post about how I think it could be improved.

Labels: , , ,

Comments (1)
 

leaf blowerA while ago an idea for a TV ad popped into my head. It was so vivid that I feel like it must be a real ad that I just don't remember having seen, but I can find no evidence that is true.

The ad opens with a wife leaving the house and telling her husband to remember that he is supposed to clean the house while she is gone. Then a montage of him goofing off. Finally he gets up, goes to the garage and gets a leaf blower, and proceeds to clean the house with it, just blowing everything loose into a pile, out the door, and into the trash. Maybe the wife comes home and thanks him for doing such a great job.

It sounds like an ad for Home Depot.

Labels: , ,

Comments (0)
 
Lately I've been studying design. The word is a broad one with a lot of meanings, but I have a very specific one in mind. Building useful, elegant solutions is both an art and a science. The scientific part is engineering, and the artistic part is design. Here are a few of the books and magazines I've been reading on the subject that I highly recommend: You can't learn to design from a book, any more than you can learn writing or music from a book. But in On Writing, Stephen King said that while no amount of teaching will turn a bad writer into a good one or a good writer into a great one, it can turn a good writer into a better one. I expect the same is true of design. I don't expect to ever be a great designer, but I think I'm a good one, and hopefully on my way to becoming a better one.

Labels: ,

Comments (0)
 
Yesterday my youngest daughter asked me, "Where did the first people come from?" I tried to explain evolution to her, in terms that a 6-year-old can understand. This is difficult to do, but then I got an unexpected assist: She said, "Oh, like in Spore!" Spore FTW!

Labels: , , , ,

Comments (0)
 

I've been looking for an electronic way to track my to-do lists, tasks, and projects. None of the online services I've tried does quite what I want, and anyway I prefer to keep the data locally. This morning I had the revelation that what I wanted was almost identical to the functionality of a bug-tracking system. I could use Bugzilla! I've used it as a client for years, but never tried to install it myself. Here is how the installation instructions begin:

  • Install Perl.
  • Install a Database Engine.
  • Install a Webserver.
And anyway, it isn't clear whether it works on Windows. Fail.

It seemed that a smart business model for commercial bug-tracking software would be to give away single-user licenses for personal use. And indeed, AxoSoft does this. However, I installed it, and it is incredibly complicated. I didn't look at it long enough to determine whether it just has way more functionality than I need, or is poorly engineered, but certainly its interface is overwhelmingly complex for the simple things I need to do.

I'd love to try FogBugz for this purpose, but even a single-user license is $200. I wouldn't mind paying, but that's far too much for what I need.

So, I keep looking. I might have to write something myself.

Labels: , , ,

Comments (3)
 

In a recent Stack Overflow podcast, Rands and the guys were talking about learning, and how nerds have a tendency to go wide and shallow - to want to learn about a million different things, not necessarily to any great depth on any of them. This is well-trodden territory for Rands, but this time it really struck a chord for me because I've started trying to resist that urge.

Professionally, I've never been all that much of a dilettante. I am continuously expanding the envelope of my expertise, but I don't bounce around like a magpie, and I'm not easily seduced by the latest toys. I have steered my career in the directions where I want to continue to learn and grow, and am currently happy to be able to explore at work a lot of new areas of analytics, visualization, web technologies, and user experience design that have been sort of in my peripheral vision for a while now. (Aside: I always bristle when I hear people say that a developer 'should' have side projects, work on open-source, and the like. I prefer to guide my career in directions that keep me stimulated, so that I get that fix from my work. Good for me, good for my career, and good for my employer; everyone wins, except I guess for whatever open-source project I might otherwise be contributing to.)

But personally, as far as hobbies go, I tend to bounce around a little more. And I've noticed something in the past year or two: The amount of readily available information about topics that I find interesting has surpassed the time I have available to digest it. Thus, I'm trying to be more selective about what topics I pursue, paring down the blog feeds I subscribe to and limiting my attention to a couple of key hobby interests: Design (which dovetails nicely with what I'm doing at work, and which is of course a huge and broad subject anyway), and making things.

We'll see how that goes. There are always a lot of shiny things around.

Labels: , , , ,

Comments (0)
 

In the mid-90's, long before that atrocious movie, my then-landlady told me about how she'd made a list of things she wanted to do in her life. I thought that was a great idea, and started making one myself. I've always felt like it was a little personal to share, and I'm still not sure that's not true, but it's pretty well established that sharing your goals is a positive thing, and in the interest of this month's blogging experiment, here it is.

As a sample, and to put even more pressure on myself to do some of these things, here are a few that I expect to do in the next few years:
Bike a century.
Build a clock from scratch (buying gears is okay, a kit is not).
Do the Bay Bridge Walk.
Learn to rollerblade well.
Make liquid-nitrogen ice cream.
Read the entire bible.

Labels: , , ,

Comments (0)
 
Still far from peaking, and I'm already annoyed with internet video. Don't get me wrong, in the right circumstances I love video. I like Hulu and Netflix on demand. But the trend toward presenting everything exclusively in video format is a real problem for me. I cannot see videos at work. I cannot watch them in the car, and I can't watch them in public unless I bring along earbuds. None of which is to say that video shouldn't be used, but it would be nice if a few stills were provided so that I could have any idea what the video contained, and in the case of videos where the picture isn't really required, it would be nice if they would provide an audio-only feed. (Yes, some do, and yes, I can extract them myself if not, so this isn't a big deal.) The Make blog, which otherwise I love, is particularly notorious about this. And if I have issues with video, I can only imagine how frustrating it is for the blind for information to trend in this direction.

Labels: , ,

Comments (0)
 

(I found this draft sitting in my account from last August.)

My daughters and I got attacked by yellow jackets on Sunday. We were traipsing around in the woods in a local park, and suddenly my youngest, about 15 feet from me, starts screaming like I've never heard her scream before. I look over, and she's standing still, with a bunch of yellow jackets on her and many more swarming around her; presumably she had stepped on a nest (they nest underground). I always worry, and I know I'm not alone here, how I might react in such a situation, but gratifyingly, before my conscious brain even figured out what was going on, I bounded over, grabbed her, and ran. I got us 50 feet away or so, and then swatted the bugs off of her; meanwhile, a bunch of them are stinging me. Once she was clear, I got rid of the ones on me. All in all, we got off pretty easy; she got 6 or 7 stings, and I got about a dozen. My middle child, who was outside of the main fray, got a couple too. Fortunately, none of us is allergic. The worst part, perhaps, is that youngest was already sort of freaky about bees, and now will surely be moreso. The pain went away pretty quickly, but two days later, they still itch.

Labels: ,

Comments (1)
 
I've been trying to figure out how long ago I first had a web presence. The earliest evidence I could find of me having a URL was in the signature line of a usenet post from September 01994. This fits with what I thought I remembered, that I first made a web site a few months before the first release of Netscape (which was December 01994). Hard to believe that was only 15 years ago.

Labels: , ,

Comments (0)
 

A more vivid case for making résumés visual occurred to me: Suppose someone told you that tomorrow you would have 60 seconds in which to convince a hiring manager to hire you, in person, with whatever props you want. Would you go in there and read to him or her from your work history? Of course not. You'd go in there with pictures, screen shots, maybe a laptop with a demo on it. Your résumé has the same goal, and should be presented similarly.

On the other hand, I found a thread on this topic over at Edward Tufte's site, and many - including Tufte himself - don't like this idea, and think that it's dangerous to break from the résumé status quo. I disagree, naturally. Once I have mine done, I'll run it by my HR contacts and see what they think.

A few more random notes on this work... First, a graphical representation of my education:

education
It's certainly more compact than the corresponding text would be, though it's not as, well, designy as I'd like. Also, I wonder at this point in my career if anyone - or at least, anyone who didn't go to Virginia Tech or the University of Virginia - cares at all about any aspect of my education other than the fact that I have a Ph.D. in Computer Science. All the more reason to present that information efficiently.

Labels: , ,

Comments (0)
 
I like LinkedIn. It is my main tool for maintaining contact information for my current and (especially) former business associates. And they've done a few things really well, like the way they made profile completion into a game. However, I feel like there's still a lot of potential left on the table. LinkedIn is not a community in the way that, say, Facebook is. If they offered compelling ways for me to interact regularly with my network, that would be enormously valuable for both me and them. It would turn LinkedIn from a tool into a community. They're trying, but they're not there yet, and I don't have any ideas yet about what would be the killer app that could make this happen.

Labels: , ,

Comments (0)
 
If you do a join on the lists of male and female first names from the 02000 census, you find that over 300 names appear in both lists. These include the ones you'd expect, such as Chris, Francis, Jamie, Kelly, Pat, and Terry, but also quite a few that were more interesting, such as Anthony, Charles, Dominique, Edward, Norman, Richard, and Thomas. It seems likely that some of these appear due to clerical error, but those surprising ones I listed appear with what would seem to be a statistically significant frequency. Women with traditionally male names seem much more common than the reverse.

Labels: , ,

Comments (0)
 

Like most people who cook, I suspect, we have a few staple recipes for which we try to keep the ingredients on hand; these are what we make if it gets to be 18:30 and we still haven't thought about what to make for dinner.

Whatever Pot Pie. This is a leftover consumer. Take whatever leftover meat you have: Chicken or turkey is best, but we've used pork or ham as well. Lunchmeat that is nearing the end of its lifespan works. Add whatever leftover or frozen or canned vegetables you want; we usually use frozen peas and carrots, and leftover broccoli if we have any. Add cubed potatoes; either microwave a potato for a few minutes and then dice it, or use frozen cube-style hash browns. Add a can of cream of whatever soup (we prefer cream of chicken, but have also used cream of mushroom, celery, or potato), and a little extra liquid: we use milk and/or white wine. Season as you like; we usually use dill. Put it in a frozen pie crust and cook for 30 or 40 minutes at 375 or so.

Curry Sausage Couscous. Cook sausage (we use turkey sausage), onions, garlic, pistachios, and curry until the sausage is fully cooked. None of the amounts are critical; the recipe we started with said 1T of curry, but I use 2 or 3. Add a little cream or milk right at the end if you wish. Serve over couscous.

Labels:

Comments (0)
 

I've had this thought for some time, but the past year working on analytics and visualization has really driven it home: Pages full of text are a terrible way to accomplish a résumé's goals. An effective business presentation or a sales brochure is very visual, full of charts and pictures. There is a good reason for this: Such communications need to convey a lot of information, not necessarily at a very high level of detail, to people who have limited attention to give, and who are intrinsically difficult to engage. A résumé's audience and purpose are similar.

I've been playing around with some concepts in this direction. For example, the following chart shows my experience with various programming languages chronologically:

languages timeline
You can tell at a glance that the majority of my experience is in C++, that I have a lot of experience with Lisp but not recently, and that recently I have been increasingly working with Python, SQL, and web technologies (by which I mean JavaScript, AJAX, CSS, etc.). I'm working on similar charts for non-language technologies (e.g. OpenGL, Qt, COM), though I'm not completely sure those don't belong in the same chart with languages, and another for the types of applications I have developed. I'm also working on a graphical timeline of where I worked, accompanied by screenshots and other visual descriptions of the work I've done. And, of course, ultimately I will make everything much better-looking than that stock Excel chart. The end result I imagine is something that looks a lot like a sales brochure (which is basically what a résumé is) - something that would be totally appropriate to print on glossy paper.

A couple of online services are small steps in this direction. VisualCV adds some charts and such to an otherwise fairly standard résumé format, and codersCV adds a timeline, but in both cases the résumé is still dominated by text, and they are also aimed at online presentation; I don't know if or how well they support print. What I'm working on will look more like something that could have been done by Nick Felton.

Labels: , ,

Comments (3)
 
The other day my 7-year-old saw me working on my blog, and informed me, "Daddy, 'blog' is not a word. 'Weblog' is a word, but 'blog' isn't." She has some authority on the subject. A while back my beloved keyboard died, and I clipped off the cord and gave it to her to play with. Now, whenever we have guests over, she brings down that keyboard and "makes them a web site," interviewing them and banging away on the keyboard.

Labels: ,

Comments (0)
 
iPod

Recently A List Apart and Rands both covered, from somewhat different angles, the notion that much of great design is about details. As far back as I can remember, seeing a message like this has made me want to scream:

You have 1 new messages.

It would have taken the programmer seconds - literally, seconds - to add the conditional that would omit the plural when the number is 1. When I write such code, I also special-case 0, writing no instead of 0.

Coincidentally, just days before I read that Rands article, I'd added code to this blog to write Today and Yesterday in place of those dates, and to omit the year when the date falls in the current year. That has long been one of my favorite little design details. A couple of my other favorites are how in iTunes, it displays an image of the model and color of iPod that is presently docked (too bad the rest of iTunes is such a mess). And how my TV, in the final seconds before the sleep timer shuts it off, says good night.

Such details sometimes cost extra, especially with physical products. But that cost is often nominal, especially in the software world, as in the 1 message example above. Look for those kinds of opportunities and take advantage of them; they can help make the difference between a product and a thing of beauty.

Labels: , ,

Comments (0)
 
I have the outlines of two novels pretty well fleshed out in my head. All that's stopping me from writing them is a lack of time; well, that and the fact that I hate the way my creative writing sounds. A friend of mine tells me that all writers hate their own writing, but the fact is, I need a lot more practice at writing fiction.

Anyway, I have this written in a notebook from maybe five years ago: A plot point in one of them hinges on a fictional piece of legislation called the Digital Surveillance Privacy Act (DSPA), which (among other things) would mandate that all cameras must make an audible beep or click when they take a picture. I had intended this to be a sarcastic commentary on ludicrous security-theater legislation. As is typical of satire, I exaggerated it beyond anything I thought likely to really happen. Well, guess what is on the House floor? The Camera Phone Predator Alert Act, which specifies exactly what I imagined. As Bruce Schneier said, this is so silly it defies comment. They even gave the law a goofier name than I did.

Labels: , ,

Comments (1)
 
Scotland, PAParsing postal addresses is surprisingly difficult. The USPS document that describes how to properly format US addresses is over 200 pages long. How many types of road (e.g. Street, Court, Avenue, etc.) do you suppose are in that document? I once tried to list all of the ones I could think of, and came up with a couple of dozen. That document contains over 200, including some strange ones such as Loaf and Stravenue. Further complicating matters, many names can serve more than one purpose depending on where in an address they are; for example, according to the 02000 census data, there are at least 68 cities in the US whose names are the names of states (including 25 cities named Washington). And of course, US rules don't apply in other countries, whose rules are all different. Supposing you had a good parser for US addresses, you'd like to be able to determine if an address was in the US. One way might be to just search for the names of other countries in your string. However, again from the census data, there are at least 85 cities in the US whose names are the names of countries elsewhere in the world (including, strangely, 17 cities named Lebanon).

It's one of those problems where you can throw together a 75% solution in a few hours, but from there each subsequent level of improvement gets increasingly difficult. It's no surprise, then, that the address parsing software out there is either really expensive, or doesn't work very well, or both. On the long list of projects I hope to get to someday is to write a good open-source address-parsing package.

Labels: , ,

Comments (0)
 
The A-TeamWhen I was a kid - and yes, I feel like a total geezer when I start sentences that way - there were a lot of shows on prime-time TV that were ostensibly adult shows, but that were wholesome enough for even fairly young children to watch. I'm thinking of The A-Team, The Dukes of Hazzard, Scarecrow and Mrs. King, Salvage One, Battlestar Galactica, Buck Rogers in the 25th Century, and so forth. Not exactly the highest-quality television, but shows that the entire family could watch together. Today, we like to let our 12-year-old watch TV with us after the two younger girls to go bed, but there is almost nothing on network TV at even 8:00pm that we will let her watch. We watch a lot of Discovery (particularly Mythbusters) and TLC. Those are great, and surely more educational than those shows I watched when I was a kid, but it would be nice if there was some fiction on TV that was suitable for young people as well as adults. I'm surprised this void exists, and I wonder how other parents fill it. Do they watch the same educational programming we do? Do they not let their kids watch network TV at all? Or are they letting their kids watch the shows that I find unsuitable, and if so, are they being too permissive or am I being too restrictive?

Labels: ,

Comments (1)
28
 
icon from mattvarone.comOne of the problems with my blog(s) is that historically I have posted sporadically, and in general not often enough. To try to get in the habit of doing so, I'm going to try an experiment: I'm going to try and make a post every day in the month of February.

You may also notice that I've redesigned: I've moved the blog back to be the front page of joeganley.com, and simplified the design quite a bit. It's a little too simple (and a little gross under the hood), but I'm working on my own blogging system which will require a complete reimplementation of the blog design, so I didn't want to spend too much time fiddling with Blogger templates right now.

Labels: ,

Comments (0)
 
Our friend Robin gave us these wonderful paintings of our two younger daughters for Christmas. Thank you, Robin!

Labels: , , ,

Comments (0)
 
I've finally gotten around to reading Norman's book The Design of Everyday Things. This book probably belongs on the list of books programmers haven't really read, which is a pity; it's a fantastic book. It covers many of the same complaints I made about alarm clocks, and it also brought to a head another of my biggest design complaints: iTunes.

I love Apple. By and large, their products are the very model of good design: Simple, intuitive, and elegant. However, even after eight versions, they've completely fumbled with iTunes; almost every time I use it, I find something new to dislike about it.

First, simply from a gestalt point of view, it's a mess. Some functionality is in the menus (including the dreaded Advanced menu), and some is hidden behind buttons bearing cryptic icons, many of which provide little clue what they do. Within the iTunes window, these icons are all over the place; when I need to perform a task, there is little affordance as to where I should look for the widget that makes it happen. The functionality of a music player is intermingled with that of a music-library manager in a most haphazard way. And the Podcasts pane is even worse than the main one.

Playlists are simply a broken model for how to organize music. Consider my family: My wife, my oldest daughter, and I each have an iPod. To try to subdivide our music library into songs that belong on each iPod, I have to have a playlist for all of us, one for my wife and I, one for my wife and daughter, one for my daughter and I, and one for each of us alone. When my two youngest girls get iPods, this is going to exponentiate; I won't need all 31 possibilities, but the number will at least double. And this doesn't even cover the case of subdividing by mood or genre. The right model is tags. I could then specify that, for example, my daughter's iPod should be stocked with songs tagged megan first, and the remaining space filled with (joe and not explicit) (she likes a lot of the same music I do). I know some of this can be done with smart playlists, but it wouldn't be easy and simple and effortlessly "just work" like it could. Also, this would further allow visualization and manipulation of my collection via tag clouds and similar devices like you see on tag-based sites like Flickr.

Initially building a library is always an unpleasant experience in iTunes. Recently I installed a new hard drive for music. I reset iTunes to point there, and asked it to consolidate my library, copying songs from anywhere on the machine to the new drive, and organizing them its own way. This took forever, and many hours into it, I got a dialog informing me that the drive was full, and that iTunes would point to the song in its original location. The buttons on this dialog are OK and Cancel. Presumably OK just acknowledges and dismisses the dialog, but what does Cancel mean? Don't add this song to the library, or abort the entire consolidation? So many hours in, I don't want to abort the whole operation, so I hit OK. Now I get the same dialog again, presumably for the next song (it doesn't tell me what song it is complaining about). I hit OK again, and again, and again. Finally, frustrated, I hit Cancel, but I continue to get these dialogs a bunch more times, and then they stop. I can't tell now if I aborted the consolidation, or if it finished. I see no recourse other than to erase everything and start over, and to hit OK for every one of those dialogs (there ended up being a few hundred of them). There is no "Yes to all" or "Don't show this again" option.

Part of the reason the disk is full is duplicates: Songs that appear in multiple places on my machine. For some reason, iTunes copies and inserts in the library every copy. I can think of no reason to include duplicates in a library, especially when iTunes is making its own copies of the songs. Mind you, I'm not talking about songs with the same title or filename; I'm talking about songs that are bitwise identical on the disk. I don't want these taking up disk space or appearing multiple times in my library (though you can turn off the latter), but I know of no way to avoid it.

Similarly, the dialog you get if you try to add songs to a playlist that are already in it. You are asked whether you want to add them multiple times, or to skip the duplicates. They should have done this the Apple way: Simply don't enable the rare corner case where someone might want the same song in their playlist multiple times, and skip them always. Or at least give the dialog a "Don't ask this again" option.

As an extra added bonus, one of its automatic upgrades deleted all of my ratings data.

It's not just me; if you Google "itunes sucks" you get almost 18,000 hits, which cover most everything I've complained about here and many other equally legitimate issues.

I'm going to move my whole collection to a Linux box running some open-source music manager (I haven't figured out which one yet). I hardly ever buy music from iTunes anyway, because I can get it DRM-free from Amazon without paying extra for the privilege. I'll miss the free songs from Starbucks, but not enough to put up with this piece of garbage any longer. Given how hard it is to move a music collection to a new platform, you can guess how likely I am to ever come back to iTunes, even if they make it awesome. Congrats, Apple - your sorry usability has lost you a customer from the iTunes ecosystem for life. It'd be shameful even for a company who doesn't have such a long history of great design. Maybe I should send them my copy of Design of Everyday Things; apparently the iTunes team hasn't read it.

Labels: , , ,

Comments (0)
 
Well, not really. But I did just discover that my Lisp in JavaScript interpreter is part of Google Chrome's test suite.

Labels: , , , ,

Comments (0)
 
NP-completeness isn't just a theoretical notion of interest only to pointy-headed computability nerds. When faced with a problem, it is important to determine whether it is NP-hard (or not) because if it isn't, you don't want to write some heuristic to produce nonoptimal solutions when it is possible to produce optimal solutions efficiently. Conversely, you don't want to waste your time trying to devise optimal solutions to an NP-hard problem, because you will surely fail.

Often, the reduction from your problem to a known NP-complete problem is obvious. When it isn't is when things get interesting. If you have trouble proving a problem NP-complete, it is often - but certainly not always - because the problem in fact isn't NP-complete. After you've done a lot of these proofs, you start to get a feel for how many degrees of freedom in the solution space might still be apprehended in polynomial time, and how many will probably lead to NP-completeness. Often, the magic boundary is between 2 and 3 degrees of freedom: 2-coloring a graph can be done efficiently, while 3-coloring is NP-complete, and likewise bipartite matching is in P while three-dimensional matching is NP-complete. However, some of the NP-completeness proofs that have been devised are incredibly complicated and nonobvious, so don't take your difficulty in proving NP-completeness as strong evidence that the problem is not NP-complete.

A productive approach is often to bounce back and forth between trying to devise an efficient algorithm and trying to prove NP-completeness. Your failures at proving NP-completeness can produce inspirations about how to attack the problem algorithmically, and your failures at devising an efficient algorithm can lead to insights into how to establish NP-completeness. One of my favorite examples from my own work was to solve a system of rectilinear orientation constraints. You had a bunch of objects in two dimensions that could have one of the eight axis-parallel orientations, and a set of constraints between pairs of them: That they must have the same orientation, have orientation mirrored about one or the other axis, have orientation rotated 90 degrees from one another, or any compatible combination of these. My initial attempts to prove it NP-complete focused on looking at it as an 8-coloring problem, but the problem didn't seem quite general enough to encode a generic graph coloring problem as a system of these constraints. Ultimately these failed efforts to prove NP-completeness led to the crucial insight to solving the problem efficiently: That the constraints could be viewed as three independent systems of constraints with respect to the three elements of the orientations: mirroring about x, mirroring about y, and 90-degree rotation. Each of those could be formulated as a graph 2-coloring problem independent of the other two problems, and thus efficiently solved. The independence of these three 2-coloring problems was the source of the difficulty in proving NP-completeness, and I'm certain I made that cognitive leap as a result of my attempts to prove the problem NP-complete.

Once you've determined that your problem is NP-complete, that still doesn't necessarily mean it is hard in practice. NP-completeness means that the problem is hard in the worst case and for general inputs. There are some NP-complete problems for which algorithms have been devised that, while technically heuristics -- they cannot guarantee to find optimal solutions for all problem inputs -- nonetheless in practice almost always do find optimal solutions to problem instances that arise in the real world. This is true, for example, of the satisfiability problem: Determining whether there is an assignment of values to boolean variables that makes a particular logic formula evaluate to true.

Further, it is often the case that the inputs your problem will be presented with in whatever real-world situation you are trying to solve will not, in fact, be fully general. Often, an NP-complete problem is no longer NP-complete if the inputs are known to have some kind of structure more restricted than the general case. A column by David Johnson, for example, addresses what graph problems turn out to be efficiently solvable if the input graphs are restricted to specific classes of graphs, e.g. trees, planar graphs, chordal graphs, bipartite graphs, and many more. If you know that your inputs are restricted in some way, sometimes a problem that is NP-complete in general will turn out to be efficiently solvable for those kinds of inputs.

Subsequent posts will address what to do if your problem does turn out to be hard in practice for the kinds of inputs you really see.

Labels: , ,

Comments (0)
 
The comment stream over at Coding Horror has gotten quite toxic, with those defending the original post seeming to say that since it was just supposed to be a lay introduction, it doesn't need to be exactly correct. I find that attitude pretty incomprehensible, so I thought I'd try to write a really simple intro to NP-completeness that is actually right.

There is a class of problems known as NP-complete, as well as a similar class called NP-hard that are at least as difficult; for this discussion you can consider those terms interchangeable. These problems have the property that if any of them can be solved efficiently (that is, in polynomial time), then they all can be. Further, despite much effort by many smart people, no efficient algorithm has ever been devised for any of them. This leads to the prevailing opinion that none of the NP-complete/NP-hard problems can be efficiently solved. You can know that a problem you face is NP-hard if a magic oracle that solved your problem instantly could be used to solve another problem that is known to be NP-hard in polynomial time (hundreds of such problems are known). If this is true, you have a number of options (which I will elaborate upon in future posts), but the most likely one is to devise a heuristic that will produce [hopefully] good, but not necessarily optimal, solutions.

(Aside: The title of this post might be my most obscure ever; +1 to anyone who can identify it without consulting the web.)

Labels: , ,

Comments (0)
 
Recently Jeff Atwood posted about NP-completeness. I and a few other commenters complained that he made a bit of a mess of it. It's easy to be a critic, so I thought I'd give it a go myself. I intend to write a series of posts; this first one will define NP-completeness (correctly), and subsequent posts will talk about various ways of attacking NP-hard problems.


The executive summary is that there is a large class of problems that are NP-complete, all of which probably cannot be solved on general inputs more quickly than in time exponential in the input size. There is no known polynomial-time algorithm for any of them, and smart money says none exists. This is what is meant by the "P=NP" debate; if a polynomial-time algorithm for any NP-complete problem were found, that would imply that P (the class of polynomial-time solvable problems) and NP (the class of NP-complete problems, which we'll define in a moment) are the same. This almost certainly isn't true, though, which in practice means that none of these problems can be solved in time that is not exponential in the size of the input. For a few reasons, this isn't quite as grim as it sounds, as will become clear in subsequent installments.

For a problem to be in the class NP, it must be possible to verify the validity of a solution to the problem in polynomial time. Mostly in NP we see the decision versions of the optimization problems we actually care about: "Does there exist a traveling salesman tour for these points whose length is less than 100?" rather than "Find me the shortest traveling salesman tour for these points." You can easily see that the optimization problem can be used to solve the decision problem, and is thus at least as hard. Often the decision problem is in NP (given a set of points and a tour, it's trivial to determine whether the tour is shorter than the given bound), while the optimization problem is not (there's no polynomial-time way to verify that a tour is the shortest possible, unless P=NP).

To be NP-complete, a problem must be in NP, and it must also be possible to transform any instance of some known NP-complete problem into an instance of our problem, and to transform a solution to our problem back into a solution to that one. This is where we get the all-or-nothing nature of the P=NP conundrum: If you had a polynomial-time algorithm for any NP-complete problem, you could use it to solve every other NP-complete problem via these transformations. Devising such a transformation is at the heart of proving a problem NP-complete.

A problem that can be transformed to an NP-complete problem, but isn't (or isn't known to be) in NP, is referred to as NP-hard rather than NP-complete. The optimization versions of many of the NP-complete decision problems fall into this category.

After you've seen a number of NP-hard problems, you start to get an intuition for whether a new problem is likely to be NP-hard or not. Be careful, though; sometimes problems that 'feel' NP-hard actually aren't, and vice-versa.

A few of the classic NP-complete problems are:

  • Satisfiability (aka SAT): Given a boolean formula, is there a set of assignments to its variables that makes the formula true?
  • Vertex cover: Given a graph, is there a set of fewer than n vertices such that at least one endpoint of every edge is in the set?
  • Bin packing: Is it possible to pack some sized objects into a set of sized bins?
    See Wikipedia for a longer list, though many of them aren't defined. I may have to fix that.

    Garey and Johnson's Computers and Intractability: A Guide to the Theory of NP-Completeness is the book to read if you want more; it's one of the better-written CS books I've ever encountered. (I believe somewhere my wife has a photo of me sitting on the beach reading this book. I'm such a nerd.) David Johnson also wrote a column on NP-completeness for Journal of Algorithms for a number of years; it is also excellent reading, and is available online!

    Labels: , ,

    Comments (2)
  •  
    A New York Times article mentions offhand that a Presidential election could be won with just 22% of the popular vote, and Kottke wonders if this could possibly be true. It is. For example: Sort the states by decreasing electoral votes per voter, then greedily remove states from the bottom of the list until you are left with 270 electoral votes. If a candidate won each of those states (the ones shown in red) by just one vote, he/she would win the election with just 21.56% of the popular vote (note that this requires more than half the votes in Maine and Nebraska, which aren't winner-take-all). This might be the source of the NYT mention, but the scenario there requires about 700,000 more votes than mine, and he also missed the Maine/Nebraska subtlety. I used the same raw data as in that article.

    Labels: , ,

    Comments (0)
     
    In today's Washington Post was what is perhaps my favorite infographic I've ever seen. It plots, as a timeline, GDP growth, unemployment, and who was president from Hoover forward. There's an enormous amount of information in it, but you can read it like a book; indeed, I spent about an hour today doing so. Unfortunately, WaPo doesn't seem to have posted the graphic online; the only place I could find it requires registration (it looks like they require money, but if you register you can see a few articles for free).

    Labels: , ,

    Comments (0)
     
    Spreadsheet screenshot

    I have long wanted to be able to create in-cell sparklines in OpenOffice calc. There are a few commercial tools to do this, but they are made for Excel, and also they are proprietary. I've put together enough of a solution to prove the concept. The idea is to create a font containing bars, lines, etc., and then an OpenOffice macro generates strings of characters in this font from the spreadsheet data.

    To create the font, I used SVG, since it allows fairly simple specification of fonts. I generated the SVG using a Python script. I then used FontForge to convert the SVG font to a TrueType font.

    Then I wrote some simple OpenOffice macros to generate characters in this font from the spreadsheet data.

    To use it, install the font on your system, and import the macros into OpenOffice. Then, in a cell where you want a sparkline, do =SPARKLINES(cell range) for a line graph or SPARKBARS(cell range) for a bar graph.

    This is pretty raw; like I said, it's just a proof of concept. I figured I'd go ahead and put it out there because I know there's a demand, and I'm hopeful that someone with a little more time to spend on it might be able to polish it up.

    Labels: , ,

    Comments (4)
     
    A little scrap of JavaScript/SVG that I threw together, with an obvious inspiration:

    Note that if you're running Internet Explorer, what you're seeing is an image and not the actual script-generated SVG, so if it is no longer 02008, it'll be wrong. This is because IE is a piece of crap, or more specifically, because it doesn't have native SVG support and doesn't allow scripts to interact with plugins.

    Labels: , , ,

    Comments (0)
     
    My daughter brought home a fun math problem last week; it took a little mental effort to solve, and I haven't found a solution on the web, so I thought I'd post one.

    The problem is this: What is the largest n such that 1005! is evenly divisible by 10n? Put another way, how many 0's does the expansion of 1005! end with?

    Highlight the following blank space for the solution:

    Imagine writing out the expansion of 1005!, and then replacing each number with its prime factorization. The prime factors of 10 are 5 and 2, so our question is equivalent to asking how many (5,2) pairs this expansion contains. It is easy to see that it contains far more 2's than 5's, so the question becomes: How many 5's are in the expansion? Every number in the original expansion of 1005! that is divisible by 5 contributes a 5 in the prime-factorization expansion; there are 201 of these. However, every number that is divisible by 52, i.e. 25, contributes a second 5; this adds 40 more. Every number that is divisible by 53, i.e. 125, contributes a third 5; add 8 more. Finally, the one number that is divisible by 54, i.e. 625, contributes a fourth 5, adding one more and bringing the total count to n = 250.

    Labels: ,

    Comments (0)
     
    I've long had a good handle on the core JavaScript language, but the other night I set out to understand the DOM and the event-handling model, and the result was this toy sudoku app. If any more seasoned JavaScript programmers see anything stupid in there, please comment.

    Labels: , ,

    Comments (0)
    T
     
    Paul Graham just posted a history of the T language written by Olin Shivers, and it got me reminiscing. T was a Lisp/Scheme dialect. I took a compilers class my first year in grad school (ulp - 17 years ago!), and the semester project was to write a compiler for a subset of Pascal (all that was missing, IIRC, was user-defined types) into object code for a hypothetical microprocessor (for which we were provided a virtual machine). At the time, I was ramping up on a master's research track around optimizing the compilation of purely-functional languages like Haskell for doing numerical computations. My C chops were very strong, I figured I wasn't going to learn much more about C programming from this project, and my research would need me to know T, so I got the bright idea of writing the compilers project in T. I brought this idea to my compilers professor; he thought this was a very bad idea, and tried to talk me out of it, but ultimately capitulated with the understanding that I would be free-climbing - if I fell down, he wasn't going to be sympathetic. (This is a man who once said that the only legitimate excuse for turning in an assignment late is your own death.) I'd never really programmed in Lisp before, so learning the language (as any Lisp programmer knows, a major change in mindset) while under the gun on a long, difficult assignment was quite a challenge. However, ultimately I got it done; as far as testing could determine, it worked perfectly, and it had about 30% as many lines of code as my classmates' C implementations. I had fun, I learned a lot about both Lisp programming and writing compilers, and it brought me some small renown in the department (I presented on the project to a number of compilers and programming languages classes). Later, when I told this story in job interviews, my interviewers seemed to appreciate the accomplishment, but especially the attitude involved in bucking the system in this way. Not long thereafter, my would-be advisor left the department, and I ended up doing algorithms research instead; this would end up being a career, and I love it, but I still feel a little wistful about languages/compilers research. Still today, T is my favorite language I've ever written in, though Python is closing fast. BTW, in the unlikely event that anyone is interested, here is the source code for the compiler.

    Labels: ,

    Comments (0)
     
    Paul Graham lists startup ideas that he would like to fund, but the one I'd most like to see isn't on the list: personal information management. I'd like a service that puts all of my info in one place - contacts, bookmarks, notes, photos, etc. There would be a wide variety of ways to get information into the system: By email, as bookmarks, by phone, as paper, etc. It would be indexed by content, and also optionally by tagging it manually. There would be a (login-protected) search engine that I could use to find my information, and it would also be mirrored onto my local machine so that I could access it when offline. It would also serve me, via an email or RSS digest, changes to my information that it finds elsewhere, and content it finds on the web that might be interesting based on similarity to my own info. One could cobble together something that does most of this from various bits and pieces already available, but if someone consolidated all of that functionality into a simple, unified user experience, I'd pay for it.

    Related story

    Labels: ,

    Comments (1)
     
    From The Economist, another interesting infographic ruined by an incredibly distracting background image.

    Ow! My brain!
    ->
    Much better.
    Perfection is achieved not when there is nothing more to add, but when there is nothing more to take away. Ironically, the title of the piece is "The Bare Necessities."

    Update: Here's another, only worse: This time, the 'background' graphic is covering part of the chart.

    Labels: ,

    Comments (0)
     
    Wordle creates really nice-looking tag clouds. Here is one using the entire text of my dissertation as input:

    Dissertation tag cloud

    (Click to enlarge.)

    Labels: ,

    Comments (0)
     
    A trendy thing these days seems to be to represent your résumé as a tag cloud. Just for kicks, here's mine:
    created at TagCrowd.com

    Labels: , ,

    Comments (0)
     
    The other day my daughter was watching me send a text message, and how when two successive letters are on the same key I had to pause a moment until it timed out, and she wondered aloud what was the longest word that did not have two successive letters on the same key. Searching the web, I was surprised not to be able to find an answer, so I fed the 480K-word dictionary on my Linux box through a few tr and grep commands and a little Python code to find the answer, plus a number of other special words.
    • The longest word that is texted with no repeated keys is dichlorodiphenyltrichloroethane (that's the full name of DDT).
    • The longest word that is texted with no repeated keystrokes (i.e. not only no repeated keys, but each letter is the first choice on its key) is pagatpat (a variety of mangrove tree).
    • There is a word that repeats no keys (even non-consecutively): idolatry.
    • The longest words that are texted with only a single key are bacaba (a variety of palm tree) and deeded (the dictionary also contains deedeed, but I can't find a definition and believe that might be a typo of deeded).
    • Several words contain a sequence of seven of the same key, including: fiddledeedee, homonomous, mononomial, and nonmonogamous.

    Labels:

    Comments (0)
     
    I just had a post on Slashdot about finding great programmers for the company where I work. My plug for the company got (understandably) dropped, so I'll put it here where hopefully some /. readers will end up.

    The company is White Oak Technologies. It's a great place to work, and we're hiring. Be surrounded by really smart people, work on hard, interesting problems, and get better-than-market compensation and benefits. What we do is data exploitation work on very large volumes of data. It's really interesting stuff. If you're really good, exactly what languages you prefer aren't all that important; we have a lot of flexibility in that regard. However, among the technologies we use most are Python and C++. Not to put too fine a point on it, our standards are very high, so bring your A game.

    We are in the DC Metro area. Due to the nature of the business, telecommuting is not possible. Also, you must be a US citizen and will be subject to a clearance investigation.

    If you're interested, check out our openings (be sure to tell them how you found us!) or send me email.

    Labels:

    Comments (0)
    I saw where someone had posted their academic genealogy, so just for fun I looked up my own. Here it is, working backward: me - James Cohoon - Sartaj Sahni - Ellis Horowitz - George Collins - Barkley Rosser - Alonzo Church.

    Labels: ,

    Comments (0)
     
    Via the Beautiful Code blog, a couple of nice sorting results: First, the proportion extend sort, a recursive sort that allegedly outperforms quicksort in both theory and practice. (Warning: Their code is pretty hard to understand, undercommented and obfuscated.) Second, the Python distribution contains a very nice writeup of the sorting routing Tim Peters used to implement listsort. The filename is listsort.txt, and it describes a nicely engineered adaptive mergesort with some elegant tricks to make it perform better on lists that already contain some order.

    Labels: ,

    Comments (0)
     
     
    We went to Las Vegas last week (my first time there), and the one casino game that surprised me was War. Yes, just like the one you used to play as a kid until you realized that it is boring. I walked past a table, and then doubled back in disbelief that this was what I thought it was. This is about the most brain-dead card game I can think of, far dumber than anything else in the casino (except, of course, the slot machines).

    Labels: ,

    Comments (0)
     
     
    Years ago, the first time I worked at Cadence, my username was "ganley". I left to go to a startup, where my username was "joe", and when Cadence acquired that startup I kept the username "joe". I quickly discovered that this was a mistake, since spammers send to name@domain for every common name. Only much later did I discover that "ganley" would have been better for another reason: I worked on an open source project, and my username still appears in some of the CVS tags in the source code. It would be much cooler if it was "ganley" instead of "joe" - after all, "joe" could be anyone, right?

    Labels:

    Comments (0)
     
    Consider the problem of choosing a random element from a linked list whose length you do not know. Obviously, you could count the length of the list, then pick a random number between 1 and length, and then walk to that element, but there is a clever way to do this in one pass.

    The algorithm is as follows:

    Element choice;
    int count = 0;
    for (Element e = head; e != NULL; e = e->next) {
       if (rand01() <= (1.0 / ++count)) {
          choice = e;
       }
    }

    You can easily show that when this finishes, choice is equal to any given element with probability 1/n, where n is the number of elements: The probability of setting the ith element to be the new choice is 1/i. The probability of not selecting any of the subsequent elements to be the new choice is the product as j goes from i+1 to n of (1 - 1/j), which if you work out the math ends up as i/n. Multiply the two probabilities together and you get 1/n.

    Labels: ,

    Comments (0)
     
    If I were inclined to start a company, something that I've always thought about is a company that applies Apple-style design sensibilities to relatively mundane consumer products. My company's first product would be an alarm clock. I've probably owned a dozen alarm clocks in my life, and used a couple of dozen more, and every single one has had an atrocious interface. Even the best of them lacked what I would consider some basic requirements. Few of them leave me confident that I've set the alarm properly and it is actually going to go off when I intended it to. It is difficult for me to turn them off in the dark, much less be able to actually set the alarm in the dark. For those of us not inclined to snooze, I've never seen one with a disable button as readily accessible as the snooze bar, but that kills the alarm until the same time tomorrow. And if there are multiple alarms or weekday-only alarms or such, forget it - the interface is so difficult that you'll probably never use those features. Someday...

    Update (May 02007): Hmmm, WidgetStation might be just the thing. Looking forward to its release.
    Update (Mar 02008): WidgetStation seems to be indefinitely delayed, but Chumby seems to be exactly what I want anyway. It's a little pricey ($180), but I still may have to buy one.

    Labels:

    Comments (3)
     
    A scary article in which the author takes an airline ticket stub out of the trash and uses it to find all sorts of personal data on its owner. I'm pretty paranoid about shredding, but I admit to occasionally considering tossing my airline stub. And I routinely toss the tag that they put on your luggage, and I bet that contains much (all?) of the same data. No more!

    I disagree with the author's claim that all of this data collection is bad, however; the data itself isn't the problem, it's the fact that this data is used as authentication of who you are. Experts have long derided security through obscurity, and security techniques that rely on your personal data being secret are doomed in today's world.

    Labels:

    Comments (0)
     
    A number of times I've been faced with what I call the "Chinese menu enumeration" problem - you know, one from column A, one from column B, etc. Given an array num of N entries where each entry contains the number of items in that column, enumerate [the indices of] all of the possible meals. Equivalently, list every N-digit number in which the i'th digit is less than num[i]. This is not a hard problem, but it's all too easy to write ugly code for it. My latest attempt ended up like this:

    int val[N];
    for (int i = N - 1; i >= 0; --i) val[i] = 0;
    for (int inc = 0; inc >= 0;) {
       // do whatever you need to do with val[]
       inc = N - 1;
       while (inc >= 0 && ++val[inc] == num[inc]) {
          val[inc--] = 0;
       }
    }

    Pretty succinct, but I can't shake the feeling that it could be obfuscated simplified further. Any takers?

    Labels:

    Comments (1)
     
    One of the more comprehensive collections of bit-twiddling hacks I've seen.

    Labels:

    Comments (0)
     
    Yesterday I was reading Paul Graham's site, and wondered what other Lisp hackers might be around, so I went to one of the Lispier pages and Googled 'similar links.' To my surprise, one of the first hits was my friend Lawrence. That led me to revisit my friend Brad; he was on the programming team with me, and he and Lawrence were roommates (of each other, not of me). Somewhere in here I also lost half an hour to Drew's site; I don't know him at all, but he's a friend of Lawrence's. (I particularly liked the manually-raytraced sphere.)

    The good that comes of this sort of jaunt is that Lawrence, Brad, and I are making plans to get together when I'm in California next month; it's been an overly large number of years since I've seen them.

    Labels:

    Comments (0)
     
    A coworker and I were discussing the relative rarity of people like me, namely PhDs who are great software engineers and who are good at actually getting things done. He conjectured that there would be a strong correlation between those traits and how quickly one finished their PhD. Certainly that fits in my case; I did my PhD in 2.5 years. I'm curious as to how strong this correlation might be, and I'll be collecting data about it going forward.

    Labels:

    Comments (0)
     
    Reader Bennet Yee pointed out that my Lisp in JavaScript interpreter failed on his Y-combinator code, as follows:
      ((lambda (x y) (x x y)) (lambda (me n) (cond ((< n 1) 1) (t (* n (me me (- n 1)))))) 4)
    Naturally, I suspected a bug in the lambda machinery, but it turned out to be much simpler: The symbol-table lookup code had a conditional that depended on the value of the variable being looked up, and JavaScript (as in C/C++) equates 0 to false. Thus, it failed to find the variable when its value was 0. I hadn't caught this before because I habitually write that terminating condition as the slightly more efficient "< n 2", and thus it terminates before n reaches 0.

    This task also led me to revisit the Y combinator, which is really beautiful. For those who don't want to wade through the details, it's a clever mechanism for implementing a recursive call to a lambda (i.e. nameless) function.

    Labels: ,

    Comments (0)
     
    My wife called me into the room the other day because the Numb3rs episode that she was watching had mentioned Steiner trees (the principal subject of my Ph.D. dissertation). As usual for that show, they got the math half wrong and half ornamented with extraneous jargon, but still it was a tiny thrill to see Steiner trees mentioned on TV.

    Labels: ,

    Comments (0)
     
    Annoyed at having to look up, yet again, one of the couple of C++ operator precedence rules that I don't have memorized (this time, whether equality or ternary is tighter), I took the C++ precedence rules from here and formatted them into a 3x5 reference card. Here it is in PDF form, or in Word if anyone wants to tweak it. (My graphic-design skills are fairly meager, so if anyone makes substantial improvements, please let me know.)

    Labels: ,

    Comments (0)
     
    Recently I wrote some code in which a templatized class declared one of its template parameters a friend. This worked fine in Visual C++, and I was surprised to find that GCC gave the rather unambiguous error, Template parameters cannot be friends. I was even more surprised to find that, indeed, this is not GCC being overly picky - it is not legal C++! There are extremely crufty workarounds, but in general it's clearly a bad idea to rely on platform-specific violations of the C++ standard in production code.

    Labels: ,

    Comments (0)
     
    SnowflakeIt struck me the other day that fuse beads could be a good way to teach my daughter about cellular automata, so we used them to build this snowflake cellular automaton.

    Labels: ,

    Comments (0)
     
    An interesting essay about why developers leave. In particular, I found the analysis in the morale section very interesting. This is one place where I think Google really has it right: Giving your people time to work on projects of their choosing pays huge dividends in morale. (Via particletree.)

    Labels:

    Comments (0)
     
    Joel Spolsky's latest essay is on coding standards, and in particular why Hungarian notation is good. I've always liked Hungarian and similar notations, but as I read this article I found myself thinking that these classifications could be not only denoted but also enforced by making them classes instead of just variables of the same type with differently-denoted names. This requires a little finesse with simple types such as int, but it can be done. The result not only denotes the same information as Hungarian (and less cryptically) in the class names, but the compiler even enforces the relationships for you. No need to "watch" for types with mismatched notation (as in Spolsky's essay); it's part of the type system. Which is not to say that I no longer find Hungarian notation useful, but that in many situations and in many languages it should be supplemented (or in some cases replaced) by type definitions.

    Labels:

    Comments (0)
     
    Maciej Ceglowski writes an excellent, half-facetious rebuttal to Paul Graham's Hackers and Painters essay. I didn't buy this analogy when I read it, and still don't, and Ceglowski does a great job of enumerating why it doesn't fly. I've always thought of programmers more like 'design and build' contractors. The best of them are both good architects and good builders, and like programmers who are good at both of these things, they are rare. Many more are either good architects and spotty builders (their buildings look nice and function well but are poorly constructed) or vice-versa (their buildings are well put-together but don't flow well or are unattractive), and of course some aren't good at either.

    Labels:

    Comments (0)
     
    I just ran across an Eric Sink article that I've somehow previously missed called The Hazards of Hiring. There are many good points in this article; in particular, he does a good job on one of my favorite points: in his words, The "very best" people never stop learning. ... They know their own weaknesses, and they're not insecure in talking about them. Many people seem afraid to say "I don't know." I love to say that, because it means we've just identified a gap in my knowledge, and almost always, it means that we're just about to fill that gap. That is the single most fulfilling thing in life, work-wise anyway.

    Another interesting point in that article is his skepticism toward people with advanced degrees. I see this a lot, and in fact I spend a lot of my time in interviews trying to convince people that despite my having a Ph.D., I am not some sort of ivory-tower computer science researcher. First and foremost, my love is for writing software. Sure, I love to sink my teeth into a really hard CS problem from time to time, but there is also fun to be found in all of the other facets of the software development process, even those that many consider mundane. I really enjoy positions that offer a lot of variety, from hard algorithmic problems to user interface design to library architecture to low-level infrastructure.

    Labels:

    Comments (0)
     
    Among the entries in Google's recent usenet timeline is the original Tom Duff post describing Duff's Device, a loop unrolling technique that exploits a surprisingly legal C quirk. I remember seeing this for the first time in a job interview in 1990 or so, when my interviewer wrote the code on the board and asked me what it did. I still think it's quite cool; though compiler optimization technology has probably rendered this unnecessary, it's still a great illustration of the sorts of things that are made possible by the "anything goes" design philosophy of C.

    Labels:

    Comments (0)
     
    Somewhat in the same spirit as that last post, another great engineering story, this one from Damien Katz. This story illustrates perfectly why it makes me so crazy when job postings demand, "10 years experience with XYZ." Clearly if you can get a really great person who also has XYZ experience, that's optimal, but if you have to choose, you want a really great programmer (who doubtless can learn XYZ very quickly) over a mediocre programmer with lots of XYZ experience. (Link via Ned Batchelder.)

    Labels:

    Comments (0)
     
    I just love this story of the Macintosh Graphing Calculator. It was part of a project that got cancelled, but its authors continued to sneak into Apple to work on it for free, and eventually got it shipped with MacOS. This is an extreme and wonderful illustration of the dedication a good engineer can have to an important project. I just wish it was available for Windows!

    Labels:

    Comments (0)
     
    Ned Batchelder tells a C++ debugging story, the punchline of which is this: If a C++ constructor throws an exception, then that object's destructor is called. I encountered this in my own work, and it turns out exactly the opposite: If the constructor throws, the destructor is not called. These stories have the same moral, though: Constructors really shouldn't throw exceptions. Update May 2005: I reported my findings to Ned, and he wrote an update that describes in detail the precise semantics of what happens when a C++ constructor throws an exception. In short, the destructors of every base class and member whose constructor successfully completed will be called; note that this necessarily precludes the destruction of the class whose constructor threw the exception, since that constructor obviously did not complete execution.

    Labels: ,

    Comments (1)
     
    Via Ned Batchelder I learned about Darcs, which is a fully peer-to-peer source code control system. This sounds ideal for home hobby work, as it doesn't require a dedicated server machine. Also, it surprised the heck out of me that it's written in the [almost] purely functional language Haskell!

    Labels:

    Comments (0)
     
    I agree pretty much 100% with these Hallmarks of a Great Developer. In particular, the first point ("plans before coding") resonated with me. This is a former weakness of mine, and the area where I've made the most improvement in the past couple of years, since joining my current project. The point is even better made by Michael Abrash in his Graphics Programming Black Book (chapter 10) and by Ned Batchelder in his diamond cutter analogy.

    Labels:

    Comments (0)
     
    Red Hat's Alan Cox on writing better software. There is a lot of good stuff in there, but my favorite quote is, "If it does everything it's complicated, and if it's complicated, it's broken."

    Labels:

    Comments (0)
     
    Some interesting observations from Brian Cantrill on The Economics of Software.

    Labels:

    Comments (0)
     
    In last month's C/C++ Users Journal, there was a clever article that described using templates to write loop constructs that are unrolled at compile time. In my last project I wrote some multiobjective optimization code that was full of loops that went from 0 to 2 or 3; this mechanism would've been great for that. On the other hand, five years ago I highly doubt that all the compilers I supported would have processed this kind of fancy template code correctly.

    Labels: ,

    Comments (0)
     
    Michael Abrash has a series of articles appearing in Dr. Dobb's currently. The subject of the articles is code optimization, but the thing I found most interesting is the code being optimized. It is a software-only rasterizer. I wouldn't have thought it possible (even for Michael Abrash) to get DirectX7 performance in software. But most of all, it's noteworthy that state-of-the-art graphics hardware and drivers still have so many bugs and portability problems that Abrash finds it worthwhile to spend a lot of time and sacrifice two generations off the state of the art in order to avoid them. I certainly feel his pain, and I'm jealous that he is able to focus on a single type of processor and thus achieve this level of optimization.

    The other thing that caught my eye was an illustration of how on modern processors, optimization can be extremely nonobvious. The example was a subproblem I've solved before: Given a value x between 0 and 1 and an array of ascending values in the same range, figure out which pair of values x falls between. I used a binary search. It turns out that on the x86, binary search is predicted poorly by the processor's branch prediction logic, while in a linear search every compare but the last one is predicted correctly. A mispredicted branch is very expensive. So, for a small number of bins a linear search is 30-50% faster (!), and the problem size has to reach 64 bins before the improved asymptotic efficiency of the binary search actually beats the linear search in real life. My problem had at most a few tens of bins, and the search was deep in an inner loop, so this would have been useful information to have known.

    Labels: ,

    Comments (0)
     
    Ned Batchelder discusses rewriting assert conditions to make them more readable. Since one of the purposes of assertions is as documentation, this is good advice.

    Labels:

    Comments (0)