I'm excited to start my new job at YouTube in a few weeks. I'll manage the engineering team building the data warehouse for usage metrics.

I like that YouTube is important. It's firmly a part of our culture and I'm sure it will be how my kids watch video. YouTube's impressive statistics are the result. You don't see usage like that without a bunch of hard problems, and hard problems attract bright people. Indeed that's the clincher for why I'm looking forward to working there. People vote with their feet, and I have a lot of friends who have opted for Google, and YouTube specifically. They tell me that it's a great place to work.

YouTube is one of the worlds foremost platforms for social commentary, education, and free speech. And it's plenty of entertainment too. Sounds like fun.

Thick Apps Still Lose

Microsoft Excel 2016 Error Message

Thick apps won mobile. Fine.

On laptop (and desktop) it's not so clear. What is better, thick or thin? I tend to live mostly in thin land, although I use some thick apps regularly, like Twitter's Mac client and Apple Photos.

Every so often I give a big native app a try: Excel instead of Google Sheets, instead of Gmail, Reminders instead of the barebones Tasks built into Gmail. (I can't bring myself to try Word). But it's disappointing to see how those fancy apps keep shooting themselves in the foot!

Take for example this Excel error message. Excel is whining that it can't verify my subscription the first time I ran Excel untethered (version 15.11.2, for what its worth). Sure you can click through the warning, but would a newbie know to do that? At best off-putting, at worst downright disorienting. Why warn me of this at all? And why in a modal that stops me dead in my tracks?

It seems thick apps should win. They rock the unplugged use case. An even better situation is flaky networks -- tethered, conference WiFi, travelling. UI's deal notoriously poorly with intermittent or partial outages. A thick client, relying on that connection only for hitting API's, can hide the network.

Another place they should shine is the UI itself. They should be fast, beautiful, and featureful. Too often they're not. For example I find to be clunky, difficult to customize, and its keyboard shortcuts few and poorly done. Gmail is pretty good!

Finally there's the upgrade problem. Thick apps need conscious effort from their users before their work sees user time and they get feedback. And that's what drives innovation. Long cycles means slower (less) invention. One example I love is Gmail's "undo send" feature. Boy, you sure do miss that when you need it and it's not there! That should be on every thick client by now, but I don't think it is. I do know that Gmail has it and still doesn't.

Maybe the Internet can help. Look at Chrome with its awesome auto updates. What makes this work is solid engineering and exceptional quality control. I've never seen behind the Google curtain, but I bet there's no magic, just a lot of good engineering that leads to good software. Like: good design and code reviews, tons of test coverage across many scenarios, diverse and well-instrumented canaries, and thorough performance and resource use testing. If Google didn't all of that so well, then we wouldn't accept frequent pushes. Without the frequent upgrade cycle, Chromes feature cycle would languish.

Electron is another bright spot. This is the framework that gives Slack and GitHub's thick clients their fit and finish. It makes these feel like true native apps, even though they are mostly web controls with JavaScript the covers. Right-clicking still doesn't do what I want, and text controls are finicky, but it's close. But what those rough edges buy you, and the software producer, are frequent, reliable, and clean upgrades.

My natural preference would be for thick apps. If they were done well, I'd use them.

My Next Job


I left my last job a few weeks back and it's high time to look for a new one. If you're working on something interesting and think I could help, let me know!

It's nice to not have a day job while looking for another. I was lucky enough to do this once before in 2012 which turned out great. I learned then that time and flexibility lets you talk to lots of friends and learn about a breadth of projects. I found a fun project in a new domain (online education), something I doubt I'd have found the normal way.

Maybe I'll get lucky again.

Enough small talk, what am I looking for?

I'm looking for some flavor of line manager. I'm a good senior manager and code-every-day engineer; but I'm exceptional leading a team and running a project. That's what line managers do: lead engineers, not other managers or departments or matrix-anything. Also, if you're some kind of executive then coding is an indulgence, and I'd rather it just be part of my job. Mostly I'm talking to small companies, say 10-100 people (fun-size).

I want to build on my experience. I know infrastructure and cloud, SaaS and enterprise, and online education. I'm probably not the best person for your storage, security, gaming, e-commerce, or cryptocurrency company. I want to stay working on Internet technology. I like the (micro)services model. For my own projects I choose Python, JavaScript (frontend and backend), and Java. I know web operations, especially the Amazon stack.

Location is important: I don't want to do a daily Menlo Park to San Francisco round-trip. I'd like to work with friends if possible. And I want to do something worthwhile.

You can always get to my resume from the header here, or via this short link. I'm open to a bunch of things, just no kick boxing. Let's have coffee/drink or take a walk.

Lessons from Three Years in AWS

AWS Logo

I've spent the last three years building and operating web sites with Amazon Web Services and here are a few lessons I've learned. But I first have to come clean that I'm a fan of AWS with only casual experience with other IAAS/PAAS platforms.

S3 Is Amazing. They made the right engineering choices and compromises: cheap, practically infinitely scalable, fast enough, with good availability. $0.03/GB/mo covers up for a lot of sins. Knowing it's there changes how you build systems.

IAM Machine Roles From The Start. IAM with Instance Metadata is a powerful way to manage secrets and rights. Trouble is you can't add to existing machines. Provision with machine roles in big categories (e.g. app servers, utility machines, databases) at the start, even if just placeholders.

Availability Zoness Are Only Mostly Decoupled. After the 2011 us-east-1 outage we were reassured that a coordinated outage wouldn't happen again, but it happened again just last month.

They Will Lock You In And You'll Like It. They secondary services work well, are cheap, and are handy. I'm speaking of SQS, SES, Glacier, even Elastic Transcoder. Who wants to run a durable queue again?

CloudFormation No. It's tough to get right. My objection isn't programming in YAML, I don't mind writing Ansible plays, it's the complexity/structure of CloudFormation that is impenetrable. Plus even if you get it working once, you'd never run it again on something that is running.

Boto Yes. Powerful and expressive. Don't script the CLI, use Boto. Easy as pie.

Qualify Machines Before Use. Some VMs have lousy networking, presumably due to a chatty same-host or same-rack neighbor. Test for loss and latency to other hosts you own and on EBS. (I've used home-grown scripts, don't know of a standard open-source widget, someone should write one).

VPC Yes. If you have machines talking to each other (i.e. not a lone machine doing something lonely) then put them in a VPC. It's not hard.

NAT No. You think that'll improve security, but it will just introduce SPOFS and capacity chokepoints. Give your machines publicly routable IP's and use security groups.

Network ACLs Are A Pain. Try to get as far as you can with just security groups.

You'll Peer VPC's Someday. Choose non-overlapping subnet IP ranges at the start. It's hard to change later.

Spot Instances Are Tricky. They're only For a very specific use case that likely isn't yours. Setting up a test network? You can spend the money you save by using spot on swear jar fees.

Pick a Management Toolset. Ansible, Chef, all those things aren't all that different when it comes down to it. Just don't dither back and forth. There's a little bit of extra Chef love w/ AWS but not enough to tip the scales in your decision I'd reckon.

Tech Support Is Terrible. My last little startup didn't get much out of the business level tech support we bought. We needed it so we could call in to get help when we needed it, and we used that for escalating some problems. It was nice to have a number to call when I urgently need to up a system limit, say. But debugging something real, like a networking problem? Pretty rough.

...Unless You Are Big. Stanford, on the other hand, had a named rep who was responsive and helpful. I guess she was sales, but I used her freely on support issues and she worked the backchannels for us. Presumably this is what any big/important customer would get, that's just not you, sorry.

The Real Power Is On Demand. I'm reaffirming cloud koolaid here. Running this way lets you build and run systems differently, much better. I've relied on the cloud this to bring up emergency capacity. I've used it to convert a class of machines on the fly to the double-price double-RAM tier when hitting a surprising capacity crunch. There are a whole class of problems that get much easier when you can have 2x the machines for just a little while. When someone comes to you with that cost/benefit spreadsheet arguing why you should self-host, that's when you need your file of "the cloud saved my bacon" stories at the ready.

Don't Say No By Email


When I have to tell someone no, I pick up the phone. I hate talking on the phone, but I do it anyway.

When you're answering no to someone, you're disappointing them even if just a little bit. So you owe it to them to talk instead of sending an email. It's the polite thing to do.

But there are two other reasons, selfish reasons, for making the call. First, you get immediate feedback on how they took the news. If they're upset then you can do damage control straight away. And at least you know! And second, delivering bad news directly and respectfully is an important skill to develop. We can all use the practice. And it's never as bad as I think it will be.

Fewer Trick or Treaters This Year

This year's Halloween tally was 208. I don't know why we have 32% fewer trick-or-treaters than we had last year, which was down 20% from the year before that. Maybe the rain earlier in the day kept people home. Maybe because it was Friday people opted for parties instead of going door to door. I don't know.

Thanks to my friend Stuart for being on this year's data gathering crew. As usual, the full story is in the numbers.

In Praise of the Hand Tally

Tally Marks

For the past five years I've gathered statistics on how many trick-or-treaters have come by on Halloween. If you want to read about that, check out posts from last year or the year before. This post is about how I track those stats, and how I don't.

Every year I'm tempted to build some fancy system to collect and manage these statistics. Wouldn't it be fun, say, to wire up some Raspberry Pi sensor that automatically counts and tweets running totals? It wouldn't be that hard and sounds like fun.

The problem is making something like that reliable. You'd have to do all the un-fun stuff, like testing and contingency planning. If your baseline is a clipboard, paper, and a ball point pen, your bar for failure is basically "never". Even if I did build something fancy I'd still end up doing backup tallies by hand. At this human scale, the tech ends up being a fun gimmick, not required.

It reminds me of a story from friend [Tony]. Tony and his brother Tom run a giant gaming convention every year, the Evolution Championship Series (Evo for short). It's a multi-day convention in Las Vegas that attracts something like ten thousand participants. They run the whole thing with their two other founders and some friends — I'm sure they have some paid help now, but the four guys are the main ones. It's impressive.

Given that Tony and Tom are strong engineers, I figured this would be a slick high-tech operation. Not so.

Tony said they've tried tech at various points and it wasn't worth it. It's easy to see why that is tempting: they have multiple mobile coordinators that need access to changing, shared information, like brackets and schedules. But what they've tried has let them down. Usually it's not the hard parts that fail, but the basics, like batteries and wireless connectivity. So they still run this off of printouts and voice communications (cell phones/walkie-talkies) and periodic data dumps.

And so, this year I'll be gathering my Halloween stats like I always have: clipboard, pen, and a hand-held tally counter. The data will still be timely and accurate.

For the curious few, check out my Halloween Traffic Spreadsheet.

Two postscripts. First, Please stop spreading that NASA Space Pen story. I'm sure you've heard it: how do you write in zero G? the wasteful Americans commissioned a multi-million dollar space pen project; the scrappy can-do Russians used pencils. Well, this story has been debunked by the good people at Snopes.

And second, I'd like to plug Tony and Tom's "day job", Stonehearth. I think of it as Starcraft meets Minecraft. I am so eager to play it when it lands.