Notes on developing with HHVM

Dynamically typed, interpreted programming languages are great. They're easy for a beginner to learn since they don't have to worry about static type checking, they enable you to push updates to a single source file without pushing a full binary, and they have the instant gratification that comes with not having to compile code. No benefit, though, is without cost; in the case of interpreted languages, the up-tick in programmer productivity is generally offset by a decrease in performance. This only makes sense — much work that could have been done by the compiler (lexing, preprocessing, semantic analysis, etc), now has to be done at runtime instead.

Runtime performance is something I've been thinking about a lot lately. Even by interpreted language standards PHP is a slow language — theres really no way to deny that. Javascript (Node.js) or even Python (PyPy) well out perform it. Historically I, along with thousands of other people, have been extremely critical of PHP's (lack of) design, philosophy, and it's performance. That said, some projects that I care about very much are committed to using PHP and are far to deep-in-the-stack to consider switching now. So, with switching language eliminated as an option, how can we make PHP fast enough to scale a website?

As it turns out, Facebook had the same issue a few years ago. Despite being one of the largest web apps on the internet, their core code is still written in PHP. Their solution to the scaling problem, rather than switch to a faster language, was to build a faster PHP interpreter: HHVM. Like the stock PHP interpreter (ZendPHP), HHVM starts by lexing and tokenizing your PHP source into bytecode. After the tokenizing step is where HHVM differs. ZendPHP immediately executes the generated bytecode and exits. HHVM, similar to the CLR or JVM, translates the bytecode into x64 machine code through the use a just-in-time (JIT) compiler. The resulting machine code is executed to fulfill the current web-request, but also cached into the SQLite database for re-use on the next request. Additionally, since HHVM is both compiling and running PHP source, it can use live variable type inspection to optimize generated code as it runs.

Sandbox mode

As great as the HHVM project is, documentation is not one of there strong points. Sandbox mode is mostly undocumented, but in a development environment, this is how you'll want to run HHVM. Sandbox mode allows a single HHVM daemon to simultaneously do page builds from a number of different source trees. This is ideal for a multi-developer setup, where each developer needs to have their own working copy of the application. Sandbox mode uses a regex match against the the HTTP host header to determine which build to use for a request.

Sandbox mode is enabled in the HHVM server config with the following entry in your server's hdf file:

Sandbox {
    SandboxMode = true
    Pattern = ([A-Za-z0-9]+).dev.example.com
    Home = /home
    ConfFile = .hphp
}

This config tells HHVM to examine the Host header with the regex at Sandbox.Pattern. The first group extracted from the regex is assumed to be the username of the developer. Optionally, the regex could include another group to extract the sandbox name. When HHVM doesn't find a sandbox name — like in the above example — it uses the sandbox name default. This feature would be useful if a single developer needed to have multiple builds; for the sake of simplicity in this example we'll just the the default sandbox.

Once HHVM has extracted the developer's username, it appends it to the Sandbox.Home value to get the developers home directory. For example, if we request joe.dev.example.com it will construct the path /home/joe. Inside that path HHVM looks for the developer's sandbox configuration file, named by the value of Sandbox.ConfFile. In this example it would read /home/joe/.hphp.

Here is an example sandbox config file:

default.path = /home/joe/src
default.log = /home/joe/logs/hhvm.log
default.accesslog = /home/joe/logs/access.log

This config file tells HHVM where to find the source code for each of the developers sandboxes and where to write access and error logs (in addition to the global logging setup on the server). Since we're using the default sandbox, the source root for http://joe.dev.example.com/ becomes /home/joe/src/. Any number of additional sandboxes can be defined in this file if needed.

Debugging

There are plans to support XDebug in the future, but until then, HHVM uses a custom debugger called HPHPd. HPHPd supports debugging both local scripts and HTTP requests running on a remote server, but the HHVM Server must be running in Sandbox mode for the debugger to work.

Local Scripts

Assume we have the following script, test.php.

<?php

function test($i) {
    for ($j = 0; $j < $i;  $j++) {
        var_dump("{$j} of {$i}");
    }
}

test(10);

To run the script we simply invoke hhvm in the cli, like we would php.

joe@dev.example.com:~$ hhvm test.php
string(7) "0 of 10"
string(7) "1 of 10"
string(7) "2 of 10"
string(7) "3 of 10"
string(7) "4 of 10"
string(7) "5 of 10"
string(7) "6 of 10"
string(7) "7 of 10"
string(7) "8 of 10"
string(7) "9 of 10"
joe@dev.example.com:~$

HPHPd is a GDB-like shell debugger. To debug the script, change the hhvm mode to debug.

joe@dev.example.com:~$ hhvm -m debug test.php
Welcome to HipHop Debugger!
Type "help" or "?" for a complete list of commands.

Program test.php loaded. Type '[r]un' or '[c]ontinue' to go.
hphpd>

You can set breakpoints based on either line number or class/function definition.

hphpd> break test()
Breakpoint 1 set upon entering test()
hphpd> break test.php:5
Breakpoint 2 set on line 5 of test.php
hphpd> break list
  1 ALWAYS    upon entering test() (unbound)
  2 ALWAYS    on line 5 of test.php (unbound)
hphpd>

With our breakpoints set, we're ready to run the actual script.

phpd> run
Breakpoint 1 reached at test() on line 4 of /home/joe/test.php
   3 function test($i) {
   4*    for ($j = 0; $j < $i;  $j++) {
   5         var_dump("{$j} of {$i}");

hphpd>

While at a breakpoint, we can view the state of locally defined variables.

hphpd> variable i
i = 10
hphpd>

Continue execution to the next breakpoint:

hphpd> continue
Breakpoint 2 reached at test() on line 5 of /home/joe/test.php
   4     for ($j = 0; $j < $i;  $j++) {
   5*        var_dump("{$j} of {$i}");
   6     }

hphpd> step
Break at test() on line 4 of /home/joe/test.php
   3 function test($i) {
   4*    for ($j = 0; $j < $i;  $j++) {
   5         var_dump("{$j} of {$i}");

hphpd>

HPHPd will inform you when execution completes.

hphpd> continue
string(7) "9 of 10"
Program test.php exited normally.
hphpd>

At this point you can restart execution from the beginning:

hphpd> run
Breakpoint 1 reached at test() on line 4 of /home/joe/test.php
   3 function test($i) {
   4*    for ($j = 0; $j < $i;  $j++) {
   5         var_dump("{$j} of {$i}");

hphpd>

Remote Sandboxes

For web-requests, HHVM is already running as a daemon in server mode. Instead of running the web application directly, we'll start an instance of HPHPd in debug mode and connect to it remotely.

joe@dev.example.com:~$ hhvm -m debug -h localhost
Welcome to HipHop Debugger!
Type "help" or "?" for a complete list of commands.

Connecting to localhost:8089...
Attaching to joe's default sandbox and pre-loading, please wait...
localhost>

Set a breakpoint, for example, anytime we increment a counter:

localhost> break Stats::increment()
Breakpoint 1 set upon entering Stats::increment()
But wont break until class Stats has been loaded.
localhost>

Run continue to hang the debugger, waiting for a breakpoint to occur.

localhost> continue

As soon as a breakpoint it hit, the debugger will unhang.

Breakpoint 1 reached at Stats::increment() on line 64 of /home/joe/src/Stats.php
  63    public function increment($name, $amount = 1) {
  64*      $this->backend->increment($name, $amount);
  65    }

localhost>

Show the locally defined variables:

localhost> variable
$name = "connection_count"
$amount = 1
localhost>

When done debugging the request, unset your breakpoints and continue to finish execution:

localhost> break disable all
  1 DISABLED  upon entering Stats::increment()
  2 DISABLED  on line 64 of /home/joe/src/Stats.php
localhost> continue

The HPHPd debugger is very powerful and, despite a bit of a learning curve and lack of documentation, can be a hugely valuable tool to gain insight into your code. More information can be gained from running the help command from within the HPHPd shell.


Links:

A Failure to Delegate

This week Marissa Mayer unveiled Yahoo's new logo design, ending the 18 year reign of their former logo. Reactions to the redesign have been less than ideal, but whether the logo is actually good or not, something Mayer said in her blog post interested me.

On a personal level, I love brands, logos, color, design, and, most of all, Adobe Illustrator. I think it’s one of the most incredible software packages ever made. I’m not a pro, but I know enough to be dangerous :)

So, one weekend this summer, I rolled up my sleeves and dove into the trenches with our logo design team: Bob Stohrer, Marc DeBartolomeis, Russ Khaydarov, and our intern Max Ma. We spent the majority of Saturday and Sunday designing the logo from start to finish, and we had a ton of fun weighing every minute detail.

Marissa Mayer is the President and CEO of Yahoo, and yet she spent a weekend wearing her "graphic artist" hat and designing Yahoo's new logo. Some would undoubtedly see this evidence of a "hands-on" CEO as a good thing, but I see it as a failure to delegate. Part of being a good leader is recognizing that other people are better at some things than you are—and to assign projects as such. A CEO – the top-level of all management – who insists on being involved at the implementation level of projects is a CEO who will limit what the company can do.

Microsoft is not an ideal company right now, but they're certainly doing better than Yahoo. Imagine though, that Ballmer spent half of every day writing C++ for new Microsoft products. What state would Microsoft be in compared to where they are now? Would they be launching Xbox One and pushing Windows 8 at the same time? Can one person be heavily involved with the implementation of both products? Most likely not. So, how is Microsoft managing to do both things? By delegating. A CEO's job isn't to write C++ for core products, just like their job isn't to design the company's logo. Certainly they should be involved in the process – the logo does help set the company's image and tone – but involved is not the same as implemented.

I have no doubt that Mayer wants Yahoo to succeed. Her acquisitions of companies like Tumblr show that she's desperately trying to get fresh, new developers on-board. If she's going to be successful, though, she needs to ditch the habit of micromanaging. She needs to be humble enough to realize that her job is to set Yahoo's direction, lead them into the future, and leave design, programming, and other sorts of implementation to the professionals that she's worked so hard to hire.

Breakpoints and Quality.

Last summer the power supply on my old, first generation LCD TV started buzzing. This wasn't the first time it had done that. A few years ago, when it still belonged to my parents, it started buzzing at a frequency just high enough that no one except me could hear it. It was still under warranty, so after convincing my Dad that I wasn't going mad, we sent it in for repair. When I moved to New York in June 2011, my parents decided they were bored with television and told me to take it along. "It'll save you a few hundred inevitable dollars," Dad said. And so to Brooklyn it came, a 27 inch Polaroid television that seemed proportionally worthy of my 200 square foot apartment.

I should note that I hardly, if ever, watch television. My ADD has a hard time putting up with 20 minutes per hour of commercials for products I don't want and shouldn't buy. I do though enjoy an occasional movie courtesy of a friend's "borrowed" Netflix streaming account. My roommate, however, watches football. He did, I should say, until a certain Sunday last fall. I walked in the door that day and began to take off my shoes when my roommate asked, "What's up with the TV? I was watching the game and it just turned off. The standby light isn't even on." I walked over to look at it; the power supply was warmer than usual. I tried turning it on again. It obeyed for a few seconds and then promptly died again. "It's done." I said. "I think the transformer burned out."

I still have decent enough soldering skills that I could probably fix the transformer. I also thought about replacing the unit with something nicer. But then a more intriguing thought occurred to me, "What if this could be a breakpoint." Like, I suspect, most kids growing up in the States in the nineties, I can't remember not having access to a television. What an interesting experiment it would be to consciously decide to not have TV available. My goal, aside from curiosity and a desire to not spend a few hundred dollars, was to make other, more beneficial means of entertainment more appealing. I would start with reading more and follow with writing more. This experiment combined with my interests lead me to William Zinsser's phenomenal book, "On Writing Well."

I would highly recommend that anyone who ever plans to put at least three words into sequence should read this book. Not only is it relevant for long-form writing and essays but also public speaking and interoffice communication: two things I seem to do a lot of. What especially impacted me, though, was the final chapter. After just under 300 pages of discussing how to write well, he switches to a more philosophical subject. He begins to discuss why you should write well.

He had a passion for quality and had no patience with the second rate; he never went into a store looking for a bargain. He charged more for his product because he made it with the best ingredients, and his company prospered…

Only later did I realise that I took along on my journey another gift from my father: a bone-deep belief that quality is its own reward. I, too, have never gone into a store looking for a bargain. #

Ostensibly, those passages apply to writing. In reality, they apply to so much more. In reality, they represent a mitigation of so much of what's wrong with modern anti-culture. As much as I enjoyed working as a consulting web developer, the consistent disinterest in quality drove me from it. I love writing good software. I love working on teams with people who really care about building something the right way. The way that will last. The way that will make our inevitable successors understand and agree with the choices and tradeoffs we made. The way that, when you have a long view, makes sense. My current team is a great example of this: 7 other people who love doing things right.

Unfortunately this is not the prevailing trend. Not in the software industry or the automobile industry or the world. The prevailing attitude among so many people isn't, "I'd like to buy a really nice pair of shoes, take proper care of them, and wear them for the next 40 years." Instead people are raised to always want a new pair of shoes, a new house, a new car. Advertising indoctrinates them to believe that's ok. "You deserve a new pair of shoes", they hear, "and it's ok, they're mass produced and they're so cheap that everyone can afford them!" And so, instead of buying 2 pair of Allen Edmunds or Aldens, they buy 20 pairs of Nike and New Balance.

Not only is this wasteful and materialistic, but it robs one of the inherent joy that comes from good. Good is replaced by the cheap imitation that is new. Shiny. Gaudy. This isn't anymore sustainable in shoes than it is in software. As programmers and system architects, it's our job to emphasis this. Capitalism is capitalism. It will always prefer cheap to good and now to later. The only way to prevent that is to stop undercutting ourselves and stop minimising the costs that come with doing something wrong.

Sure, the application could be done in a month by an understaffed development team. It's going to be ugly, have a bad user experience, tarnish the company's name, and occasionally delete the user's entire home folder. Or…we could hire another developer, take 3 months to build it, and make record profits.

Some companies will never accept that. A bad application is good enough, they will say. Those companies will eventually fail. As a developer, though, it's still our job to try. "Several magazine editors have told me I'm the only writer they know who cares what happens to his piece after he gets paid for it…yet to defend what you've written is a sign that you are alive."# To defend quality is a sign that you're a good programmer. To defend quality is to have satisfaction is your work.

SilverStripe 2.x Templates

SilverStripe and a few other frameworks use a variant of standard HTML comments within their template engines. This enables the engine to easily strip out comments before sending the page to the client, but it does cause an issue for most syntax highlighting text editors. Since Sublime Text 2 is my editor of choice, heres a simple Gist to make it syntax highlight these comments correctly. Add the above excerpt to HTML/HTML.tmLanguage in your Sublime Text 2 Packages directory.

View with Gist

skyscraper of bananas

When you sit back and look at it web application development is a total disaster. If you tried to conceive of the "ideal" UI platform (or even a "good" one) you would never, in a million years, come up with what we've got now. It is a testament to the skills of everyone in the industry that things like google maps, Gmail etc. can be built on this sky-scraper of bananas. No wonder it is hard to learn.—josephcooney

I completely agree, yet despite being a skyscraper of bananas with a million moving parts, a well designed web app still has both better typography and a better user experience than the vast majority of desktop software—especially desktop software on Windows. What a great example of how tools matter a lot less than most people think. What really matters in building a great product isn't the language it's written in, but rather hard work and great attention to detail.

Hacker New—Why Codecademy is overrated and missing it's target audience

Purge.

I don't have enough time. More specifically, I've decided I don't have enough time. Everyday I get up at at 6:00AM with high hopes for the day: optimistic views on what I'll accomplish; which tickets I'll finally polish off; how much closer to the next milestone my team will be. Yet everyday, by the time I'm too exhausted to think about anything worthwhile, I'm disappointed. The issues I wanted to fix: still open; the random tasks requested of me throughout the day: still unchecked to-do items. It's been this way for a while and I've had enough.

Don't misunderstand what I'm saying: I love my job and I love the people I work with. Like any programmer these days, recruiters contact me semi-frequently, yet they never receive anything more than "Thanks, but I'm content," from me. Besides, I love being busy. Like most other ADD inflicted introverts, there are few things I fear more than not being busy. Being busy with meaningful tasks gives someone's life purpose, direction, and satisfaction. So what's the issue?

The issue is that my backlog continues to grow and is quickly exceeding my capacity. I can keep up at work—but that's far from all I need to do in a day. Regularly, I'll decide I need to read a new book or I'll come up with a programming idea I'd like to try. Someone recently gave me a guitar, so now I need to learn how to play it. I have sticky notes everywhere reminding me to take more time to write, to go cycling for a few hours, to meditate, to learn to speak Mandarin. All of these are important tasks with specific (although not always obvious) reasons for their existence; all of them sitting in my backlog waiting to get done. At my current burn-rate that day will never come.

Something needs to change. But what? I can't work less—In all reality I should work more. I can't eliminate any of the projects above. But I can eliminate kipple. How much time do I spend on email every day? What about Twitter? Instagram? Facebook? RSS Feeds? Honestly, I don't know. I have absolutely no empirical data that says I'm wasting my time and cognitive energy reading Twitter, but I know it's taking a non-trivial chunk of time. Time that could be better spent finally reading the untouched books sitting on my shelf, becoming more physically fit, meditating and regaining perspective, or spending time with the people I care about.

I've decided to take action. I'm not getting rid of Twitter or Facebook or Reeder, but I'm cutting them off at the knees. I just cut my Twitter following count by 60%. I cut Facebook down to 30 friends that I really, actually know. In Reeder I now subscribe to 36 RSS feeds, down from a peak of over 150 a few years ago.

I'm keeping all of these services around because they do have value: they help me be aware of important news and keep in touch with good friends. I've simply trimmed the fat and raised the bar on the quality of the content I'm going to consume. Too much of the content I was spending time on was completely inconsequential and frighteningly shallow. Instead of content being a way to convey important information, it was a way to spend time and to distract myself from thinking about difficult problems. That's something I can no longer afford to do.

In short: time is the most valuable and exceedingly scarce resource in the Universe, so I've decided to treat it as such.

good design is...

This couch doesn't have drop shadows and an unnecessary 'loading' animation, metaphorically speaking. This article is a great description of the product of what good designers do. It's cheap to manufacture. Easily deployed and assembled. Well suited to its primary purpose. Incorporates multiple additional uses without compromising the primary use. Flexible enough to be reconfigured easily for different locations. Best of all, it doesn't have any unnecessary 'design' frills. — HN Comment dicussing Ikea furnature

In otherwords, good design is discovering what you think is a new use case for a product and then realizing, "the designers obviously already thought about this." Good design is the complete lack of arbitrary decisions.

Design, Innovation and Hacking in a couch

The difference between online and print

I really hate that people think writing online needs to be significantly dumbed down and shorter than writing in print. Studies have shown that people are conditioned to skip around and to read only small blocks of text when reading online, but I hardly think that's the fault of the LCD they're looking at. When you compare the two mediums, print and web, I don't think the brain knows the difference between looking at letters on a computer or letters on a sheet of dead tree. LCD's do arguably cause more eye-strain, but that's slowly improving with technologies like e-ink.

The difference, I think, is our own fault. When's the last time you saw an online article that looked like page from The Hitchhiker's Guide to the Galaxy or Anathem? Don't be surprised if you can't think of one, because I can't either. Online reading is plagued by absolutely horribly designed flash ads and other sorts of chrome surrounding it. We're programmed to be drawn to motion, so if there's an invasive flash ad complete with video and animation sitting right next to a long-form article, who can blame our brain for tending to stare at it rather than focus on what we're reading?

The solution then, to online writing, is not to write ever shorter articles. Who wants to live in a world where the New York Times produces nothing but Twitter posts? Rather I think we need to bring back the long-form article and remove what has been slowly killing it: distractions, bad typography and noisy pages. Then, just maybe, we'll be able to regain the attention of the continually more distracted reader.

Node.js isn't a silver bullet, but it's still a bullet.

If you're using Node.js, you're doing life wrong

This morning, on a conference mailing list, I made some disparaging remarks about Node.js (the title of this post, in fact). A couple people asked me why I felt that way. Rather than respond individually, I'll just list my reasons here — codeslinger.posterous.com

I really don't understand all the hate for Node.js. It seems like many of the articles and rants against Node.js assume that Node.js was supposed to be good at everything. It's not. Just like every other framework and language in existence, it's good at some things and bad at others.

Node.js, at least from my understanding, was designed to be great at transporting small bits of information around the internet very quickly, and in real-time. Server-side events, instant messaging apps, real-time games, and collaboration tools are all great example of this. Take for example Trello. Trello is a real-time collaboration app that leverages Socket.io and Node.js to enable real-time propagation of events and state-changes between clients. You could do the same thing with long-polling ajax or even frequent polling, but those both come with the cost of tying up unnecessary worker threads on the server and dealing with extra requests. Node.js on the other hand is inherently great at this. It's asynchronous event based architecture makes receiving, processing, and sending real-time events simple, painless, and very fast.

At the same time, Node.js isn't isn't especially good at computation. If for example you were trying to build an API to return the nth number in the Fibonacci sequence, Node.js would almost certainly be a bad choice? Why? The whole reason to use Node.js is based around the idea of not waiting on things. Instead of waiting for a db query to return results, it just triggers a db query and sets a callback event. Then, while the query is processing, your program can be doing other things (like handling another request). This is what makes Node.js seem so fast, without actually using more than 1 CPU core. In our example of computing Fibonacci numbers, however, the program doesn't need to wait on anything. The speed at which such an API can return results is directly linked to how fast it can compute a result. So here it would be better to use another, computationally faster language like Haskell or Scala.

So what's the point of all this? The point is that it's silly and irrational to complain that a framework isn't good at completing task A, when it was only ever designed to do task B. Node.js is extraordinarily good at what it was designed to do- so don't rant that it's bad at something else.

You can discuss this post on Hacker News.

there's never been a better case for minimalism

“@sdw: This is how Apple sells its laptops. http://twitpic.com/89a4ay This is how HP sells laptops. http://twitpic.com/89a47m" CC @gruber

Yet HP is Apple's largest, most competent competitor? No wonder Apple made $46.3 Billion last quarter.