Notes on developing with HHVM

Dynamically typed, interpreted programming languages are great. They're easy for a beginner to learn since they don't have to worry about static type checking, they enable you to push updates to a single source file without pushing a full binary, and they have the instant gratification that comes with not having to compile code. No benefit, though, is without cost; in the case of interpreted languages, the up-tick in programmer productivity is generally offset by a decrease in performance. This only makes sense — much work that could have been done by the compiler (lexing, preprocessing, semantic analysis, etc), now has to be done at runtime instead.

Runtime performance is something I've been thinking about a lot lately. Even by interpreted language standards PHP is a slow language — theres really no way to deny that. Javascript (Node.js) or even Python (PyPy) well out perform it. Historically I, along with thousands of other people, have been extremely critical of PHP's (lack of) design, philosophy, and it's performance. That said, some projects that I care about very much are committed to using PHP and are far to deep-in-the-stack to consider switching now. So, with switching language eliminated as an option, how can we make PHP fast enough to scale a website?

As it turns out, Facebook had the same issue a few years ago. Despite being one of the largest web apps on the internet, their core code is still written in PHP. Their solution to the scaling problem, rather than switch to a faster language, was to build a faster PHP interpreter: HHVM. Like the stock PHP interpreter (ZendPHP), HHVM starts by lexing and tokenizing your PHP source into bytecode. After the tokenizing step is where HHVM differs. ZendPHP immediately executes the generated bytecode and exits. HHVM, similar to the CLR or JVM, translates the bytecode into x64 machine code through the use a just-in-time (JIT) compiler. The resulting machine code is executed to fulfill the current web-request, but also cached into the SQLite database for re-use on the next request. Additionally, since HHVM is both compiling and running PHP source, it can use live variable type inspection to optimize generated code as it runs.

Sandbox mode

As great as the HHVM project is, documentation is not one of there strong points. Sandbox mode is mostly undocumented, but in a development environment, this is how you'll want to run HHVM. Sandbox mode allows a single HHVM daemon to simultaneously do page builds from a number of different source trees. This is ideal for a multi-developer setup, where each developer needs to have their own working copy of the application. Sandbox mode uses a regex match against the the HTTP host header to determine which build to use for a request.

Sandbox mode is enabled in the HHVM server config with the following entry in your server's hdf file:

Sandbox {
    SandboxMode = true
    Pattern = ([A-Za-z0-9]+).dev.example.com
    Home = /home
    ConfFile = .hphp
}

This config tells HHVM to examine the Host header with the regex at Sandbox.Pattern. The first group extracted from the regex is assumed to be the username of the developer. Optionally, the regex could include another group to extract the sandbox name. When HHVM doesn't find a sandbox name — like in the above example — it uses the sandbox name default. This feature would be useful if a single developer needed to have multiple builds; for the sake of simplicity in this example we'll just the the default sandbox.

Once HHVM has extracted the developer's username, it appends it to the Sandbox.Home value to get the developers home directory. For example, if we request joe.dev.example.com it will construct the path /home/joe. Inside that path HHVM looks for the developer's sandbox configuration file, named by the value of Sandbox.ConfFile. In this example it would read /home/joe/.hphp.

Here is an example sandbox config file:

default.path = /home/joe/src
default.log = /home/joe/logs/hhvm.log
default.accesslog = /home/joe/logs/access.log

This config file tells HHVM where to find the source code for each of the developers sandboxes and where to write access and error logs (in addition to the global logging setup on the server). Since we're using the default sandbox, the source root for http://joe.dev.example.com/ becomes /home/joe/src/. Any number of additional sandboxes can be defined in this file if needed.

Debugging

There are plans to support XDebug in the future, but until then, HHVM uses a custom debugger called HPHPd. HPHPd supports debugging both local scripts and HTTP requests running on a remote server, but the HHVM Server must be running in Sandbox mode for the debugger to work.

Local Scripts

Assume we have the following script, test.php.

<?php

function test($i) {
    for ($j = 0; $j < $i;  $j++) {
        var_dump("{$j} of {$i}");
    }
}

test(10);

To run the script we simply invoke hhvm in the cli, like we would php.

joe@dev.example.com:~$ hhvm test.php
string(7) "0 of 10"
string(7) "1 of 10"
string(7) "2 of 10"
string(7) "3 of 10"
string(7) "4 of 10"
string(7) "5 of 10"
string(7) "6 of 10"
string(7) "7 of 10"
string(7) "8 of 10"
string(7) "9 of 10"
joe@dev.example.com:~$

HPHPd is a GDB-like shell debugger. To debug the script, change the hhvm mode to debug.

joe@dev.example.com:~$ hhvm -m debug test.php
Welcome to HipHop Debugger!
Type "help" or "?" for a complete list of commands.

Program test.php loaded. Type '[r]un' or '[c]ontinue' to go.
hphpd>

You can set breakpoints based on either line number or class/function definition.

hphpd> break test()
Breakpoint 1 set upon entering test()
hphpd> break test.php:5
Breakpoint 2 set on line 5 of test.php
hphpd> break list
  1	ALWAYS    upon entering test() (unbound)
  2	ALWAYS    on line 5 of test.php (unbound)
hphpd>

With our breakpoints set, we're ready to run the actual script.

phpd> run
Breakpoint 1 reached at test() on line 4 of /home/joe/test.php
   3 function test($i) {
   4*    for ($j = 0; $j < $i;  $j++) {
   5         var_dump("{$j} of {$i}");

hphpd>

While at a breakpoint, we can view the state of locally defined variables.

hphpd> variable i
i = 10
hphpd>

Continue execution to the next breakpoint:

hphpd> continue
Breakpoint 2 reached at test() on line 5 of /home/joe/test.php
   4     for ($j = 0; $j < $i;  $j++) {
   5*        var_dump("{$j} of {$i}");
   6     }

hphpd> step
Break at test() on line 4 of /home/joe/test.php
   3 function test($i) {
   4*    for ($j = 0; $j < $i;  $j++) {
   5         var_dump("{$j} of {$i}");

hphpd>

HPHPd will inform you when execution completes.

hphpd> continue
string(7) "9 of 10"
Program test.php exited normally.
hphpd>

At this point you can restart execution from the beginning:

hphpd> run
Breakpoint 1 reached at test() on line 4 of /home/joe/test.php
   3 function test($i) {
   4*    for ($j = 0; $j < $i;  $j++) {
   5         var_dump("{$j} of {$i}");

hphpd>

Remote Sandboxes

For web-requests, HHVM is already running as a daemon in server mode. Instead of running the web application directly, we'll start an instance of HPHPd in debug mode and connect to it remotely.

joe@dev.example.com:~$ hhvm -m debug -h localhost
Welcome to HipHop Debugger!
Type "help" or "?" for a complete list of commands.

Connecting to localhost:8089...
Attaching to joe's default sandbox and pre-loading, please wait...
localhost>

Set a breakpoint, for example, anytime we increment a counter:

localhost> break Stats::increment()
Breakpoint 1 set upon entering Stats::increment()
But wont break until class Stats has been loaded.
localhost>

Run continue to hang the debugger, waiting for a breakpoint to occur.

localhost> continue

As soon as a breakpoint it hit, the debugger will unhang.

Breakpoint 1 reached at Stats::increment() on line 64 of /home/joe/src/Stats.php
  63    public function increment($name, $amount = 1) {
  64*      $this->backend->increment($name, $amount);
  65    }

localhost>

Show the locally defined variables:

localhost> variable
$name = "connection_count"
$amount = 1
localhost>

When done debugging the request, unset your breakpoints and continue to finish execution:

localhost> break disable all
  1	DISABLED  upon entering Stats::increment()
  2	DISABLED  on line 64 of /home/joe/src/Stats.php
localhost> continue

The HPHPd debugger is very powerful and, despite a bit of a learning curve and lack of documentation, can be a hugely valuable tool to gain insight into your code. More information can be gained from running the help command from within the HPHPd shell.

Links:

Programming Language Performance Benchmarks