It was a chilly morning in November when Olivia walked into her favorite coffee shop in Brooklyn and ordered a triple-shot of espresso. While waiting for the barista to make her drink, Olivia opened her laptop and logged on to her company's webmail interface to read a few email. Five minutes later her drink arrived, she closed her laptop, and walked off towards to subway. By the time she disembarked the train in Manhattan an attacker had compromised her email account, downloaded all of her archived mail, sold the list of recipient email addresses to buyers online, and used her email account to send several thousand pieces of spam.
The above scenario is fictional, but also entirely possible. How? An attacker sitting in the coffee shop was running a WiFi pineapple in their backpack. The pineapple impersonated Olivia's home network SSID, causing her laptop to auto-connect to it's network instead of the coffee shop's router. Now posing as a man-in-the-middle, the attacker used SSLStrip and Wireshark to intercept, record, and decrypt the traffic between Olivia's laptop and the webmail server. In seconds, without Olivia noticing a thing, the attacker had her username and password.
There's numerous issues here, but what I aim to illustrate is how sending long-lived credentials over-the-wire is dangerous. Each time you transmit a re-usable credential, especially over a non-trusted network, you risk having it intercepted and used by an unauthorized party. We can never entirely eliminate risk of compromise, but if we can devise a way to transmit secrets less often, we can reduce our possibility of attack. Using asymmetric cryptography, this very thing is possible.
What is asymmetric crypto?
In symmetric (or shared-key) cryptography, both parties share a piece of data which serves an an encryption key. This key is used to both encrypt and decrypt the messages sent between the two parties. Asymmetric (or public-key) cryptography is different in that two keys are used: a public and a private key. When used for encryption, the public key is capable of encrypting data that only the private key can decrypt. For example, Party-A could distribute their public key to anyone who wants it. Party-B, now that they have Party-A's public key, can use it to encrypt a message that only Party-A can read. Along with that message Party-B includes their own public key. Party-A then uses it to encrypt their response so that only Party-B can read it.
Asymmetric cryptography, in addition to encryption, has another interesting application: message signing. When used for message signing, a party can use their private key (the key not shared with anyone under any circumstance) to attach a signature to a message. Then, anyone holding the corresponding public key an verify (1) the sender of the message is actually the holder of the private key and (2) the contents of the message have not been tampered with or altered in anyway en route. While this isn't useful for hiding the contents of a message, this is very useful for establishing trust. Party-A can verify that a message claiming to be from Party-B is in-fact from Party-B (or at least someone holding Party-B's private key). Even better, Party-A can do this without ever asking for a password or any other secret to be sent over the wire.
Asymmetric crypto as authentication
To use asymmetric cryptography for authentication in the context of an HTTP API, we need to define a different things.
- A message schema for a client to claim who they are
- A message signature algorithm for the client to sign their claim with
- A standard way to include the claim message and signature with each request
- A way for the server to store public keys associated with the user's who own them
- Some sort of request middleware on the server to verify authentication claims
Fortunately open-standards describing exactly what we need for points 1 and 2 already exist. A JSON Web Token (RFC7519) is a token comprised of a base64 encoded JSON document and an HMAC signature for verifying the messages contents. The algorithm used in the HMAC calculation isn't part of the standard — most often you'll see it being used with SHA — but in this case, since the creator and consumer of the token aren't the same system, we'll use RSA. As far as the data included in the JWT, we'll want to include at least three fields.
- Unique user identifier: this tells the server who the client is claiming to be.
- Issued timestamp: this tells the server when the token was created. The server should enforce that this be close to the current timestamp — the tolerance should be large enough to allow for some clock drift, but small enough to that we don't have to hold onto nonces forever.
- Nonce: this should be an arbitrary value used to prevent replay attacks using the token. once a request is authenticated, the server should store this value for at least as long as the timestamp drift tolerance. Any request made within that rolling period of time must have a unique nonce.
Point 3, how we'll transmit the claim, is fairly arbitrary. All that really matters is that the server knowns where to look for the clients claim. Since we're talking about HTTP APIs, lets use the HTTP Authorization header.
Finally, points 4 and 5 will very based on your web framework of choice. I've written a Python/Django implementation with using
django.contrib.auth and it's middleware system. Corresponding implementations could easily be ported to Node.js, Ruby, or any other language with an existing JWT library that supports RSA HMAC.