Hi. I’m Cody Hatch.

Here are some thoughts of mine.

High performance Nginx and SSL

As we move towards launch, we’re doing a lot of long delayed polish and testing. Checking browser compatability, adding cacheing, integrating with a CDN, and generally making sure everything is working just right. We’re also doing some stress tests with the excellent blitz.io tool.

About that…

So, I queued up a basic blitz.io test (or a “rush”, in their terminology), and…

whups

For those unfamiliar, the grey line shows increasing load, the blue line shows replies, the red line shows errors. Or in simple terms, about the time I got 40 hits a second, everything fell over.

On the surface, this is kind of surprising, since I’m using Nginx, and serving up a static pre-rendered page. Nginx is well known to be fast and really good at serving static content; most of the blog posts and StackOverflow comments about how to fix “my site falls over when I stress test it” boil down to “use Nginx!”. But I’m already using it, so…um.

A few tweaks

First off, I poked around a bit, and realised that I actually wasn’t serving up the page directly from disk; a small config error meant I was actually hitting a node.js process which was serving up the static page directly from disk. Node isn’t quite as fast as Nginx, but it’s quite fast enough; fixing my config didn’t do anything.

So then I started tweaking Nginx config variables. worker_processes didn’t seem to do much. And neither did worker_rlimit_nofile, nor open_file_cache, or turning access_log to off, or well…anything. My server froze up and cried little tears of pain every time I hit the test.

Progress

Well, okay. So something is bottlenecked. And it’s not clear what exactly is bottlenecked either, so…hm. Let’s try running top on the server while we hit it with a test.

Oh look, my CPU usage is 100%. That certainly explains the results I’m seeing, but Nginx is meant to be super low CPU when it comes to serving static files! What’s going on?

At this point, most of your are probably yelling at your monitors:“It says SSL in your title! SSL termination is computationally expensive! You’re a flaming idiot!”

Indeed. It does, it is, and I might be. Because once I realised what was going wrong I ran an rush on the non-SSL URL for our app. And you can probably guess the results:

hmm

What you’re seeing here is that the response time stayed pretty constant, while hits tracked overall volume closely. In short, it responded (quickly!) to whatever I threw at it up to 250 hits/second which is as high as I decided to go. Which is what Nginx is famed for, so uh…good. (Not shown: Our CPU usage, which was essentially nil. No surprise; serving static files from RAM over plain-old-HTTP is practically free.)

But now what? I kind need SSL termination.

The fix

Hardware

Step one is to look at the hardware I’m throwing at the problem. Which, as it turns out, is a $10/month Digital Ocean droplet. For the price, they’re not bad — 30GB of SSD disk, 1GB of RAM, and 1 vCPU. It’s that last bit which is clearly killing us now though.

Well, it’s the work of a moment to resize our droplet to the next tier up with 2 vCPU. Let’s reset to vanilla Nginx settings and see what that does:

better

Well, that is technically better…kind of. Okay, now with all those tuned worker_processes settings we copied off the internet?

nope]

Well, it’s certainly spiky looking. But not what we’re going for.

Software

Nginx is very configurable. Maybe there are some specific SSL settings? But of course there are!

1
2
3
4
5
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers ECDHE-RSA-AES256-SHA384:AES256-SHA256:RC4:HIGH:!MD5:!aNULL:!eNULL:!NULL:!DH:!EDH:!AESGCM;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
yes!]

Well how about that?

The conclusion

Well, the obvious conclusion is “know how to use your tools”, but that’s a bit too general. Let’s look for something a bit more specific. Such as:

Make sure to configure Nginx properly for SSL!


Testability

The more time in software development, the more I come to understand the importance of testing. Unit tests, functional tests, integration tests…the important thing is to have some tests.

When I was younger and stupider (all of, oh, 2-3 years ago) the benefit of tests seemed more nebulous. And stuff like DI? That’s some crazy Java crap. Ain’t nobody got time for that stuff!

I was kind of dumb

So now I’ve got this fairly massive app (well, massive for a team of 2) we’ve been working on for a couple of years. And I’m starting to really wish I’d been a lot more into testing when we started. That little todo webapp you threw together to learn Ember or whatever probably doesn’t need a lot of tests. But we’ve got something like 9k lines of Coffeescript these days, plus endless Jade templates, a bunch of Stylus stylesheets, and a ton of random scripts, Ansible playbooks, config files, whatever. And a complicated architecture. We need tests!

…and we don’t really have them.

Meh, just add them!

If only it were that easy! …well, it kind of is. We do have a server (a lightweight Express app mostly just exposing a somewhat RESTful API) and it is pretty easy to add tests to it. Take a scooping helping of Mocha, a dash of Supertest, and a liberal coating of Nock, shake well, and you get some very nice, very fast, very useful functional tests.

A simple test
1
2
3
4
5
6
7
8
9
10
11
12
app = express()
config = require './config'
require('./routes')(app, config)
require './nocks'

describe 'The  server', ->
  before (done) -> app.listen config.PORT, config.BIND, (err) -> done err

  it 'should reject logged out requests', (done) ->
    request(app)
    .get('/api/test')
    .expect(401, done)

Basically, I require my server code, spin up a testing instance, then make a serious of requests to the API endpoints, with all the calls to other servers mocked out with nock. And it’s amazing.

But our client is written with KnockoutJS, and it’s not very well organised.

The options

Unit tests

This makes sense. The server can be unit tested trivially, and Knockout is based on the MVVM pattern. Just require the app, and test some view models!

Problem: Over the months and now years, a few “quick hacks” have tangled the viewmodel code with the view code. We have (just a few!) cases where viewmodel functions are referencing the window, or document globals directly, or using jQuery selectors. This won’t work in a headless unit test.

Um, unit tests with JSDOM?

Doesn’t implement enough. The code relies on being in a web browser, and refactoring it now, especially without tests (funny how that works) will be a nightmare.

Okay, ZombieJS!

Zombie is pretty awesome; it gives you a headless browser you can play with. But like JSDOM, it isn’t enough. The code expects a Webkit browser.

Webkit? PhantomJS!

An obvious choice. We can actually spin up a PhantomJS instance, and hit a dev server, and fetch an actual copy of the app.

But these aren’t unit tests any more; we’re not into the realm of acceptance or integration tests. Which is fine, but…

Problem 1: They’re really slow. It’s a big, chunky app; it’s meant to be cached and only loaded once. Plus it does a lot of slow processing on initial load, so… These tests are slow.

Problem 2: The PhantomJS API is a huge pain to use. Something simple like loading the app, then doing some tests to make sure the login form properly validates usernames and passwords, then logging in, waiting for the initial data to load, and then checking to make sure it loaded properly is, again, slow, but also a huge pain to write.

Okay, CasperJS?

Not much better. It’s just a painful API.

Right, Capybara?

Well, it’s ruby-centric, and we’re not a ruby shop. But it works, and it has a nice API, and I even got a couple of Lettuce+Capybara tests written.

But it’s still very slow to run, and very slow to write. These aren’t a replacement for the unit tests we so desperately need; at most I could write a couple as smoke tests. (Basically “log into the app, connect to the test DB, and make sure a record displays; if that works it can’t be too broken so let’s ship it into production!”)

But we still need unit tests. And this path doesn’t lead to unit tests.

Now what?

That’s a good question. I’ve got over 8k lines of poorly organised Knockout code with no unit tests, and no ability to write unit tests unless I refactor it, which will be a nightmare without…unit tests.

At this point, we wince, accept the technical debt, and push forward, and hope we get the resources for a proper rewrite one day.

And the lesson is?

Testability matters. Which is why you should be writing tests from day one; not because you need them then, but because one day you’ll need them, and having tests guarantees that you can write tests. I look at the search code now, and I wince, because it’s so tightly coupled to, well, everything else, that it’s largely untestable now. But it didn’t have to be that way.

Also: The Knockout docs say almost nothing about testing, and google for Knockout + testing turns up very little. Culture matters, and I think that one of the big advantages Angular may have over Knockout is nothing technical; it’s just the focus the Angular community puts on testing.


Hello World

Let’s try the blog thing again.