Last Sunday my friend told me that Starman is now way faster than Starlet and I started to wonder why. There should not be such a difference between the two servers, sicne they share a lot in common. So I looked into the code yesterday, found the bottleneck and fixed them, as well as adding some tweaks. Here's what I did, and the chart shows how the performance changed.
1) optimize the calculation of Content-Length
Plack::Middleware::ContentLength was the larget bottleneck. Instead of calling the module Starlet now calculates the Content-Length header by itself in a faster way, when and only when necessary.
2) use TCP_DEFER_ACCEPT on linux
TCP_DEFER_ACCEPT is a flag that suppresses the return of accept(2) until any data arrives on a newly-established TCP socket. By using the flag, same number of HTTP requests can be handled by a smaller set of workers, since they do not need to poll waiting for HTTP requests to arrive on non-keepalive connections. The fact leads to less memory pressure and higher performance. Starlet turns on the flag by default.
3) switch from alarm to select-based polling
Apps can now use alarm for their own purposes.
4) reuse the fake psgi.input
An optimization copied from Starman.
5) merge short content-length with response headers
It is sometimes faster to merge data within perl and send them as a single packet than sending the two separately. BTW, anyone going to write a wrapper for writev(2)?
The conclusion is that Starlet is still a good choice if you are looking for an application server that runs behind a reverse proxy, maybe better than Starman until miyagawa-san copies the optimizations back :-p
NOTE 1: The benchmark was taken on linux-2.6.32 (x86_64), using a single core of Core 2 Duo running at 3GHz. The options used were "ab -c 50", "plackup -s Starlet --max-reqs-per-child=10000", "--workers=64" for keepalive and "--workers=16" for non-keepalive.