in English

October 27, 2010

Announce: The Transfer of Mylingual.net

Thank you for using Mylingual.net, an automatic, user-based translation (localization) service of web-based application UI.  As of Oct. 29 2010, the service will be transferred from Cybozu Labs, Inc. to Kazuho Oku (the developer of the service, it's me).

  • All the translation data (including editing history, etc.) will be transferred.
  • The browser extensions and userscripts will continue to work as it does today.
  • The translators are requested to re-create their account after the transfer.
  • After the transfer, the translators will be requested to log-in to the new system using Twitter but the integration might not be completed as on Oct. 29.
  • After the transfer, announces and discussions will be posted to the website of Mylingual or to my personal weblog: Kazuho's Blog.

I am sorry to bother the users of the service (especially the translators).  Thank you for your cooperation.

October 21, 2010

Compressing URLs in your Webapp, for size and speed

Last year I had a chance to talk about the internals of our service: Pathtraq at Percona Performance Conference (slides), in which I described the methods we use to compress the URLs in our database to below 40% of the original size, however had not released the source code since then.  I am sorry for the delay, but have finally uploaded the code to github.com/kazuho/url_compress.

It is generally considered difficult to achieve high ratio for compressing short texts.  This is due to the fact that most compression algorithms are adaptive, i.e., short texts reach their end before the compressors learn how to encode them efficiently.

Our approach uses an algorithm known as "static PPM (prediction by partial matching)", that uses a pre-built table for compressing tables.

By using static PPM it is possible to achieve high compression ratio against hostnames or domains (e.g. "www" or "com") or English words that occasionally appear in paths or query parameters (e.g. "search").  And even in case of compressing unknown words in URLs the method works fairly well by utilizing the lower-order probability prediction tables for syllable-level compression.

The repository includes the source code of the compressor / decompressor (the core of the compressor; an optimized range coder using SSE, based on the work by Daisuke Okanohara, is actually git submoduleed from github.com/kazuho/rangecoder), Perl scripts for building compression tables, a small URL corpus for building and evaluating compression tables, and source code of a MySQL plugin that can be used to compress / decompress URLs using SQL.

By following the instructions in the README you will be able to build the compressor that compresses the URLs in the corpus to 30.5% in average, and use it in your MySQL database to store URLs more efficiently.  And one more thing, the benefit is not only for space efficiency.  Since the compressor is designed so that prefix search can be performed without actually decompressing them, in some cases your query becomes faster if you store the URLs in compressed form :-)

October 16, 2010

My Slides at YAPC::Asia 2010 (Writing Prefork Servers / Workers, Unix Programming with Perl)

The slides I used for my presentations at YAPC::Asia 2010 are now available from the links below. Thank you to those who came, and JPA and the volunteer stuff for running the event. I hope you enjoyed (will enjoy) my slides!

August 17, 2010

Releasing Starlet 0.10, now runs faster than ever

Last Sunday my friend told me that Starman is now way faster than Starlet and I started to wonder why.  There should not be such a difference between the two servers, sicne they share a lot in common.  So I looked into the code yesterday, found the bottleneck and fixed them, as well as adding some tweaks.  Here's what I did, and the chart[1] shows how the performance changed.

Starlet

1) optimize the calculation of Content-Length

Plack::Middleware::ContentLength was the larget bottleneck.  Instead of calling the module Starlet now calculates the Content-Length header by itself in a faster way, when and only when necessary.

2) use TCP_DEFER_ACCEPT on linux

TCP_DEFER_ACCEPT is a flag that suppresses the return of accept(2) until any data arrives on a newly-established TCP socket.  By using the flag, same number of HTTP requests can be handled by a smaller set of workers, since they do not need to poll waiting for HTTP requests to arrive on non-keepalive connections.  The fact leads to less memory pressure and higher performance.  Starlet turns on the flag by default.

3) switch from alarm to select-based polling

Apps can now use alarm for their own purposes.

4) reuse the fake psgi.input

An optimization copied from Starman.

5) merge short content-length with response headers

It is sometimes faster to merge data within perl and send them as a single packet than sending the two separately.  BTW, anyone going to write a wrapper for writev(2)?

The conclusion is that Starlet is still a good choice if you are looking for an application server that runs behind a reverse proxy, maybe better than Starman until miyagawa-san copies the optimizations back :-p

NOTE 1: The benchmark was taken on linux-2.6.32 (x86_64), using a single core of Core 2 Duo running at 3GHz.  The options used were "ab -c 50", "plackup -s Starlet --max-reqs-per-child=10000", "--workers=64" for keepalive and "--workers=16" for non-keepalive.

July 02, 2010

Writing a Fast and Secure HTTP Parser (in the case of HTTP::Parser::XS)

In May, I had a chance to give a talk at Tsukuba.xs Beer Talks #1.  I forgot to upload the slide since then, but finally, here it comes :-p

The slide titled "HTTP::Parser::XS - writing a fast & secure XS module" covers the design and implementation of HTTP::Parser::XS: an XS HTTP Parser that is used by a number of Plack-compatible web servers, and picohttpparser: its underlying HTTP parser written in C.

My intention was to introduce to programmers who mostly use Perl the fun of writing tight (fast and beautiful) code using the C programming language.  It would be grateful if you could feel some of my addiction from the slides.  Enjoy!

PS. And if you find any security holes or bugs, please let me know. Thank you in advance.

June 29, 2010

Q4M 0.9.4 released

I have just uploaded Q4M (Queue for MySQL) 0.9.4 to q4m.31tools.com.  There has been no bug fixes since 0.9.3, the only change is the newly added function queue_compact(table_name) that can be used to trigger table compaction manually.

If you were looking for a way to control queue compaction timing, it would be worth considering upgrading to 0.9.4.

For more information of what compaction is, please refer to my last entry on Q4M describing concurrent compaction.

April 26, 2010

MySQL and the XFS stack overflow problem

I have heard that there have been talks at MySQL UC on running MySQL (InnoDB) on top of XFS for better write concurrency than ext3.

Last weekend I had a chance to discuss on Twitter the xfs stack overflow problem (that became apparent this month, on x86_64 systems with 8k stack) and how it would affect (if any) MySQL servers.  If I understand correctly, the problem is that stack overflow might occur when a dirty page needs to be paged-out to xfs.

The conclusion was that for stability of MySQL running on xfs:

  1. xfs should not be used on top of LVM or any other MD (i.e. software RAID, etc.)
  2. xfs volumes should only contain ibdata files (that are accessed using O_DIRECT) so that the files on xfs would never exist as dirty pages within the OS

References:

March 25, 2010

Q4M 0.9.3 prerelease (with support for "concurrent compaction")

Q4M (Queue for MySQL) periodically performs an operation called "compaction", which is sort of a garbage collection, that collects empty space from a queue file and returns to the OS.

The pitfall that exists until now was that during compaction, all operation on the queue table was being blocked.

My opinion was (is) that it is not a serious problem for most users, since the time required for compaction will be small in most cases (the time depends on the number (and size) of the rows alive on the queue table, and the number of the rows alive will be mostly small).

But for usecases where fast response is a requirement, I have added a "queue_use_concurrent_compaction" option to Q4M in the 0.9.3 prerelease.  When the variable is set to one in my.cnf, INSERTs will not be blocked during compaction.  Another configuration variable queue_concurrent_compaction_interval is also available to fine-tune response time of INSERTs during compaction.  The response of INSERTs during compaction will become faster as you set the variable smaller, although the compaction will become slower as a side effect.

my.cnf
# enable / disable concurrent compaction (0: disabled (default), 1: enabled)
queue_use_concurrent_compaction=1

# handle INSERTs for every N bytes of data is compacted (default: 1048576)
queue_concurrent_compaction_interval=1048576

If you are already using Q4M without any problems, I recommend sticking to the version you are using, since the stability of Q4M might have degraded due to the introduction of the feature.

On the other hand if you were having problems with the issue or planning to use Q4M for a new application, I recommend using this release.  Have fun!  And if you find any problems, please send stacktraces to me :-)

January 22, 2010

Q4M 0.9.2 prerelease avaiable fixing data corruption on 32bit systems

Thanks to a user of Q4M, I have found a bug that would likely lead to data corruption on 32bit versions of Q4M.  64bit versions are unaffected.

Q4M by default uses mmap(2) to read from data files.  On 32bit systems, it tries to map max. 1GB per each table into memory using mmap.  When mmap fails to map memory due to low memory, Q4M falls back to file I/O to read the data.

However there was a bug in handling the response from mmap, that led to reading corrupt data from database files when mmap(2) failed after the size of the underlying file was grown / shrunk by Q4M.  And since Q4M writes back the corrupt data into the database file when rows are being consumed, the bug will likely destroy the database files.

I have fixed the bug and have uploaded Q4M 0.9.2, into the prerelease directory at q4m.31tools.com/dist/pre.  Source tarball and prebuilt binaries for MySQL 5.1.42 for 32bit linux are available.

If you are using 32bit versions of Q4M, I highly recommend either to update to 0.9.2 or switch to 64bit versions if possible.

BTW in 0.9.2 release, I also changed the maximum mmap size per table from 1GB to 256MB, to lower the possiblity of low memory.  However this countermeasure might not be sufficient in some cases, i.e. databases having many Q4M tables or if other storage engines also used a lot of memory.  I am considering of just disabling the use of mmap(2) on 32bit systems in future releases.  If you have any comments on this, please let me know.

January 20, 2010

Building a highly configurable, easy-to-maintain backup solution for LVM-based VMs and MySQL databases

Motives and the Features

For the servers running in our new network, I was in need for a highly configurable, but easy-to-use backup solution that can take online backups of VMs and MySQL databases running multiple storage engines.

Since my colleagues are all researchers or programmers but there are no dedicated engineers for managing our system, I decided to write a set of command line scripts to accomplish the task instead of using an existing, highly-configurable but time-taking-to-learn backup solutions, like Amanda.

And what I have come up with now is a backup solution with following characteristics, let me introduce them.

  • a central backup server able to take backup of other servers over SSH using public-key authentication
  • no need to install backup agents into each server
  • LVM snapshot-based online, incremental backups (capable of taking online backups of LVM-based VMs)
  • taking online backups of MySQL databases running mulitple storage engines with sophisticated lock control
  • no configuration files, only use crontab and shell-scripts

The solution consists of two tools, blockdiff (kazuho's blockdiff at master - GitHub), and cronlog script of kaztools (kazuho's kaztools at master - GitHub).

Blockdiff

Blockdiff is a set of scripts for taking block-based diffs of files or volumes on a local machine or on remote machines over the network using SSH.  The script below takes online backup of three LVM volumes from three servers.  In the form below, a full backup will be taken once a month, and incremetal backups will be taken during every month.

backup.sh
#! /bin/bash

export YEARMONTH=`date '+%Y%m'`

# backup a LVM volume (using snapshot) at /dev/pv/lv on srv00
blockdiff_backup /var/backup/srv00-pv-lv-$YEARMONTH ssh_lvm_dump --gzip \
  root@srv00 /dev/pv/lv \
  || exit $?

# backup another LVM volume on an another server
blockdiff_backup /var/backup/srv01-pv-lv-$YEARMONTH ssh_lvm_dump --gzip \
  root@srv01 /dev/pv/lv \
  || exit $?


# backup a MySQL database stored on volume /dev/pv/lv on server db00
BLOCKSIZE=16384 \
  LVCREATE_PREFIX='mysqllock --host=db00 --user=root --password=XXXX' \
  blockdiff_backup /var/backup/db00-pv-lv-$YEARMONTH ssh_lvm_dump --gzip \
  root@db00 /dev/pv/lv \
  || exit $?

The backup command of the last volume uses mysqllock command included in blockdiff to keep "FLUSH TABLES WITH WRITE LOCK" running while taking a snapshot of the LVM volume on which the database files exist.  It is also possible to implement other kinds of locks so as not to issue the "FLUSH TABLES WITH WRITE LOCK" while long-running queries are in execution.  Since the flush statement blocks other queries until all of the already running queries complete, issuing the flush query when long-running queries exist will lead to the database not responding to other queries for a certain amount of time.

Crontab and the cronlog script

The backup script is invoked by cron via cronlog, a script that logs the output of the executed task, as well as controlling the output passed to cron so that an alert mail will be sent when the backup script fails.  It uses setlock command of daemontools for holding an exclusive lock while running the backup script (and to alert the administrator on when failing to acquire the lock).

Crontab script
MAILTO=admin@example.com
5 3 * * * cd /var/backup && exec setlock -nX /tmp/backup.lock cronlog -l /var/backup/backup.log -t -- ./backup.sh 2>&1

This is all that needs to be set up to backup LVM volumes including MySQL databases.  Output of the log will be like the following.

backup.log
------------------------------------------------------------------------------
[Sat Jan  9 03:05:02 2010] backup-srv starting: ./backup.sh
[Sat Jan  9 03:05:02 2010] creating snapshot...
[Sat Jan  9 03:05:07 2010]   Logical volume "lvm_dump" created
[Sat Jan  9 03:05:07 2010] running: ssh_blockdiff_dump --gzip "root@srv00" "/dev/pv/lv"...
[Sat Jan  9 03:19:22 2010] removing snapshot /dev/pv/lvm_dump...
[Sat Jan  9 03:19:23 2010]   Logical volume "lvm_dump" successfully removed
[Sat Jan  9 03:19:23 2010] backup completed successfully
(snip)
[Sat Jan  9 03:35:56 2010] creating snapshot...
[Sat Jan  9 03:35:56 2010] issuing lock statement: FLUSH TABLES WITH READ LOCK
[Sat Jan  9 03:36:00 2010]   Logical volume "lvm_dump" created
[Sat Jan  9 03:36:00 2010] issuing unlock statement: UNLOCK TABLES
[Sat Jan  9 03:36:00 2010] running: bin/ssh_blockdiff_dump --gzip "root@db00" "/dev/pv/lv"...
[Sat Jan  9 04:18:44 2010] removing snapshot /dev/pv/lvm_dump...
[Sat Jan  9 04:18:46 2010]   Logical volume "lvm_dump" successfully removed
[Sat Jan  9 04:18:46 2010] backup completed successfully
[Sat Jan  9 04:18:46 2010] command exited with code:0

The files in the backup directory will be like below.  The .gz files contain the backup data, and .md5 files contain per-block checksums used for taking incremental or differential backups.

The backup files
% ls -l db00-pv-lv-201001*
-rw-r--r-- 1 backup backup 50289166539 2010-01-01 05:35 db00-pv-lv-201001.1.gz
-rw-r--r-- 1 backup backup   131072004 2010-01-01 05:35 db00-pv-lv-201001.1.md5
-rw-r--r-- 1 backup backup 10914423057 2010-01-02 04:32 db00-pv-lv-201001.2.gz
-rw-r--r-- 1 backup backup   131072004 2010-01-02 04:32 db00-pv-lv-201001.2.md5
-rw-r--r-- 1 backup backup 13648250036 2010-01-03 04:33 db00-pv-lv-201001.3.gz
-rw-r--r-- 1 backup backup   131072004 2010-01-03 04:34 db00-pv-lv-201001.3.md5
(snip)
-rw-r--r-- 1 backup backup           3 2010-01-18 04:34 db00-pv-lv-201001.ver

For more information, please read the source code and the accompanying documentation.

Conclusion

As can be seen, this is a powerful backup solution that can be built up with minimum setup. It will work well if you work in a small number of experienced engineers, while it might not be suitable for large-scale deployments with many admins.  If you are interested, please give it a try.  I am looking forward to your ideas and / or suggestions.

PS. The blockdiff_merge command can be used to restore the backups.