« July 2009 | Main | September 2009 »

August 2009

August 26, 2009

Picoev: a tiny event loop for network applications, faster than libevent or libev

I am sure many programmers writing network applications have their own abstracting layers hiding the differences between various I/O multiplex APIs, like select(2), poll(2), epoll(2), ... And of course, I am one among them.  While writing mycached (see Mycached: memcached protocol support for MySQL for more information), I was at first considering of using libev for multiplexing socket I/Os.  Libevent was not an option since it does not (yet) provide multithreading support.

But it was a great pain for me to learn how to use libev.  I do not mean that its is an ugly product.  In fact, I think that it is a very well written, excellent library.  However, for me it was too much a boring task to learn how the things are abstracted, already being familiar with the problems it tries to hide.

So instead I thought it might be a good occasion to write my own library that could be used in any programs I may write in the future.  The result is picoev, and it is faster than libevent or libev!  The benchmark used is a re-modified version taken from libev.schmorp.de/bench.html and can be found here.

Why is it faster?  It is because it uses an array and a ring buffer of bit vectors as its internal structure.  Libevent and libev seem to use some kind of sorted tree to represent file descriptors.  However, if we concentrate on Un*x systems, there is a guarantee that the descriptors will a small positive integer.  Picoev utilizes the fact and stores information related to file descriptors (such as pointers to callback functions or callback arguments) in an array, resulting in a faster manipulation of socket states.

Another optimization technique used by picoev is not to use an ordered tree for keeping timeout information.  Generally speaking, most network applications do not require accurate timeouts.  Thus it is possible to use a ring buffer (a sliding array) of bit vectors for the purpose.  Each bit vector represents a set of file descriptors that time-outs at a given time.  Picoev uses 128 of bit vectors to represent timeouts, for example, the first bit vector represents the sockets that timeout a second after, the second bit vector representing them of two seconds after..., and the bit vectors slide every second.  If the maximum timeout required by the web application is greater than 128, the minimum granurality of timeout becomes two seconds.

I would like to reiterate that both libevent and libev are great libraries.  Picoev is not at comparable to them especially in maturity and the number of features.  It only supports select(2), epoll(2), and kqueue(2) for the time being.  However the design is simple than the other two, and I think it will be a good starting point to write network applications, or to use as a basis for writing one's own network libraries.

If you are interested in using picoev, the source code can be found at coderepos.org/share/browser/lang/c/picoev/.

Mycached: memcached protocol support for MySQL

It is a well-known fact that the bottlenecks of MySQL does not exist in its storage engines, but rather in the core, for example, its parser and execution planner.  Last weekend I started to wonder how fast MySQL could be if those bottlenecks were skipped.  Not being able to stop my curiousity, I started  adding memcached proctol support to MySQL as a UDF.  And that is Mycached.

From what I understand, there are two advantages of using mycached (or the memcached protocol, in general) over using SQL.  One is faster access.  The QPS (queries per second) of mycached is roughly 2x compared to using SQL.  The other is higher concurrency.  As can be seen in the chart below, mycached can handle thousands of connections simultaneously.

And the lack of the described advantages has been the reasons for large-scale web applications to use memcached, even though it requires application developers to take care of keeping consistency between mysql and memcached.  In other words, mycached could become a good alternative here.

Mycached is still in very early stage of development.  I would never recommend using current version in a production environment, but if you are interested, the details of how to use it is as follows.

Source code of mycached is available at coderepos.org/share/browser/platform/mysql/mycached.  There is no Makefile or configure script yet, use GCC directly to create the UDF.  Source code of mysql is needed and should be set as include directories (you need to run make on the source code of mysql prior to compiling mycached).

% g++ -DMYCACHED_USE_EPOLL=1 -shared -fPIC -Wall -g -O2 -I ~/test/mysql-5.1.37/src/include -I ~/test/mysql-5.1.37/src/sql -I ~/test/mysql-5.1.37/src/regex mycached_as_udf.cc -o mycached_as_udf.so

Once the compilation succeeds, copy the shared library into the plugins directory of the lib directory of mysql, and use the CREATE FUNCTION statements to activate the necessary UDFs.  And to start mycached, call mycached_start.  The function takes three arguments: host, port, and number of worker threads to be used for handling requests.  To stop the mycached, call mycached_stop().

mysql> CREATE FUNCTION mycached_start RETURNS INT SONAME 'mycached_as_udf.so';
mysql> CREATE FUNCTION mycached_stop RETURNS INT SONAME 'mycached_as_udf.so';
mysql> SELECT mycached_start(0, 11211, 4);

The only memcached command currently supported by mycached is get (but you can do other things using SQL :-p).  As the following example shows, specify more than one db_name.table_name.primary_key to fetch a row of a table, separated by a whitespace.  The returned value will be a repetition of columns, each of them in the format of column_name:value_length:value.  It is also possible to specify JSON as the response format by adding ":JSON", however it will be slower, since the server needs to convert the values of the columns to UTF8 and to escape them.

$ telnet localhost 11211
Connected to localhost.localdomain (
Escape character is '^]'.
get test.t1.1
VALUE test.t1.1 0 68
id:1:1message:51:hello world! hello world! hello world! hello world!
get test.t1.1:json
VALUE test.t1.1 0 72
{"id":1,"message":"hello world! hello world! hello world! hello world!"}

Thank you for reading.  If you have any ideas or suggestions, please leave a comment.

August 18, 2009

Perl のテスト用に PostgreSQL のインスタンスを自動で構築するモジュール Test::postgresql を書いた

 以前作成した Test::mysqld (参照:Perl のテスト用に MySQL 環境を自動で構築するモジュール Test::mysqld を書いた) の姉妹版、Test::postgresql を書きました。このモジュールを使えば、テストコード内で一時的にテスト専用の PostgreSQL のインスタンスを構築し、テストを実行することができます。構築されたインスタンスは、テスト終了時に自動的に削除されます。

use DBI;
use Test::postgresql;
use Test::More;

my $pgsql = Test::postgresql->new(
        => $Test::postgresql::Defaults{initdb_args} . ' --encoding=utf8',
) or plan skip_all => $Test::postgresql::errstr;

plan tests => 10;

my $dbh = DBI->connect(
    "DBI:Pg:...;port=" . $pgsql->port

... 以下、テストコード ...

お使いになりたい方は、CPAN か coderepos からどうぞ。

August 07, 2009

YAPC::Asia 2009 で「スケールするウェブアプリケーションを20分で作る方法」について話します

 このところ、MySQL と Perl 関連のエントリをいろいろ書いていますが、それは、スケールアウト可能で、かつ、管理が容易なウェブアプリケーションを、簡単に書けるようにしたい、という理由があるからです。

 ただ、ブログエントリだとどうしても細切れになるので、一連のモジュールやプログラムを組み合わせて、どうやってスケールするウェブアプリケーションを作るのかという話を YAPC::Asia 2009 でさせていただくことにしました。

 YAPC::Asia 2009 は9月10日(木)と11日(金)の2日間、東京工業大学大岡山キャンパスで開催されます。今日からチケット販売も始まったので、興味のある方はお越しいただければ、と思います。

August 06, 2009

Deployment of MySQL using daemontools, XtraBackup

I am sure many people have already done similar things, but to ease my pain of setting up mysqld on a large-scale environment (I am trying to create a set of database nodes, each node consists of a MySQL failover cluster using semi-sync replication, that can be administered easily), I have just finished writing a deployment script called mysqld_jumpstart.  The caveats are:

  • integration with daemontools (mysqld is automatically started)
  • setup of masters and slaves
  • can setup slaves from backup data generated by XtraBackup

The last feature was the one I especially needed, since thanks to the people at Percona, things have become much easier with XtraBackup (or with InnoDB Hot Backup) since there is no more need to detach a mysqld before creating a backups.

Setting up and starting a master database:

# ssh root@ mysqld_jumpstart \
--mysql-install-db=/usr/local/mysql/scripts/mysql_install_db \
--mysqld=/usr/local/mysql/bin/mysqld --base-dir=/var/my_webapp/mysql \
--data-dir=/var/datadrive/my_webapp --server-id=10 \
--replication-user=repl --replication-password=replpass \

To create and a start a slave from a backup by XtraBackup (specified by --data-dir):

# ssh root@ mysqld_jumpstart \
--mysql-install-db=/usr/local/mysql/scripts/mysql_install_db \
--mysqld=/usr/local/mysql/bin/mysqld --base-dir=/var/my_webapp/mysql \
--data-dir=/var/datadrive/my_webapp --server-id=10 \
--replication-user=repl --replication-password=replpass \
--replication-network='' \
--master-host= --from-innobackupex

Mysqld_jumpstart can be downloaded from coderepos.org/share/browser/platform/mysql/mysqld_jumpstart.  Use --help to find out how to use the script.  FYI to run the tests, you need to apply a patch to XtraBackup.

August 04, 2009

Perl のテスト用に MySQL 環境を自動で構築するモジュール Test::mysqld を書いた

 ORM やウェブアプリケーション関連のライブラリなどのテストケースを書くにあたっては、 RDBMS へのアクセスが必要になります。しかし、SQLite のようなスタンドアローンのデータベースと比較すると、サーバ型データベースである MySQL に接続してテストを書くのは、既存の MySQL の権限設定やデータベース名を気にする必要があったりと、いろいろ不便です。そこで、MySQL のインスタンスをテンポラリディレクトリに自動生成し、テストが終わったら削除してくれる Perl モジュール Test::mysqld を書きました。こんな感じで使います。

use DBI;
use Test::mysqld;
use Test::More;

my $mysqld = Test::mysqld->new(
    my_cnf => { 'skip-networking' => '' }, # TCP接続を使わない
) or plan skip_all => $Test::mysqld::errstr;

plan tests => 10;

my $dbh = DBI->connect(
    "DBI:mysql:$db_name;user=root;mysql_socket=" . $mysqld->my_cnf->{socket},

... 以下、テストコード ...

 テスト専用の mysqld サーバが立ち上がるので、既存のデータを壊したりすることもありません。データベースを作ったり消したりも自由自在。また、既存の mysqld の設定に引きずられることもないので、mysqld の権限設定を気にする必要もありません。テストが終了すると、作成された mysqld インスタンスは自動的に消去されます。

 結論。Test::mysqld を使えば MySQL にアクセスするテストが SQLite なみに簡単に書けるよ、きっと。ということで、お使いになりたい方は、CPAN か coderepos からどうぞ。

2009/8/5追記: Test-mysqld-0.04 で mysql 関連の必要なプログラムが見つからなかった場合の処理を整理したのにあわせ、サンプルをアップデートしました。