« REST におけるトランザクションについて (Re: Web を支える技術) | Main | Q4M 0.9.4 released »

April 26, 2010

MySQL and the XFS stack overflow problem

I have heard that there have been talks at MySQL UC on running MySQL (InnoDB) on top of XFS for better write concurrency than ext3.

Last weekend I had a chance to discuss on Twitter the xfs stack overflow problem (that became apparent this month, on x86_64 systems with 8k stack) and how it would affect (if any) MySQL servers.  If I understand correctly, the problem is that stack overflow might occur when a dirty page needs to be paged-out to xfs.

The conclusion was that for stability of MySQL running on xfs:

  1. xfs should not be used on top of LVM or any other MD (i.e. software RAID, etc.)
  2. xfs volumes should only contain ibdata files (that are accessed using O_DIRECT) so that the files on xfs would never exist as dirty pages within the OS

References:

TrackBack

TrackBack URL for this entry:
http://bb.lekumo.jp/t/trackback/404050/23985547

Listed below are links to weblogs that reference MySQL and the XFS stack overflow problem:

Comments

The stack problem isn't an issue these days. Use 8k stacks and you're more than fine

(i should clarify that yes there has been the odd occasional issue crop up... but usually with some huge number of disks and 4k stacks... not 8k which is the kernel default)

Thank you for your comment.

As you can see from the discussion on LKML, kernel developers agree that the problem still exists, on x86_64 systems with 8k stack.

Stack overflow happens when kernel tries to page-out a dirty page to XFS to reclaim memory.

And when "lumpy reclaim" (a memory reclamation that tries to allocate contiguous area of physical memory) is running, even the most recently touched pages might be reclaimed. So the problem could arise even if you periodically fsync data on XFS.

The only way to avoid the issue is to totally not create dirty pages of XFS at all.

(well this is what I was explained on Twitter by a kernel developer involved in the discussion on LKML looking for a way to fix the problem, and it seems logical to me)

Post a comment