MySQL and the XFS stack overflow problem
I have heard that there have been talks at MySQL UC on running MySQL (InnoDB) on top of XFS for better write concurrency than ext3.
Last weekend I had a chance to discuss on Twitter the xfs stack overflow problem (that became apparent this month, on x86_64 systems with 8k stack) and how it would affect (if any) MySQL servers. If I understand correctly, the problem is that stack overflow might occur when a dirty page needs to be paged-out to xfs.
The conclusion was that for stability of MySQL running on xfs:
- xfs should not be used on top of LVM or any other MD (i.e. software RAID, etc.)
- xfs volumes should only contain ibdata files (that are accessed using O_DIRECT) so that the files on xfs would never exist as dirty pages within the OS
References:
- Togetter - まとめ「xfs stack overflow」 (the discussion on Twitter, in Japanese)
- LKML: John Berthels: PROBLEM + POSS FIX: kernel stack overflow, xfs, many disks, heavy write load, 8k stack, x86-64 (on-going discussion on LKML how to fix the problem)
- Bug 240077 – Panic under high disk I/O (stack overflow: XFS + LVM) (a bug-report involving MySQL on the issue)
The stack problem isn't an issue these days. Use 8k stacks and you're more than fine
Posted by: Stewart Smith | April 27, 2010 at 07:04 AM
(i should clarify that yes there has been the odd occasional issue crop up... but usually with some huge number of disks and 4k stacks... not 8k which is the kernel default)
Posted by: Stewart Smith | April 27, 2010 at 07:08 AM
Thank you for your comment.
As you can see from the discussion on LKML, kernel developers agree that the problem still exists, on x86_64 systems with 8k stack.
Stack overflow happens when kernel tries to page-out a dirty page to XFS to reclaim memory.
And when "lumpy reclaim" (a memory reclamation that tries to allocate contiguous area of physical memory) is running, even the most recently touched pages might be reclaimed. So the problem could arise even if you periodically fsync data on XFS.
The only way to avoid the issue is to totally not create dirty pages of XFS at all.
Posted by: kazuho | April 27, 2010 at 08:47 AM
(well this is what I was explained on Twitter by a kernel developer involved in the discussion on LKML looking for a way to fix the problem, and it seems logical to me)
Posted by: kazuho | April 27, 2010 at 08:55 AM