[Rack] noisebridge.net downtime

Andy Isaacson adi at hexapodia.org
Mon Mar 19 03:02:07 UTC 2012

The VPS hosting the wiki and the mailing lists was unresponsive for a
few hours yesterday, due to going into swap hell apparently because of a
lot of active Apache processes running PHP.

A few days back I noticed that Apache was very slow to serve requests,
and I diagnosed the problem as running out of workers due to "MaxClients
20" in apache2.conf.  I increased the setting to 90 and that seemed to
resolve that episode of slow response times.  Notably, the VPS was not
at all sluggish during this episode, just Apache was slow to respond due
to its configuration and how it was interacting with clients.

Last night I noticed that the wiki was slow again, and when I tried to
log in I found the whole VPS was very slow; a classic swap storm was in
effect, and I pretty much immediately realized I'd busted this by the
previous change.  I decreased MaxClients to 40 and restarted apache.

The wiki was unavailable from about 23:15 Saturday until 02:05 Sunday.

Looking at the graphs on status.noisebridge.net (a separate server run
by Dr Jesus, thanks!), it looks like we had another latency burst
lasting about an hour around noon today.  This seems to correlate with a
burst of Apache traffic around the same time and was probably due to the
same kind of MaxClients limit as the first metioned slowness episode.

I'll continue to keep an eye on the situation.


