[Rack] noisebridge.net sloooow

Wed Feb 20 01:06:07 UTC 2013

Restarting apache is not the issue, that's just a stupid way to deal with 
memory leaks.  We already have dynamic threadpool spawning that regularly 
cycles the apache processes.  The issue is we have bad performance tuning 
limits for apache when the incoming load gets high.  This overloads the 
machine instead of pausing the incoming connections.

Setting a cgroup memory limit on apache limits the damage a DoS can do to 
the rest of the host.  I've already started testing some config changes to 
see if I can keep things from exploding.

-ben

On Tue, 19 Feb 2013, Matt Long wrote:

> I'm not totally familiar with how the server is setup, but monit would be a good last resort to
> have apache automatically restarted whenever memory usage goes over a threshold.
> Forgive me if this is a non-sequitur to the discussion :)
> 
> On Tue, Feb 19, 2013 at 4:06 PM, Ben Kochie <ben at nerp.net> wrote:
>       Ok, here's what I've done so far.
>
>       I've added memory cgroup support to most of the init scripts that matter.
>
>       There's now a script for checking memory use by process tree:
>       get-cgroup-memory-use.sh
>
>       11216 KiB - /cgroup/bind9/memory.usage_in_bytes
>       680 KiB - /cgroup/clamsmtp/memory.usage_in_bytes
>       4252 KiB - /cgroup/posfix/memory.usage_in_bytes
>       69432 KiB - /cgroup/mysql/memory.usage_in_bytes
>       80396 KiB - /cgroup/mailman/memory.usage_in_bytes
>       175448 KiB - /cgroup/clamav-daemon/memory.usage_in_bytes
>       51168 KiB - /cgroup/apache2/memory.usage_in_bytes
>
>       The last one output by teh script is the system total.
>       1048708 KiB - /cgroup/memory.usage_in_bytes
>
>       I've manually set a memory limit for apache2 to 500MB
>
>       This in theory should catch apache2 going over memory limits.  We can improve the
>       memory cgroup settings in the future.
>
>       The one difficult thing is this, this does not work well for upstart jobs since the
>       scripting for upstart is crappy.
>
>       -ben
>
>       On Tue, 19 Feb 2013, Ben Kochie wrote:
>
>             Fucking crap.  I've started testing using memory cgroups to limit apache
>             memory use to keep it from blowing up the machine.
>
>             I'm also testing adjustments to the oom killer to make apache the more
>             likely target.
>
>             I'll likely create some scripts to automatically deal with this.
>
>             -ben
>
>             On Tue, 19 Feb 2013, Andy Isaacson wrote:
>
>                   On Tue, Feb 19, 2013 at 01:10:15PM -0800, Jonathan Lassoff
>                   wrote:
>                         Somethings awful slow with the wiki again.
>
>                         Halp.
> 
>
>                   Page loads were taking 10 seconds each.
>
>                   memcached got OOMkilled.  Restarted, loads back down to 300
>                   ms again.
>
>                   Thanks for noticing.  Can we get an external latency
>                   monitoring system
>                   up and running again?  Both TCP and HTTPS GET latency would
>                   be useful..
>
>                   -andy
>                   _______________________________________________
>                   Rack mailing list
>                   Rack at lists.noisebridge.net
>                   https://www.noisebridge.net/mailman/listinfo/rack
>
>             _______________________________________________
>             Rack mailing list
>             Rack at lists.noisebridge.net
>             https://www.noisebridge.net/mailman/listinfo/rack
>
>       _______________________________________________
>       Rack mailing list
>       Rack at lists.noisebridge.net
>       https://www.noisebridge.net/mailman/listinfo/rack
> 
> 
> 
>