Welcome to the TWC Wiki! You are not logged in. Please log in to the Wiki to vote in polls, change skin preferences, or edit pages. See HERE for details of how to LOG IN.

TWC:Sysadmin info

From TWC Wiki
Revision as of 17:21, 24 September 2008 by Simetrical (talk | contribs) (lighttpd: Fix restart command so it logs the errors someplace)
Jump to navigationJump to search

I was bored and decided to put together this page of handy info for our active sysadmins, currently GED and me. I (Simetrical) started this page on September 20, 2008, and I cynically speculate that it will get updated approximately never after that, so I've tried to keep things relatively nonspecific as to exact versions and stuff, and to provide verification procedures where possible. But take it all with a grain of salt.

If you don't have at least shell access to the TWC server, this page is almost certainly useless to you. If you at least know what shell access is, you're probably capable of understanding at least some of it. Maybe you'll find it interesting, good for you.

Hardware

We currently have one dedicated server. We own none of the hardware, we rent it from The Planet on a monthly basis. It's their "Xeon 3060 - RAID" model with some adjustments. 8 GB RAM is the only difference from stock, I think. Since we didn't buy the hardware, the exact models and stuff are mostly from orbit.theplanet.com, the magical control panel they give you, but I don't totally trust it. Some of it is of course obvious from running the right commands (cat /proc/cpuinfo, free -m, . . .).

  • "Dell Single socket 1067FSB - Quad Core Capable PowerEdge 840" motherboard
  • One dual-core Intel Xeon Conroe 3060 at 2.40 GHz
  • 8 GB RAM, DDR2 667 ECC
  • Four 250 GB, 7200 RPM SATA drives (Western Digital WD2500JD) in RAID 10, for 500 GB usable disk space (they say it's SATA II, but peak throughput from hdparm -t is just a hair under 1.5 Gbps, so I wonder . . . but it doesn't matter, anyway, since our disk throughput is orders of magnitude less)
  • "3ware 4 channel SATA2 w/ battery backup 9550SX-4LP" RAID controller
  • 2500 GB monthly bandwidth
  • Some indeterminate-speed network uplink, probably 100 Mbps? It might matter if we get another server.

Software

Linux

The "L" in "LAMP". We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way. (Some people who currently have root access might hold different opinions.) In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie.

The exact version is Red Hat Enterprise Linux 5. The exacter version, from /etc/redhat-release, is "Red Hat Enterprise Linux Server release 5.1 (Tikanga)". uname -a gives "Linux odin.twcenter.net 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux".

Red Hat sucks and I wish we had used Ubuntu. The packages are paleolithic here. 2.6.18 was released in September 2006. I want large argument lists, dammit! But The Planet doesn't offer Ubuntu as an option, and I had enough trouble repartitioning over SSH without having to try installing a new OS too . . .

lighttpd

The "A" in "LAMP". Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP". We used to use Apache, but it used way too much memory. We switched in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back. Fiddling around with random stuff for fun can pay off sometimes.

lighttpd's config file is at /etc/lighttpd/lighttpd.conf (history). Documentation is at the lighttpd website (config file docs are the most useful). The binary is at /usr/sbin/lighttpd.conf. lighttpd runs as user lighttpd. Access logs are at /var/log/lighttpd/access.log*, automagically rotated by logrotate. The error log is at /var/log/lighttpd-error, apparently because when I set it up I was feeling pig-headed about silly things like file naming conventions. A handy web-based server status thing is available in the secret stuff thread in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).

To restart lighttpd in our current setup, for instance due to new lighttpd.conf or php.ini, I use this:

sudo killall php-cgi && sudo service lighttpd restart 2>/var/log/fastcgi/error.log

This 1) kills the FastCGI processes (probably not necessary if you don't need to restart FastCGI too), 2) restarts lighttpd using the system restart scripts, and 3) redirects stderr for (2) to an appropriate file because FastCGI seems to spam your terminal with PHP warnings forever after if you don't. (I used to redirect it to /dev/null, but apparently error logs are actually useful, who'd have thought? The errors logged aren't dated, but are still useful with tail -f.)

Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something. So don't get too trigger-happy. If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother. Restarts of lighttpd are fairly rare anyway.

lighttpd usually uses a few percent CPU and a few hundred megs of memory, in my experience. It runs on a single thread and doesn't run any scripts itself, so this is pretty reasonable. I wonder what the few hundred megs are for, actually, but thinking back to Apache with mod_php I'm not even going to bother looking into it.

The version is currently:

$ lighttpd -v
lighttpd-1.4.18 (ssl) - a light and fast webserver
Build-Date: Oct  1 2007 23:50:36

This is because it's installed from packages and not compiled from source, and as noted, RHEL 5 has pretty old packages. I've run into a couple of annoying limitations in 1.4 that are fixed in 1.5, but not enough to make me want to bother installing from source, or finding a package repo that has a more recent version.

MySQL

This is the "M" in LAMP. MySQL is nice in some ways, like the server process is very resilient and tends not to ever crash or go into hysterics, or if it does it automatically restarts. In terms of features, such as the ability to optimize queries or use indexes in a better than semi-retarded fashion, it's probably worse than PostgreSQL. But, vBulletin only supports MySQL, like lots of other web apps, so that's what we use.

Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine). "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares". InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads. Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway. (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)

MySQL's config file is at /etc/my.cnf (history). Its general-purpose log is at /var/log/mysql (or /var/log/mysqld.log, which is a symlink to that). The slow query log is at /var/log/mysql-slow. (See previous remarks about log naming conventions, or lack thereof.) The slow query log is probably semi-useless since we don't have the microslow patch. The actual databases are in /var/lib/mysql, which is on its own partition in LVM.

To restart MySQL, you just need to do service mysql restart. This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live. Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load. InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while.

MySQL, being a database, loves RAM, loves it very, very much. The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 4G, fully half of our RAM. Miscellaneous other stuff means that MySQL usually uses around 4.5G. It will usually use a decent fraction of the CPU too, and of course lots and lots of disk space.

We use the packaged version of MySQL, to wit:

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.22, for redhat-linux-gnu (x86_64) using readline 5.0

I've been pondering getting Percona's patched 5.0 versions, for their nifty features like the enhanced microslow patch plus the extra features added since 5.0.22 (like SHOW PROFILE), but I really can't justify the possible downtime or other issues that might arise from an upgrade.

FastCGI

This is the "P" in LAMP. Unlike with lighttpd above, this isn't due to creative spelling. "P" stands for "PHP", and we run PHP using FastCGI. FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it. In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.

I don't actually have much of any idea how FastCGI works. I just configured lighttpd to use FastCGI and it handles spawning the processes itself. To restart FastCGI, I just use the command above to restart lighttpd. I mentally categorize FastCGI with lighttpd as "the web server". They don't seem to die when lighttpd does, but I'm not sure lighttpd uses pre-existing ones if it's restarted or spawns new ones or what. I haven't tested. So this is mostly voodoo magic to me. If we get another server, I'll probably have to figure out how to actually use FastCGI instead of getting lighttpd to do it all for me.

The processes are /usr/bin/php-cgi. They run as the lighttpd user right now, since lighttpd spawns them after it setuid()s. PHP is configured in /etc/php.ini (history). It logs warnings and errors to /var/log/php_error. FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file. Some info on XCache status (which caches PHP variables between sessions) is in the secret stuff thread in the Tech Cathedral, since it allows viewing and even deleting or changing the values of the cached variables.

Each FastCGI daemon seems to use about 70M shared memory and 40M private memory. The private memory is largely due to memory_limit in php.ini. This limit has the delightful behavior that if it's too low, PHP processes will artificially OOM even if there's plenty of free memory; whereas if it's too high, as far as I know, the extra memory will go totally unused, I don't think it's free()d between requests. That seems exceptionally stupid: hopefully I'm wrong on the second part. Anyway, it's a fair amount of memory all told, although a lot less than MySQL.

The really fun bit is CPU. PHP uses a crazy lot of CPU. By which I mean, all of it, and then ten or twenty times more than that during peak load. (More precisely, this CPU usage is due to vBulletin, which of course is written in PHP.) We might want to get another server solely to provide a home for some of the FastCGI processes. This is our bottleneck right now.

We use the packaged PHP version:

$ php --version
PHP 5.1.6 (cli) (built: Jun 12 2008 05:02:36) 
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies
    with XCache v1.2.2, Copyright (c) 2005-2007, by mOo