Welcome to the TWC Wiki! You are not logged in. Please log in to the Wiki to vote in polls, change skin preferences, or edit pages. See HERE for details of how to LOG IN.

Difference between revisions of "TWC:Sysadmin info"

From TWC Wiki
Jump to navigationJump to search
m (lighttpd: Config files are not executable, kthx)
(Update to reflect events of last decade)
Line 2: Line 2:
  
 
If you don't have at least shell access to the TWC server, this page is almost certainly useless to you.  If you at least know what shell access ''is'', you're probably capable of understanding at least some of it.  Maybe you'll find it interesting, good for you.
 
If you don't have at least shell access to the TWC server, this page is almost certainly useless to you.  If you at least know what shell access ''is'', you're probably capable of understanding at least some of it.  Maybe you'll find it interesting, good for you.
 +
 +
Rewritten on January 24, 2010.  (It indeed was updated approximately never.)
  
 
==Hardware==
 
==Hardware==
We currently have one dedicated server.  We own none of the hardware, we rent it from [http://www.theplanet.com The Planet] on a monthly basis.  It's their "Xeon 3060 - RAID" model with some adjustments8 GB RAM is the only difference from stock, I think.  Since we didn't buy the hardware, the exact models and stuff are mostly from orbit.theplanet.com, the magical control panel they give you, but I don't totally trust it.  Some of it is of course obvious from running the right commands (<code>cat /proc/cpuinfo</code>, <code>free -m</code>, . . .).
+
We currently have one dedicated server.  We own all of the hardware, bought with hard-earned pennies mostly donated by the membership in a 2009 donation drive.  I've provided the command necessary to get info about the component in parentheses, where applicable.
  
* "Dell Single socket 1067FSB - Quad Core Capable PowerEdge 840" motherboard
+
* [http://www.newegg.com/Product/Product.aspx?Item=N82E16816152093 TYAN B4980G24V4H 1U Barebone Server NVIDIA nForce Professional 3600 Quad 1207(F) Four AMD Opteron 1.0GHz Hyper Transport FSB - Retail]
* One dual-core Intel Xeon Conroe 3060 at 2.40 GHz
+
* Four Quad-Core AMD Opteron(tm) Processor 8346 HE, 1.8 GHz (<code>cat /proc/cpuinfo</code>)
* 8 GB RAM, DDR2 667 ECC
+
* 16 GB RAM (<code>free -m</code>)
* Four 250 GB, 7200 RPM SATA drives (Western Digital WD2500JD) in RAID 10, for 500 GB usable disk space (they say it's SATA II, but peak throughput from <code>hdparm -t</code> is just a hair under 1.5 Gbps, so I wonder . . . but it doesn't matter, anyway, since our disk throughput is orders of magnitude less)
+
* Two 500 GB WDC WD5000AACS-0 (apparently [http://www.newegg.com/Product/Product.aspx?Item=N82E16822136149 this]?) for disks, in Linux software RAID1 (<code>scsiadd -p</code>)
* "3ware 4 channel SATA2 w/ battery backup 9550SX-4LP" RAID controller
+
 
* 2500 GB monthly bandwidth
+
Our bandwidth is $50/Mbps/month at the 95th percentile, and we're paying for 10 Mbps uncapped. In principle, if we use more than 10 Mbps at the 95th percentile, we get charged more than $500 for the month.
* Some indeterminate-speed network uplink, probably 100 Mbps? It might matter if we get another server.
 
  
 
==Software==
 
==Software==
  
 
===Linux===
 
===Linux===
The "L" in "LAMP".  We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way.  (Some people who currently have root access might hold different opinions.)  In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie.
+
The "L" in "LAMP".  We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way.  (Some people who currently have root access might hold different opinions.)  In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie. The same logic went for the new new server (thor), since although GrnEyedDvl was around by that point, Simetrical was the one familiar with the existing setup.
 +
 
 +
To be more precise, we use Ubuntu, mainly because it has a huge and up-to-date package repository, and also Simetrical happens to be familiar with it because he uses it at home.  Both of the old servers were RHEL5, and we have no regrets about switching to Ubuntu.  The output of <code>lsb_release -a</code> is currently
  
The exact version is Red Hat Enterprise Linux 5The exacter version, from <code>/etc/redhat-release</code>, is "Red Hat Enterprise Linux Server release 5.1 (Tikanga)"<code>uname -a</code> gives "Linux odin.twcenter.net 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux".
+
No LSB modules are available.
 +
Distributor ID: Ubuntu
 +
  Description: Ubuntu 8.10
 +
  Release: 8.10
 +
Codename: intrepid
  
Red Hat sucks and I wish we had used Ubuntu. The packages are paleolithic here. 2.6.18 was released in September 2006I want large argument lists, dammit!  But The Planet doesn't offer Ubuntu as an option, and I had enough trouble repartitioning over SSH without having to try installing a new OS too . . .
+
We were originally on 8.04 LTS (Hardy), but switched to 8.10 to fix kernel problemsWe might upgrade to a later version to get ext4 and improve disk performance.
  
 
===lighttpd===
 
===lighttpd===
 
The "A" in "LAMP".  Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP".  We used to use Apache, but it used way too much memory.  We [http://www.twcenter.net/forums/showthread.php?t=148910 switched] in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back.  Fiddling around with random stuff for fun can pay off sometimes.
 
The "A" in "LAMP".  Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP".  We used to use Apache, but it used way too much memory.  We [http://www.twcenter.net/forums/showthread.php?t=148910 switched] in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back.  Fiddling around with random stuff for fun can pay off sometimes.
  
lighttpd's config file is at [http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=blob;f=lighttpd/lighttpd.conf;hb=HEAD <code>/etc/lighttpd/lighttpd.conf</code>] ([http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=history;f=lighttpd/lighttpd.conf;hb=HEAD history]).  Documentation is at [http://trac.lighttpd.net/trac/wiki#Documentation the lighttpd website] (config file docs are the most useful).  The binary is at <code>/usr/sbin/lighttpd</code>.  lighttpd runs as user lighttpd.  Access and error logs are in <code>/var/log/lighttpd/</code>, automagically rotated by logrotate.  A handy web-based server status thing is available in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).
+
lighttpd's config file is at <code>/etc/lighttpd/lighttpd.conf</code>.  Documentation is at [http://trac.lighttpd.net/trac/wiki#Documentation the lighttpd website] (config file docs are the most useful).  The binary is at <code>/usr/sbin/lighttpd</code>.  lighttpd runs as user lighttpd.  Access and error logs are in <code>/var/log/lighttpd/</code>, automagically rotated by logrotate.  A handy web-based server status thing is available in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).
  
 
To restart lighttpd in our current setup, for instance due to new <code>lighttpd.conf</code> or <code>php.ini</code>, I use this:
 
To restart lighttpd in our current setup, for instance due to new <code>lighttpd.conf</code> or <code>php.ini</code>, I use this:
  
  sudo killall php-cgi && sudo service lighttpd restart 2>>/var/log/fastcgi/error.log
+
  sudo killall php-cgi && sudo /etc/init.d/lighttpd restart
  
This 1) kills the FastCGI processes (probably not necessary if you don't need to restart FastCGI too), 2) restarts lighttpd using the system restart scripts, and 3) redirects stderr for (2) to an appropriate file because FastCGI seems to spam your terminal with PHP warnings forever after if you don't. (I used to redirect it to <code>/dev/null</code>, but apparently error logs are actually useful, who'd have thought?  The errors logged aren't dated, but are still useful with <code>tail -f</code>.)
+
This 1) kills the FastCGI processes (not sure if this is necessary), and 2) restarts lighttpd using the system restart scripts.
  
 
Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something.  So don't get too trigger-happy.  If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother.  Restarts of lighttpd are fairly rare anyway.
 
Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something.  So don't get too trigger-happy.  If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother.  Restarts of lighttpd are fairly rare anyway.
Line 41: Line 48:
  
 
  $ lighttpd -v
 
  $ lighttpd -v
  lighttpd-1.4.18 (ssl) - a light and fast webserver
+
  lighttpd/1.4.24 - a light and fast webserver
  Build-Date: Oct  1 2007 23:50:36
+
  Build-Date: Dec 23 2009 19:15:43
  
This is because it's installed from packages and not compiled from source, and as noted, RHEL 5 has pretty old packages.  I've run into a couple of annoying limitations in 1.4 that are fixed in 1.5, but not enough to make me want to bother installing from source, or finding a package repo that has a more recent version.
+
lighttpd is one of the few things where we've compiled from source and aren't using distro packages.  We [http://www.twcenter.net/forums/showthread.php?p=6493596#post6493596 upgraded to the latest version] to take advantage of better handling of out-of-FastCGI errors.
  
 
===MySQL===
 
===MySQL===
Line 51: Line 58:
 
Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine).  "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares".  InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads.  Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway.  (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)
 
Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine).  "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares".  InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads.  Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway.  (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)
  
MySQL's config file is at [http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=blob;f=my.cnf;hb=HEAD <code>/etc/my.cnf</code>] ([http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=history;f=my.cnf;hb=HEAD history]).  Its general-purpose log is at <code>/var/log/mysql</code> (or <code>/var/log/mysqld.log</code>, which is a symlink to that).  The slow query log is at <code>/var/log/mysql-slow</code>.  (I apparently had some stupid ideas about log naming conventions when setting this up, or lack thereof.)  The slow query log is probably semi-useless since we don't have the [http://www.mysqlperformanceblog.com/2008/04/20/updated-msl-microslow-patch-installation-walk-through/ microslow patch].  The actual databases are in <code>/var/lib/mysql</code>, which is on its own partition in LVM.
+
MySQL's config is in <code>/etc/mysql</code> (primary config file: <code>/etc/mysql/my.cnf</code>).  The logging goes to syslog, in <code>/var/log/syslog</code>.  The actual databases are in <code>/var/lib/mysql</code>, which is on its own logical volume in LVM.
  
To restart MySQL, you just need to do <code>service mysql restart</code>.  This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live.  ''Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load.''  InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while.
+
To restart MySQL, you just need to do <code>/etc/init.d/mysql restart</code>.  This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live.  ''Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load.''  InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while.  Moreover, the site can be sluggish for as much as a couple of hours after a MySQL restart, because its caches will be cleared and have to repopulate.
  
MySQL, being a database, loves RAM, loves it very, very much.  The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 4G, fully half of our RAM.  Miscellaneous other stuff means that MySQL usually uses around 4.5G.  It will usually use a decent fraction of the CPU too, and of course lots and lots of disk space.
+
MySQL, being a database, loves RAM, loves it very, very much.  The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 5G.  Miscellaneous other stuff means that MySQL usually uses around 5.5G or 6G.  It will usually a bit of CPU too, although not much, and of course lots and lots of disk space.
  
 
We use the packaged version of MySQL, to wit:
 
We use the packaged version of MySQL, to wit:
  
 
  $ mysql --version
 
  $ mysql --version
  mysql  Ver 14.12 Distrib 5.0.22, for redhat-linux-gnu (x86_64) using readline 5.0
+
  mysql  Ver 14.12 Distrib 5.0.67, for debian-linux-gnu (x86_64) using readline 5.2
 
 
I've been pondering getting [http://www.percona.com/mysql/5.0.68/RPM/rhel5/ Percona's patched 5.0 versions], for their nifty features like the enhanced microslow patch plus the extra features added since 5.0.22 (like [http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html SHOW PROFILE]), but I really can't justify the possible downtime or other issues that might arise from an upgrade.
 
  
 
===FastCGI===
 
===FastCGI===
 
This is the "P" in LAMP.  Unlike with lighttpd above, this isn't due to creative spelling.  "P" stands for "PHP", and we run PHP using FastCGI.  FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it.  In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.
 
This is the "P" in LAMP.  Unlike with lighttpd above, this isn't due to creative spelling.  "P" stands for "PHP", and we run PHP using FastCGI.  FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it.  In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.
  
I don't actually have much of any idea how FastCGI works.  I just configured lighttpd to use FastCGI and it handles spawning the processes itself.  To restart FastCGI, I just use the command [[#lighttpd|above]] to restart lighttpd.  I mentally categorize FastCGI with lighttpd as "the web server".  They don't seem to die when lighttpd does, but I'm not sure lighttpd uses pre-existing ones if it's restarted or spawns new ones or what.  I haven't tested.  So this is mostly voodoo magic to me.  If we get another server, I'll probably have to figure out how to actually use FastCGI instead of getting lighttpd to do it all for me.
+
I don't actually have much of any idea how FastCGI works.  I just configured lighttpd to use FastCGI and it handles spawning the processes itself.  To restart FastCGI, I just use the command [[#lighttpd|above]] to restart lighttpd.  I mentally categorize FastCGI with lighttpd as "the web server".  They don't seem to die when lighttpd does, but I'm not sure lighttpd uses pre-existing ones if it's restarted or spawns new ones or what.  I haven't tested. (I think it spawns new ones.) So this is mostly voodoo magic to me.
  
The processes are <code>/usr/bin/php-cgi</code>.  They run as the lighttpd user right now, since lighttpd spawns them after it setuid()s.  PHP is configured in [http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=blob;f=php.ini;hb=HEAD /etc/php.ini] ([http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=history;f=php.ini;hb=HEAD history]).  It logs warnings and errors to <code>/var/log/php_error</code>.  FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file.  Some info on XCache status (which caches PHP variables between sessions) is in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, since it allows viewing and even deleting or changing the values of the cached variables.
+
The processes are <code>/usr/bin/php-cgi</code>.  They run as the lighttpd user right now, since lighttpd spawns them after it setuid()s.  PHP is configured in <code>/etc/php5/cgi/php.ini</code>.  It logs warnings and errors to the syslog, <code>/var/log/syslog</code>.  FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file.  Some info on XCache status (which caches PHP variables between sessions) is in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, since it allows viewing and even deleting or changing the values of the cached variables.
  
Each FastCGI daemon seems to use about 70M shared memory and 40M private memory.  The private memory is largely due to <code>memory_limit</code> in <code>php.ini</code>.  This limit has the delightful behavior that if it's too low, PHP processes will artificially OOM even if there's plenty of free memory; whereas if it's too high, as far as I know, the extra memory will go totally unused, I don't think it's free()d between requests.  That seems exceptionally stupid: hopefully I'm wrong on the second part.  Anyway, it's a fair amount of memory all told, although a lot less than MySQL.
+
FastCGI uses a crazy lot of CPU and memoryCurrently it uses maybe 8G of memory and more than half our CPU at peak.  There's some low-hanging fruit to be had in the CPU department, which I hope to pursue when feasible.
 
 
The really fun bit is CPU.  PHP uses a crazy lot of CPU.  By which I mean, all of it, and then ten or twenty times more than that during peak load(More precisely, this CPU usage is due to vBulletin, which of course is written in PHP.)  We might want to get another server solely to provide a home for some of the FastCGI processes.  This is our bottleneck right now.
 
  
 
We use the packaged PHP version:
 
We use the packaged PHP version:
  
 
  $ php --version
 
  $ php --version
  PHP 5.1.6 (cli) (built: Jun 12 2008 05:02:36)  
+
  PHP 5.2.6-2ubuntu4.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Nov 26 2009 14:16:15)  
  Copyright (c) 1997-2006 The PHP Group
+
  Copyright (c) 1997-2008 The PHP Group
  Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies
+
  Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
 
     with XCache v1.2.2, Copyright (c) 2005-2007, by mOo
 
     with XCache v1.2.2, Copyright (c) 2005-2007, by mOo
 +
 +
=== git ===
 +
I should write about git here.  I use it to store our configuration files, my custom administration scripts, and the source code to the forums, among other things.  I'm not going to write a git tutorial here, but it could be helpful to have some basic commands.
 +
 +
=== vBulletin ===
 +
The one piece of software we use that's not free and open-source.  Website is [http://www.vbulletin.com vbulletin.com].  Not too much needs to be said here, because it's much easier to use than the rest of our software, with a web interface and everything.  Some notes on our particular setup (particularly how we store mounds of hacks in git) would be worth writing up at some point.
 +
 +
=== Other stuff ===
 +
Most of the rest of our software doesn't have much to discuss.  It's worth noting what we've installed that's not part of distro packages.  <code>ls /usr/local/bin /usr/local/sbin</code> yields:
 +
 +
/usr/local/bin:
 +
git  git-cvsserver  gitk  git-receive-pack  git-shell  git-upload-archive  git-upload-pack  indexer  search  searchd  spelldump
 +
 +
/usr/local/sbin:
 +
7zdl    bwsumm-attach  bwsumm-totals  dlswitch    lighttpd        lockupmon    reindex.sh  vbupgrade
 +
bwsumm  bwsumm-dl      dbbackup.sh    groupedsum  lighttpd-angel  profile-proc  rotate.sh
 +
 +
Most of the scripts in the latter directory are written by me.  The upshot is that there are only three pieces of software where we aren't using the packaged version:
 +
 +
; git
 +
: Due to a bug that was throwing weird error messages during rebases.  We could switch back to the packaged version if we upgrade to the next OS version, or maybe we could do it right now.  I don't remember what versions had the error.
 +
; lighttpd
 +
: Upgraded to the latest version so that the site would die a bit less under heavy I/O, as noted above.
 +
; Sphinx
 +
: This is indexer, search, searchd.  We don't use the packaged version because there is none.  There should be a Sphinx package in Ubuntu 10.04, but we're well behind that.

Revision as of 16:01, 24 January 2010

I was bored and decided to put together this page of handy info for our active sysadmins, currently GED and me. I (Simetrical) started this page on September 20, 2008, and I cynically speculate that it will get updated approximately never after that, so I've tried to keep things relatively nonspecific as to exact versions and stuff, and to provide verification procedures where possible. But take it all with a grain of salt.

If you don't have at least shell access to the TWC server, this page is almost certainly useless to you. If you at least know what shell access is, you're probably capable of understanding at least some of it. Maybe you'll find it interesting, good for you.

Rewritten on January 24, 2010. (It indeed was updated approximately never.)

Hardware

We currently have one dedicated server. We own all of the hardware, bought with hard-earned pennies mostly donated by the membership in a 2009 donation drive. I've provided the command necessary to get info about the component in parentheses, where applicable.

Our bandwidth is $50/Mbps/month at the 95th percentile, and we're paying for 10 Mbps uncapped. In principle, if we use more than 10 Mbps at the 95th percentile, we get charged more than $500 for the month.

Software

Linux

The "L" in "LAMP". We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way. (Some people who currently have root access might hold different opinions.) In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie. The same logic went for the new new server (thor), since although GrnEyedDvl was around by that point, Simetrical was the one familiar with the existing setup.

To be more precise, we use Ubuntu, mainly because it has a huge and up-to-date package repository, and also Simetrical happens to be familiar with it because he uses it at home. Both of the old servers were RHEL5, and we have no regrets about switching to Ubuntu. The output of lsb_release -a is currently

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 8.10
Release:	8.10
Codename:	intrepid

We were originally on 8.04 LTS (Hardy), but switched to 8.10 to fix kernel problems. We might upgrade to a later version to get ext4 and improve disk performance.

lighttpd

The "A" in "LAMP". Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP". We used to use Apache, but it used way too much memory. We switched in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back. Fiddling around with random stuff for fun can pay off sometimes.

lighttpd's config file is at /etc/lighttpd/lighttpd.conf. Documentation is at the lighttpd website (config file docs are the most useful). The binary is at /usr/sbin/lighttpd. lighttpd runs as user lighttpd. Access and error logs are in /var/log/lighttpd/, automagically rotated by logrotate. A handy web-based server status thing is available in the secret stuff thread in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).

To restart lighttpd in our current setup, for instance due to new lighttpd.conf or php.ini, I use this:

sudo killall php-cgi && sudo /etc/init.d/lighttpd restart

This 1) kills the FastCGI processes (not sure if this is necessary), and 2) restarts lighttpd using the system restart scripts.

Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something. So don't get too trigger-happy. If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother. Restarts of lighttpd are fairly rare anyway.

lighttpd usually uses a few percent CPU and a few hundred megs of memory, in my experience. It runs on a single thread and doesn't run any scripts itself, so this is pretty reasonable. I wonder what the few hundred megs are for, actually, but thinking back to Apache with mod_php I'm not even going to bother looking into it.

The version is currently:

$ lighttpd -v
lighttpd/1.4.24 - a light and fast webserver
Build-Date: Dec 23 2009 19:15:43

lighttpd is one of the few things where we've compiled from source and aren't using distro packages. We upgraded to the latest version to take advantage of better handling of out-of-FastCGI errors.

MySQL

This is the "M" in LAMP. MySQL is nice in some ways, like the server process is very resilient and tends not to ever crash or go into hysterics, or if it does it automatically restarts. In terms of features, such as the ability to optimize queries or use indexes in a better than semi-retarded fashion, it's probably worse than PostgreSQL. But, vBulletin only supports MySQL, like lots of other web apps, so that's what we use.

Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine). "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares". InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads. Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway. (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)

MySQL's config is in /etc/mysql (primary config file: /etc/mysql/my.cnf). The logging goes to syslog, in /var/log/syslog. The actual databases are in /var/lib/mysql, which is on its own logical volume in LVM.

To restart MySQL, you just need to do /etc/init.d/mysql restart. This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live. Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load. InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while. Moreover, the site can be sluggish for as much as a couple of hours after a MySQL restart, because its caches will be cleared and have to repopulate.

MySQL, being a database, loves RAM, loves it very, very much. The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 5G. Miscellaneous other stuff means that MySQL usually uses around 5.5G or 6G. It will usually a bit of CPU too, although not much, and of course lots and lots of disk space.

We use the packaged version of MySQL, to wit:

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.67, for debian-linux-gnu (x86_64) using readline 5.2

FastCGI

This is the "P" in LAMP. Unlike with lighttpd above, this isn't due to creative spelling. "P" stands for "PHP", and we run PHP using FastCGI. FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it. In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.

I don't actually have much of any idea how FastCGI works. I just configured lighttpd to use FastCGI and it handles spawning the processes itself. To restart FastCGI, I just use the command above to restart lighttpd. I mentally categorize FastCGI with lighttpd as "the web server". They don't seem to die when lighttpd does, but I'm not sure lighttpd uses pre-existing ones if it's restarted or spawns new ones or what. I haven't tested. (I think it spawns new ones.) So this is mostly voodoo magic to me.

The processes are /usr/bin/php-cgi. They run as the lighttpd user right now, since lighttpd spawns them after it setuid()s. PHP is configured in /etc/php5/cgi/php.ini. It logs warnings and errors to the syslog, /var/log/syslog. FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file. Some info on XCache status (which caches PHP variables between sessions) is in the secret stuff thread in the Tech Cathedral, since it allows viewing and even deleting or changing the values of the cached variables.

FastCGI uses a crazy lot of CPU and memory. Currently it uses maybe 8G of memory and more than half our CPU at peak. There's some low-hanging fruit to be had in the CPU department, which I hope to pursue when feasible.

We use the packaged PHP version:

$ php --version
PHP 5.2.6-2ubuntu4.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Nov 26 2009 14:16:15) 
Copyright (c) 1997-2008 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
    with XCache v1.2.2, Copyright (c) 2005-2007, by mOo

git

I should write about git here. I use it to store our configuration files, my custom administration scripts, and the source code to the forums, among other things. I'm not going to write a git tutorial here, but it could be helpful to have some basic commands.

vBulletin

The one piece of software we use that's not free and open-source. Website is vbulletin.com. Not too much needs to be said here, because it's much easier to use than the rest of our software, with a web interface and everything. Some notes on our particular setup (particularly how we store mounds of hacks in git) would be worth writing up at some point.

Other stuff

Most of the rest of our software doesn't have much to discuss. It's worth noting what we've installed that's not part of distro packages. ls /usr/local/bin /usr/local/sbin yields:

/usr/local/bin:
git  git-cvsserver  gitk  git-receive-pack  git-shell  git-upload-archive  git-upload-pack  indexer  search  searchd  spelldump
/usr/local/sbin:
7zdl    bwsumm-attach  bwsumm-totals  dlswitch    lighttpd        lockupmon     reindex.sh  vbupgrade
bwsumm  bwsumm-dl      dbbackup.sh    groupedsum  lighttpd-angel  profile-proc  rotate.sh

Most of the scripts in the latter directory are written by me. The upshot is that there are only three pieces of software where we aren't using the packaged version:

git
Due to a bug that was throwing weird error messages during rebases. We could switch back to the packaged version if we upgrade to the next OS version, or maybe we could do it right now. I don't remember what versions had the error.
lighttpd
Upgraded to the latest version so that the site would die a bit less under heavy I/O, as noted above.
Sphinx
This is indexer, search, searchd. We don't use the packaged version because there is none. There should be a Sphinx package in Ubuntu 10.04, but we're well behind that.