Welcome to the TWC Wiki! You are not logged in. Please log in to the Wiki to vote in polls, change skin preferences, or edit pages. See HERE for details of how to LOG IN.

Difference between revisions of "TWC:Sysadmin info"

From TWC Wiki
Jump to navigationJump to search
m (lighttpd: We don't want to truncate the log file every time, do we?)
m (typos)
 
(12 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
If you don't have at least shell access to the TWC server, this page is almost certainly useless to you.  If you at least know what shell access ''is'', you're probably capable of understanding at least some of it.  Maybe you'll find it interesting, good for you.
 
If you don't have at least shell access to the TWC server, this page is almost certainly useless to you.  If you at least know what shell access ''is'', you're probably capable of understanding at least some of it.  Maybe you'll find it interesting, good for you.
 +
 +
Rewritten on January 24, 2010.  (It indeed was updated approximately never.)
 +
 +
Updated and expanded on June 27, 2010.  (This is getting to be slightly more often than never.)
  
 
==Hardware==
 
==Hardware==
We currently have one dedicated server.  We own none of the hardware, we rent it from [http://www.theplanet.com The Planet] on a monthly basis.  It's their "Xeon 3060 - RAID" model with some adjustments. 8 GB RAM is the only difference from stock, I think.  Since we didn't buy the hardware, the exact models and stuff are mostly from orbit.theplanet.com, the magical control panel they give you, but I don't totally trust it.  Some of it is of course obvious from running the right commands (<code>cat /proc/cpuinfo</code>, <code>free -m</code>, . . .).
+
We currently have one dedicated server, known as thor.  We own all of the hardware, bought with hard-earned pennies mostly donated by the membership in a 2009 donation drive.  I've provided the command necessary to get info about the component in parentheses, where applicable.
 +
 
 +
* [http://www.newegg.com/Product/Product.aspx?Item=N82E16816152093 TYAN B4980G24V4H 1U Barebone Server NVIDIA nForce Professional 3600 Quad 1207(F) Four AMD Opteron 1.0GHz Hyper Transport FSB - Retail]
 +
* Four Quad-Core AMD Opteron(tm) Processor 8346 HE, 1.8 GHz (<code>cat /proc/cpuinfo</code>)
 +
* 16 GB RAM (<code>free -m</code>)
 +
* Two 500 GB WDC WD5000AACS-0 (apparently [http://www.newegg.com/Product/Product.aspx?Item=N82E16822136149 this]?) and two 300 GB [http://www.newegg.com/Product/Product.aspx?Item=N82E16822136322&nm_mc=OTC-RSS&cm_mmc=OTC-RSS-_-Internal%20Hard%20Drives-_-Western%20Digital-_-N82E16822136322 WD3000HLFS-0 VelociRaptors] for disks (<code>scsiadd -p</code>), in Linux software RAID (<code>cat /proc/mdstat</code>).
  
* "Dell Single socket 1067FSB - Quad Core Capable PowerEdge 840" motherboard
+
Our bandwidth is $50/Mbps/month at the 95th percentile, and we're paying for 10 Mbps uncapped. In principle, if we use more than 10 Mbps at the 95th percentile, we get charged more than $500 for the month.
* One dual-core Intel Xeon Conroe 3060 at 2.40 GHz
 
* 8 GB RAM, DDR2 667 ECC
 
* Four 250 GB, 7200 RPM SATA drives (Western Digital WD2500JD) in RAID 10, for 500 GB usable disk space (they say it's SATA II, but peak throughput from <code>hdparm -t</code> is just a hair under 1.5 Gbps, so I wonder . . . but it doesn't matter, anyway, since our disk throughput is orders of magnitude less)
 
* "3ware 4 channel SATA2 w/ battery backup 9550SX-4LP" RAID controller
 
* 2500 GB monthly bandwidth
 
* Some indeterminate-speed network uplink, probably 100 Mbps?  It might matter if we get another server.
 
  
==Software==
+
==Linux==
 +
The "L" in "LAMP".  We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way.  (Some people who currently have root access might hold different opinions.)  In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie.  The same logic went for the new new server (thor), since although GrnEyedDvl was around by that point, Simetrical was the one familiar with the existing setup.
  
===Linux===
+
To be more precise, we use Ubuntu, mainly because it has a huge and up-to-date package repository, and also Simetrical happens to be familiar with it because he uses it at homeBoth of the old servers were RHEL5, and we have no regrets about switching to Ubuntu.  The output of <code>lsb_release -a</code> is currently
The "L" in "LAMP".  We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way(Some people who currently have root access might hold different opinions.)  In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie.
 
  
The exact version is Red Hat Enterprise Linux 5The exacter version, from <code>/etc/redhat-release</code>, is "Red Hat Enterprise Linux Server release 5.1 (Tikanga)"<code>uname -a</code> gives "Linux odin.twcenter.net 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux".
+
No LSB modules are available.
 +
Distributor ID: Ubuntu
 +
  Description: Ubuntu 10.04 LTS
 +
Release: 10.04
 +
  Codename: lucid
  
Red Hat sucks and I wish we had used Ubuntu. The packages are paleolithic here. 2.6.18 was released in September 2006I want large argument lists, dammit! But The Planet doesn't offer Ubuntu as an option, and I had enough trouble repartitioning over SSH without having to try installing a new OS too . . .
+
We were originally on 8.04 LTS (Hardy), but switched to 8.10 to fix kernel problems.  Later we [http://www.twcenter.net/forums/showthread.php?t=342832 upgraded to 10.04 LTS], although under rather [http://www.twcenter.net/forums/showthread.php?t=365981 unfortunate circumstances]We kind of had to upgrade because support on the non-LTS version 8.10 expired, but a major goal was also to improve disk performance by switching from ext3 to ext4. Ironically, I/O on the new OS version was [http://www.twcenter.net/forums/showthread.php?t=368483 slower].
  
===lighttpd===
+
==lighttpd==
 
The "A" in "LAMP".  Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP".  We used to use Apache, but it used way too much memory.  We [http://www.twcenter.net/forums/showthread.php?t=148910 switched] in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back.  Fiddling around with random stuff for fun can pay off sometimes.
 
The "A" in "LAMP".  Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP".  We used to use Apache, but it used way too much memory.  We [http://www.twcenter.net/forums/showthread.php?t=148910 switched] in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back.  Fiddling around with random stuff for fun can pay off sometimes.
  
lighttpd's config file is at [http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=blob;f=lighttpd/lighttpd.conf;hb=HEAD <code>/etc/lighttpd/lighttpd.conf</code>] ([http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=history;f=lighttpd/lighttpd.conf;hb=HEAD history]).  Documentation is at [http://trac.lighttpd.net/trac/wiki#Documentation the lighttpd website] (config file docs are the most useful).  The binary is at <code>/usr/sbin/lighttpd.conf</code>.  lighttpd runs as user lighttpd.  Access and error logs are in <code>/var/log/lighttpd/</code>, automagically rotated by logrotate.  A handy web-based server status thing is available in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).
+
lighttpd's config file is at <code>/etc/lighttpd/lighttpd.conf</code>.  Documentation is at [http://trac.lighttpd.net/trac/wiki#Documentation the lighttpd website] (config file docs are the most useful).  The binary is at <code>/usr/sbin/lighttpd</code>.  lighttpd runs as user www-data.  Access and error logs are in <code>/var/log/lighttpd/</code>, automagically rotated by logrotate.  A handy web-based server status thing is available in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).
  
To restart lighttpd in our current setup, for instance due to new <code>lighttpd.conf</code> or <code>php.ini</code>, I use this:
+
To restart lighttpd in our current setup, for instance due to new <code>lighttpd.conf</code> or <code>php.ini</code>, use this:
  
  sudo killall php-cgi && sudo service lighttpd restart 2>>/var/log/fastcgi/error.log
+
  sudo service lighttpd restart
  
This 1) kills the FastCGI processes (probably not necessary if you don't need to restart FastCGI too), 2) restarts lighttpd using the system restart scripts, and 3) redirects stderr for (2) to an appropriate file because FastCGI seems to spam your terminal with PHP warnings forever after if you don't.  (I used to redirect it to <code>/dev/null</code>, but apparently error logs are actually useful, who'd have thought?  The errors logged aren't dated, but are still useful with <code>tail -f</code>.)
+
Actually, replace "lighttpd" with the service name (mysql, sphinxsearch, memcached, . . .) and that's basically how all service restarts work.
  
 
Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something.  So don't get too trigger-happy.  If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother.  Restarts of lighttpd are fairly rare anyway.
 
Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something.  So don't get too trigger-happy.  If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother.  Restarts of lighttpd are fairly rare anyway.
Line 41: Line 48:
  
 
  $ lighttpd -v
 
  $ lighttpd -v
  lighttpd-1.4.18 (ssl) - a light and fast webserver
+
  lighttpd/1.4.26 (ssl) - a light and fast webserver
  Build-Date: Oct 1 2007 23:50:36
+
  Build-Date: Apr 6 2010 11:42:30
  
This is because it's installed from packages and not compiled from source, and as noted, RHEL 5 has pretty old packages.  I've run into a couple of annoying limitations in 1.4 that are fixed in 1.5, but not enough to make me want to bother installing from source, or finding a package repo that has a more recent version.
+
We previously [http://www.twcenter.net/forums/showthread.php?p=6493596#post6493596 compiled the latest version from source] to take advantage of better handling of out-of-FastCGI errors, but then we upgraded to Lucid and this became unnecessary.
  
===MySQL===
+
==MySQL==
 
This is the "M" in LAMP.  MySQL is nice in some ways, like the server process is very resilient and tends not to ever crash or go into hysterics, or if it does it automatically restarts.  In terms of features, such as the ability to optimize queries or use indexes in a better than semi-retarded fashion, it's probably worse than PostgreSQL.  But, vBulletin only supports MySQL, like lots of other web apps, so that's what we use.
 
This is the "M" in LAMP.  MySQL is nice in some ways, like the server process is very resilient and tends not to ever crash or go into hysterics, or if it does it automatically restarts.  In terms of features, such as the ability to optimize queries or use indexes in a better than semi-retarded fashion, it's probably worse than PostgreSQL.  But, vBulletin only supports MySQL, like lots of other web apps, so that's what we use.
  
 
Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine).  "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares".  InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads.  Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway.  (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)
 
Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine).  "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares".  InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads.  Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway.  (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)
  
MySQL's config file is at [http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=blob;f=my.cnf;hb=HEAD <code>/etc/my.cnf</code>] ([http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=history;f=my.cnf;hb=HEAD history]).  Its general-purpose log is at <code>/var/log/mysql</code> (or <code>/var/log/mysqld.log</code>, which is a symlink to that).  The slow query log is at <code>/var/log/mysql-slow</code>.  (I apparently had some stupid ideas about log naming conventions when setting this up, or lack thereof.)  The slow query log is probably semi-useless since we don't have the [http://www.mysqlperformanceblog.com/2008/04/20/updated-msl-microslow-patch-installation-walk-through/ microslow patch].  The actual databases are in <code>/var/lib/mysql</code>, which is on its own partition in LVM.
+
MySQL's config is in <code>/etc/mysql</code> (primary config file: <code>/etc/mysql/my.cnf</code>).  The error logging goes to <code>/var/log/mysql/error.log</code>.  The actual databases are in <code>/var/lib/mysql</code>, which is on its own logical volume in LVM.
  
To restart MySQL, you just need to do <code>service mysql restart</code>.  This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live.  ''Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load.''  InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while.
+
To restart MySQL, you just need to do <code>sudo service mysql restart</code>.  This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live.  ''Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load.''  InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while.  Moreover, the site can be sluggish for as much as a couple of hours after a MySQL restart, because its caches will be cleared and have to repopulate.
  
MySQL, being a database, loves RAM, loves it very, very much.  The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 4G, fully half of our RAM.  Miscellaneous other stuff means that MySQL usually uses around 4.5G.  It will usually use a decent fraction of the CPU too, and of course lots and lots of disk space.
+
MySQL, being a database, loves RAM, loves it very, very much.  The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 5G (<code>grep innodb_buffer_pool_size /etc/mysql/my.cnf</code>).  Miscellaneous other stuff means that MySQL usually uses around 5.5G or 6G.  It will usually a bit of CPU too, although not much, and of course lots and lots of disk space.
  
 
We use the packaged version of MySQL, to wit:
 
We use the packaged version of MySQL, to wit:
  
 
  $ mysql --version
 
  $ mysql --version
  mysql  Ver 14.12 Distrib 5.0.22, for redhat-linux-gnu (x86_64) using readline 5.0
+
  mysql  Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1
 +
 
 +
To backup particular tables in the DB (often needed or helpful if upgrading a vb plugin) use the following command:
 +
$ mysqldump --add-drop-table --single-transaction -u ''username'' -p ''database_name'' ''table1'' [''table2 table3 . . . ''] > ''outputfile''.sql
 +
 
 +
To find the relevant db and login information for the forums, see /var/www/forums/includes/config.php on the server. This requires www-data access.
 +
 
 +
=== Loading a backup ===
 +
 
 +
Database backups are stored in /var/local/backup/db/, which should be readable to everyone in the www-data group (i.e., everyone with shell access).  The backups contain a copy of the whole database, not just totalwar_vb, and loading them will wipe out all existing MySQL data, so only do this if you're sure you know it's what you want to do.  There are two likely cases: either you're reloading the passive master from the active master, or both servers' databases were so horribly corrupted that you have no choice but to restore from backup.  If you're restoring to the passive master's database, you need to be root, since the passive master is read-only to non-root users.
 +
 
 +
First copy the backup file someplace.  If you're copying a backup from one server to the other, it's best to copy the whole file first instead of piping it so that it won't fail midway if the network connection fails for some reason (although that's somewhat unlikely for a server one hop away).  scp works for copying the file.  I'll assume the file is in the current working directory and is named data.sql.gz.
 +
 
 +
Next, stop the slave ''on both servers'' with
 +
 
 +
sudo bash -c 'HOME=/root mysql -e "STOP SLAVE"'
 +
 
 +
The funny incantation ensures that your credentials are loaded from root's home directory, not your own, so you're running the command as the root MySQL user.  <big>'''You should never ever run commands as the root MySQL user, because that means you have the SUPER privilege and can write to the passive master, which will copy the writes to the active master and horribly corrupt everything.  Never ever run any command as MySQL root unless you first shut down the slaves on ''both'' servers.  Only run mysql as your own user using the credentials from your own ~/.my.cnf file, never run it as root, except if you've stopped the slave on both servers.'''</big>
 +
 
 +
Now run this command to be double super sure you stopped both slaves:
 +
 
 +
for SERVER in thor.twcenter.net mjollnir.twcenter.net; do sudo ssh $SERVER 'HOME=/root mysql -te "SHOW SLAVE STATUS"'; done
 +
 
 +
This will produce a whole mess of output because it wraps a lot, but carefully inspect it to make sure that Slave_IO_Running and Slave_SQL_Running are both "No" in ''both'' of the two output tables (you might have to scroll up a bunch to get to the earlier one).
 +
 
 +
Now do this, '''which will completely destroy the database on the ''other'' server as well as the current one if you didn't stop both slaves''':
 +
 
 +
pv data.sql.gz | gunzip | sudo bash -c 'HOME=/root mysql'
 +
 
 +
This will start loading the database.  It will take a few hours.  pv prints out a handy progress bar, but its time estimate is going to undershoot, because it takes MySQL longer to copy data as it copies more data (inserts take more than linear time).  In March 2011, loading a backup on mjollnir when it was otherwise idle took two hours and 14 minutes.
 +
 
 +
If everything has finished with no errors, you want to reset the slaves.  If you were reloading the database due to total corruption of both servers, well, ideally you should be replaying the appropriate binary log at this point, but I'm not going to write detailed instructions for all this because it's not what I'm doing right now as I'm writing this tutorial, and I don't want to write down hypotheticals, and I hope this never happens again anyway and that if it does I'm personally available to fix it or at least help.
 +
 
 +
What I'm doing right now is reloading the passive master from the active master, so the instructions for that follow.  You want to set the passive master's slave to point at whatever point in time this backup was taken.  This will not work if the backup was one you took yourself instead of taking it from /var/local/backup/db, unless you used the right mysqldump options, which /usr/local/sbin/dbbackup.sh does.  If the dump does have the info, take a look with
 +
 
 +
zcat data.sql.gz | less
 +
 
 +
Near the top you'll find something like:
 +
 
 +
--
 +
-- Position to start replication or point-in-time recovery from
 +
--
 +
 +
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.001573', MASTER_LOG_POS=106;
 +
 
 +
That "CHANGE MASTER" statement is what you want to run.  On the passive master (the one you just loaded the dump on), run the following:
 +
 
 +
sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO...\""
 +
 
 +
with the appropriate CHANGE MASTER statement inserted, as copied out of the backup file you used.
 +
 
 +
Then you need to reset the active master's slave position.  (This would theoretically be unnecessary if we had turned off binary logging for the database reload, but I'm nervous so I'd have done this anyway.)  Since the passive master is read-only, it won't be running any statements except ones from the active master, which the active master will ignore when they bounce back to it, so it doesn't really matter what exact position the active master is set to.  On the ''passive'' master (where you loaded the backup), run
 +
 
 +
mysql -te 'SHOW MASTER STATUS'
 +
 
 +
and then on the ''active'' master (which is the one hopefully running the actual site right now), run
 +
 
 +
sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...\""
 +
 
 +
where you fill in MASTER_LOG_FILE and MASTER_LOG_POS values from the "File" and "Position" of the SHOW MASTER STATUS you just ran.
 +
 
 +
Now both slaves should be set to point to the right place. Triple-check everything, then run this on both servers:
 +
 
 +
sudo bash -c "HOME=/root mysql -e 'START SLAVE'"
  
I've been pondering getting [http://www.percona.com/mysql/5.0.68/RPM/rhel5/ Percona's patched 5.0 versions], for their nifty features like the enhanced microslow patch plus the extra features added since 5.0.22 (like [http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html SHOW PROFILE]), but I really can't justify the possible downtime or other issues that might arise from an upgrade.
+
This will completely destroy both databases if you messed up any of the above steps, but I hope I've scared you enough by now that I don't need to use bold text when I say that.
  
===FastCGI===
+
==FastCGI==
 
This is the "P" in LAMP.  Unlike with lighttpd above, this isn't due to creative spelling.  "P" stands for "PHP", and we run PHP using FastCGI.  FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it.  In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.
 
This is the "P" in LAMP.  Unlike with lighttpd above, this isn't due to creative spelling.  "P" stands for "PHP", and we run PHP using FastCGI.  FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it.  In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.
  
I don't actually have much of any idea how FastCGI works.  I just configured lighttpd to use FastCGI and it handles spawning the processes itself.  To restart FastCGI, I just use the command [[#lighttpd|above]] to restart lighttpd.  I mentally categorize FastCGI with lighttpd as "the web server".  They don't seem to die when lighttpd does, but I'm not sure lighttpd uses pre-existing ones if it's restarted or spawns new ones or what.  I haven't tested.  So this is mostly voodoo magic to me.  If we get another server, I'll probably have to figure out how to actually use FastCGI instead of getting lighttpd to do it all for me.
+
I don't actually have much of any idea how FastCGI works.  I just configured lighttpd to use FastCGI and it handles spawning the processes itself.  To restart FastCGI, I just use the command [[#lighttpd|above]] to restart lighttpd.  I mentally categorize FastCGI with lighttpd as "the web server".  So this is mostly voodoo magic to me.
 
 
The processes are <code>/usr/bin/php-cgi</code>.  They run as the lighttpd user right now, since lighttpd spawns them after it setuid()s.  PHP is configured in [http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=blob;f=php.ini;hb=HEAD /etc/php.ini] ([http://www.twcenter.net/gitweb/gitweb.cgi?p=etc;a=history;f=php.ini;hb=HEAD history]).  It logs warnings and errors to <code>/var/log/php_error</code>.  FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file.  Some info on XCache status (which caches PHP variables between sessions) is in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, since it allows viewing and even deleting or changing the values of the cached variables.
 
  
Each FastCGI daemon seems to use about 70M shared memory and 40M private memory.  The private memory is largely due to <code>memory_limit</code> in <code>php.ini</code>.  This limit has the delightful behavior that if it's too low, PHP processes will artificially OOM even if there's plenty of free memory; whereas if it's too high, as far as I know, the extra memory will go totally unused, I don't think it's free()d between requests. That seems exceptionally stupid: hopefully I'm wrong on the second part.  Anyway, it's a fair amount of memory all told, although a lot less than MySQL.
+
The processes are <code>/usr/bin/php-cgi</code>.  They run as the www-data user right now, since lighttpd spawns them after it setuid()s.  PHP is configured in <code>/etc/php5/cgi/php.ini</code>.  It logs warnings and errors to the syslog, <code>/var/log/syslog</code>.  FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file.  Some info on APC status (which caches PHP files) is in [http://www.twcenter.net/forums/showthread.php?t=114936 the secret stuff thread] in the Tech Cathedral, since it allows viewing and even changing the values of the cached stuff.
  
The really fun bit is CPU.  PHP uses a crazy lot of CPU.  By which I mean, all of it, and then ten or twenty times more than that during peak load(More precisely, this CPU usage is due to vBulletin, which of course is written in PHP.)  We might want to get another server solely to provide a home for some of the FastCGI processes.  This is our bottleneck right now.
+
FastCGI uses a crazy lot of CPU and memoryCurrently it uses maybe 8G of memory and more than half our CPU at peak.  There's some low-hanging fruit to be had in the CPU department, which I hope to pursue when feasible.
  
 
We use the packaged PHP version:
 
We use the packaged PHP version:
  
 
  $ php --version
 
  $ php --version
  PHP 5.1.6 (cli) (built: Jun 12 2008 05:02:36)  
+
  PHP 5.3.2-1ubuntu4.2 with Suhosin-Patch (cli) (built: May 13 2010 20:03:45)  
  Copyright (c) 1997-2006 The PHP Group
+
  Copyright (c) 1997-2009 The PHP Group
  Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies
+
  Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies
    with XCache v1.2.2, Copyright (c) 2005-2007, by mOo
+
 
 +
== memcached ==
 +
We [http://www.twcenter.net/forums/showthread.php?t=366508 started using memcached] a little while ago, instead of XCache/APC cache.  Actually we still use APC cache for one thing because we're lazy.  Not much to say about it.
 +
 
 +
== git ==
 +
We have some git repositories lying around in various places: /etc, /usr/local/sbin, /var/www, /var/www/forums, and /var/www/fpss, for instance.  The first two are owned by root, the second two are owned by www-data.  The first contains site config info like the password file, so is secret; the last two contain copyrighted code, so are also secret.  /usr/local/sbin and /var/www can be viewed via [http://www.twcenter.net/gitweb/gitweb.cgi gitweb], so you can track all the exciting changes I make.  We also have a repository in /usr/local/src/vbulletin that I move new versions into, so that the /var/www/forums git repo can use it as a reference point for the dark art of <code>git rebase</code>.  And there are some repos in /home/aryeh . . . and maybe others lurking elsewhere.  One never knows.
 +
 
 +
The purpose of git is to be [http://en.wikipedia.org/wiki/Revision_control version control software].  This records when all changes were made, so 1) we can figure out who made each change (hint: me), 2) we can figure out when and why a change was made (my memory isn't perfect, and I might be hit by a bus), and 3) it's easy to undo changes if they prove problematic (this happens a lot).  Every time some files are changed, I try to remember to record a commit at least briefly describing the changes I made.  Everyone else with shell access should ideally do this too.
 +
 
 +
The thread [http://www.twcenter.net/forums/showthread.php?t=370420 Using git on a production server] contains a considerable amount of useful info.  You have to create the file ~/.gitconfig with the proper info before using git (see that thread for a sample file).  Some basic git commands are:
 +
 
 +
; <code>git log</code>
 +
: This shows you a list of all commits, nicely paginated, with the most recent ones first.  You can do <code>git log -p</code> to get the exact changes that each commit made (i.e., the diffs).
 +
; <code>git diff</code>
 +
: Lists what changes have been made, but not yet recorded in git.  Normally this should be empty, since people are committing everything to git when they make a change, right?  (Okay, kind of tricky if you only have FTP access, so I wind up committing that stuff.)  This will ignore any newly-created files: those have to be explicitly committed before they'll show up.
 +
; <code>git commit</code>
 +
: Creates a new commit.  You can do <code>git commit -a</code> to commit all the changes listed by <code>git diff</code>, or you can list exactly which files you want to commit, like <code>git commit file1 file2 file3</code>.  (Actually you can commit parts of files too, using <code>git add -i</code>, but let's not go there.)  Note that you have to run this command as the repository owner, so prefix it with "sudo" if the repo is owned by root, or "sudo -u www-data" if it's owned by www-data.
 +
 
 +
/etc is currently (November 2010) sort of mirrored, in an ad hoc fashion, between thor and mjollnir.  If you make a change on thor that should also be on mjollnir, the current way to copy it is
 +
 
 +
# Log in on mjollnir and go to /etc.
 +
# Do <code>sudo git fetch</code> to fetch the /etc info from thor.  (This currently isn't set up to work in reverse, so make shared changes on thor first, not mjollnir.)
 +
# For each commit you want to copy, do <code>sudo git cherry-pick ''abcdef''</code>, where <code>''abcdef''</code> is the hexadecimal identifier for the commit.  Start with the earlier commits and work your way up to later commits.
 +
# You might get conflicts, like if you update /etc/aliases and commit the new aliases.db in one commit on thor.  Either commit the new /etc/aliases and /etc/aliases.db in different commits and then only copy the aliases change on mjollnir (then run newaliases separately there); or else do <code>sudo newaliases; sudo git add aliases.db; sudo git commit</code> to resolve the conflict.  (Todo: can we just use the same aliases.db on thor and mjollnir to avoid these conflicts?)
 +
 
 +
== vBulletin ==
 +
The one piece of software we use that's not free and open-source.  Website is [http://www.vbulletin.com vbulletin.com].  Not too much needs to be said here, because it's much easier to use than the rest of our software, with a web interface and everything.  The interesting point is that our vBulletin copy has lots of extra files added, and lots of existing files hacked.  You can see for yourself by doing <code>git log</code> in /var/www/forums.  There are dozens of changes going back to 2008.  When we upgrade, git will semi-magically transfer all of the changes to the new versions, if coaxed by suitable incantations.  <code>/usr/local/sbin/vbupgrade</code> does this, but if there's a conflict, you'll have to know what you're doing to resolve it.  (Or, alternatively, just get me to do the upgrade.)  Minor version changes shouldn't cause conflicts, so any root should be able to do those by [http://www.twcenter.net/forums/showthread.php?p=6978347#post6978347 following the procedure].  Although you'll still have to fix template conflicts.
 +
 
 +
== Disk administration ==
 +
This comes up a bunch, so I'll document some of this too.
 +
 
 +
=== mdadm: software RAID ===
 +
mdadm can be used to control our RAID array.  One basic useful command doesn't actually use mdadm at all: <code>cat /proc/mdstat</code> (sudo not required) tells you the current status of the RAID.  Sample output:
 +
 
 +
$ cat /proc/mdstat
 +
Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
 +
md0 : active raid1 sde1[0] sdb1[1]
 +
      9767424 blocks [2/2] [UU]
 +
      bitmap: 1/1 pages [4KB], 65536KB chunk
 +
 +
md1 : active raid1 sde2[0] sdb2[1]
 +
      478616448 blocks [2/2] [UU]
 +
      bitmap: 7/29 pages [28KB], 8192KB chunk
 +
 +
md2 : active raid10 sdc3[0]
 +
      292230144 blocks super 1.1 512K chunks 2 far-copies [2/1] [U_]
 +
      bitmap: 3/3 pages [12KB], 65536KB chunk
 +
 +
unused devices: <none>
 +
 
 +
The RAID arrays are the things named "md0", "md1", and "md2".  It tells you what type of RAID that array uses (here raid1 or raid10), and what disks are being used for each array (sde1, sdb1, sde2, etc.).  The second line for each device contains some critical info, namely whether there are any missing devices.  The [UU] in the first two arrays here tells us that the array is supposed to have two devices and they're both present.  The [U_] in the last one says that it's supposed to have two devices, but one (represented by an underscore) is missing.  This is bad, since it means that if the second disk fails, we'll lose data.
 +
 
 +
If RAID is resyncing, that file will contain information about how fast the resync is going and when it's expected to complete.  In this case, <code>watch cat /proc/mdstat</code> is useful: it will display the file's contents on your screen and refresh it every few seconds, so the resync progress shows up in real time.
 +
 
 +
For actually using <code>mdadm</code>, the most important thing to know is how to handle a failed disk.  First you need to figure out which array had a device failure.  One disk failure might mean multiple device failures, if one disk is part of multiple arrays.  For instance, in the above output, failure of sdb would mean that sdb1 and sdb2 would both fail, and you'd need to handle both failures separately.
 +
 
 +
Let's say that you check /proc/mdstat and find that sdg1 has failed in the array md7.  It will have an "(F)" after its name on the first status line.  The first thing you need to do is remove it: <code>sudo mdadm /dev/md7 --remove failed</code>.  Then you can try re-adding it with <code>sudo mdadm /dev/md7 --add /dev/sdg1</code>.  If you're lucky, this will work, and it will start resyncing.  If not, probably sdg will have to be physically replaced, and then the "--add" command will have to be run on its successor, after that's partitioned.
 +
 
 +
For more info on mdadm, as always, check the man page.  There are a ton of options, and there's no room to explain them all here.
 +
 
 +
=== LVM: logical volume management ===
 +
Most of our filesystems don't lie directly on top of a RAID array.  Instead, we have the RAID array on top of a disk partition, then on top of the RAID array we have an LVM "volume group" (VG).  Within the volume group, we have "logical volumes" (LVs), which we then put filesystems on.  The advantage here is that we can expand a logical volume and the filesystem on top of it without repartitioning.  Also, a volume group can have multiple "physical volumes" (PVs), i.e., it can be placed on top of a bunch of devices.  Logical volumes can be moved between the devices while they're in use, so we don't need to shut down the server to move data around when we change our disk configuration.  There are other uses of LVM too, like snapshots, although we don't use those right now.
 +
 
 +
Three useful commands (must be run as root) are <code>vgdisplay</code>, <code>pvdisplay</code>, and <code>lvdisplay</code>, to show info about current VGs, PVs, and LVs respectively:
 +
 
 +
$ sudo vgdisplay
 +
  --- Volume group ---
 +
  VG Name              LVM
 +
  System ID           
 +
  Format                lvm2
 +
  Metadata Areas        2
 +
  Metadata Sequence No  65
 +
  VG Access            read/write
 +
  VG Status            resizable
 +
  MAX LV                0
 +
  Cur LV                6
 +
  Open LV              6
 +
  Max PV                0
 +
  Cur PV                2
 +
  Act PV                2
 +
  VG Size              735.13 GiB
 +
  PE Size              4.00 MiB
 +
  Total PE              188194
 +
  Alloc PE / Size      122112 / 477.00 GiB
 +
  Free  PE / Size      66082 / 258.13 GiB
 +
  VG UUID              dDcYBl-qdYn-6TJ3-8MwN-AfYH-l0SD-9GamfZ
 +
   
 +
0 11:46:25 ~$ sudo pvdisplay
 +
  --- Physical volume ---
 +
  PV Name              /dev/md1
 +
  VG Name              LVM
 +
  PV Size              456.44 GiB / not usable 2.88 MiB
 +
  Allocatable          yes
 +
  PE Size              4.00 MiB
 +
  Total PE              116849
 +
  Free PE              45169
 +
  Allocated PE          71680
 +
  PV UUID              WKOVus-9kwU-8wpz-W90s-pmt1-gJAR-wrAw1Y
 +
   
 +
  --- Physical volume ---
 +
  PV Name              /dev/md2
 +
  VG Name              LVM
 +
  PV Size              278.69 GiB / not usable 1.00 MiB
 +
  Allocatable          yes
 +
  PE Size              4.00 MiB
 +
  Total PE              71345
 +
  Free PE              20913
 +
  Allocated PE          50432
 +
  PV UUID              4ALWTX-serI-HoHN-tzUk-jy9H-4rk4-cvXMXr
 +
   
 +
0 11:46:29 ~$ sudo lvdisplay
 +
  --- Logical volume ---
 +
  LV Name                /dev/LVM/www
 +
  VG Name                LVM
 +
  LV UUID                C1i8qg-Isi2-o35Q-2OTd-2i21-vrz9-7xwXbp
 +
  LV Write Access        read/write
 +
  LV Status              available
 +
  # open                1
 +
  LV Size                5.00 GiB
 +
  Current LE            1280
 +
  Segments              1
 +
  Allocation            inherit
 +
  Read ahead sectors    auto
 +
  - currently set to    4096
 +
  Block device          251:0
 +
   
 +
  --- Logical volume ---
 +
  LV Name                /dev/LVM/userfiles
 +
  VG Name                LVM
 +
  LV UUID                PYeTEP-rEj2-x2kF-anh7-AMUL-3Wkw-m5RDnO
 +
  LV Write Access        read/write
 +
  LV Status              available
 +
  # open                1
 +
  LV Size                95.00 GiB
 +
  Current LE            24320
 +
  Segments              3
 +
  Allocation            inherit
 +
  Read ahead sectors    auto
 +
  - currently set to    256
 +
  Block device          251:2
 +
   
 +
  --- Logical volume ---
 +
  LV Name                /dev/LVM/log
 +
  VG Name                LVM
 +
  LV UUID                sPBgWF-l5Xl-GoJW-ys8h-WXEA-Ef9k-0ZG1cN
 +
  LV Write Access        read/write
 +
  LV Status              available
 +
  # open                1
 +
  LV Size                100.00 GiB
 +
  Current LE            25600
 +
  Segments              2
 +
  Allocation            inherit
 +
  Read ahead sectors    auto
 +
  - currently set to    256
 +
  Block device          251:3
 +
   
 +
  --- Logical volume ---
 +
  LV Name                /dev/LVM/mysql
 +
  VG Name                LVM
 +
  LV UUID                LnDgLQ-9rLv-FJxc-k3pd-ngxK-xWfX-acSc1P
 +
  LV Write Access        read/write
 +
  LV Status              available
 +
  # open                1
 +
  LV Size                80.00 GiB
 +
  Current LE            20480
 +
  Segments              2
 +
  Allocation            inherit
 +
  Read ahead sectors    auto
 +
  - currently set to    4096
 +
  Block device          251:1
 +
   
 +
  --- Logical volume ---
 +
  LV Name                /dev/LVM/backup
 +
  VG Name                LVM
 +
  LV UUID                RcfHSo-ftce-OVx5-7QTZ-Wdtv-owWX-xxLIBm
 +
  LV Write Access        read/write
 +
  LV Status              available
 +
  # open                1
 +
  LV Size                182.00 GiB
 +
  Current LE            46592
 +
  Segments              1
 +
  Allocation            inherit
 +
  Read ahead sectors    auto
 +
  - currently set to    4096
 +
  Block device          251:4
 +
   
 +
  --- Logical volume ---
 +
  LV Name                /dev/LVM/sphinx
 +
  VG Name                LVM
 +
  LV UUID                Wjukbc-BxSe-swqy-wOBd-n2Yg-z8tX-oxz7e2
 +
  LV Write Access        read/write
 +
  LV Status              available
 +
  # open                1
 +
  LV Size                15.00 GiB
 +
  Current LE            3840
 +
  Segments              2
 +
  Allocation            inherit
 +
  Read ahead sectors    auto
 +
  - currently set to    4096
 +
  Block device          251:5
 +
 
 +
Most of that is probably incomprehensible, but some important parts (like "Size") should make some sense.  Notice that we have only one volume group, called "LVM", and all PVs are part of it.  This lets us freely move LVs between them.
 +
 
 +
The command to extend a logical volume is <code>sudo lvextend -L+<var>XX</var>G /dev/LVM/<var>something</var></code>, where <var>XX</var> is the number of gigabytes you want to extend it by, and <var>something</var> is the LV name.  After you do this, you'll have to do <code>sudo resize2fs /dev/LVM/<var>something</var></code> to grow the filesystem itself.  Be parsimonious here: you can grow an ext4 filesystem without downtime, but you can't shrink it without unmounting it.  We don't want to use up the free space in the volume group and then have to take the site down for half an hour to free space up.  (This is why /dev/LVM/log is 100 gigabytes: it was grown too much because access logs were using up a lot of space at that particular moment, and now it can't be shrunk without a reboot, since programs will want to write log files as long as the system is running.)
 +
 
 +
There are lots of other LVM commands that do many various things.  Check <code>man lvm</code> for more thorough documentation.
 +
 
 +
== Other software ==
 +
Most of the rest of our software doesn't have much to discuss.  We have no software installed from source right now (not counting web apps like vB), it's all packaged.  I wrote some scripts in /usr/local/sbin:
 +
 
 +
$ ls /usr/local/sbin
 +
7zdl          bwsumm-dl      dlswitch    profile-proc  vbupgrade
 +
bwsumm        bwsumm-totals  groupedsum  reindex.sh
 +
bwsumm-attach  dbbackup.sh    lockupmon  rotate.sh
 +
 
 +
Actually, a few of those are originally based on scripts I got from elsewhere, namely reindex.sh, rotate.sh, and dbbackup.sh.

Latest revision as of 09:15, 9 June 2011

I was bored and decided to put together this page of handy info for our active sysadmins, currently GED and me. I (Simetrical) started this page on September 20, 2008, and I cynically speculate that it will get updated approximately never after that, so I've tried to keep things relatively nonspecific as to exact versions and stuff, and to provide verification procedures where possible. But take it all with a grain of salt.

If you don't have at least shell access to the TWC server, this page is almost certainly useless to you. If you at least know what shell access is, you're probably capable of understanding at least some of it. Maybe you'll find it interesting, good for you.

Rewritten on January 24, 2010. (It indeed was updated approximately never.)

Updated and expanded on June 27, 2010. (This is getting to be slightly more often than never.)

Hardware

We currently have one dedicated server, known as thor. We own all of the hardware, bought with hard-earned pennies mostly donated by the membership in a 2009 donation drive. I've provided the command necessary to get info about the component in parentheses, where applicable.

Our bandwidth is $50/Mbps/month at the 95th percentile, and we're paying for 10 Mbps uncapped. In principle, if we use more than 10 Mbps at the 95th percentile, we get charged more than $500 for the month.

Linux

The "L" in "LAMP". We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way. (Some people who currently have root access might hold different opinions.) In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie. The same logic went for the new new server (thor), since although GrnEyedDvl was around by that point, Simetrical was the one familiar with the existing setup.

To be more precise, we use Ubuntu, mainly because it has a huge and up-to-date package repository, and also Simetrical happens to be familiar with it because he uses it at home. Both of the old servers were RHEL5, and we have no regrets about switching to Ubuntu. The output of lsb_release -a is currently

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 10.04 LTS
Release:	10.04
Codename:	lucid

We were originally on 8.04 LTS (Hardy), but switched to 8.10 to fix kernel problems. Later we upgraded to 10.04 LTS, although under rather unfortunate circumstances. We kind of had to upgrade because support on the non-LTS version 8.10 expired, but a major goal was also to improve disk performance by switching from ext3 to ext4. Ironically, I/O on the new OS version was slower.

lighttpd

The "A" in "LAMP". Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP". We used to use Apache, but it used way too much memory. We switched in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back. Fiddling around with random stuff for fun can pay off sometimes.

lighttpd's config file is at /etc/lighttpd/lighttpd.conf. Documentation is at the lighttpd website (config file docs are the most useful). The binary is at /usr/sbin/lighttpd. lighttpd runs as user www-data. Access and error logs are in /var/log/lighttpd/, automagically rotated by logrotate. A handy web-based server status thing is available in the secret stuff thread in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).

To restart lighttpd in our current setup, for instance due to new lighttpd.conf or php.ini, use this:

sudo service lighttpd restart

Actually, replace "lighttpd" with the service name (mysql, sphinxsearch, memcached, . . .) and that's basically how all service restarts work.

Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something. So don't get too trigger-happy. If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother. Restarts of lighttpd are fairly rare anyway.

lighttpd usually uses a few percent CPU and a few hundred megs of memory, in my experience. It runs on a single thread and doesn't run any scripts itself, so this is pretty reasonable. I wonder what the few hundred megs are for, actually, but thinking back to Apache with mod_php I'm not even going to bother looking into it.

The version is currently:

$ lighttpd -v
lighttpd/1.4.26 (ssl) - a light and fast webserver
Build-Date: Apr  6 2010 11:42:30

We previously compiled the latest version from source to take advantage of better handling of out-of-FastCGI errors, but then we upgraded to Lucid and this became unnecessary.

MySQL

This is the "M" in LAMP. MySQL is nice in some ways, like the server process is very resilient and tends not to ever crash or go into hysterics, or if it does it automatically restarts. In terms of features, such as the ability to optimize queries or use indexes in a better than semi-retarded fashion, it's probably worse than PostgreSQL. But, vBulletin only supports MySQL, like lots of other web apps, so that's what we use.

Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine). "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares". InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads. Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway. (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)

MySQL's config is in /etc/mysql (primary config file: /etc/mysql/my.cnf). The error logging goes to /var/log/mysql/error.log. The actual databases are in /var/lib/mysql, which is on its own logical volume in LVM.

To restart MySQL, you just need to do sudo service mysql restart. This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live. Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load. InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while. Moreover, the site can be sluggish for as much as a couple of hours after a MySQL restart, because its caches will be cleared and have to repopulate.

MySQL, being a database, loves RAM, loves it very, very much. The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 5G (grep innodb_buffer_pool_size /etc/mysql/my.cnf). Miscellaneous other stuff means that MySQL usually uses around 5.5G or 6G. It will usually a bit of CPU too, although not much, and of course lots and lots of disk space.

We use the packaged version of MySQL, to wit:

$ mysql --version
mysql  Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1

To backup particular tables in the DB (often needed or helpful if upgrading a vb plugin) use the following command:

$ mysqldump --add-drop-table --single-transaction -u username -p database_name table1 [table2 table3 . . . ] > outputfile.sql

To find the relevant db and login information for the forums, see /var/www/forums/includes/config.php on the server. This requires www-data access.

Loading a backup

Database backups are stored in /var/local/backup/db/, which should be readable to everyone in the www-data group (i.e., everyone with shell access). The backups contain a copy of the whole database, not just totalwar_vb, and loading them will wipe out all existing MySQL data, so only do this if you're sure you know it's what you want to do. There are two likely cases: either you're reloading the passive master from the active master, or both servers' databases were so horribly corrupted that you have no choice but to restore from backup. If you're restoring to the passive master's database, you need to be root, since the passive master is read-only to non-root users.

First copy the backup file someplace. If you're copying a backup from one server to the other, it's best to copy the whole file first instead of piping it so that it won't fail midway if the network connection fails for some reason (although that's somewhat unlikely for a server one hop away). scp works for copying the file. I'll assume the file is in the current working directory and is named data.sql.gz.

Next, stop the slave on both servers with

sudo bash -c 'HOME=/root mysql -e "STOP SLAVE"'

The funny incantation ensures that your credentials are loaded from root's home directory, not your own, so you're running the command as the root MySQL user. You should never ever run commands as the root MySQL user, because that means you have the SUPER privilege and can write to the passive master, which will copy the writes to the active master and horribly corrupt everything. Never ever run any command as MySQL root unless you first shut down the slaves on both servers. Only run mysql as your own user using the credentials from your own ~/.my.cnf file, never run it as root, except if you've stopped the slave on both servers.

Now run this command to be double super sure you stopped both slaves:

for SERVER in thor.twcenter.net mjollnir.twcenter.net; do sudo ssh $SERVER 'HOME=/root mysql -te "SHOW SLAVE STATUS"'; done

This will produce a whole mess of output because it wraps a lot, but carefully inspect it to make sure that Slave_IO_Running and Slave_SQL_Running are both "No" in both of the two output tables (you might have to scroll up a bunch to get to the earlier one).

Now do this, which will completely destroy the database on the other server as well as the current one if you didn't stop both slaves:

pv data.sql.gz | gunzip | sudo bash -c 'HOME=/root mysql'

This will start loading the database. It will take a few hours. pv prints out a handy progress bar, but its time estimate is going to undershoot, because it takes MySQL longer to copy data as it copies more data (inserts take more than linear time). In March 2011, loading a backup on mjollnir when it was otherwise idle took two hours and 14 minutes.

If everything has finished with no errors, you want to reset the slaves. If you were reloading the database due to total corruption of both servers, well, ideally you should be replaying the appropriate binary log at this point, but I'm not going to write detailed instructions for all this because it's not what I'm doing right now as I'm writing this tutorial, and I don't want to write down hypotheticals, and I hope this never happens again anyway and that if it does I'm personally available to fix it or at least help.

What I'm doing right now is reloading the passive master from the active master, so the instructions for that follow. You want to set the passive master's slave to point at whatever point in time this backup was taken. This will not work if the backup was one you took yourself instead of taking it from /var/local/backup/db, unless you used the right mysqldump options, which /usr/local/sbin/dbbackup.sh does. If the dump does have the info, take a look with

zcat data.sql.gz | less

Near the top you'll find something like:

--
-- Position to start replication or point-in-time recovery from
--

-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.001573', MASTER_LOG_POS=106;

That "CHANGE MASTER" statement is what you want to run. On the passive master (the one you just loaded the dump on), run the following:

sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO...\""

with the appropriate CHANGE MASTER statement inserted, as copied out of the backup file you used.

Then you need to reset the active master's slave position. (This would theoretically be unnecessary if we had turned off binary logging for the database reload, but I'm nervous so I'd have done this anyway.) Since the passive master is read-only, it won't be running any statements except ones from the active master, which the active master will ignore when they bounce back to it, so it doesn't really matter what exact position the active master is set to. On the passive master (where you loaded the backup), run

mysql -te 'SHOW MASTER STATUS'

and then on the active master (which is the one hopefully running the actual site right now), run

sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...\""

where you fill in MASTER_LOG_FILE and MASTER_LOG_POS values from the "File" and "Position" of the SHOW MASTER STATUS you just ran.

Now both slaves should be set to point to the right place. Triple-check everything, then run this on both servers:

sudo bash -c "HOME=/root mysql -e 'START SLAVE'"

This will completely destroy both databases if you messed up any of the above steps, but I hope I've scared you enough by now that I don't need to use bold text when I say that.

FastCGI

This is the "P" in LAMP. Unlike with lighttpd above, this isn't due to creative spelling. "P" stands for "PHP", and we run PHP using FastCGI. FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it. In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.

I don't actually have much of any idea how FastCGI works. I just configured lighttpd to use FastCGI and it handles spawning the processes itself. To restart FastCGI, I just use the command above to restart lighttpd. I mentally categorize FastCGI with lighttpd as "the web server". So this is mostly voodoo magic to me.

The processes are /usr/bin/php-cgi. They run as the www-data user right now, since lighttpd spawns them after it setuid()s. PHP is configured in /etc/php5/cgi/php.ini. It logs warnings and errors to the syslog, /var/log/syslog. FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file. Some info on APC status (which caches PHP files) is in the secret stuff thread in the Tech Cathedral, since it allows viewing and even changing the values of the cached stuff.

FastCGI uses a crazy lot of CPU and memory. Currently it uses maybe 8G of memory and more than half our CPU at peak. There's some low-hanging fruit to be had in the CPU department, which I hope to pursue when feasible.

We use the packaged PHP version:

$ php --version
PHP 5.3.2-1ubuntu4.2 with Suhosin-Patch (cli) (built: May 13 2010 20:03:45) 
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

memcached

We started using memcached a little while ago, instead of XCache/APC cache. Actually we still use APC cache for one thing because we're lazy. Not much to say about it.

git

We have some git repositories lying around in various places: /etc, /usr/local/sbin, /var/www, /var/www/forums, and /var/www/fpss, for instance. The first two are owned by root, the second two are owned by www-data. The first contains site config info like the password file, so is secret; the last two contain copyrighted code, so are also secret. /usr/local/sbin and /var/www can be viewed via gitweb, so you can track all the exciting changes I make. We also have a repository in /usr/local/src/vbulletin that I move new versions into, so that the /var/www/forums git repo can use it as a reference point for the dark art of git rebase. And there are some repos in /home/aryeh . . . and maybe others lurking elsewhere. One never knows.

The purpose of git is to be version control software. This records when all changes were made, so 1) we can figure out who made each change (hint: me), 2) we can figure out when and why a change was made (my memory isn't perfect, and I might be hit by a bus), and 3) it's easy to undo changes if they prove problematic (this happens a lot). Every time some files are changed, I try to remember to record a commit at least briefly describing the changes I made. Everyone else with shell access should ideally do this too.

The thread Using git on a production server contains a considerable amount of useful info. You have to create the file ~/.gitconfig with the proper info before using git (see that thread for a sample file). Some basic git commands are:

git log
This shows you a list of all commits, nicely paginated, with the most recent ones first. You can do git log -p to get the exact changes that each commit made (i.e., the diffs).
git diff
Lists what changes have been made, but not yet recorded in git. Normally this should be empty, since people are committing everything to git when they make a change, right? (Okay, kind of tricky if you only have FTP access, so I wind up committing that stuff.) This will ignore any newly-created files: those have to be explicitly committed before they'll show up.
git commit
Creates a new commit. You can do git commit -a to commit all the changes listed by git diff, or you can list exactly which files you want to commit, like git commit file1 file2 file3. (Actually you can commit parts of files too, using git add -i, but let's not go there.) Note that you have to run this command as the repository owner, so prefix it with "sudo" if the repo is owned by root, or "sudo -u www-data" if it's owned by www-data.

/etc is currently (November 2010) sort of mirrored, in an ad hoc fashion, between thor and mjollnir. If you make a change on thor that should also be on mjollnir, the current way to copy it is

  1. Log in on mjollnir and go to /etc.
  2. Do sudo git fetch to fetch the /etc info from thor. (This currently isn't set up to work in reverse, so make shared changes on thor first, not mjollnir.)
  3. For each commit you want to copy, do sudo git cherry-pick abcdef, where abcdef is the hexadecimal identifier for the commit. Start with the earlier commits and work your way up to later commits.
  4. You might get conflicts, like if you update /etc/aliases and commit the new aliases.db in one commit on thor. Either commit the new /etc/aliases and /etc/aliases.db in different commits and then only copy the aliases change on mjollnir (then run newaliases separately there); or else do sudo newaliases; sudo git add aliases.db; sudo git commit to resolve the conflict. (Todo: can we just use the same aliases.db on thor and mjollnir to avoid these conflicts?)

vBulletin

The one piece of software we use that's not free and open-source. Website is vbulletin.com. Not too much needs to be said here, because it's much easier to use than the rest of our software, with a web interface and everything. The interesting point is that our vBulletin copy has lots of extra files added, and lots of existing files hacked. You can see for yourself by doing git log in /var/www/forums. There are dozens of changes going back to 2008. When we upgrade, git will semi-magically transfer all of the changes to the new versions, if coaxed by suitable incantations. /usr/local/sbin/vbupgrade does this, but if there's a conflict, you'll have to know what you're doing to resolve it. (Or, alternatively, just get me to do the upgrade.) Minor version changes shouldn't cause conflicts, so any root should be able to do those by following the procedure. Although you'll still have to fix template conflicts.

Disk administration

This comes up a bunch, so I'll document some of this too.

mdadm: software RAID

mdadm can be used to control our RAID array. One basic useful command doesn't actually use mdadm at all: cat /proc/mdstat (sudo not required) tells you the current status of the RAID. Sample output:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md0 : active raid1 sde1[0] sdb1[1]
      9767424 blocks [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sde2[0] sdb2[1]
      478616448 blocks [2/2] [UU]
      bitmap: 7/29 pages [28KB], 8192KB chunk

md2 : active raid10 sdc3[0]
      292230144 blocks super 1.1 512K chunks 2 far-copies [2/1] [U_]
      bitmap: 3/3 pages [12KB], 65536KB chunk

unused devices: <none>

The RAID arrays are the things named "md0", "md1", and "md2". It tells you what type of RAID that array uses (here raid1 or raid10), and what disks are being used for each array (sde1, sdb1, sde2, etc.). The second line for each device contains some critical info, namely whether there are any missing devices. The [UU] in the first two arrays here tells us that the array is supposed to have two devices and they're both present. The [U_] in the last one says that it's supposed to have two devices, but one (represented by an underscore) is missing. This is bad, since it means that if the second disk fails, we'll lose data.

If RAID is resyncing, that file will contain information about how fast the resync is going and when it's expected to complete. In this case, watch cat /proc/mdstat is useful: it will display the file's contents on your screen and refresh it every few seconds, so the resync progress shows up in real time.

For actually using mdadm, the most important thing to know is how to handle a failed disk. First you need to figure out which array had a device failure. One disk failure might mean multiple device failures, if one disk is part of multiple arrays. For instance, in the above output, failure of sdb would mean that sdb1 and sdb2 would both fail, and you'd need to handle both failures separately.

Let's say that you check /proc/mdstat and find that sdg1 has failed in the array md7. It will have an "(F)" after its name on the first status line. The first thing you need to do is remove it: sudo mdadm /dev/md7 --remove failed. Then you can try re-adding it with sudo mdadm /dev/md7 --add /dev/sdg1. If you're lucky, this will work, and it will start resyncing. If not, probably sdg will have to be physically replaced, and then the "--add" command will have to be run on its successor, after that's partitioned.

For more info on mdadm, as always, check the man page. There are a ton of options, and there's no room to explain them all here.

LVM: logical volume management

Most of our filesystems don't lie directly on top of a RAID array. Instead, we have the RAID array on top of a disk partition, then on top of the RAID array we have an LVM "volume group" (VG). Within the volume group, we have "logical volumes" (LVs), which we then put filesystems on. The advantage here is that we can expand a logical volume and the filesystem on top of it without repartitioning. Also, a volume group can have multiple "physical volumes" (PVs), i.e., it can be placed on top of a bunch of devices. Logical volumes can be moved between the devices while they're in use, so we don't need to shut down the server to move data around when we change our disk configuration. There are other uses of LVM too, like snapshots, although we don't use those right now.

Three useful commands (must be run as root) are vgdisplay, pvdisplay, and lvdisplay, to show info about current VGs, PVs, and LVs respectively:

$ sudo vgdisplay
  --- Volume group ---
  VG Name               LVM
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  65
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                6
  Open LV               6
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               735.13 GiB
  PE Size               4.00 MiB
  Total PE              188194
  Alloc PE / Size       122112 / 477.00 GiB
  Free  PE / Size       66082 / 258.13 GiB
  VG UUID               dDcYBl-qdYn-6TJ3-8MwN-AfYH-l0SD-9GamfZ
   
0 11:46:25 ~$ sudo pvdisplay
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               LVM
  PV Size               456.44 GiB / not usable 2.88 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              116849
  Free PE               45169
  Allocated PE          71680
  PV UUID               WKOVus-9kwU-8wpz-W90s-pmt1-gJAR-wrAw1Y
   
  --- Physical volume ---
  PV Name               /dev/md2
  VG Name               LVM
  PV Size               278.69 GiB / not usable 1.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              71345
  Free PE               20913
  Allocated PE          50432
  PV UUID               4ALWTX-serI-HoHN-tzUk-jy9H-4rk4-cvXMXr
   
0 11:46:29 ~$ sudo lvdisplay
  --- Logical volume ---
  LV Name                /dev/LVM/www
  VG Name                LVM
  LV UUID                C1i8qg-Isi2-o35Q-2OTd-2i21-vrz9-7xwXbp
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                5.00 GiB
  Current LE             1280
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           251:0
   
  --- Logical volume ---
  LV Name                /dev/LVM/userfiles
  VG Name                LVM
  LV UUID                PYeTEP-rEj2-x2kF-anh7-AMUL-3Wkw-m5RDnO
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                95.00 GiB
  Current LE             24320
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:2
   
  --- Logical volume ---
  LV Name                /dev/LVM/log
  VG Name                LVM
  LV UUID                sPBgWF-l5Xl-GoJW-ys8h-WXEA-Ef9k-0ZG1cN
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100.00 GiB
  Current LE             25600
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:3
   
  --- Logical volume ---
  LV Name                /dev/LVM/mysql
  VG Name                LVM
  LV UUID                LnDgLQ-9rLv-FJxc-k3pd-ngxK-xWfX-acSc1P
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                80.00 GiB
  Current LE             20480
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           251:1
   
  --- Logical volume ---
  LV Name                /dev/LVM/backup
  VG Name                LVM
  LV UUID                RcfHSo-ftce-OVx5-7QTZ-Wdtv-owWX-xxLIBm
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                182.00 GiB
  Current LE             46592
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           251:4
   
  --- Logical volume ---
  LV Name                /dev/LVM/sphinx
  VG Name                LVM
  LV UUID                Wjukbc-BxSe-swqy-wOBd-n2Yg-z8tX-oxz7e2
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                15.00 GiB
  Current LE             3840
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           251:5

Most of that is probably incomprehensible, but some important parts (like "Size") should make some sense. Notice that we have only one volume group, called "LVM", and all PVs are part of it. This lets us freely move LVs between them.

The command to extend a logical volume is sudo lvextend -L+XXG /dev/LVM/something, where XX is the number of gigabytes you want to extend it by, and something is the LV name. After you do this, you'll have to do sudo resize2fs /dev/LVM/something to grow the filesystem itself. Be parsimonious here: you can grow an ext4 filesystem without downtime, but you can't shrink it without unmounting it. We don't want to use up the free space in the volume group and then have to take the site down for half an hour to free space up. (This is why /dev/LVM/log is 100 gigabytes: it was grown too much because access logs were using up a lot of space at that particular moment, and now it can't be shrunk without a reboot, since programs will want to write log files as long as the system is running.)

There are lots of other LVM commands that do many various things. Check man lvm for more thorough documentation.

Other software

Most of the rest of our software doesn't have much to discuss. We have no software installed from source right now (not counting web apps like vB), it's all packaged. I wrote some scripts in /usr/local/sbin:

$ ls /usr/local/sbin
7zdl           bwsumm-dl      dlswitch    profile-proc  vbupgrade
bwsumm         bwsumm-totals  groupedsum  reindex.sh
bwsumm-attach  dbbackup.sh    lockupmon   rotate.sh

Actually, a few of those are originally based on scripts I got from elsewhere, namely reindex.sh, rotate.sh, and dbbackup.sh.