Difference between revisions of "TWC:Sysadmin info"
Simetrical (talk | contribs) (→MySQL: It's best practice not to type passwords on the command line, because hypothetically anyone could read them by using "ps" or such (although the utility might change its command name to avoid this).) |
m (typos) |
||
(3 intermediate revisions by one other user not shown) | |||
Line 69: | Line 69: | ||
mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1 | mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1 | ||
− | To backup particular tables in the DB (often needed or helpful if upgrading | + | To backup particular tables in the DB (often needed or helpful if upgrading a vb plugin) use the following command: |
− | $ mysqldump --add-drop-table -u ''username'' -p ''database_name'' ''table1'' [''table2 table3 . . . ''] > ''outputfile''.sql | + | $ mysqldump --add-drop-table --single-transaction -u ''username'' -p ''database_name'' ''table1'' [''table2 table3 . . . ''] > ''outputfile''.sql |
− | To find the relevant db and login information see /var/www/forums/includes/config.php on the server. This requires root | + | To find the relevant db and login information for the forums, see /var/www/forums/includes/config.php on the server. This requires www-data access. |
+ | |||
+ | === Loading a backup === | ||
+ | |||
+ | Database backups are stored in /var/local/backup/db/, which should be readable to everyone in the www-data group (i.e., everyone with shell access). The backups contain a copy of the whole database, not just totalwar_vb, and loading them will wipe out all existing MySQL data, so only do this if you're sure you know it's what you want to do. There are two likely cases: either you're reloading the passive master from the active master, or both servers' databases were so horribly corrupted that you have no choice but to restore from backup. If you're restoring to the passive master's database, you need to be root, since the passive master is read-only to non-root users. | ||
+ | |||
+ | First copy the backup file someplace. If you're copying a backup from one server to the other, it's best to copy the whole file first instead of piping it so that it won't fail midway if the network connection fails for some reason (although that's somewhat unlikely for a server one hop away). scp works for copying the file. I'll assume the file is in the current working directory and is named data.sql.gz. | ||
+ | |||
+ | Next, stop the slave ''on both servers'' with | ||
+ | |||
+ | sudo bash -c 'HOME=/root mysql -e "STOP SLAVE"' | ||
+ | |||
+ | The funny incantation ensures that your credentials are loaded from root's home directory, not your own, so you're running the command as the root MySQL user. <big>'''You should never ever run commands as the root MySQL user, because that means you have the SUPER privilege and can write to the passive master, which will copy the writes to the active master and horribly corrupt everything. Never ever run any command as MySQL root unless you first shut down the slaves on ''both'' servers. Only run mysql as your own user using the credentials from your own ~/.my.cnf file, never run it as root, except if you've stopped the slave on both servers.'''</big> | ||
+ | |||
+ | Now run this command to be double super sure you stopped both slaves: | ||
+ | |||
+ | for SERVER in thor.twcenter.net mjollnir.twcenter.net; do sudo ssh $SERVER 'HOME=/root mysql -te "SHOW SLAVE STATUS"'; done | ||
+ | |||
+ | This will produce a whole mess of output because it wraps a lot, but carefully inspect it to make sure that Slave_IO_Running and Slave_SQL_Running are both "No" in ''both'' of the two output tables (you might have to scroll up a bunch to get to the earlier one). | ||
+ | |||
+ | Now do this, '''which will completely destroy the database on the ''other'' server as well as the current one if you didn't stop both slaves''': | ||
+ | |||
+ | pv data.sql.gz | gunzip | sudo bash -c 'HOME=/root mysql' | ||
+ | |||
+ | This will start loading the database. It will take a few hours. pv prints out a handy progress bar, but its time estimate is going to undershoot, because it takes MySQL longer to copy data as it copies more data (inserts take more than linear time). In March 2011, loading a backup on mjollnir when it was otherwise idle took two hours and 14 minutes. | ||
+ | |||
+ | If everything has finished with no errors, you want to reset the slaves. If you were reloading the database due to total corruption of both servers, well, ideally you should be replaying the appropriate binary log at this point, but I'm not going to write detailed instructions for all this because it's not what I'm doing right now as I'm writing this tutorial, and I don't want to write down hypotheticals, and I hope this never happens again anyway and that if it does I'm personally available to fix it or at least help. | ||
+ | |||
+ | What I'm doing right now is reloading the passive master from the active master, so the instructions for that follow. You want to set the passive master's slave to point at whatever point in time this backup was taken. This will not work if the backup was one you took yourself instead of taking it from /var/local/backup/db, unless you used the right mysqldump options, which /usr/local/sbin/dbbackup.sh does. If the dump does have the info, take a look with | ||
+ | |||
+ | zcat data.sql.gz | less | ||
+ | |||
+ | Near the top you'll find something like: | ||
+ | |||
+ | -- | ||
+ | -- Position to start replication or point-in-time recovery from | ||
+ | -- | ||
+ | |||
+ | -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.001573', MASTER_LOG_POS=106; | ||
+ | |||
+ | That "CHANGE MASTER" statement is what you want to run. On the passive master (the one you just loaded the dump on), run the following: | ||
+ | |||
+ | sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO...\"" | ||
+ | |||
+ | with the appropriate CHANGE MASTER statement inserted, as copied out of the backup file you used. | ||
+ | |||
+ | Then you need to reset the active master's slave position. (This would theoretically be unnecessary if we had turned off binary logging for the database reload, but I'm nervous so I'd have done this anyway.) Since the passive master is read-only, it won't be running any statements except ones from the active master, which the active master will ignore when they bounce back to it, so it doesn't really matter what exact position the active master is set to. On the ''passive'' master (where you loaded the backup), run | ||
+ | |||
+ | mysql -te 'SHOW MASTER STATUS' | ||
+ | |||
+ | and then on the ''active'' master (which is the one hopefully running the actual site right now), run | ||
+ | |||
+ | sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...\"" | ||
+ | |||
+ | where you fill in MASTER_LOG_FILE and MASTER_LOG_POS values from the "File" and "Position" of the SHOW MASTER STATUS you just ran. | ||
+ | |||
+ | Now both slaves should be set to point to the right place. Triple-check everything, then run this on both servers: | ||
+ | |||
+ | sudo bash -c "HOME=/root mysql -e 'START SLAVE'" | ||
+ | |||
+ | This will completely destroy both databases if you messed up any of the above steps, but I hope I've scared you enough by now that I don't need to use bold text when I say that. | ||
==FastCGI== | ==FastCGI== |
Latest revision as of 09:15, 9 June 2011
I was bored and decided to put together this page of handy info for our active sysadmins, currently GED and me. I (Simetrical) started this page on September 20, 2008, and I cynically speculate that it will get updated approximately never after that, so I've tried to keep things relatively nonspecific as to exact versions and stuff, and to provide verification procedures where possible. But take it all with a grain of salt.
If you don't have at least shell access to the TWC server, this page is almost certainly useless to you. If you at least know what shell access is, you're probably capable of understanding at least some of it. Maybe you'll find it interesting, good for you.
Rewritten on January 24, 2010. (It indeed was updated approximately never.)
Updated and expanded on June 27, 2010. (This is getting to be slightly more often than never.)
Contents
Hardware
We currently have one dedicated server, known as thor. We own all of the hardware, bought with hard-earned pennies mostly donated by the membership in a 2009 donation drive. I've provided the command necessary to get info about the component in parentheses, where applicable.
- TYAN B4980G24V4H 1U Barebone Server NVIDIA nForce Professional 3600 Quad 1207(F) Four AMD Opteron 1.0GHz Hyper Transport FSB - Retail
- Four Quad-Core AMD Opteron(tm) Processor 8346 HE, 1.8 GHz (
cat /proc/cpuinfo
) - 16 GB RAM (
free -m
) - Two 500 GB WDC WD5000AACS-0 (apparently this?) and two 300 GB WD3000HLFS-0 VelociRaptors for disks (
scsiadd -p
), in Linux software RAID (cat /proc/mdstat
).
Our bandwidth is $50/Mbps/month at the 95th percentile, and we're paying for 10 Mbps uncapped. In principle, if we use more than 10 Mbps at the 95th percentile, we get charged more than $500 for the month.
Linux
The "L" in "LAMP". We use Linux because, as Simetrical will tell you, it is both technically and morally superior to Windows in every conceivable way. (Some people who currently have root access might hold different opinions.) In fact, the old server (loki) ran Linux, and it was all Simetrical knew how to administer when he was picking out the new server (odin), so it was a fairly pragmatic choice even if he wasn't a penguin-hugging open-source hippie. The same logic went for the new new server (thor), since although GrnEyedDvl was around by that point, Simetrical was the one familiar with the existing setup.
To be more precise, we use Ubuntu, mainly because it has a huge and up-to-date package repository, and also Simetrical happens to be familiar with it because he uses it at home. Both of the old servers were RHEL5, and we have no regrets about switching to Ubuntu. The output of lsb_release -a
is currently
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 10.04 LTS Release: 10.04 Codename: lucid
We were originally on 8.04 LTS (Hardy), but switched to 8.10 to fix kernel problems. Later we upgraded to 10.04 LTS, although under rather unfortunate circumstances. We kind of had to upgrade because support on the non-LTS version 8.10 expired, but a major goal was also to improve disk performance by switching from ext3 to ext4. Ironically, I/O on the new OS version was slower.
lighttpd
The "A" in "LAMP". Yes, this is fairly weird, alphabetically speaking, but nobody wants to try to pronounce "LLMP". We used to use Apache, but it used way too much memory. We switched in March 2008, saved 1.5G of RAM, got perceptibly faster page load times, and never looked back. Fiddling around with random stuff for fun can pay off sometimes.
lighttpd's config file is at /etc/lighttpd/lighttpd.conf
. Documentation is at the lighttpd website (config file docs are the most useful). The binary is at /usr/sbin/lighttpd
. lighttpd runs as user www-data. Access and error logs are in /var/log/lighttpd/
, automagically rotated by logrotate. A handy web-based server status thing is available in the secret stuff thread in the Tech Cathedral, for those with access there (it gives IP addresses of all current connections, so not fit for public consumption).
To restart lighttpd in our current setup, for instance due to new lighttpd.conf
or php.ini
, use this:
sudo service lighttpd restart
Actually, replace "lighttpd" with the service name (mysql, sphinxsearch, memcached, . . .) and that's basically how all service restarts work.
Restarting lighttpd usually takes a few seconds, so no perceptible downtime for most users, but it will shut down any active connections, so anyone doing a big upload/download will get an error or bogus file or something. So don't get too trigger-happy. If you're feeling really kind, you could check the server status and see if anyone's downloading anything large, but personally I don't bother. Restarts of lighttpd are fairly rare anyway.
lighttpd usually uses a few percent CPU and a few hundred megs of memory, in my experience. It runs on a single thread and doesn't run any scripts itself, so this is pretty reasonable. I wonder what the few hundred megs are for, actually, but thinking back to Apache with mod_php I'm not even going to bother looking into it.
The version is currently:
$ lighttpd -v lighttpd/1.4.26 (ssl) - a light and fast webserver Build-Date: Apr 6 2010 11:42:30
We previously compiled the latest version from source to take advantage of better handling of out-of-FastCGI errors, but then we upgraded to Lucid and this became unnecessary.
MySQL
This is the "M" in LAMP. MySQL is nice in some ways, like the server process is very resilient and tends not to ever crash or go into hysterics, or if it does it automatically restarts. In terms of features, such as the ability to optimize queries or use indexes in a better than semi-retarded fashion, it's probably worse than PostgreSQL. But, vBulletin only supports MySQL, like lots of other web apps, so that's what we use.
Wherever possible, we use InnoDB (robust, sophisticated, ACID-compliant, high update concurrency) instead of MyISAM (fragile, simple, non-transactional, low update concurrency, default storage engine). "Wherever possible" means "everywhere except totalwar_vb.phrase, which is a single giant row and InnoDB doesn't like that, but that like never changes anyway so who cares". InnoDB provides lots of the lovely features that users of other database engines assume exist, like transactions, granular locking, and non-blocking reads. Unfortunately vBulletin is written with MyISAM in mind and doesn't actually use, for instance, transactions, but it benefits from the high concurrency anyway. (MyISAM has only table-level locks, so updates are serialized with all other queries, which becomes hellish if you have long-running selects.)
MySQL's config is in /etc/mysql
(primary config file: /etc/mysql/my.cnf
). The error logging goes to /var/log/mysql/error.log
. The actual databases are in /var/lib/mysql
, which is on its own logical volume in LVM.
To restart MySQL, you just need to do sudo service mysql restart
. This is only necessary on config file changes, and even then it's usually not necessary, you can change most settings live. Restarting MySQL takes several minutes, during which all users will get database errors and no pages requiring the database will load. InnoDB must write all changes from the transaction log to the actual data and index files before it shuts down, and this can take quite a while. Moreover, the site can be sluggish for as much as a couple of hours after a MySQL restart, because its caches will be cleared and have to repopulate.
MySQL, being a database, loves RAM, loves it very, very much. The InnoDB buffer pool (where InnoDB caches data and index pages, being a good DB engine and not trusting the fickle OS for such a sensitive task) is 5G (grep innodb_buffer_pool_size /etc/mysql/my.cnf
). Miscellaneous other stuff means that MySQL usually uses around 5.5G or 6G. It will usually a bit of CPU too, although not much, and of course lots and lots of disk space.
We use the packaged version of MySQL, to wit:
$ mysql --version mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1
To backup particular tables in the DB (often needed or helpful if upgrading a vb plugin) use the following command:
$ mysqldump --add-drop-table --single-transaction -u username -p database_name table1 [table2 table3 . . . ] > outputfile.sql
To find the relevant db and login information for the forums, see /var/www/forums/includes/config.php on the server. This requires www-data access.
Loading a backup
Database backups are stored in /var/local/backup/db/, which should be readable to everyone in the www-data group (i.e., everyone with shell access). The backups contain a copy of the whole database, not just totalwar_vb, and loading them will wipe out all existing MySQL data, so only do this if you're sure you know it's what you want to do. There are two likely cases: either you're reloading the passive master from the active master, or both servers' databases were so horribly corrupted that you have no choice but to restore from backup. If you're restoring to the passive master's database, you need to be root, since the passive master is read-only to non-root users.
First copy the backup file someplace. If you're copying a backup from one server to the other, it's best to copy the whole file first instead of piping it so that it won't fail midway if the network connection fails for some reason (although that's somewhat unlikely for a server one hop away). scp works for copying the file. I'll assume the file is in the current working directory and is named data.sql.gz.
Next, stop the slave on both servers with
sudo bash -c 'HOME=/root mysql -e "STOP SLAVE"'
The funny incantation ensures that your credentials are loaded from root's home directory, not your own, so you're running the command as the root MySQL user. You should never ever run commands as the root MySQL user, because that means you have the SUPER privilege and can write to the passive master, which will copy the writes to the active master and horribly corrupt everything. Never ever run any command as MySQL root unless you first shut down the slaves on both servers. Only run mysql as your own user using the credentials from your own ~/.my.cnf file, never run it as root, except if you've stopped the slave on both servers.
Now run this command to be double super sure you stopped both slaves:
for SERVER in thor.twcenter.net mjollnir.twcenter.net; do sudo ssh $SERVER 'HOME=/root mysql -te "SHOW SLAVE STATUS"'; done
This will produce a whole mess of output because it wraps a lot, but carefully inspect it to make sure that Slave_IO_Running and Slave_SQL_Running are both "No" in both of the two output tables (you might have to scroll up a bunch to get to the earlier one).
Now do this, which will completely destroy the database on the other server as well as the current one if you didn't stop both slaves:
pv data.sql.gz | gunzip | sudo bash -c 'HOME=/root mysql'
This will start loading the database. It will take a few hours. pv prints out a handy progress bar, but its time estimate is going to undershoot, because it takes MySQL longer to copy data as it copies more data (inserts take more than linear time). In March 2011, loading a backup on mjollnir when it was otherwise idle took two hours and 14 minutes.
If everything has finished with no errors, you want to reset the slaves. If you were reloading the database due to total corruption of both servers, well, ideally you should be replaying the appropriate binary log at this point, but I'm not going to write detailed instructions for all this because it's not what I'm doing right now as I'm writing this tutorial, and I don't want to write down hypotheticals, and I hope this never happens again anyway and that if it does I'm personally available to fix it or at least help.
What I'm doing right now is reloading the passive master from the active master, so the instructions for that follow. You want to set the passive master's slave to point at whatever point in time this backup was taken. This will not work if the backup was one you took yourself instead of taking it from /var/local/backup/db, unless you used the right mysqldump options, which /usr/local/sbin/dbbackup.sh does. If the dump does have the info, take a look with
zcat data.sql.gz | less
Near the top you'll find something like:
-- -- Position to start replication or point-in-time recovery from -- -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.001573', MASTER_LOG_POS=106;
That "CHANGE MASTER" statement is what you want to run. On the passive master (the one you just loaded the dump on), run the following:
sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO...\""
with the appropriate CHANGE MASTER statement inserted, as copied out of the backup file you used.
Then you need to reset the active master's slave position. (This would theoretically be unnecessary if we had turned off binary logging for the database reload, but I'm nervous so I'd have done this anyway.) Since the passive master is read-only, it won't be running any statements except ones from the active master, which the active master will ignore when they bounce back to it, so it doesn't really matter what exact position the active master is set to. On the passive master (where you loaded the backup), run
mysql -te 'SHOW MASTER STATUS'
and then on the active master (which is the one hopefully running the actual site right now), run
sudo bash -c "HOME=/root mysql -e \"CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...\""
where you fill in MASTER_LOG_FILE and MASTER_LOG_POS values from the "File" and "Position" of the SHOW MASTER STATUS you just ran.
Now both slaves should be set to point to the right place. Triple-check everything, then run this on both servers:
sudo bash -c "HOME=/root mysql -e 'START SLAVE'"
This will completely destroy both databases if you messed up any of the above steps, but I hope I've scared you enough by now that I don't need to use bold text when I say that.
FastCGI
This is the "P" in LAMP. Unlike with lighttpd above, this isn't due to creative spelling. "P" stands for "PHP", and we run PHP using FastCGI. FastCGI is basically a bunch of daemons that hang around, which lighttpd asks to execute PHP scripts for it. In Apache this is typically done with mod_php instead, so Apache executes the scripts itself, but this is a terrible idea for a general-purpose web server.
I don't actually have much of any idea how FastCGI works. I just configured lighttpd to use FastCGI and it handles spawning the processes itself. To restart FastCGI, I just use the command above to restart lighttpd. I mentally categorize FastCGI with lighttpd as "the web server". So this is mostly voodoo magic to me.
The processes are /usr/bin/php-cgi
. They run as the www-data user right now, since lighttpd spawns them after it setuid()s. PHP is configured in /etc/php5/cgi/php.ini
. It logs warnings and errors to the syslog, /var/log/syslog
. FastCGI itself I don't know how to configure, I just set all the relevant stuff (like number of processes) in lighttpd's config file. Some info on APC status (which caches PHP files) is in the secret stuff thread in the Tech Cathedral, since it allows viewing and even changing the values of the cached stuff.
FastCGI uses a crazy lot of CPU and memory. Currently it uses maybe 8G of memory and more than half our CPU at peak. There's some low-hanging fruit to be had in the CPU department, which I hope to pursue when feasible.
We use the packaged PHP version:
$ php --version PHP 5.3.2-1ubuntu4.2 with Suhosin-Patch (cli) (built: May 13 2010 20:03:45) Copyright (c) 1997-2009 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies
memcached
We started using memcached a little while ago, instead of XCache/APC cache. Actually we still use APC cache for one thing because we're lazy. Not much to say about it.
git
We have some git repositories lying around in various places: /etc, /usr/local/sbin, /var/www, /var/www/forums, and /var/www/fpss, for instance. The first two are owned by root, the second two are owned by www-data. The first contains site config info like the password file, so is secret; the last two contain copyrighted code, so are also secret. /usr/local/sbin and /var/www can be viewed via gitweb, so you can track all the exciting changes I make. We also have a repository in /usr/local/src/vbulletin that I move new versions into, so that the /var/www/forums git repo can use it as a reference point for the dark art of git rebase
. And there are some repos in /home/aryeh . . . and maybe others lurking elsewhere. One never knows.
The purpose of git is to be version control software. This records when all changes were made, so 1) we can figure out who made each change (hint: me), 2) we can figure out when and why a change was made (my memory isn't perfect, and I might be hit by a bus), and 3) it's easy to undo changes if they prove problematic (this happens a lot). Every time some files are changed, I try to remember to record a commit at least briefly describing the changes I made. Everyone else with shell access should ideally do this too.
The thread Using git on a production server contains a considerable amount of useful info. You have to create the file ~/.gitconfig with the proper info before using git (see that thread for a sample file). Some basic git commands are:
git log
- This shows you a list of all commits, nicely paginated, with the most recent ones first. You can do
git log -p
to get the exact changes that each commit made (i.e., the diffs). git diff
- Lists what changes have been made, but not yet recorded in git. Normally this should be empty, since people are committing everything to git when they make a change, right? (Okay, kind of tricky if you only have FTP access, so I wind up committing that stuff.) This will ignore any newly-created files: those have to be explicitly committed before they'll show up.
git commit
- Creates a new commit. You can do
git commit -a
to commit all the changes listed bygit diff
, or you can list exactly which files you want to commit, likegit commit file1 file2 file3
. (Actually you can commit parts of files too, usinggit add -i
, but let's not go there.) Note that you have to run this command as the repository owner, so prefix it with "sudo" if the repo is owned by root, or "sudo -u www-data" if it's owned by www-data.
/etc is currently (November 2010) sort of mirrored, in an ad hoc fashion, between thor and mjollnir. If you make a change on thor that should also be on mjollnir, the current way to copy it is
- Log in on mjollnir and go to /etc.
- Do
sudo git fetch
to fetch the /etc info from thor. (This currently isn't set up to work in reverse, so make shared changes on thor first, not mjollnir.) - For each commit you want to copy, do
sudo git cherry-pick abcdef
, whereabcdef
is the hexadecimal identifier for the commit. Start with the earlier commits and work your way up to later commits. - You might get conflicts, like if you update /etc/aliases and commit the new aliases.db in one commit on thor. Either commit the new /etc/aliases and /etc/aliases.db in different commits and then only copy the aliases change on mjollnir (then run newaliases separately there); or else do
sudo newaliases; sudo git add aliases.db; sudo git commit
to resolve the conflict. (Todo: can we just use the same aliases.db on thor and mjollnir to avoid these conflicts?)
vBulletin
The one piece of software we use that's not free and open-source. Website is vbulletin.com. Not too much needs to be said here, because it's much easier to use than the rest of our software, with a web interface and everything. The interesting point is that our vBulletin copy has lots of extra files added, and lots of existing files hacked. You can see for yourself by doing git log
in /var/www/forums. There are dozens of changes going back to 2008. When we upgrade, git will semi-magically transfer all of the changes to the new versions, if coaxed by suitable incantations. /usr/local/sbin/vbupgrade
does this, but if there's a conflict, you'll have to know what you're doing to resolve it. (Or, alternatively, just get me to do the upgrade.) Minor version changes shouldn't cause conflicts, so any root should be able to do those by following the procedure. Although you'll still have to fix template conflicts.
Disk administration
This comes up a bunch, so I'll document some of this too.
mdadm: software RAID
mdadm can be used to control our RAID array. One basic useful command doesn't actually use mdadm at all: cat /proc/mdstat
(sudo not required) tells you the current status of the RAID. Sample output:
$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md0 : active raid1 sde1[0] sdb1[1] 9767424 blocks [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md1 : active raid1 sde2[0] sdb2[1] 478616448 blocks [2/2] [UU] bitmap: 7/29 pages [28KB], 8192KB chunk md2 : active raid10 sdc3[0] 292230144 blocks super 1.1 512K chunks 2 far-copies [2/1] [U_] bitmap: 3/3 pages [12KB], 65536KB chunk unused devices: <none>
The RAID arrays are the things named "md0", "md1", and "md2". It tells you what type of RAID that array uses (here raid1 or raid10), and what disks are being used for each array (sde1, sdb1, sde2, etc.). The second line for each device contains some critical info, namely whether there are any missing devices. The [UU] in the first two arrays here tells us that the array is supposed to have two devices and they're both present. The [U_] in the last one says that it's supposed to have two devices, but one (represented by an underscore) is missing. This is bad, since it means that if the second disk fails, we'll lose data.
If RAID is resyncing, that file will contain information about how fast the resync is going and when it's expected to complete. In this case, watch cat /proc/mdstat
is useful: it will display the file's contents on your screen and refresh it every few seconds, so the resync progress shows up in real time.
For actually using mdadm
, the most important thing to know is how to handle a failed disk. First you need to figure out which array had a device failure. One disk failure might mean multiple device failures, if one disk is part of multiple arrays. For instance, in the above output, failure of sdb would mean that sdb1 and sdb2 would both fail, and you'd need to handle both failures separately.
Let's say that you check /proc/mdstat and find that sdg1 has failed in the array md7. It will have an "(F)" after its name on the first status line. The first thing you need to do is remove it: sudo mdadm /dev/md7 --remove failed
. Then you can try re-adding it with sudo mdadm /dev/md7 --add /dev/sdg1
. If you're lucky, this will work, and it will start resyncing. If not, probably sdg will have to be physically replaced, and then the "--add" command will have to be run on its successor, after that's partitioned.
For more info on mdadm, as always, check the man page. There are a ton of options, and there's no room to explain them all here.
LVM: logical volume management
Most of our filesystems don't lie directly on top of a RAID array. Instead, we have the RAID array on top of a disk partition, then on top of the RAID array we have an LVM "volume group" (VG). Within the volume group, we have "logical volumes" (LVs), which we then put filesystems on. The advantage here is that we can expand a logical volume and the filesystem on top of it without repartitioning. Also, a volume group can have multiple "physical volumes" (PVs), i.e., it can be placed on top of a bunch of devices. Logical volumes can be moved between the devices while they're in use, so we don't need to shut down the server to move data around when we change our disk configuration. There are other uses of LVM too, like snapshots, although we don't use those right now.
Three useful commands (must be run as root) are vgdisplay
, pvdisplay
, and lvdisplay
, to show info about current VGs, PVs, and LVs respectively:
$ sudo vgdisplay --- Volume group --- VG Name LVM System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 65 VG Access read/write VG Status resizable MAX LV 0 Cur LV 6 Open LV 6 Max PV 0 Cur PV 2 Act PV 2 VG Size 735.13 GiB PE Size 4.00 MiB Total PE 188194 Alloc PE / Size 122112 / 477.00 GiB Free PE / Size 66082 / 258.13 GiB VG UUID dDcYBl-qdYn-6TJ3-8MwN-AfYH-l0SD-9GamfZ 0 11:46:25 ~$ sudo pvdisplay --- Physical volume --- PV Name /dev/md1 VG Name LVM PV Size 456.44 GiB / not usable 2.88 MiB Allocatable yes PE Size 4.00 MiB Total PE 116849 Free PE 45169 Allocated PE 71680 PV UUID WKOVus-9kwU-8wpz-W90s-pmt1-gJAR-wrAw1Y --- Physical volume --- PV Name /dev/md2 VG Name LVM PV Size 278.69 GiB / not usable 1.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 71345 Free PE 20913 Allocated PE 50432 PV UUID 4ALWTX-serI-HoHN-tzUk-jy9H-4rk4-cvXMXr 0 11:46:29 ~$ sudo lvdisplay --- Logical volume --- LV Name /dev/LVM/www VG Name LVM LV UUID C1i8qg-Isi2-o35Q-2OTd-2i21-vrz9-7xwXbp LV Write Access read/write LV Status available # open 1 LV Size 5.00 GiB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 251:0 --- Logical volume --- LV Name /dev/LVM/userfiles VG Name LVM LV UUID PYeTEP-rEj2-x2kF-anh7-AMUL-3Wkw-m5RDnO LV Write Access read/write LV Status available # open 1 LV Size 95.00 GiB Current LE 24320 Segments 3 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 251:2 --- Logical volume --- LV Name /dev/LVM/log VG Name LVM LV UUID sPBgWF-l5Xl-GoJW-ys8h-WXEA-Ef9k-0ZG1cN LV Write Access read/write LV Status available # open 1 LV Size 100.00 GiB Current LE 25600 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 251:3 --- Logical volume --- LV Name /dev/LVM/mysql VG Name LVM LV UUID LnDgLQ-9rLv-FJxc-k3pd-ngxK-xWfX-acSc1P LV Write Access read/write LV Status available # open 1 LV Size 80.00 GiB Current LE 20480 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 251:1 --- Logical volume --- LV Name /dev/LVM/backup VG Name LVM LV UUID RcfHSo-ftce-OVx5-7QTZ-Wdtv-owWX-xxLIBm LV Write Access read/write LV Status available # open 1 LV Size 182.00 GiB Current LE 46592 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 251:4 --- Logical volume --- LV Name /dev/LVM/sphinx VG Name LVM LV UUID Wjukbc-BxSe-swqy-wOBd-n2Yg-z8tX-oxz7e2 LV Write Access read/write LV Status available # open 1 LV Size 15.00 GiB Current LE 3840 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 251:5
Most of that is probably incomprehensible, but some important parts (like "Size") should make some sense. Notice that we have only one volume group, called "LVM", and all PVs are part of it. This lets us freely move LVs between them.
The command to extend a logical volume is sudo lvextend -L+XXG /dev/LVM/something
, where XX is the number of gigabytes you want to extend it by, and something is the LV name. After you do this, you'll have to do sudo resize2fs /dev/LVM/something
to grow the filesystem itself. Be parsimonious here: you can grow an ext4 filesystem without downtime, but you can't shrink it without unmounting it. We don't want to use up the free space in the volume group and then have to take the site down for half an hour to free space up. (This is why /dev/LVM/log is 100 gigabytes: it was grown too much because access logs were using up a lot of space at that particular moment, and now it can't be shrunk without a reboot, since programs will want to write log files as long as the system is running.)
There are lots of other LVM commands that do many various things. Check man lvm
for more thorough documentation.
Other software
Most of the rest of our software doesn't have much to discuss. We have no software installed from source right now (not counting web apps like vB), it's all packaged. I wrote some scripts in /usr/local/sbin:
$ ls /usr/local/sbin 7zdl bwsumm-dl dlswitch profile-proc vbupgrade bwsumm bwsumm-totals groupedsum reindex.sh bwsumm-attach dbbackup.sh lockupmon rotate.sh
Actually, a few of those are originally based on scripts I got from elsewhere, namely reindex.sh, rotate.sh, and dbbackup.sh.