Production Environments

%myvars; ]> Production Environments by Joel Aufrecht Starting and Stopping an OpenACS instance. The simplest way to start and stop and OpenACS site is to run the startup shell script provided, /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/daemontools/run. This runs as a regular task, and logs to the logfile. To stop the site, kill the script. A more stable way to run OpenACS is with a "keepalive" mechanism of some sort, so that whenever the server halts or is stopped for a reset, it restarts automatically. This is recommended for development and production servers. The Reference Platform uses Daemontools to control AOLserver. A simpler method, using init, is here. Daemontools must already be installed. If not, install it. Each service controlled by daemontools must have a directory in /service. That directory must have a file called run. It works like this: The init program starts every time the computer is booted. A line in init's configuration file, /etc/inittab, tells init to run, and to restart if necessary, svscanboot. svscanboot checks the directory /service every few seconds. If it sees a subdirectory there, it looks for a file in the subdirectory called run. If it finds a run file, it creates a supervise process supervise executes the run script. Whenever the run script stops, supervise executes it again. It also creates additional control files in the same directory. Hence, the AOLserver instance for your development server is started by the file /service/$OPENACS_SERVICE_NAME/run. But we use a symlink to make it easier to add and remove stuff from the /service, so the actual location is /var/lib/aolserver/$OPENACS_SERVICE_NAMEetc/daemontools/run. Daemontools creates additional files and directories to track status and log. A daemontools directory is included in the OpenACS tarball at /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/daemontools. To use it, first ill any existing AOLserver instances. As root, link the daemontools directory into the /service directory. Daemontools' svscan process checks this directory every five seconds, and will quickly execute run. [$OPENACS_SERVICE_NAME etc]$ killall nsd nsd: no process killed [$OPENACS_SERVICE_NAME etc]$ emacs /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/daemontools/run [$OPENACS_SERVICE_NAME etc]$ exit [root root]# ln -s /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/daemontools/ /service/$OPENACS_SERVICE_NAME Verify that AOLserver is running. [root root]# ps -auxw | grep nsd $OPENACS_SERVICE_NAME 5562 14.4 6.2 22436 15952 ? S 11:55 0:04 /usr/local/aolserver/bin/nsd -it /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/config.tcl -u serve root 5582 0.0 0.2 3276 628 pts/0 S 11:55 0:00 grep nsd [root root]# The user $OPENACS_SERVICE_NAME can now control the service $OPENACS_SERVICE_NAME with these commands: svc -d /service/$OPENACS_SERVICE_NAME - Bring the server down svc -u /service/$OPENACS_SERVICE_NAME - Start the server up and leave it in keepalive mode. svc -o /service/$OPENACS_SERVICE_NAME - Start the server up once. Do not restart it if it stops. svc -t /service/$OPENACS_SERVICE_NAME - Stop and immediately restart the server. svc -k /service/$OPENACS_SERVICE_NAME - Sends the server a KILL signal. This is like KILL -9. AOLserver exits immediately. If svc -t fails to fully kill AOLserver, use this option. This does not take the server out of keepalive mode, so it should still bounce back up immediately. Install a script to automate the stopping and starting of AOLserver services via daemontools. You can then restart a service via restart-aolserver $OPENACS_SERVICE_NAME [root root]# cp /var/lib/aolserver/$OPENACS_SERVICE_NAME/packages/acs-core-docs/www/files/restart-aolserver-daemontools.txt /usr/local/bin/restart-aolserver [root root]# chmod 755 /usr/local/bin/restart-aolserver [root root]# At this point, these commands will work only for the root user. Grant permission for the web group to use svc commands on the $OPENACS_SERVICE_NAME server. [root root]# /usr/local/bin/svgroup web /service/$OPENACS_SERVICE_NAME [root root]# Verify that the controls work. You may want to tail -f /var/lib/aolserver/$OPENACS_SERVICE_NAME/log/$OPENACS_SERVICE_NAME-error.log in another window, so you can see what happens when you type these commands. More information can be found on the AOLserver Daemontools page. How it Works Program Invoked by this program ... ... using this file Where to find errors Log goes to Use these commands to control it svscanboot init /etc/inittab ps -auxw | grep readproctitle n/a aolserver supervise (a child of svscanboot) /service/$OPENACS_SERVICE_NAME/run /var/lib/aolserver/$OPENACS_SERVICE_NAME/log/error.log /var/lib/aolserver/$OPENACS_SERVICE_NAME/log/$OPENACS_SERVICE_NAME.log svc -k /service/$OPENACS_SERVICE_NAME postgresql Redhat init scripts during boot /etc/init.d/postgresql /usr/local/pgsql/data/server.log service postgresql start (Red Hat), /etc/init.d/postgresql start (Debian)

AOLserver keepalive with inittab This is an alternative method for keeping the AOLserver process running. The recommended method is to run AOLserver supervised. This step should be completed as root. This can break every service on your machine, so proceed with caution. There are 2 general steps to getting this working. Install a script called restart-aolserver. This script doesn't actually restart AOLserver - it just kills it. Ask the OS to restart our service whenever it's not running. We do this by adding a line to /etc/inittab. Calling restart-aolserver kills our service. The OS notices that our service is not running, so it automatically restarts it. Thus, calling restart-aolserver effectively restarts our service. Copy this file into /var/tmp/restart-aolserver.txt. This script needs to be SUID-root, which means that the script will run as root. This is necessary to ensure that the AOLserver processes are killed regardless of who owns them. However, the script should be executable by the web group to ensure that the users updating the web page can use the script, but that general system users cannot run the script. You also need to have Perl installed and also a symbolic link to it in /usr/local/bin. [joeuser ~]$ su - Password: *********** [root ~]# cp /var/tmp/restart-aolserver.txt /usr/local/bin/restart-aolserver [root ~]# chown root.web /usr/local/bin/restart-aolserver [root ~]# chmod 4750 /usr/local/bin/restart-aolserver [root ~]# ln -s /usr/bin/perl /usr/local/bin/perl [root ~]# exit Test the restart-aolserver script. We'll first kill all running servers to clean the slate. Then, we'll start one server and use restart-aolserver to kill it. If it works, then there should be no more servers running. You should see the following lines. [joeuser ~]$ killall nsd nsd: no process killed [joeuser ~]$ /usr/local/aolserver/bin/nsd-postgres -t ~/var/lib/aolserver/birdnotes/nsd.tcl [joeuser ~]$ restart-aolserver birdnotes Killing 23727 [joeuser ~]$ killall nsd nsd: no process killed The number 23727 indicates the process id(s) (PIDs) of the processes being killed. It is important that no processes are killed by the second call to killall. If there are processes being killed, it means that the script is not working. Assuming that the restart-aolserver script worked, login as root and open /etc/inittab for editing. [joeuser ~]$ su - Password: ************ [root ~]# emacs -nw /etc/inittab Copy this line into the bottom of the file as a template, making sure that the first field nss1 is unique. nss1:345:respawn:/usr/local/aolserver/bin/nsd-postgres -i -u nobody -g web -t /home/joeuser/var/lib/aolserver/birdnotes/nsd.tcl Important: Make sure there is a newline at the end of the file. If there is not a newline at the end of the file, the system may suffer catastrophic failures. Still as root, enter the following command to re-initialize /etc/inittab. [root ~]# killall nsd nsd: no process killed [root ~]# /sbin/init q See if it worked by running the restart-aolserver script again. [root ~]# restart-aolserver birdnotes Killing 23750 If processes were killed, congratulations, your server is now automated for startup and shutdown. Running multiple services on one machine Services on different ports To run a different service on another port but the same ip, simply repeat replacing $OPENACS_SERVICE_NAME, and change the set httpport 8000 set httpsport 8443 to different values. Services on different host names For example, suppose you want to support http://service0.com and http://bar.com on the same machine. The easiest way is to assign each one a different ip address. Then you can install two services as above, but with different values for set hostname [ns_info hostname] set address 127.0.0.1 If you want to install two services with different host names sharing the same ip, you'll need nsvhr to redirect requests based on the contents of the tcp headers. See AOLserver Virtual Hosting with TCP by markd. High Availability/High Performance Configurations See also .

Multiple-server configuration Staged Deployment for Production Networks ($Id: maintenance.xml,v 1.35.2.1 2019/10/05 13:43:47 gustafn Exp $) By Joel Aufrecht This section describes two minimal-risk methods for deploying changes on a production network. The important characteristics of a safe change deployment include: (THIS SECTION IN DEVELOPMENT) Control: You know for sure that the change you are making is the change that you intend to make and is the change that you tested. Rollback: If anything goes wrong, you can return to the previous working configuration safely and quickly. Method 1: Deployment with CVS With this method, we control the files on a site via CVS. This example uses one developmental server (service0-dev) and one production server (service0). Depending on your needs, you can also have a staging server for extensive testing before you go live. The only way files should move between the server instances is via cvs. To set up a developmental installation, first set up either your developmental installation or your production installation, and follow the instructions for committing your files to CVS. We'll assume in this example that you set up the production server (service0). To set up the developmental instance, you then follow the install guide again, this time creating a new user (service0-dev) that you'll use for the new installation. To get the files for service0-dev, you check them out from cvs (check out service0). su - service0-dev co -d /cvsroot service0 mv service0 /var/lib/aolserver/service0-dev ln -s /home/service0-dev/web /var/lib/aolserver/service0-dev emacs web/etc/config.tcl emacs web/etc/daemontools/run In the config.tcl file, you'll probably want to pay attention the rollout support section. That will ensure that email on your developmental server will not be sent out to the general world. Also, instead of going through the OpenACS online installer, you'll actually load live data into your production server. You can even automate the process of getting live data from your production server. Copy something like this to /home/service0-dev/bin and put it in service0-dev's crontab to run once a night. You'll need to make sure the database backups are set up in service0's crontab, and that if the servers are on different physical machines, that the database backup is copied to the developmental machine once per night. /usr/local/bin/svc -d /service/service0-dev /bin/sleep 60 # this deletes the dev database! /usr/local/pgsql/bin/dropdb service0-dev /usr/local/pgsql/bin/createdb -E UNICODE service0-dev # this is not necessary from Postgres 7.4 on /usr/local/pgsql/bin/psql -f /var/lib/aolserver/service0-dev/packages/acs-kernel/sql/postgresql/postgresql.sql service0 mv /var/lib/aolserver/service0/database-backup/service0-nightly-backup.dmp.gz /var/lib/aolserver/service0-dev/database-backup/service0-nightly-backup-old.dmp.gz /bin/gunzip /var/lib/aolserver/service0-dev/database-backup/service0-nightly-backup.dmp.gz /usr/bin/perl -pi -e "s/^\\connect service0$/\\connect service0-dev/" /var/lib/aolserver/service0-dev/database-backup/service0-nightly-backup.dmp /usr/local/pgsql/bin/psql service0-dev < /var/lib/aolserver/service0-dev/database-backup/service0-nightly-backup.dmp /usr/local/bin/svc -u /service/service0-dev /bin/gzip /var/lib/aolserver/service0-dev/database-backup/service0-nightly-backup-old.dmp Your developmental server will always have data about a day old. To make changes on service0-dev: 1) change the file on service0-dev as desired 2) test the new file 3) commit the file: if the file is /var/lib/aolserver/service0-dev/www/index.adp, do: cd /var/lib/aolserver/service0-dev/www cvs diff index.adp (this is optional; it's just a reality check) the lines starting > will be added and the lines starting < will be removed, when you commit if that looks okay, commit with: cvs -m "changing text on front page for February conference" index.adp the stuff in -m "service0" is a comment visible only from within cvs commands To make these changes take place on service0: 4) update the file on production: cd /var/lib/aolserver/service0/www cvs up -Pd index.adp If you make changes that require changes to the database, test them out first on service0-dev, using either -create.sql or upgrade scripts. Once you've tested them, you then update and run the upgrade scripts from the package manager. The production site can run "HEAD" from cvs. The drawback to using HEAD as the live code is that you cannot commit new work on the development server without erasing the definition of 'working production code.' So a better method is to use a tag. This guarantees that, at any time in the future, you can retrieve exactly the same set of code. This is useful for both of the characteristics of safe change deployment. For control, you can use tags to define a body of code, test that code, and then know that what you are deploying is exactly that code. For rollback, you can use return to the last working tag if the new tag (or new, untagged changes) cause problems. .... example of using tags to follow ... Method 2: A/B Deployment The approach taken in this section is to always create a new service with the desired changes, running in parallel with the existing site. This guarantees control, at least at the final step of the process: you know what changes you are about to make because you can see them directly. It does not, by itself, guarantee the entire control chain. You need additional measures to make sure that the change you are making is exactly and completely the change you intended to make and tested previously, and nothing more. Those additional measures typically take the form of source control tags and system version numbers. The parallel-server approach also guarantees rollback because the original working service is not touched; it is merely set aside. This approach can has limitations. If the database or file system regularly receiving new data, you must interrupt this function or risk losing data in the shuffle. It also requires extra steps if the database will be affected. Simple A/B Deployment: Database is not changed

Simple A/B Deployment - Step 1

Simple A/B Deployment - Step 2

Simple A/B Deployment - Step 3 Complex A/B Deployment: Database is changed

Complex A/B Deployment - Step 1

Complex A/B Deployment - Step 2

Complex A/B Deployment - Step 3 Installing SSL Support for an OpenACS service Debian Users: apt-get install openssl before proceeding. Make sure nsopenssl.so is installed for AOLserver. Uncomment this line from config.tcl. #ns_param nsopenssl ${bindir}/nsopenssl.so Prepare a certificate directory for the service. [$OPENACS_SERVICE_NAME etc]$ mkdir /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/certs [$OPENACS_SERVICE_NAME etc]$ chmod 700 /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/certs [$OPENACS_SERVICE_NAME etc]$ mkdir /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/certs chmod 700 /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/certs It takes two files to support an SSL connection. The certificate is the public half of the key pair - the server sends the certificate to browser requesting ssl. The key is the private half of the key pair. In addition, the certificate must be signed by Certificate Authority or browsers will protest. Each web browser ships with a built-in list of acceptable Certificate Authorities (CAs) and their keys. Only a site certificate signed by a known and approved CA will work smoothly. Any other certificate will cause browsers to produce some messages or block the site. Unfortunately, getting a site certificate signed by a CA costs money. In this section, we'll generate an unsigned certificate which will work in most browsers, albeit with pop-up messages. Use an OpenSSL perl script to generate a certificate and key. Debian users: use /usr/lib/ssl/misc/CA.pl instead of /usr/share/ssl/CA macOS users: use perl /System/Library/OpenSSL/misc/CA.pl -newcert instead of /usr/share/ssl/CA [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ cd /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/certs [$OPENACS_SERVICE_NAME certs]$ perl /usr/share/ssl/misc/CA -newcert Using configuration from /usr/share/ssl/openssl.cnf Generating a 1024 bit RSA private key ...++++++ .......++++++ writing new private key to 'newreq.pem' Enter PEM pass phrase: Enter a pass phrase for the CA certificate. Then, answer the rest of the questions. At the end you should see this: Certificate (and private key) is in newreq.pem [$OPENACS_SERVICE_NAME certs]$ newreq.pem contains our certificate and private key. The key is protected by a passphrase, which means that we'll have to enter the pass phrase each time the server starts. This is impractical and unnecessary, so we create an unprotected version of the key. Security implication: if anyone gets access to the file keyfile.pem, they effectively own the key as much as you do. Mitigation: don't use this key/cert combo for anything besides providing ssl for the web site. [root misc]# openssl rsa -in newreq.pem -out keyfile.pem read RSA key Enter PEM pass phrase: writing RSA key [$OPENACS_SERVICE_NAME certs]$ To create the certificate file, we take the combined file, copy it, and strip out the key. [$OPENACS_SERVICE_NAME certs]$ cp newreq.pem certfile.pem [root misc]# emacs certfile.pem Strip out the section that looks like -----BEGIN RSA PRIVATE KEY----- Proc-Type: 4,ENCRYPTED DEK-Info: DES-EDE3-CBC,F3EDE7CA1B404997 S/Sd2MYA0JVmQuIt5bYowXR1KYKDka1d3DUgtoVTiFepIRUrMkZlCli08mWVjE6T (11 lines omitted) 1MU24SHLgdTfDJprEdxZOnxajnbxL420xNVc5RRXlJA8Xxhx/HBKTw== -----END RSA PRIVATE KEY----- If you start up using the etc/daemontools/run script, you will need to edit this script to make sure the ports are bound for SSL. Details of this are in the run script. Set up Log Analysis Reports Analog is a program with processes webserver access logs, performs DNS lookup, and outputs HTML reports. Analog should already be installed. A modified configuration file is included in the OpenACS tarball. [root src]# su - $OPENACS_SERVICE_NAME [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ cd /var/lib/aolserver/$OPENACS_SERVICE_NAME [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ mkdir www/log [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ cp -r /usr/share/analog-5.32/images www/log/ [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ su - $OPENACS_SERVICE_NAME cd /var/lib/aolserver/$OPENACS_SERVICE_NAME cp /var/lib/aolserver/$OPENACS_SERVICE_NAME/packages/acs-core-docs/www/files/analog.cfg.txt etc/analog.cfg mkdir www/log cp -r /usr/share/analog-5.32/images www/log/ Edit /var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/analog.cfg and change the variable in HOSTNAME "[my organization]" to reflect your website title. If you don't want the traffic log to be publicly visible, change OUTFILE /var/lib/aolserver/$OPENACS_SERVICE_NAME/www/log/traffic.html to use a private directory. You'll also need to edit all instances of service0 to your $OPENACS_SERVICE_NAME. Run it. [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ /usr/share/analog-5.32/analog -G -g/var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/analog.cfg /usr/share/analog-5.32/analog: analog version 5.32/Unix /usr/share/analog-5.32/analog: Warning F: Failed to open DNS input file /home/$OPENACS_SERVICE_NAME/dnscache: ignoring it (For help on all errors and warnings, see docs/errors.html) /usr/share/analog-5.32/analog: Warning R: Turning off empty Search Word Report [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ Verify that it works by browing to http://yourserver.test:8000/log/traffic.html Automate this by creating a file in /etc/cron.daily. [$OPENACS_SERVICE_NAME $OPENACS_SERVICE_NAME]$ exit logout [root root]# emacs /etc/cron.daily/analog Put this into the file: #!/bin/sh /usr/share/analog-5.32/analog -G -g/var/lib/aolserver/$OPENACS_SERVICE_NAME/etc/analog.cfg [root root]# chmod 755 /etc/cron.daily/analog Test it by running the script. [root root]# sh /etc/cron.daily/analog Browse to http://yourserver.test/log/traffic.html External uptime validation The OpenACS uptime site can monitor your site and send you an email whenever your site fails to respond. If you test the url http://yourserver.test/SYSTEM/dbtest.tcl, you should get back the string success. Diagnosing Performance Problems Did performance problems happen overnight, or did they sneak up on you? Any clue what caused the performance problems (e.g. loading 20K users into .LRN) Is the file system out of space? Is the machine swapping to disk constantly? Isolating and solving database problems. Without daily internal maintenance, most databases slowly degrade in performance. For PostGreSQL, see . For Oracle, use exec dbms_stats.gather_schema_stats('SCHEMA_NAME') (Andrew Piskorski's Oracle notes). You can track the exact amount of time each database query on a page takes: Go to Main Site : Site-Wide Administration : Install Software Click on "Install New Application" in "Install from OpenACS Repository" Choose "ACS Developer Support"> After install is complete, restart the server. Browse to Developer Support, which is automatically mounted at /ds. Turn on Database statistics Browse directly to a slow page and click "Request Information" at the bottom of the page. This should return a list of database queries on the page, including the exact query (so it can be cut-paste into psql or oracle) and the time each query took.

Query Analysis example Identify a runaway Oracle query: first, use ps aux or top to get the UNIX process ID of a runaway Oracle process. Log in to SQL*Plus as the admin: [$OPENACS_SERVICE_NAME ~]$ svrmgrl Oracle Server Manager Release 3.1.7.0.0 - Production Copyright (c) 1997, 1999, Oracle Corporation. All Rights Reserved. Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production With the Partitioning option JServer Release 8.1.7.3.0 - Production SVRMGR> connect internal Password: See all of the running queries, and match the UNIX PID: select p.spid -- The UNIX PID ,s.sid ,s.serial# ,p.username as os_user ,s.username ,s.status ,p.terminal ,p.program from v$session s ,v$process p where p.addr = s.paddr order by s.username ,p.spid ,s.sid ,s.serial# ; See the SQL behind the oracle processes: select s.username ,s.sid ,s.serial# ,sql.sql_text from v$session s, v$sqltext sql where sql.address = s.sql_address and sql.hash_value = s.sql_hash_value --and upper(s.username) like 'USERNAME%' order by s.username ,s.sid ,s.serial# ,sql.piece ; To kill a troubled process: alter system kill session 'SID,SERIAL#'; --substitute values for SID and SERIAL# (See Andrew Piskorski's Oracle notes) Identify a runaway Postgres query. First, logging must be enabled in the database. This imposes a performance penalty and should not be done in normal operation. Edit the file postgresql.conf - its location depends on the PostGreSQL installation - and change #stats_command_string = false to stats_command_string = true Next, connect to postgres (psql service0) and select * from pg_stat_activity;. Typical output should look like: datid | datname | procpid | usesysid | usename | current_query ----------+-------------+---------+----------+---------+----------------- 64344418 | openacs.org | 14122 | 101 | nsadmin | <IDLE> 64344418 | openacs.org | 14123 | 101 | nsadmin | delete from acs_mail_lite_queue where message_id = '2478608'; 64344418 | openacs.org | 14124 | 101 | nsadmin | <IDLE> 64344418 | openacs.org | 14137 | 101 | nsadmin | <IDLE> 64344418 | openacs.org | 14139 | 101 | nsadmin | <IDLE> 64344418 | openacs.org | 14309 | 101 | nsadmin | <IDLE> 64344418 | openacs.org | 14311 | 101 | nsadmin | <IDLE> 64344418 | openacs.org | 14549 | 101 | nsadmin | <IDLE> (8 rows) openacs.org=> Creating an appropriate tuning and monitoring environment The first task is to create an appropriate environment for finding out what is going on inside Oracle. Oracle provides Statspack, a package to monitor and save the state of the v$ performance views. These reports help finding severe problems by exposing summary data about the Oracle wait interface, executed queries. You'll find the installation instructions in $ORACLE_HOME/rdbms/admin/spdoc.txt. Follow the instructions carefully and take periodic snapshots, this way you'll be able to look at historical performance data. Also turn on the timed_statistics in your init.ora file, so that Statspack reports (and all other Oracle reports) are timed, which makes them a lot more meaningful. The overhead of timing data is about 1% per Oracle Support information. To be able to get a overview of how Oracle executes a particular query, install "autotrace". I usually follow the instructions here http://asktom.oracle.com/~tkyte/article1/autotrace.html. Make sure that the Oracle CBO works with adequate statistics The Oracle Cost Based optimizer is a piece of software that tries to find the "optimal" execution plan for a given SQL statement. For that it estimates the costs of running a SQL query in a particular way (by default up to 80.000 permutations are being tested in a Oracle 8i). To get an adequate cost estimate, the CBO needs to have adequate statistics. For that Oracle supplies the dbms_stats package.