<html> <head> <title>Monitoring</title> </head> <body bgcolor=#ffffff> <h2>Monitoring</h2> your <a href="index.html">ArsDigita Community System</a> installation by <a href="http://teadams.com">Tracy Adams</a> and <a href="mailto:jsalz@mit.edu">Jon Salz</a> <hr> <ul> <li>User directory: none <li>Admin directory: <a href="/admin/monitoring/">/admin/monitoring/</a> <li>Procedures: /tcl/watchdog-defs.tcl, /tcl/cassandracle-defs.tcl <li>Binaries: /bin/aolserver-errors.pl </ul> <h3>The Big Picture</h3> The ArsDigita Community System has an integrated set of monitoring tools. <h3>Parameters</h3> Monitoring parameters as centralized in the monitoring section of the .ini file. Add a new <code>PersontoNotify</code> for each person who should receive monitoring alerts. <blockquote> <pre> [ns/server/yourservername/acs/monitoring] ; People to email for alerts PersontoNotify=nerd1@yourservicename.com ;PersontoNotify=nerd2@yourservicename.com ; location of the watchdog perl script WatchDogParser=/web/yourservicename/bin/aolserver-errors.pl ; watchdog frequency in minutes WatchDogFrequency=15 </pre> </blockquote> <h3>Current page requests - monitor.tcl</h3> The "current page request" section (linked from /admin/monitoring/) will produce a report like the following. <p> <center> <table width=90> <tr><td colspan=6>There are a total of 8 requests being served right now (to 8 distinct IP addresses). Note that this number seems to include only the larger requests. Smaller requests, e.g., for .html files and in-line images, seem to come and go too fast for this program to catch. </td></tr> <tr><th>conn #<th>client IP<th>state<th>method<th>url<th>n seconds<th>bytes</tr> <tr><td>17899<td>212.252.145.38<td>running<td>GET<td>/photo/pcd3255/chappy-store-31.4.jpg<td>59<td>158544 <tr><td>18185<td>38.27.213.213<td>running<td>GET<td>/wtr/thebook/html.html<td>21<td>0 <tr><td>18247<td>171.210.228.91<td>running<td>GET<td>/photo/nikon/nikon-reviews.html<td>15<td>0 <tr><td>18367<td>209.86.54.190<td>running<td>GET<td>/bboard/image.tcl<td>8<td>34228 <tr><td>18454<td>199.174.160.135<td>running<td>GET<td>/photo/pcd1669/treptower-big-view-51.4.jpg<td>1<td>34376 <tr><td>18464<td>207.100.29.220<td>running<td>?<td>?<td>1<td>0 <tr><td>18468<td>216.214.210.53<td>running<td>GET<td>/chat/js-refresh.tcl<td>0<td>0 <tr><td>18481<td>216.34.106.252<td>running<td>GET<td>/monitor.tcl<td>0<td>0 </table> </center> <p> This report will inform you which users are waiting on pages from your server. In the report above, users asking for large images or pages are waiting. This is normal because some users have very slow connections. <p> If you see the same .tcl or .adp file often, especially with the longest wait times, it is likely that the script is extremely slow or is hogging database handles. You should <ul> <li>Examine and fix the page <li>User <a href=proc-one.tcl?proc_name=ad_return_if_another_copy_is_running>ad_return_if_another_copy_is_running</a> to limit the number of times the page can concurrently run (limit to a few less than your total db pool). This will prevent multiple executions of that page from destroying your whole web service. </ul> <p> If you see a large number of requests from the same IP address, it is likely that a poorly-designed spider is attacking your web service. To stop it, ban that IP address from your system. <h3>Cassandracle (Oracle)</h3> Cassandracle is a Web-based monitor for an Oracle installation. The goal is that, at a glance, a novice Oracle DBA ought to be able to identify problems and find pointers to relevant reference materials. <p> To use Cassandracle in your installation, you will need to give the web service's database user read access to some core Oracle tables. <ol> <li>Log into Oracle via sqlplus <li>Execute: <blockquote> SQL> connect internal </blockquote> <li> Run the commands in /sql/cassandracle.sql <li> Execute <blockquote> SQL> grant ad_cassandracle to username; </blockquote> </ul> <h3>Configuration</h3> This is a simple section with information about the current machine and connection. The information provided is pretty sparse and should expand in the future. <h3>WatchDog (Error log)</h3> Every <code>WatchDogFrequency</code> seconds, the service's error logs will be scanned. If errors are found, they will be emailed to those configured as a <code>PersontoNotify</code>. The administration pages have a tool to search the error log for errors. <h3>Registered Filters and Schedule Procedures</h3> The <tt>ad_register_filter</tt> and <tt>ad_schedule_proc</tt> procs are wrappers around the corresponding <tt>ns_</tt> calls, which allow us to more carefully track what's happening on the server and when. /admin/monitoring/filters.tcl shows which filters are called for which URLs and methods, and /admin/monitoring/scheduled-procs.tcl shows which procedures are scheduled to be called in the future. <hr> <a href="mailto:teadams@arsdigita.com"><address>teadams@arsdigita.com</address></a>, <a href="mailto:jsalz@mit.edu"><address>jsalz@mit.edu</address></a>