<html>
<head>
<title>Monitoring</title>
</head>
<body bgcolor=#ffffff>
<h2>Monitoring</h2>

your <a href="index.html">ArsDigita Community System</a> installation
by <a href="http://teadams.com">Tracy Adams</a> and <a href="mailto:jsalz@mit.edu">Jon Salz</a>

<hr>

<ul>
<li>User directory:  none
<li>Admin directory:   <a href="/admin/monitoring/">/admin/monitoring/</a> 
<li>Procedures:  /tcl/watchdog-defs.tcl, /tcl/cassandracle-defs.tcl
<li>Binaries: /bin/aolserver-errors.pl
</ul>


<h3>The Big Picture</h3>

The ArsDigita Community System has an integrated set of monitoring 
tools. 

<h3>Parameters</h3>

Monitoring parameters as centralized in the monitoring
section of the .ini file.  Add a new <code>PersontoNotify</code> for
each person who should receive monitoring alerts.

<blockquote>
<pre>
[ns/server/yourservername/acs/monitoring]
; People to email for alerts
PersontoNotify=nerd1@yourservicename.com
;PersontoNotify=nerd2@yourservicename.com
; location of the watchdog perl script
WatchDogParser=/web/yourservicename/bin/aolserver-errors.pl
; watchdog frequency in minutes
WatchDogFrequency=15
</pre>
</blockquote>

<h3>Current page requests - monitor.tcl</h3>

The "current page request" section (linked from /admin/monitoring/)
will produce a report like the following.
<p>

<center>

<table width=90>
<tr><td colspan=6>There are a total of 8 requests being served right now (to 8 distinct IP addresses). Note that this number seems to include only
the larger requests. Smaller requests, e.g., for .html files and in-line images, seem to come and go too fast for this program to
catch. </td></tr>
<tr><th>conn #<th>client IP<th>state<th>method<th>url<th>n seconds<th>bytes</tr>
<tr><td>17899<td>212.252.145.38<td>running<td>GET<td>/photo/pcd3255/chappy-store-31.4.jpg<td>59<td>158544
<tr><td>18185<td>38.27.213.213<td>running<td>GET<td>/wtr/thebook/html.html<td>21<td>0
<tr><td>18247<td>171.210.228.91<td>running<td>GET<td>/photo/nikon/nikon-reviews.html<td>15<td>0
<tr><td>18367<td>209.86.54.190<td>running<td>GET<td>/bboard/image.tcl<td>8<td>34228
<tr><td>18454<td>199.174.160.135<td>running<td>GET<td>/photo/pcd1669/treptower-big-view-51.4.jpg<td>1<td>34376
<tr><td>18464<td>207.100.29.220<td>running<td>?<td>?<td>1<td>0
<tr><td>18468<td>216.214.210.53<td>running<td>GET<td>/chat/js-refresh.tcl<td>0<td>0
<tr><td>18481<td>216.34.106.252<td>running<td>GET<td>/monitor.tcl<td>0<td>0
</table>
</center>
<p>
This report will inform you which users are waiting on pages from your server.
In the report above, users asking for large images or pages are waiting.  This
is normal because some users have very slow connections.
<p>
If you see the same .tcl or .adp file often, especially with the longest wait times, it is likely that the script is extremely slow or is hogging database handles. You should
<ul>
<li>Examine and fix the page
<li>User <a href=proc-one.tcl?proc_name=ad_return_if_another_copy_is_running>ad_return_if_another_copy_is_running</a> to limit the number of times the page can concurrently run (limit to a few less than your total db pool).  
This will prevent multiple executions of that page from destroying your whole web service.
</ul>
<p>
If you see a large number of requests from the same IP address, it is 
likely that a poorly-designed spider is attacking your web service. To stop it,
ban that IP address from your system.

<h3>Cassandracle (Oracle)</h3>

Cassandracle is a Web-based monitor for an Oracle installation. 
The goal is that, at a glance, a novice Oracle DBA ought to be
able to identify problems and find pointers to relevant reference materials. 
<p>
To use Cassandracle in your installation, you will need to 
give the web service's database
user read access to some core Oracle tables.  

<ol>
<li>Log into Oracle via sqlplus
<li>Execute:
<blockquote>
SQL>  connect internal
</blockquote>
<li> Run the commands in /sql/cassandracle.sql
<li> Execute
<blockquote>
SQL> grant ad_cassandracle to username;
</blockquote>
</ul>

<h3>Configuration</h3>

This is a simple section with information about the current machine
and connection. The information provided is pretty sparse and should
expand in the future.

<h3>WatchDog (Error log)</h3>

Every <code>WatchDogFrequency</code> seconds, the service's error logs will
be scanned. If errors are found, they will be emailed to those configured
as a <code>PersontoNotify</code>.  The administration pages have a tool
to search the error log for errors.

<h3>Registered Filters and Schedule Procedures</h3>

The <tt>ad_register_filter</tt> and <tt>ad_schedule_proc</tt> procs are
wrappers around the corresponding <tt>ns_</tt> calls, which allow us to
more carefully track what's happening on the server and when.
/admin/monitoring/filters.tcl shows which filters are called for which URLs and
methods, and /admin/monitoring/scheduled-procs.tcl shows which procedures are
scheduled to be called in the future.

<hr>

<a href="mailto:teadams@arsdigita.com"><address>teadams@arsdigita.com</address></a>,
<a href="mailto:jsalz@mit.edu"><address>jsalz@mit.edu</address></a>