Mike Lubanski
mlubanski@hotmail.com
Introduction
With the massive proliferation of web servers in the enterprise, it is important to maintain a monitoring strategy for monitoring those web sites for downtime and other potential problems…
With the massive proliferation of web servers in the enterprise,
it is important to maintain a monitoring strategy for monitoring
those web sites for downtime and other potential problems. The
Internet is revolutionizing the way business is conducted and web
sites and web servers play a key role in that revolution. Companies
that use their web servers to conduct e-business (such as
Amazon.com, E*Trade.com, etc.) rely on their web servers as much as
they rely on any other machine that provides the information to
conduct business.
This reliance on web servers demands a monitoring and management
strategy for the companies’ key web servers. Microsoft’s Internet
Information Server 4.0 is one web server technology that is popular
in the business. This document will give an overview and some
specific details about how to monitor and manage a Microsoft IIS 4.0
web server.
Prerequisites and Assumptions
Several prerequisites and assumptions should be met in order to
proceed with the monitoring.
- The SNMP service must be installed and configured on each
server. The SNMP trap destination must be configured to point to a
network management console where traps can be viewed, sort or
acted upon by the support organization. - To track any disk performance monitors, be sure that disk
monitoring is enabled by executing “diskperf -y” on the target
machine. - The IIS-specific counters should be installed on all IIS
Servers. This is normally done through product installation. - This document covers “what” to monitor, but does not describe
“how” or “by what means.” Therefore, a monitoring or management
tool (of your choice) to monitor and alert on the events of the
IIS will be necessary if a more sophisticated solution is desired.
This will allow for the coordination and collaboration of events
in order to be more proactive and response to the events. Simply
monitoring events without the need for automation can be achieved
with built-in operating system or (such as Windows NT Performance
Monitor counters). - Some events state “Need Baseline.” This indicates that a
baseline of normal activity is necessary to help determine what is
above or below normal or what would be considered an error. For
example, CPU utilization should be baselined to determine what is
the normal utilization of the CPU. From this number, you can
determine what is abnormal.
Key
The key to the table is as follows:
Problem Description
– This column defines the problem or
error.
Method of Detection
– This column defines how the problem
or error can be detected. Most of the time the method of detection
will be an entry in the Windows NT Event Log or a SNMP trap that is
generated by the machine.
Recommended Action
– This column defines what to do when
the problem or error occurs. In order to turn “monitoring” efforts
into “management” efforts, the recommended actions should be
automated to occur when the problem occurs.
Monitoring Interval
– This column defines how often the
monitoring sample should take place. Most monitors need to scan the
system every 30 or 60 minutes, while others may need to scan every 5
minutes, such as a health check of a service.
Severity definition:
1 = High priority, notify immediately
2 = Medium priority, notify within 1 hour
3 = Low priority, notify within 24 hours.
Threshold
– This column defines the thresholds that need
to be monitored. For events, the threshold will always be one. Other
monitors such as cache hit ratio will have a specific
value.
Internet Information Server – Operations
This section details what services and processes need to be
monitored in order to keep the IIS web server operational and
available.
Problem Description |
Method of Detection |
Recommended Action |
Monitoring Interval |
Severity |
Threshold |
FTP service unavailable. |
Monitor the FTP Publishing Service within the service control manager. |
Attempt to restart service. If restart fails, reboot server. |
15 min |
1 |
1 |
IIS Admin Service unavailable. |
Monitor the IIS Admin Service within the service control manager. |
Attempt to restart service. If restart fails, reboot server. |
15 min |
1 |
1 |
Cannot access web page. |
Monitor the World Wide Web Publishing Service within the service control manager. |
Attempt to restart service. If restart fails, reboot server. |
15 min |
1 |
1 |
Poor Web Server performance |
Monitor the Web Service performance monitor object and the Bytes Total / sec instance |
A high total bytes / sec can indicate a high amount of traffic on a web server. This can indicate a need for load balancing efforts or more web servers. |
15 min |
2 |
Need Baseline |
Logged errors |
Monitor the %systemroot%system32LogFilesW3Svc1 directory for log files. |
Depends on log entry |
30 min |
2 |
1 |
Internet Information Server – Performance
Using a new tool that Microsoft is releasing called “Homer,” IIS
servers can be stress-tested. As stated on the Homer web page, “The
Microsoft Homer web stress tool is designed to realistically
simulate multiple browsers requesting pages from a web site. You can
use this tool to gather performance and stability information about
your web application. This tool simulates a large number of requests
with a relatively small number of client machines. The goal is to
create an environment that is as close to production as possible so
that you can find and eliminate problems in the web application
prior to deployment.” Homer can be accessed from http://homer.rte.microsoft.com/
When using Homer, you can watch the following Performance Monitor
counters to get an idea of where your performance bottlenecks may
arise:
- Disk Faults
- System: % Total processor time
- Bandwidth usage
- ASP Requests/Sec.
- Web Service: Get Requests/sec
- Web Service: Post Requests/sec
General Tips for Faster Performance (on Client side)
- Send fewer and smaller images to clients
- Use SSL only when necessary
- Cache static data and slowly changing content. Use expiration
dates on data when possible. - Use Cache control on Proxy servers
General Tips for Faster Performance (on Server side)
- Use static content whenever possible
- Use adequate hardware
- Scale to multiple processors if possible.
- Partition workload among numerous machines
- Have a good connection to any backend databases
- Use Round Robin DNS or Router Load Balancing (like Cisco’s
Local Director) - Decide on ISAPI vs. ASP vs. CGI. ISAPI will give you the best
performance, but is high in maintenance and troubleshooting.
Windows NT
This section will detail what items need to be monitored in the
underlying operating system to ensure it keep the web server running
and available.
Problem Description |
Method of Detection |
Recommended Action |
Monitoring Interval |
Severity |
Threshold |
Runaway process |
Monitor the CPU utilization of the server |
Check all of the server services. Restart services that are down in proper order. Verify what service is consuming CPU. Examine the top 20 processes consuming CPU. |
10 Mins |
1 |
>80% Sustained over 10 mins Need Baseline |
Hardware & Network Management
This section will detail what to monitor and manage in the
low-level hardware and network interface card.
Problem Description |
Method of Detection |
Recommended Action |
Monitoring Interval |
Severity |
Threshold |
Hardware errors |
Monitor the internal temperature of server |
Check hardware for errors |
10 min. |
1 |
Need Baseline |
Hardware errors |
Monitor any critical IDE or SCSI disk failures |
Check hardware for errors |
10 min. |
1 |
Need Baseline |
Hardware errors |
Monitor NIC failures |
Check hardware for errors |
10 min. |
1 |
Need Baseline |
Hardware errors |
Monitor any fan failures |
Check hardware for errors |
10 min |
1 |
Need Baseline |
Hardware errors |
Monitor any correctable memory errors |
Check hardware for errors |
10 min. |
1 |
Need Baseline |
Network utilization high |
Monitor the total bytes/second processed by the network interface card. |
Check and/or tune performance of NIC card. |
10 min. |
1 |
Need Baseline |
ICMP errors | Monitor the receipt time for ICMP packets |
Check and/or tune performance of NIC card. |
10 min. |
3 |
Need Baseline |
ICMP errors |
Monitor the level of unreachable destinations. |
Check and/or tune performance of NIC card. |
15 min |
3 |
Need Baseline |