Microsoft Exchange Management Requirements

By ServerWatch Staff (Send Email)
Posted Nov 15, 1998


Mike Lubanski
mlubanski@hotmail.com

Introduction

Messaging systems are so critical to organizations of all sizes that it is important to develop a plan for ongoing systems management, monitoring, and troubleshooting. The type of management plan needed depends on many factors, including size of the company, number of users, location of remote offices, type of messaging software used, network infrastructure and so on.

Messaging systems are so critical to organizations of all sizes that it is important to develop a plan for ongoing systems management, monitoring, and troubleshooting. The type of management plan needed depends on many factors, including size of the company, number of users, location of remote offices, type of messaging software used, network infrastructure and so on. This need can be determined by both the administration and help desk structure that is in place as well as the business criticality of the messaging software. For example, if e-mail is mission critical, then a heavy amount of monitoring should be in place. If e-mail is not critical, or the e-mail infrastructure is small, less monitoring will suffice.

The architecture of the messaging system at this company may or may not reflect your own, but the server roles detailed below will probably be in your architecture at one place or another.

This document provides a detailed view and explanation of the systems management requirements for Microsoft Exchange version 5.0 deployed at a large company with a large number of users, postoffices, etc. All of these monitoring requirements may or may not apply to your environment, but serve as a starting point of what to consider to monitor. For smaller companies or those with only a few sites, some of the monitoring requirements may be dropped at your discretion.

The company developed these management requirements because of both the business criticality of e-mail services as well as some problems that were encountered when it was too late to prevent them. With some monitors in place, problems and, more importantly, outages can be prevented and operations can be notified at a stage when they have enough time to proactively react to it with time to fix it. The resulting document, excerpted below, covers systems management at the following levels of e-mail operations:

  • Tier 0-Enterprise send-mail relay (used for firm-wide external mail distribution)
  • Tier I-IMS servers (servers that provide Internet Mail Services)
  • Tier II-Bridgehead servers (servers that provide MTA services for site to site communication)
  • Tier III-Postoffice (servers that contain the mailboxes of users).

Each level contains a list of potential errors that are monitored for, what those errors mean, and what corrective actions to take. These lists of errors were derived with joint meetings between the company's Exchange administrators and the New York office of Microsoft Consulting Services. Most of the errors can be trapped on first occurrence. Some errors, however, require a baseline of activity to determine the appropriate error level to monitor. This baseline can be achieved by using performance monitor counters over a prolonged period of time. For this particular company, several performance monitor counters recorded data for a 2-week period of time for the Exchange servers' critical services and processes that need to be monitored.

Note: Systems administrators and helpdesk personnel wanting to use this information in their environments should note that these requirements were developed for a specific company and, therefore, may not apply directly to their situation.

Tier 0: Send Mail Server with SMTP Gateways

A UNIX server is used as the primary gateway to transmit mail out of the system to the Internet through SMTP. This Internet gateway was based in UNIX due to the deployment of an e-mail system before MS Exchange was commercially available. This machine needs to be monitored in order to ensure that e-mail can leave and enter the company. A failure in this gateway will not affect intra-company mail services. Aside from monitoring port #25 (see below), it is also necessary to monitor the operating system on which this service resides to ensure machine availability.

The table below provides information on the problem, detection method, action recommended, how often to monitor, and severity and threshold levels. Severity definition: 1 = High priority, notify immediately; 2 = Medium priority, notify within 1 hour; 3 = Low priority, notify within 24 hours.

Problem Description Method of Detection Recommended Action Monitoring Interval Severity Threshold
UNIX send-mail system unable to process mail Telnet to Port #25 of the gateway to obtain a "ESMTP spoken here" response Contact UNIX System Tech Support for assistance 15 min. 1 1

Tier I: IMS Servers

Tier I servers contain the IMS (Internet Mail Service), which carry mail from the internal system to the external gateways for access to the Internet. IMS servers need to be able to send mail outside of the system through the UNIX send-mail host, which must be operational for this to occur. Although this falls into the hand of the UNIX OS support team, port #25 on these boxes should be monitored nonetheless.

The impact of any problems on the IMS will only affect messages that travel outside the environment. IMS problems will not affect site to site or post office to post office operations. However, IMS failures will cause messages destined for external addresses to queue at the bridgehead server trying to send the message. The message will remain in queue until the IMS is operational or it is removed from queue with a diagnostic tool.

Consider scheduled downtime when monitoring Exchange services and processes. Exchange administrators should be aware of servers scheduled for maintenance to avoid false alerts from the monitors. Also, temporarily disable any "auto-fix" type of monitors during scheduled maintenance. Suggestion: disable all monitors during the same part of the day that maintenance is scheduled to occur. First, make sure the efforts of the Exchange administrators and those performing maintenance are coordinated.

All EventLog ID numbers assume use of Microsoft Exchange version 5.0. EventLog IDs for Exchange 5.5 may differ, but the problem description and resolution will remain the same.

Severity definition: 1 = High priority, notify immediately; 2 = Medium priority, notify within 1 hour; 3 = Low priority, notify within 24 hours.

Problem Description Method of Detection Recommended Action Monitoring Interval Severity Threshold
Connectivity          
Unable to send mail upstream Telnet to Port #25 of IMS to obtain a "ESMTP spoken here" response Troubleshoot TCP/IP status on machine. Ensure port #25 is operational and not used by another application. Check the c:\winnt\system32\drivers\etc\services file to determine which application may be using port #25 15 min. 1 1
Database problems          
Database too fragmented EventLog ID 65 detected Use "edbutil" to defrag database (should be done by Exchange admins. Only) 15 min. 2 1 time every 3 months
Database in inconsistent state

(This message may also appear in the Directory or Information Store database, in the case of a power failure. This error usually means that the database is in an inconsistent state and cannot start.)

EventLog ID Error -550 has occurred Confirm that the database state database is inconsistent, and then try a defragmentation repair. Stop all services and backup all files before you manually run the Edbutil.exe program.

1. To check the state, use Edbutil.exe with the "MH" option on the problem database and dump the output to a text file:

EDBUTIL /MH c:\exchsrvr\dsadata\dir.edb >c:\edbdump.txt

-OR-

EDBUTIL /MH c:\exchsrvr\mdbdata\priv.edb >c:\edbdump.txt

-OR-

EDBUTIL /MH c:\exchsrvr\mdbdata\pub.edb >c:\edbdump.txt

2. View the Edbdump.txt file and if the database is in an inconsistent state and won't start due to a -550 error in the EventLog, restore the database from the online backup, replay the logs, and restart the consistent database. If and only if the online backup is unavailable, follow step #3.

3. To repair the database, use the following Edbutil syntax:

EDBUTIL /R /DS

Use /ISPRIV or /ISPUB instead of /DS for repairing the private or public information stores. Because there is a difference between the Repair (/d [database]/r) database while defragmenting and Recovery (/r) option of EDBUTIL, do not run the EDBUTIL /D /R unless specifically directed by Microsoft Premier Support Services (PSS). Refer to Knowledge Base article Q143235 for information on running Recovery option (edbutil /r).

15 min. 2 1
Database reaching capacity EventLog ID 1112 detected or IS size reached 80% of logical disk capacity Normally logged after database has shutdown for reaching capacity, this requires that the server run edbutil /d to free space up. After completion of edbutil database, restart Information store. 20 min. 2 1
Database cache hit rate too low Monitor the database buffer cache hit ratio for the IS and DS database DS and IS buffers can be increased if there is sufficient RAM. If these fall below 95% frequently, it indicates the buffers are too low. To correct the problem, manually run perfwiz -v. 30 min. 3 Baseline
MTA problems          
MTA messages per second too low or too high Monitor the number of messages being processed by the MTA. Check the status of the MTA and the CPU and memory consumption of the processes. 15 min. 1 Baseline
MTA process is down Monitor the number of threads in use by the MTA and EventLog ID 2110 detected Restart MTA Service. If service fails to restart, restart ALL Exchange services in order. 10 min. 1 1
MTA Work Queue length too high Monitor MTA Queue length on server Check the MTA Service is up and the MTA service on upstream connections (i.e. if MTA queue length of bridgehead server is too high, check the MTA on the IMS) 15 min. 2

3

Baseline
Directory Problems          
Directory updates failed EventLog ID 1171 detected - exception event Directory Service Problem followed by a 1214 Error in the Event log indicates a Server failing on a deletion or addition of a directory object.

Contact Microsoft (PSS) for troubleshooting

15 min. 2 1
Directory updates failed EventLog ID 1214 detected - KCC event Knowledge consistency checker fails to complete successfully. Indicates a corruption in the Directory schema that may affect more than one (1) server in a site or Organizational Unit. Contact Microsoft PSS for troubleshooting 15 min. 2 1
Directory Services Pending Replications too high Monitor the number of pending replications in the DS Huge lag in Directory updates may indicate a problem with Network connectivity to other bridgehead servers and confirms that the ability to ping other Bridgehead servers that this server uses for directory replication still exists. This can also occur with servers in the same site. 30 min. 2 Baseline
Directory Services remaining replication updates not decreasing Monitor the number of objects being processed by the DS Indicates a directory problem with either the server failing to exchange directory replication messages due to Network issues or directory problems. Check Event logs on Server for details. 30 min. 2 Baseline
IMS problems          
IMS NDRs are high Monitor the NDR for the IMS.

 

EventLog ID 1001 detected.

 

EventLog ID 1026 or 2007 detected

Confirm Sendmail, Unix Relay host is available from Server.

EventLog ID 1001 means the IMS service has been stopped or has shutdown

EventLog ID 1026 or 2007: contact Microsoft PSS.

30 min. 2 Baseline
IMS inbound queue too high Monitor the IMS inbound queue Check the IMS Service. 30 min. 2 Baseline
IMS outbound queue too high Monitor the IMS outbound queue Check the IMS Service 30 min. 2 Baseline
Overall Exchange problems          
Exchange services down Monitor the service control manager to detect status of services. Check all Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 1
Windows NT          
Exchange process dead Monitor the CPU and thread utilization of the Exchange processes Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 Baseline
Runaway Exchange process Monitor the CPU and memory utilization of the Exchange processes Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 Baseline
Paging too high Monitor the paging frequency of the operating systems (pagefile usage) Excessive paging requires the need for upgrade of memory. If paging persists, treat as bug. 5 min. 2 Baseline
Low logical disk free space Monitor the logical disk space of the Exchange machines Delete unnecessary files to free up disk space. Install new disk space if necessary. 15 min. 1 Baseline
CPU Queue Length too high Monitor the overall queue length of the CPU over a prolonged period. Use Performance Monitor to identify CPU bottlenecks and rectify as necessary 30 min. 2 Baseline
Hardware / Network          
Compaq Insight Manager errors Monitor the internal temperature of server Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any critical IDE or SCSI disk failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor NIC failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any fan failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any correctable memory errors Check hardware for errors 10 min. 1 Baseline
Network utilization high Monitor total bytes per second processed by network interface card. Check and/or tune performance of NIC card. 10 min. 3 Baseline
ICMP errors Monitor the receipt time for ICMP packets Check and/or tune performance of NIC card. 10 min. 3 Baseline
ICMP errors Monitor the level of unreachable destinations Check and/or tune performance of NIC card. 10 min. 3 Baseline

Tier II: Bridgehead Servers

Tier II servers provide the function of a message transfer agent from one distribution module to another. Bridgehead servers send e-mail upstream to the IMS and to other bridgehead servers.

Problems on the bridgehead servers affect all servers within that site and any other sites trying to communicate with the problem site. Messages will still travel within its site, but not to any outside sites, including the IMS and the Internet. Bridgehead server failures will cause messages to queue at both its own queues and the queues of any sites attempting to communicate with the problem site. The messages will remain in queue until the connection is re-established or they are removed with a diagnostic tool.

Consider scheduled downtime when monitoring Exchange services and processes. Exchange administrators should be aware of servers scheduled for maintenance to avoid false alerts from the monitors. Also, temporarily disable any "auto-fix" type of monitors during scheduled maintenance. Suggestion: disable all monitors during the same part of the day that maintenance is scheduled to occur. First, make sure the efforts of Exchange administrators and those performing maintenance are coordinated.

All EventLog ID numbers assume use of Microsoft Exchange version 5.0. EventLog IDs for Exchange 5.5 may differ, but the problem description and resolution will remain the same.

Severity definition: 1 = High priority, notify immediately; 2 = Medium priority, notify within 1 hour; 3 = Low priority, notify within 24 hours.

Problem Description Method of Detection Recommended Action Monitoring Interval Severity Threshold
Database problems          
Database too fragmented EventLog ID 65 detected Use "edbutil" to defrag database (should be done by Exchange admins. only). 15 min. 2 1 time every 3 months
Database state inconsistent

(This message may also appear in the Directory or Information Store database, in the case of a power failure. This error usually means that the database is in an inconsistent state and cannot start.)

EventLog ID Error -550 has occurred Confirm inconsistent database state and then try a defragmentation repair. Stop all services and backup all files before you manually run the Edbutil.exe program.

1. To check the state, use Edbutil.exe with the "MH" option on the problem database and dump the output to a text file:
EDBUTIL /MH c:\exchsrvr\dsadata\dir.edb >c:\edbdump.txt

-OR-
EDBUTIL /MH c:\exchsrvr\mdbdata\priv.edb >c:\edbdump.txt

-OR-
EDBUTIL /MH c:\exchsrvr\mdbdata\pub.edb >c:\edbdump.txt

2. View the Edbdump.txt file and confirm that the database state is inconsistent. If it is and it will not start due to a -550 error in the EventLog, restore the database from the online backup, replay the logs, and restart the consistent database. If and only if the online backup is unavailable, follow step #3.

3. To repair the database, use the following Edbutil syntax:
EDBUTIL /R /DS

Use /ISPRIV or /ISPUB instead of /DS for repairing the private or public information stores. Because there is a difference between the Repair (/d [database]/r) database while defragmenting and Recovery (/r) option of EDBUTIL, do not run the EDBUTIL /D /R unless specifically directed by Microsoft PSS. Refer to Knowledge Base article Q143235 for information on running the Recovery option (edbutil /r).

15 min. 2 Recovery
Database reaching capacity EventLog ID 1112 detected or IS size reached 80% of logical disk capacity Normally logged after database has shutdown for reaching capacity, this requires that the server run edbutil /d to free space up. After completion of edbutil database, restart Information store. 20 min. 2 1
Database cache hit rate too low Monitor the database buffer cache hit ratio for the IS and DS database DS and IS buffers can be increased if there is sufficient RAM. If these fall below 95% frequently, it indicates the buffers are too low. To correct the problem, manually run perfwiz -v. 30 min. 1 Baseline
MTA problems          
MTA messages per second too low or too high Monitor the number of messages processed by the MTA. Check the status of the MTA and the CPU and memory consumption of the processes. 15 min. 1 Baseline
MTA process is down Monitor the number of threads in use by the MTA. Restart MTA Service. If service fails to restart, restart ALL Exchange services in order. 10 min. 1 1
MTA Work Queue length too high Monitor MTA Queue length on server Check that the MTA Service is up and check it on upstream connections (i.e. if MTA queue length of bridgehead server is too high, check the MTA on the IMS) 15 min. 2 Baseline
Directory Problems          
Directory updates failed EventLog ID 1171 detected - exception event Directory Service Problem followed by a 1214 Error in the Event log indicates a Server failing on a deletion or addition of a directory object.

Contact Microsoft PSS for troubleshooting

15 min. 2 1
Directory updates failed EventLog ID 1214 detected - KCC event Knowledge consistency checker fails to complete successfully. Indicates a corruption in the Directory schema that may affect more than (1) server in a site or Organizational Unit. Contact Microsoft PSS for troubleshooting 15 min. 2 1
Directory Services Pending Replications too high Monitor the number of pending replications in the DS Huge lag in Directory updates may indicate a problem with Network connectivity to other bridgehead servers and confirms that the ability to ping other Bridgehead servers that this server uses for directory replication still exists. This can also occur with servers in the same site. 30 min. 2 Baseline
Directory Services remaining replication updates not decreasing Monitor the number of objects being processed by the DS Indicates a directory problem with either the server failing to exchange directory replication messages due to Network issues or directory problems. Check Event logs on Server for details. 30 min. 2 Baseline
Overall Exchange problems          
Exchange services down Monitor the service control manager to detect status of services. Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 1
Windows NT          
Exchange process dead Monitor the CPU and thread utilization of the Exchange processes Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 Baseline
Runaway Exchange process Monitor the CPU and memory utilization of the Exchange processes Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 Baseline
Paging too high Monitor the paging frequency of the operating systems (pagefile usage) Excessive paging requires an Upgrade of memory. If paging persists, treat as bug. 10 min. 2 Baseline
Low logical disk free space Monitor the logical disk space of the Exchange machines Delete unnecessary files to free up disk space. Install new disk space if necessary. 15 min. 1 Baseline
CPU Queue Length too high Monitor the overall queue length of the CPU over a prolonged period. Use Performance Monitor to identify CPU bottlenecks and rectify as necessary 30 min. 2 Baseline
Hardware / Network          
Compaq Insight Manager errors Monitor the internal temperature of server Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any critical IDE or SCSI disk failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor NIC failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any fan failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any correctable memory errors Check hardware for errors 10 min. 1 Baseline
Network utilization high Monitor the total bytes per second processed by the network interface card. Check and/or tune performance of NIC card. 10 min. 3 Baseline
ICMP errors Monitor the receipt time for ICMP packets Check and/or tune performance of NIC card. 10 min. 3 Baseline
  Monitor the level of unreachable destinations Check and/or tune performance of NIC card. 10 min. 3 Baseline

Tier III: Branch Servers

Tier III servers provide the postoffice' function that will route messages sent from the branch to other branches, other sites and outside of the environment by sending the message to the appropriate upstream postoffice.

Postoffice servers send e-mail upstream to bridgehead servers. Problems on the postoffice servers will affect only users within that postoffice. Messages will not travel outside the postoffice.

Postoffice server failures can cause messages to queue at both its own queues and the queues of any sites attempting to communicate with the problem postoffice. The messages will remain in queue until the connection is re-established or they are removed with a diagnostic tool.

Consider scheduled downtime when monitoring Exchange services and processes. Exchange administrators should be aware of servers scheduled for maintenance to avoid false alerts from the monitors. Also, temporarily disable any "auto-fix" type of monitors during scheduled maintenance. Suggestion: disable all monitors during the same part of the day that maintenance is scheduled to occur. First, make sure Exchange administrators and those performing maintenance are coordinated.

All EventLog ID numbers assume use of Microsoft Exchange version 5.0. EventLog IDs for Exchange 5.5 may differ, but the problem description and resolution will remain the same.

Severity definition: 1 = High priority, notify immediately; 2 = Medium priority, notify within 1 hour; 3 = Low priority, notify within 24 hours.

Problem Description Method of Detection Recommended Action Monitoring Interval Severity Threshold
Database problems          
Database too fragmented EventLog ID 65 detected Use "edbutil" to defrag database (should be done by Exchange admins only) 15 min. 2 1 time every 3 months
Database in inconsistent state

(This message may also appear in the Directory or Information Store database, in the case of a power failure. This error usually means that the database is in an inconsistent state and cannot start.)

EventLog ID Error -550 has occurred Confirm the inconsistent state, and then try a defragmentation repair. Be sure to stop all services and backup all files before you manually run the Edbutil.exe program.

1. To check the state of the database, use Edbutil.exe with the "MH" option on the problem database and dump the output to a text file:
EDBUTIL /MH c:\exchsrvr\dsadata\dir.edb >c:\edbdump.txt

-OR-
EDBUTIL /MH c:\exchsrvr\mdbdata\priv.edb >c:\edbdump.txt

-OR-
EDBUTIL /MH c:\exchsrvr\mdbdata\pub.edb >c:\edbdump.txt

2. View the Edbdump.txt file and confirm that the state is inconsistent. If the database is in an inconsistent state and will not start due to a -550 error in the EventLog, restore the database from the online backup, replay the logs and restart the consistent database. If and only if the online backup is not available, follow step #3.

3. To repair the database, use the following Edbutil syntax:
EDBUTIL /R /DS

Use /ISPRIV or /ISPUB instead of /DS for repairing the private or public information stores. Because there is a difference between the Repair (/d [database]/r) database while defragmenting and Recovery (/r) option of EDBUTIL, do not run the EDBUTIL /D /R unless specifically directed by Microsoft PSS. Refer to Knowledge Base article Q143235 for information on running the Recovery option (edbutil /r).

15 min. 2 Recovery
Database reaching capacity EventLog ID 1112 detected or IS size reached 80% of logical disk capacity Normally logged after database has shutdown for reaching capacity, this requires that the server run edbutil /d to free space up. After completion of edbutil database, restart Information store. 20 min. 2 1
Database cache hit rate too low Monitor the database buffer cache hit ratio for the IS and DS database DS and IS buffers can be increased if there is sufficient RAM. If these fall below 95% frequently, this indicates the buffers are too low. To correct the problem, manually run perfwiz -v. 30 min. 1 Baseline
MTA problems          
MTA process is down Monitor the number of threads in use by the MTA. Restart MTA Service. If service fails to restart, restart ALL Exchange services in order. 10 min. 1 1
Directory Problems          
Directory updates failed EventLog ID 1171 detected - exception event Directory Service Problem usually followed by a 1214 Error in the Event log indicates a Server failing on a deletion or addition of a directory object.

Contact Microsoft PSS for troubleshooting

15 min. 2 1
Directory updates failed EventLog ID 1214 detected - KCC event Knowledge consistency checker fails to complete successfully. Indicates a corruption in the Directory schema that may affect more than (1) server in a site or Organizational Unit. Contact Microsoft PSS for troubleshooting 15 min. 2 1
Directory Services Pending Replications too high Monitor the number of pending replications in the DS Huge lag in Directory updates may indicate a problem with Network connectivity to other bridgehead servers and confirms that the ability to ping other Bridgehead servers that this server uses for directory replication still exists. This can also occur with servers in the same site. 30 min. 2 Baseline
Directory Services remaining replication updates not decreasing Monitor the number of objects being processed by the DS Indicates a directory problem with either the server failing to exchange directory replication messages due to Network issues or directory problems. Check Event logs on Server for more detail of problem. 30 min. 2 Baseline
Internal connection to IS failed Monitor number of logons to the IS. Check the status of the IS service. If service is up, but still get errors, then check that IP Stack is working properly on server. For example, ping 127.0.0.1 then run IPCONFIG, and ping Gateway of server. If the server has active mailboxes and there are zero connections then a problem exists. Use a test account to see if there is a problem making a connection to the server. 15 min. 2 Baseline
Overall Exchange problems          
Exchange services down Monitor the service control manager to detect status of services. Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 1
Windows NT          
Exchange process dead Monitor the CPU utilization of the Exchange processes Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 Baseline
Runaway Exchange process Monitor the CPU utilization of the Exchange processes Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary. 5 min. 1 Baseline
Paging too high Monitor the paging frequency of the operating systems (pagefile usage) Excessive paging requires an Upgrade of memory. If paging persists, treat as bug. 5 min. 2 Baseline
Low logical disk free space Monitor the logical disk space of the Exchange machines Delete unnecessary files to free up more disk space. Install new disk space if necessary. 15 min. 1 Baseline
Memory utilization too high Monitor the memory utilization of the Exchange processes Tune the memory utilization of Exchange. Add additional RAM if necessary. 15 min. 1 Baseline
Hardware / Network          
Compaq Insight Manager errors Monitor the internal temperature of server Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any critical IDE or SCSI disk failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor NIC failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any fan failures Check hardware for errors 10 min. 1 Baseline
Compaq Insight Manager errors Monitor any correctable memory errors Check hardware for errors 10 min. 1 Baseline
Network utilization high Monitor the total bytes per second processed by the network interface card. Check and/or tune performance of NIC card. 10 min. 3 Baseline
ICMP errors Monitor the receipt time for ICMP packets Check and/or tune performance of NIC card. 10 min. 3 Baseline
  Monitor the level of unreachable destinations Check and/or tune performance of NIC card. 10 min. 3 Baseline

Conclusion

With these requirements implemented at each of the tiers, Exchange downtime has been minimized, but most importantly, problem resolution time decreased significantly. By monitoring critical Exchange pieces, the time needed to determine and resolve the problem was reduced because the cause of the problem was pinpointed as soon as troubles occurred. This has ensured that the business critical application of e-mail remains up at all times and system or resource problems are discovered with enough time to react and fix the problem. The goal was to discover e-mail problems BEFORE the users did and prevent calls to the help desk.

Page 1 of 1


Comment and Contribute

Your name/nickname

Your email

(Maximum characters: 1200). You have characters left.


 

 


Thanks for your registration, follow us on our social networks to keep up-to-date