Blog de Dario Leonel May: Performance and Threshold Counters for Exchange Server 2010

Good morning!!
Here the most important performance counters and their thresholds is critical to establishing a performance baseline and monitoring plan to proactively monitor your Exchange 2010 environment and troubleshoot and resolve issues when they arise

back to Performance and Threshold Counters for Exchange Server 2010 - All in One

Active Database Copy IO Latency Requirements

When these values are exceeded the client experience will degrade (sluggish user experience, message delivery delays etc..)

Counters

Threshold

Troubleshooting

*MSExchange Database\I/O Database Reads (Attached) Average Latency*	The average value should be below 20 ms. Spikes (maximum values) should not be higher than 100 ms.
Indicates the average time (in milliseconds) to read from the database file.

*MSExchange Database\I/O Database Writes (Attached) Average Latency*	This counter is not a good indicator for client latency since database writes are async.	In general, however this latency should be less than the MSExchange Database\I/O Database Reads (Attached) Average Latency when battery-backed write caching is utilized.
Indicates the average time (in milliseconds) to write to the database file.

*Database\Database Page Fault Stalls/sec*	This counter should be zero on production servers.	If this counter is non-zero, it is an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high.
Indicates the rate of page faults that cannot be serviced because there are no pages available for allocation from the database cache.

Active Log IO Latency Requirements

When these values are exceeded the client experience will degrade (sluggish user experience, message delivery delays etc..)

*MSExchange Database\IO Log Writes Average Latency*	The average value should be below 10 ms. Spikes (maximum values) should not be higher then 50 ms.
Indicates the average time (in millisecond) to write a log buffer to the active log file.

*Database\Log Record Stalls/sec*	The average value should be below 10 per second. Spikes (maximum values) should not be higher than 100 per second.
Indicates the number of log records that cannot be added to the log buffers per second because the log buffers are full.

*Database\Log Threads Waiting*	The average value should be less than 10 threads waiting.
Indicates the number of threads waiting to complete an update of the database by writing their data to the log.	The average value should be less than 10 threads waiting.

Passive Database Copy IO Latency Requirements

When these values are exceeded the database copy may fall behind by not replaying logs in to the passive database copy fast enough. Log replication performance may also be impacted.

*MSExchange Database\I/O Database Reads (Recovery) Average Latency*	The average value should be below 200 ms. Spikes (maximum values) should not be higher than 1000 ms.
Indicates the average time (in milliseconds) to read from the database file.
*MSExchange Database\I/O Database Writes (Recovery) Average Latency*	In general, however this latency should be less than the MSExchange Database\I/O Database Reads (Attached) Average Latency when battery-backed write caching is utilized.
Indicates the average time (in milliseconds) to write to the database file.

*Database\Database Page Fault Stalls/sec*	This counter should be zero on production servers.	If this counter is non-zero, it is an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high.
Indicates the rate of page faults that cannot be serviced because there are no pages available for allocation from the database cache.

Replay Log IO Latency Requirements
When these values are exceeded the database copy may fall behind by not replaying logs in to the passive database copy fast enough. Log replication performance may also be impacted.

*MSExchange Database\IO Log Read Average Latency*	The average value should be below 200 ms. Spikes (maximum values) should not be higher than 1000 ms.
Indicates the average time (in millisecond) to read data from a log file. Specific to log replay and database recovery operations.

Information Store RPC Processing Counters

*MSExchangeIS\RPC Requests*	Should be below 70 at all times.	The maximum value is 500 RPC requests that can execute at any designated time before the information store starts rejecting any new connections from clients.
Indicates the overall RPC requests that are currently executing within the information store process.	Should be below 70 at all times.

*MSExchangeIS\RPC Averaged Latency*	Should not be higher than 10 ms on average.	To determine if certain protocols are causing overall RPC latencies, monitor MSExchangeIS Client (*)\RPC Average Latency to separate latencies based on client protocol.
Indicates the RPC latency, in milliseconds, averaged for all operations in the last 1,024 packets.
For information about how clients are affected when overall server RPC averaged latencies increase, see RPC Client Throttling.


*MSExchangeIS Mailbox\RPC Averaged Latency*	Should not be higher than 10 ms on average.
Indicates the RPC latency, in milliseconds, averaged for all operations in the last 1,024 packets.
For information about how clients are affected when overall server RPC averaged latencies increase, see RPC Client Throttling.

**MSExchangeIS Client ()\RPC Average Latency***	Should be less than 10 ms on average.	Wide disparities between different client types, such as IMAP4, Outlook Anywhere, or Other Clients (MAPI), can help direct troubleshooting to appropriate subcomponents.
Shows a server RPC latency, in milliseconds, averaged for the past 1,024 packets for a particular client protocol.	Should be less than 10 ms on average.

RPC Client Throttling Counters

*MSExchangeIS\Client: RPCs Failed:Server Too Busy/sec*	Should be 0 at all times.	Higher values may indicate RPC threads are exhausted or client throttling is occurring for clients running versions of Outlook earlier than Microsoft Office Outlook 2007.
Shows the client-reported rate of failed RPCs (since the store was started) due to the Server Too Busy ROC error.	Should be 0 at all times.

*MSExchangeIS\Client: RPCs Failed:Server Too Busy*	Should be 0 at all times.
The client-reported number of failed RPCs (since the store was started) due to the Server Too Busy ROC error.	Should be 0 at all times.

Message Queuing Counters

Database Counters

**MSExchange Database ==> Instances()\Log Generation Checkpoint Depth***	Should be below 500 at all times for the Mailbox server role. A healthy server should indicate between 20 and 30 for each database instance.	If checkpoint depth increases continually for a sustained period, this is an indicator of either a long-running transaction (which will impact the version store) or of a bottleneck involving the database disks.
Represents the amount of work in the log file count that will need to be redone or undone to the database files if the process fails.

*MSExchange Database(Information Store)\Database Page Fault Stalls/sec*	This should be 0 at all times.
Shows the rate that database file page requests require of the database cache manager to allocate a new page from the database cache.	If this value is non-zero, this indicates that the database is not able to flush dirty pages to the database file fast enough to make pages free for new page allocations.

*MSExchange Database(Information Store)\Log Record Stalls/sec*	The average value should be below 10 per second. Spikes (maximum values) should not be higher than 100 per second.	If I/O log write latencies are high, check for RAID5 or sync replication on log devices.
Shows the number of log records that cannot be added to the log buffers per second because the log buffers are full. If this counter is non-zero most of the time, the log buffer size may be a bottleneck.


*MSExchange Database(Information Store)\Log Threads Waiting*	Should be less than 10 on average.	Regular spikes concurrent with log record stall spikes indicate that the transaction log disks are a bottleneck.. If the value for log threads waiting is more than the spindles available for the logs, there is a bottleneck on the log disks.
Shows the number of threads waiting for their data to be written to the log to complete an update of the database. If this number is too high, the log may be a bottleneck.	Should be less than 10 on average.


*MSExchange Database(Information Store)\Version buckets allocated*	Should be less than 12,000 at all times.	The maximum default version is 16,384. If version buckets reach 70 percent of maximum, the server is at risk of running out of the version store.
Shows the total number of version buckets allocated.	Should be less than 12,000 at all times.

**MSExchange Database Instances()\I/O Database Reads Average Latency***	Should be 20 ms on average. Should show 50 ms spikes.
Shows the average length of time, in milliseconds, per database read operation.	Should be 20 ms on average. Should show 50 ms spikes.

**MSExchange Database Instances()\I/O Database Writes Average Latency***	Should be 50 ms on average.	Spikes of up to 100 ms are acceptable if not accompanied by database page fault stalls.
Shows the average length of time, in milliseconds, per database write operation.	Should be 50 ms on average.

*MSExchange Database(Information Store)\Database Cache Size (MB)*	Maximum value is RAM-2GB (RAM-3GB for servers with sync replication enabled). This and Database Cache Hit % are extremely useful counters for gauging whether a server's performance problems might be resolved by adding more physical memory.	Use this counter along with store private bytes to determine if there are store memory leaks. If the database cache size seems too small for optimal performance and there is little available memory on the system (check the value of Memory/Available Bytes), adding more memory to the system may increase performance. If there is ample memory on the system and the database cache size is not growing beyond a certain point, the database cache size may be capped at an artificially low limit. Increasing this limit may increase performance.
Shows the amount of system memory, in megabytes, used by the database cache manager to hold commonly used information from the database files to prevent file operations.

*MSExchange Database(Information Store)\Database Cache % Hit*	Should be over 90% for companies with majority online mode clients. Should be over 99% for companies with majority cached mode clients.	If the hit ratio is less than these numbers, the database cache may be insufficient.
Shows the percentage of database file page requests that were fulfilled by the database cache without causing a file operation. If this percentage is too low, the database cache size may be too small.


*MSExchange Database\Log Bytes Write/sec*	Should be less than 10,000,000 at all times.	With each log file being 1,000,000 bytes in size, 10,000,000 bytes/sec would yield 10 logs/sec. This may indicate a large message being sent or a looping message.
Shows the rate bytes are written to the log.	Should be less than 10,000,000 at all times.

Client-Related Search Counters

**MSExchangeIS Mailbox()\Slow Findrow Rate***	Should be no more than 10 for any specific mailbox store.	Higher values indicate applications are crawling or searching mailboxes, which is affecting server performance. These include desktop search engines, customer relationship management (CRM), or other third-party applications.
Shows the rate at which the slower FindRow needs to be used in the mailbox store.	Should be no more than 10 for any specific mailbox store.

**MSExchangeIS Mailbox()\Search Task Rate***	Should be less than 10 at all times.
Shows the number of search tasks created per second.	Should be less than 10 at all times.

*MSExchangeIS\Slow QP Threads*	Should be less than 10 at all times.
Shows the number of query processor threads currently running queries that are not optimized.	Should be less than 10 at all times.

*MSExchangeIS\Slow Search Threads*	Should be less than 10 at all times.
Shows the number of search threads currently running queries that are not optimized.	Should be less than 10 at all times.

Content Indexing Counters

*Process(Microsoft.Exchange.Search.ExSearch)\% Processor time*	Should be less than 1% of overall CPU typically and not sustained above 5%. Should be less than 10% of what the store process is during steady state.
Shows the amount of processor time that is currently being consumed by the Exchange Search service.

**Process(msftefd)\%Processor Time***	Full crawls will increase overall processing time, but should never exceed overall store CPU capacity. Check throttling counters to determine if throttling is occurring due to server performance bottlenecks.
Shows the amount of processor time that is being consumed to update content indexing within the store process.

**MSExchange Search Indices()\Recent Average Latency of RPCs Used to Obtain Content***	Should coincide with the latencies that Outlook clients are experiencing.
Shows the average latency, in milliseconds, of the most recent RPCs to the Microsoft Exchange Information Store service. These RPCs are used to get content for the filter daemon for the specified database.

**MSExchange Search Indices()\ Average Document Indexing Time***	Should be less than 30 seconds at all time.
Shows the average, in milliseconds, of how long it takes to index documents.	Should be less than 30 seconds at all time.

**MSExchange Search Indices()\Full Crawl Mode Status***	Indicates whether this .mdb file is going through a full crawl (value=1) or not (value=0).	If CPU resources are high, it is possible content indexing is occurring for a database or set of databases.
Used to determine if a full crawl is occurring for any specified database.

Mailbox Assistant Counters

*Process(MSExchangeMailboxAssistants)\%Processor Time*	Should be less than 5% of overall CPU capacity.
Shows the amount of processor time that is being consumed by mailbox assistants.	Should be less than 5% of overall CPU capacity.

**MSExchange Assistants()\Events in queue***	Should be a low value at all times. High values may indicate a performance bottleneck.
Shows the number of events in the in-memory queue waiting to be processed by the assistants.

**MSExchange Assistants()\Average Event Processing Time in Seconds***	Should be less than 2 at all times.
Shows the average processing time of the events chosen.	Should be less than 2 at all times.

Resource Booking Counters

*MSExchange Resource Booking\Average ResourceBooking Processing Time*	Should be a low value at all times. High values may indicate a performance bottleneck.
Shows the average time to process an event in the Resource Booking Attendant.

*MSExchange Resource Booking\Requests Failed*	Should be 0 at all times.
Shows the total number of failures that occurred while the Resource Booking Attendant was processing events.	Should be 0 at all times.

Calendar Attendant Counters

*MSExchange Calendar Attendant\Average Calendar Attendant Processing time*	Should be a low value at all times. High values may indicate a performance bottleneck.
Shows the average time to process an event in the Calendar Attendant.

*MSExchange Calendar Attendant\Requests Failed*	Should be 0 at all times.
Shows the total number of failures that occurred while the Calendar Attendant was processing events.	Should be 0 at all times.

Store Client Request Counters

*MSExchange Store Interface(_Total)\RPC Latency average (msec)*	Should be less than 100 ms at all times.
Shows the average latency, in milliseconds, of RPC requests. The average is calculated over all RPCs since exrpc32 was loaded.	Should be less than 100 ms at all times.

*MSExchange Store Interface(_Total)\RPC Requests outstanding*	Should be 0 at all times.
Shows the current number of outstanding RPC requests.	Should be 0 at all times.

**MSExchange Store Interface()\RPC Requests failed (%)***	Should be 0 at all times.
Shows the percentage of failed requests in the total number of RPC requests. Here, failed means the sum of failed with error code plus failed with exception.	Should be 0 at all times.

**MSExchange Store Interface()\RPC Slow Requests (%)***	Should be less than 1 at all times.
Shows the percentage of slow RPC requests among all RPC requests.
A slow RPC request is one that has taken more than 500 ms.

**MSExchangeMailSubmission()\Hub Servers In Retry***	Should be 0 at all times.
Shows the number of Hub Transport servers in retry mode.	Should be 0 at all times.

**MSExchangeMailSubmission()\Failed Submissions Per Second***	Should be 0 at all times.
	Should be 0 at all times.

**MSExchangeMailSubmission()\Temporary Submission Failures/sec***	Should be 0 at all times.
Shows the number of temporary submission failures per second.	Should be 0 at all times.

cal Continuous Replication, Cluster Continuous Replication, and Standby Continuous Replication Counters

-Dario

Blog de Dario Leonel May

Tuesday, March 22, 2011

Performance and Threshold Counters for Exchange Server 2010 - Mailbox Server

No comments:

Post a Comment

Total Pageviews

About Me