What should the threshold for maximum response time be set to?

From WebWatchBotWiki
Jump to: navigation, search

A Watch Item’s threshold for response time maximum is used to trigger an alarm which depends on different factors:

1. How fast should the Watch Item respond, e.g. what is an acceptable amount of time: 2 seconds? 5 seconds?

2. What is the average response time? A Watch Item's response time can fluctuate over time, but an average will be evident after a day of testing or when a hundred or so data points are collected.

3. What is the threshold at which someone should be notified? If a Watch Item responds after 10 seconds once a day, is it a cause for concern or just an anomaly?

Taking everything into account:

1. If a Watch Item should respond in 2 seconds (2000 ms), a recommended threshold is at least 2 times that amount – 4 seconds - to prevent false alarms. Setting the threshold to 2 seconds would cause an alarm to trigger frequently.

2. If the average response time is 3 seconds (3000 ms) then setting the threshold too low, say 4 seconds, may be too low because it does not take into account fluctuations during the day.

3. If someone should be notified as soon as a Watch Item responds slowly, then the threshold should be set to a number that reflects the urgency. If a Watch Item is monitoring a critical web page that cannot ever go down, setting the threshold to a low number will trigger more often, allowing the opportunity for more scrutiny.


SCENARIO 1: Home Page of corporate website. The site is critical for the company’s image and cannot be unavailable. The page should respond – best case scenario - in 1 second or less, averages a download time 1200 ms, responds quickly – 800 ms - in the morning and nighttime hours, slowly - 1500 ms - in the afternoon when traffic peaks, and moderately – 1100 ms - in the evening. The system administrator needs to be notified as soon as the site has an outage – does not respond at all.

RECOMMENDATION: Set the threshold for maximum response time to 5000 ms, and set the trigger for an alarm to 1 consecutive failure to ensure the sys admin is notified of all problems.

SCENARIO 2: Sub-page of a website that connects to a database and displays dynamic content. The page is not critical but important. The page should respond in 3 seconds, averages around 3 seconds and the response time does not fluctuate throughout the day. Someone should be notified if the page goes down for more than a few minutes.

RECOMMENDATION: Set the threshold for maximum response time to 10000 ms, and set the trigger for an alarm to 3 consecutive failures to ensure someone is notified that there is a problem.

SCENARIO 3: Login page for a company Intranet. Response time is not critical, but having the page available is a must. The page should load and respond in 5 seconds. The average response time is 10 seconds – it’s slow and the website programmers are looking into it – and fluctuate wildly.

RECOMMENDATION: Set the threshold for maximum response time to 50000 ms, and set the trigger for an alarm to 3 consecutive.