How-to monitor 99.99% uptime for a Unix box?

Posted: April 21st, 2008 | Author: TnT Admin | Filed under: How-Tos | Tags: , | No Comments »

Actually, this was a question related to Solaris boxes instead of Unix brought up in the Yahoo LoadRunner Group. We have to differ the purpose of the monitoring to two aspects.  (a) Are you monitoring during a load test or (b) are you monitoring in a production environment?

In respond to the questions, I’ve came out with three suggestions:

    1. Use a single Vuser to monitor the Unix System Resource for the duration.
    2. Use BAC (Business Availabilty Center) to monitor.
    3. Use “uptime” command in the Unix box and output to a log file periodically.

Let’s discussed in details each of the methods details.

[1] Use a single Vuser to monitor the Unix system resource for the duration.

This is a method to circumvent the conventional monitoring technique. You will get your graphs from Unix, however, take note that this is not all perfect. When LoadRunner completes a scenario execution (load test), it collates the results from the Load Generators (LG in short) back to the Controller at the end of the test. Say we are monitoring for 7 days, the monitoring results of these 7 days will be sent back to the Controller which may fail due to (1) the size of data being transmitted over the bandwidth and (2) the amount of resources needed to process the monitoring results sent back, inadvertently causing it to crash or hang.

[2] Use BAC to monitor

BAC and SiteScope, both from Mercury/HP will be able to achieve that. The main purpose of the above products is to monitor in a production environment and send reports or alerts depending on the configuration. There is also other monitoring products out in the market but it’s up the organization to decide what is best for them.

Take note that the monitoring principle for LoadRunner and BAC/SiteScope is the same (BAC is tapping to capabilities of SiteScope monitoring). I’ve discussed that in an article,“How does the monitoring work in LoadRunner?” previously, which you may want to explore.

[3] Use “uptime” command in the Unix box and output to a log file periodically (Unix only)

This is by far the cheapest method as you are riding on the capabilities of your current resources. Uptime provides information about the system availability since it’s last boot up. For more information of uptime, click here. Also, refer to this resource for redirection output information.

By running a cron job periodically, example every 30 min, and redirect the output to a log file, you can trace when the system was rebooted. For more information of cron job, click here. Use the following syntax in the crontab file if you are keen implementing this suggestion.

    Example:
    30 18 * * * uptime >> uptime.log

Related Posts



Leave a Reply