<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>LoadRunner TnT &#187; Analyze</title>
	<atom:link href="http://www.loadrunnertnt.com/category/analyze/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.loadrunnertnt.com</link>
	<description>Performance Testing, LoadRunner Tips &#38; Tricks</description>
	<lastBuildDate>Mon, 08 Mar 2010 07:57:02 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Find Offending SQL Bottlenecks!</title>
		<link>http://www.loadrunnertnt.com/analyze/find-offending-sql-bottlenecks/</link>
		<comments>http://www.loadrunnertnt.com/analyze/find-offending-sql-bottlenecks/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 07:54:30 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[MS SQL]]></category>
		<category><![CDATA[SQL Statements]]></category>
		<category><![CDATA[Stored Procedures]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=650</guid>
		<description><![CDATA[After the tune-able parameters are changed for optimal performance, your system still fails miserably with a poor response time.  The most likely step you should take is to do a deep diagnostics on the system.  Break the system up into different components such as application server where diagnostics using probes is required, database servers where [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.loadrunnertnt.com/wp-content/uploads/2010/01/ms_sql_logo1.gif"><img class="alignleft size-full wp-image-674" title="ms_sql_logo" src="http://www.loadrunnertnt.com/wp-content/uploads/2010/01/ms_sql_logo1.gif" alt="" width="178" height="87" /></a>After the tune-able parameters are changed for optimal performance, your system still fails miserably with a poor response time.  The most likely step you should take is to do a deep diagnostics on the system.  Break the system up into different components such as application server where diagnostics using probes is required, database servers where SQL statements and stored procedures become the next to be scrutinized, etc…<span id="more-650"></span></p>
<p>In this post, we will not be covering on the diagnostics of the application server, but we will cover the basic signs of database bottlenecks at the <strong>SQL statement</strong> level.  The 1<sup>st</sup> symptom is a collective symptom that the application server (and all other servers) with the exception of the database server in the architecture are performing reasonably well.  E.g. memory available is well maintained without any exception decrease of this value, the processor % time is always low for the servers.<br />
<!--adsensestart--><br />
The 2<sup>nd</sup> symptom is a constant high utilization of the processor % time and low memory usage (page faults, page reads and available memory) in the database server.  This provides a tell-tale sign that the database server is processing some instructions, likely to be SQL statement that is taking too long.  It would not be the database buffer or SQL buffer that utilizes the memory as the consumption is low meaning reuse of the buffer is high.</p>
<p>Now with these two symptoms, you are ready to go down deeper.  Access MS SQL and run the following query:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333; font-weight: bold;">SELECT</span> scheduler_id<span style="color: #66cc66;">,</span> current_tasks_count<span style="color: #66cc66;">,</span> runnable_tasks_count<br />
<span style="color: #993333; font-weight: bold;">FROM</span> sys<span style="color: #66cc66;">.</span>dm_os_schedulers <span style="color: #993333; font-weight: bold;">WHERE</span> scheduler_id <span style="color: #66cc66;">&lt;</span> <span style="color: #cc66cc;">255</span></div></div>
<p>This query will <strong>sys.dm_os_schedulers</strong> will give you information of how many runnable tasks that exists.  Values other than zero indicate that tasks are waiting to run, and high values are an indication that the CPU is bottlenecking your performance.  Another query that directly targets at SQL statement is the following:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333; font-weight: bold;">SELECT</span> TOP 50 SUM<span style="color: #66cc66;">&#40;</span>qs<span style="color: #66cc66;">.</span>total_worker_time<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> total_cpu_time<span style="color: #66cc66;">,</span><br />
<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">&#91;</span>TEXT<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">FROM</span> sys<span style="color: #66cc66;">.</span>dm_exec_sql_text<span style="color: #66cc66;">&#40;</span>plan_handle<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span><br />
SUM<span style="color: #66cc66;">&#40;</span>qs<span style="color: #66cc66;">.</span>execution_count<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> total_execution_count<span style="color: #66cc66;">,</span><br />
COUNT<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> number_of_statements<span style="color: #66cc66;">,</span> qs<span style="color: #66cc66;">.</span>plan_handle<br />
<span style="color: #993333; font-weight: bold;">FROM</span> sys<span style="color: #66cc66;">.</span>dm_exec_query_stats qs<br />
<span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> qs<span style="color: #66cc66;">.</span>plan_handle<br />
<span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> SUM<span style="color: #66cc66;">&#40;</span>qs<span style="color: #66cc66;">.</span>total_worker_time<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">DESC</span></div></div>
<p>This query shows which batches or procedures are consuming the most CPU and will also include the actual SQL statement. The query aggregates by a specific plan handle. If the plan handle contains more than one SQL statement you must drill into each statement to determine where the greatest CPU contribution comes from.</p>
<p>Codes illustrated in this post are taken from <a href="http://sql.dotnetbob.com/?p=96">DotNet Bob on SQL</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/find-offending-sql-bottlenecks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyzing Oracle SGA Large Pool</title>
		<link>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-large-pool/</link>
		<comments>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-large-pool/#comments</comments>
		<pubDate>Tue, 04 Nov 2008 06:34:19 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=315</guid>
		<description><![CDATA[
Oracle provides the ability to create a optional area in the SGA (System Global Area), called the Large Pool to provide large memory allocations for the following. By allocating session memory from the large pool for shared server, Oracle XA, or parallel query buffers, Oracle can use the shared pool primarily for caching shared SQL [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Oracle" src="http://loadrunnertnt.com/images/oracle_logo.jpg" alt="" width="221" height="76" /></p>
<p>Oracle provides the ability to create a optional area in the SGA (System Global Area), called the <strong>Large Pool</strong> to provide large memory allocations for the following. By allocating session memory from the large pool for shared server, Oracle XA, or parallel query buffers, Oracle can use the shared pool primarily for caching shared SQL and avoid the performance overhead caused by shrinking the shared SQL cache.<span id="more-315"></span></p>
<p>Some of the uses of <strong>Large Pool</strong> are listed below:</p>
<ul>
<li>Session memory for the shared server and the Oracle XA interface</li>
<li>I/O server processes (DBW0)</li>
<li>Oracle backup and restore operations</li>
<li>Parallel execution message buffers, if the initialization parameter PARALLEL_AUTOMATIC_TUNING is set to true (otherwise, these buffers are allocated to the shared pool)</li>
</ul>
<p><a title="Oracle Memory Architecture" href="http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96524/c08memor.htm" target="_blank">(Source: Oracle Memory Architecture)</a></p>
<p>The counters that we should be watching are as followed:</p>
<ul>
<li>V$SGASTAT: Large Pool bytes</li>
</ul>
<p>What we want to see of the counters are:</p>
<ul>
<li>Watch the value for free memory when the results returned. If high or increasing values for free memory, you have over allocated space to the Large Pool. If low or decreasing values for free memory, you may need to consider increasing the Large Pool size.</li>
</ul>
<p>The following queries will be used to determine the hit-ratio:</p>
<ul>
<li>SELECT name, bytes FROM V$SGASTAT WHERE pool =&#8217;large pool&#8217;;</li>
</ul>
<p><strong>Real world scenario</strong></p>
<p>When using the Recovery Manager feature in Oracle, RMAN will buffer its I/O activity in Shared Pool. If database activity is high during the time that the RMAN backup is running, &#8220;ORA-04031 unable to allocate x bytes of shared memory&#8221; errors can result. This error indicates that a large piece of contiguous memory was requested in the Shared Pool, but could not be obtained. If this request is coming from RMAN, errors in the backup and poor application performance can result.  When the Large Pool is present, RMAN will use the Large Pool to buffer this I/O instead of the Shared Pool. This will result in not only better RMAN performance but also in improved Shared Pool hit-ratios.</p>
<p><a title="OCP Oracle 9i Performance Tuning Study Guide" href="http://www.amazon.com/gp/product/0782140653?ie=UTF8&amp;tag=pertesloatipt-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0782140653" target="_blank">(Source: OCP Oracle 9i Performance Tuning Study Guide)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-large-pool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyzing Oracle SGA Java Pool</title>
		<link>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-java-pool/</link>
		<comments>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-java-pool/#comments</comments>
		<pubDate>Tue, 04 Nov 2008 06:31:37 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=313</guid>
		<description><![CDATA[
Oracle includes several application program interfaces and Java class libraries in order to facilitate the interaction of Java-based applications with Oracle databases.  Oracle also allows you to dedicate a portion of the SGA, called Java Pool, as the location where session-specified Java code and application variables reside during program execution.
Similarly, like the Large Pool, the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Oracle" src="http://loadrunnertnt.com/images/oracle_logo.jpg" alt="" width="221" height="76" /></p>
<p>Oracle includes several application program interfaces and Java class libraries in order to facilitate the interaction of Java-based applications with Oracle databases.  Oracle also allows you to dedicate a portion of the SGA, called <strong>Java Pool</strong>, as the location where session-specified Java code and application variables reside during program execution.<span id="more-313"></span></p>
<p>Similarly, like the <strong>Large Pool</strong>, the V$SGASTAT is used to determined the amount of memory allocated to <strong>Java Pool</strong>.  The counters that we should be watching are as followed:</p>
<ul>
<li>V$SGASTAT: Java Pool bytes</li>
</ul>
<p>What we want to see of the counters are:</p>
<ul>
<li>The value for <em>free memory</em> can be used to help tune the size of the Java Pool.  If V$SGASTAT shows high or increasing values for <em>free memory</em> when compared to <em>memory in use</em>, you have probably over allocated space to the Java Pool.  If you have low or decreasing values for<em> free memory</em>, you may need to consider increasing the Java Pool size.</li>
</ul>
<p>The following queries will be used to determine the hit-ratio:</p>
<ul>
<li>SELECT name, bytes FROM V$SGASTAT WHERE pool = &#8216;java pool&#8217;;</li>
</ul>
<p><span><span><a title="OCP Oracle 9i Performance Tuning Study Guide" href="http://www.amazon.com/gp/product/0782140653?ie=UTF8&amp;tag=pertesloatipt-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0782140653" target="_blank"><span style="color: #0066cc;">(Source: OCP Oracle 9i Performance Tuning Study Guide)</span></a></span></span></p>
<p>For Oracle 10g, Oracle recommends the total required Java pool memory to range between 10 and 50MB for Dedicated Servers.  While for Shared Servers, this could as large as more than 100MB.</p>
<p><a title="Oracle 10 Database Java Application Performance" href="http://www.stanford.edu/dept/itss/docs/oracle/10g/java.101/b12021/perf.htm#i1006240" target="_blank">(Source: Oracle® Database Java Developer&#8217;s Guide 10g Release 1 (10.1),10 Oracle Database Java Application Performance) </a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-java-pool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyzing Oracle SGA Database Buffer Cache</title>
		<link>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-database-buffer-cache/</link>
		<comments>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-database-buffer-cache/#comments</comments>
		<pubDate>Wed, 08 Oct 2008 06:36:37 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=318</guid>
		<description><![CDATA[Like the Shared Pool (Library Cache and Data Dictionary Cache), the performance of the Database Buffer Cache is determine with the cache hit-ratio.  Cache hits occur whenever a user process finds that a data buffer needed by their SQL statement is already cached in memory;  Consequently, cache misses occur when the user process does not [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Oracle" src="http://loadrunnertnt.com/images/oracle_logo.jpg" alt="" width="221" height="76" />Like the <strong>Shared Pool</strong> (Library Cache and Data Dictionary Cache), the performance of the <strong>Database Buffer Cache</strong> is determine with the cache hit-ratio.  Cache hits occur whenever a user process finds that a data buffer needed by their SQL statement is already cached in memory;  Consequently, cache misses occur when the user process does not find the requested data to already be cached in memory &#8211; causing the data to be read form disk instead.  High cache hit ratios indicate that your application users are frequently finding that the data buffers they need are already in memory thus reducing the need (and delay) to read from disk (remember that disk reads are much slower than memory).</p>
<p><span id="more-318"></span>On the other hand, there are also non-hit ratio measures of <strong>Database Buffer Cache</strong> effectiveness such as Free Buffer Inspected Waits, Buffer Busy Wait events and Free Buffer Wait events.  To find out about the hit-ratio for the Database Buffer Cache, we can use V$SYSSTAT, V$SYSTEM_EVENT and STATSPACK.</p>
<p>The counters that we should be watching are as followed:</p>
<ul>
<li>V$SYSSTAT: Buffer Cache Hit Ratio</li>
<li>V$SYSSTAT &amp; V$SYSTEM_EVENT: Free Buffer Inspected</li>
<li>STATSPACK &gt; Instance Efficiency Percentages (Target 100%): Buffer Hit %</li>
<li>STATSPACK &gt; Instance Activity Stats for DB: &lt;instance&gt;: Free Buffer Inspected</li>
<li>STATSPACK &gt; Buffer Pool Statistics for DB: &lt;instance&gt;: Buffer Busy Wait</li>
</ul>
<p>What we want to see of the counters are:</p>
<ul>
<li>Buffer Cache Hit Ratio should be higher than 90% for OLTP system according to Oracle.</li>
</ul>
<p>The following queries will be used to determine the hit-ratio:</p>
<ul>
<li>SELECT 1 &#8211; ((physical.value &#8211; direct.value -lobs.value) / logical.value) &#8220;Buffer Cache Hit Ratio&#8221; FROM V$SYSSTAT physical, V$SYSSTAT direct, V$SYSSTAT lobs, V$SYSSTAT logical WHERE physical.name = &#8216;physical reads&#8217; AND direct.name = &#8216;physical reads direct&#8217; AND lobs.name = &#8216;physical reads direct (lob)&#8217; AND logical.name = &#8217;session logical reads&#8217;;</li>
<li>SELECT name, value FROM V$SYSSTAT WHERE name IN (&#8216;free buffer inspected&#8217;) UNION SELECT event, total_waits FROM V$SYSTEM_EVENT<br />
WHERE event in (&#8216;free buffer waits&#8217;, &#8216;buffer busy waits&#8217;);</li>
</ul>
<p>To know more about the counters, read on.</p>
<p><span style="text-decoration: underline;"><strong>[1] V$SYSSTAT</strong></span></p>
<p>Out of the 200 different available statistics, the performance of <strong>Database Buffer Cache</strong> for hit-ratio is calculated and based on four statistics, namely, Physical Reads, Physical Reads Direct, Physical Reads Direct [LOB] and Session Logical Reads.   Based on the four statistics, we are trying to calculate the physical reads made unintentionally (or on normal circumstances) against the the total of hits made on the cache represented by Session Logical Reads.  The general formula for Database Buffer Cache is as followed:</p>
<p>1 &#8211; ((physical reads &#8211; physical reads direct &#8211; physical reads direct [LOB]) / session logical reads)</p>
<p>The syntax of the SQL is as followed:</p>
<p>SELECT 1 &#8211; ((physical.value &#8211; direct.value &#8211; lobs.value) / logical.value) &#8220;Buffer Cache Hit Ratio&#8221; FROM V$SYSSTAT physical, V$SYSSTAT direct, V$SYSSTAT lobs, V$SYSSTAT logical WHERE physical.name = &#8216;physical reads&#8217; AND direct.name = &#8216;physical reads direct&#8217; AND lobs.name = &#8216;physical reads direct (lob)&#8217; AND logical.name = &#8217;session logical reads&#8217;;</p>
<p><em>Physical Reads; </em>This indicates the number of data blocks (i.e. tables, indexes, and rollback segments) read from disk into the Buffer Cache since instance startup.</p>
<p><em>Physical Reads Direct; </em>This indicates the number of reads that bypassed the Buffer Cache because the data blocks were read directly from disk instead.  Because direct physical reads are done intentionally by Oracle when using certain features like Parallel Query, these reads are subtracted from the Physical Reads value when the Buffer Cache hit ratio is calculated.  Otherwise, including these direct reads in the Buffer Cache hit ratio calculation would result in an artificially low hit ratio.</p>
<p><em>Physical Reads Direct (LOB);</em> This indicates the number of reads that bypassed the Buffer Cache because the data blocks were associated with a Large Object (LOB) data type.</p>
<p>Because direct physical reads are done intentionally by Oracle when accessing segments that contain LOB datatypes, these reads are also subtracted from the Physical Reads value when the Buffer Cache hit-ratio is calculated.  Including these direct reads in the Buffer Cache hit ratio calculation would also result in an artificially low hit ratio.</p>
<p><em>Session Logical Reads; </em>This indicates the number of times a request for a data block was satisfied by using a buffer that was already cached in the Database Buffer cache.  For read consistency, some of these buffers may have contained data from rollback segments.</p>
<p>Hit-ratios are one effective measure of <strong>Database Buffer Cache</strong> performance.  However, there are also ways to measure performance of Database Buffer Cache through non-hit ratio. TIP: Because these statistics can help guide you towards the true tuning trouble-spot, using them as a starting point of Buffer Cache tuning is often more effective than calculating Buffer Cache hit ratios and blindly increasing the size of the Buffer Cache.  The V$SYSSTAT and V$SYSTEM_EVENT can be used to monitor these secondary indicators of Buffer Cache Performance.  The V$SYSSTAT contains statistics of the Free Buffer Inspected statistics and the V$SYSTEM_EVENT contains statistics on the Free Buffer Waits and Buffer Busy Waits.  All three of these can be combined into one query using the SQL below:</p>
<p>SELECT name, value FROM V$SYSSTAT WHERE name IN (&#8216;free buffer inspected&#8217;) UNION SELECT event, total_waits FROM V$SYSTEM_EVENT<br />
WHERE event in (&#8216;free buffer waits&#8217;, &#8216;buffer busy waits&#8217;);</p>
<p><em>Free Buffer Inspected; </em>Number of Buffer Cache buffers inspected by user Server Process before finding a buffer.  A closely related statistic is dirty buffer inspected, which represents the total number of dirty buffers a user process found while trying to find a free buffer.</p>
<p><em>Free Buffer Waits; </em>Number of waits experienced by user Server Processes during Free Buffer Inspected activity,  These waits occur whenever the Server Process had to wait for Database Writer to write a dirty buffer to disk.</p>
<p><em>Buffer Busy Waits; </em>Number of times user Server Processes waited for a free buffer to become available.  These waits occur whenever a buffer requested by user Server Processes is already in memory, but is in use by another process.  These waits can occur for rollback segment buffers as well as data and index buffers.</p>
<p>The SQL statement queries the state of the statistics and it is recommend to record a benchmark over a period of time before proceeding with a load test.  High or steadily increasing values for any of these statistics indicate that user Server Processes are spending too much time searching for, and wait for access to, free buffers in the Database Buffer Cache.</p>
<p><span style="text-decoration: underline;"><strong>[2] STATSPACK</strong></span></p>
<p>Unlike the V$SYSSTAT and V$SYSTEM_EVENT, STATSPACK does not require calculation of the statistics as it has been calculated before display.  In the Instance Efficiency Percentage section, look at &#8220;Buffer Hit %&#8221; for the Database Buffer Cache hit-ratio.  For non-hit ratio, look at Instance Activity Stats for DB: &lt;instance&gt;: Free Buffer Inspected for Free Buffer Statistics and Buffer Pool Statistics for DB: &lt;instance&gt;: Buffer Busy Wait provides statistics for Number of times user Server Processes waited for a free buffer to become available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-database-buffer-cache/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyzing Oracle SGA Shared Pool (Library Cache)</title>
		<link>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-shared-pool-library-cache/</link>
		<comments>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-shared-pool-library-cache/#comments</comments>
		<pubDate>Sun, 28 Sep 2008 08:05:59 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=345</guid>
		<description><![CDATA[Starting with the Oracle System Global Area, we are going to touch on the basics of Shared Pool &#8211; Library Cache followed by Data Dictionary Cache (which will be in another article). The Shared Pools&#8217; Library Cache is the area Oracle caches the SQL and PL/SQL statements that have been recently issued by application users. [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Oracle" src="http://loadrunnertnt.com/images/oracle_logo.jpg" alt="" width="221" height="76" />Starting with the <strong>Oracle System Global Area</strong>, we are going to touch on the basics of Shared Pool &#8211; <strong>Library Cache</strong> followed by <strong>Data Dictionary Cache </strong>(which will be in another article). The Shared Pools&#8217; Library Cache is the area Oracle caches the SQL and PL/SQL statements that have been recently issued by application users. PL/SQL statements can be in the form of procedures, functions, packages, triggers, anonymous PL/SQL blocks, or Java classes.<span id="more-345"></span></p>
<p>The primary indicator of the performance of the Shared Pool (both Library Cache and Data Dictionary Cache) is the <strong>Cache Hit Ratio</strong>. High cache-hit ratios indicate that the application users are frequently finding the SQL and PL/SQL statements they are issuing already in memory. Oracle recommends maximizing the Library Cache before the <strong>Database Buffer Cache</strong> because the Library Cache has a greater impact on performance thanan under performing cache hit ratio for the Database Buffer Cache. The performance of the Library Cache is measure by calculating its hit ratio. Through this article, we wil be showing how to use three tools, namely, <strong>V$LIBRARYCACHE</strong>, <strong>STATSPACK</strong> and <strong>OEM Performance Manager</strong> to analyze the hit ratio.</p>
<p>The following will be the counters associating with Library Cache.</p>
<ul>
<li>V$LIBRARYCACHE: GETHITRATIO column</li>
<li>V$LIBRARYCACHE: GETPINRATIO column</li>
<li>V$LIBRARYCACHE: INVALIDATION column</li>
<li>V$LIBRARYCACHE: RELOAD ratio</li>
<li>STATSPACK &gt; Instance Efficiency Percentages (Target 100%): Library Hit %</li>
<li>STATSPACK &gt; Library Cache Activity: Invalidations: Pct Miss</li>
<li>STATSPACK &gt; Library Cache Activity: Reloads</li>
<li>OEM  &gt; Performance Manager &gt; Database Instance Category &gt; Library Cache Hit %</li>
</ul>
<p>The actual threshold for the Library Cache is as followed:</p>
<ul>
<li>The higher the GETHITRATIO, the better is the application is performing. According to Oracle, well-tuned OLTP systems can expect to have GETHITRATIO of 90% or higher for the SQL Area portion of the Library Cache. DSS and data warehouse applications often have lower hit ratios because of the ad hoc nature of the queries against them.</li>
<li>Oracle recommends that well-tuned OLTP systems strive for PINHITRATIO that exceeds 90% for the SQL Area portion. DSS and data warehouse applications are generally lower.</li>
<li>Low reload ratio for PINS is good. Oracle considers well-tuned systems to be those with Reload Ratios of less than 1 percent.</li>
<li>High values for INVALIDATIONS mean additional overhead for the application. Therefore, performing activities that might cause INVALIDATIONS should be weighed against the expected benefit of those activities.</li>
<li>For STATSPACK, look at Library Hit % for the percentage of Library Cache hit-ratio. The Namespace, <strong>SQL AREA</strong> should have a 90% hit-ratio for the Library Cache.</li>
</ul>
<p>Queries to be used for Library Cache:</p>
<ul>
<li>SELECT NAMESPACE,GETHITRATIO,PINHITRATIO FROM V$LIBRARYCACHE;</li>
<li>SELECT SUM(RELOADS)/SUM(PINS) &#8220;Reload Ratio&#8221; FROM V$LIBRARYCACHE;</li>
</ul>
<p>The explanation of the counters are discussed after the table and will be recommended to read through them if you are unsure what they mean.</p>
<p><strong><span style="text-decoration: underline;">[1] V$LIBRARYCACHE </span></strong></p>
<p>A library cache miss can occur at either the Parse or Execute phases of SQL statement processing. The <strong>cache hit ratio</strong> related to the Parse Phase is shown in the GETHITRATIO column of <strong>V$LIBRARYCACHE</strong> while the Execute Phase is shown in the PINHITRATIO column of V$LIBRARYCACHE.  Other than GETHITRATIO and PINHITRATIO, the RELOADS and INVALIDATIONS columns of V$LIBRARYCACHE are also examined to uncovered behaviour of the Library Cache.</p>
<p><strong>GETHITRATIO</strong><br />
Oracle uses the term &#8220;get&#8221; to refer to a type of lock, called a &#8220;Parse Lock&#8221;, that is taken out on an object during the Parse Phase for the statement that references that object. Each time a statement is parsed, the value for GETS in the V$LIBRARYCACHE view is incremented by 1. On the other hand, the column GETHIT stores the number of times that the SQL and PL/SQL statements issued by the application found a parsed copy of themselves already in memory. When this occurs, there is no parsing of the statement required and the user&#8217;s server process just executes the copy already in memory instead. The ratio of parsed statements (GETS) to those that did not required parsing (GETHITS) is calculated in the GETHITRATIO column of V$LIBRARYCACHE.</p>
<p><strong>PINHITRATIO</strong></p>
<p>PINs like GETS, are also related to locking. However, while GETS are associated with locks that occur at Parse time, PINS are related to locks that occur at Execution time. These locks are the short-term locks used when accessing an object. Therefore, each library cache GET also requires an associated PIN, in either Shared or Exclusive mode, before accessing the statement&#8217;s reference objects. Each time a statement is executed, the value for PINS is increments by 1. The PINHITRATIO column in V$LIBRARYCACHE indicates how frequently executed statements found the associated parsed SQL already in the Library Cache.</p>
<p><strong>RELOADS</strong></p>
<p>RELOADS column shows the number of times that an executed statement had to be re-parsed because the library cache had aged out or invalidated the parsed version of the statement.  Examining the RELOAD column individually will not tell much.  Reload activity can be monitored by comparing the number of statement that have been executed (PINS) to the number of those statements that required a reload (RELOADS), calculated based on the following SQL:</p>
<div>SELECT SUM(RELOADS)/SUM(PINS) &#8220;Reload Ratio&#8221; FROM V$LIBRARYCACHE;</div>
<p><strong>INVALIDATIONS</strong></p>
<p>INVALIDATIONS occur when a cached SQL statement is marked as invalid and forced to parse even though it was already in the Library Cache. Cached statements are marked as invalid whenever the objects they referenced are modified in some way. For example, recompiling a view that was used by previously cached SQL statements will cause the those cached statements to be marked as invalid. Therefore, any subsequent statements that use this view will need to be parsed even though an exact copy of that statement may already be cached. High values for INVALIDATIONS mean additional overhead for the application. Therefore, performing activities that might cause invalidations should be weighed against the expected benefit of those activities.</p>
<p><span style="text-decoration: underline;"><strong>[2] STATSPACK</strong></span></p>
<p>The same cache hit-ratio information can be found in the results of STATSPACK. Library Cached performance statistics are included in the STATSPACK section headed Instance Efficiency Percentages (Target 100%). Look at Library Hit % for the percentage of Library Cache hit-ratio. Also, it is possible to look at the STATSPACK for INVALIDATIONS and RELOADS, under Library Cache Activity. Observe Pct Miss which conversely means that percentage that hits when minus off 100%.</p>
<p><span style="text-decoration: underline;"><strong>[3] OEM Performance Manager</strong></span></p>
<p>From OEM, access Performance Manager &gt; Database Instance Category &gt; Library Cache Hit %. This gives a runtime view of the Library Cache hit-ratio which eliminates the need to query V$LIBRARYCACHE. However, as this is a runtime view, you won&#8217;t be able to conduct analysing work after a load test.  Therefore, for analysing work after a load test, it is advisable to use Custom Query of Oracle Monitoring in LoadRunner to query <strong>V$LIBRARYCACHE</strong> from a load testing perspective.</p>
<p><a title="OCP Oracle 9i Performance Tuning Study Guide" href="http://www.amazon.com/gp/product/0782140653?ie=UTF8&amp;tag=pertesloatipt-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0782140653" target="_blank">(Source: OCP Oracle 9i Performance Tuning Study Guide)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/analyzing-oracle-sga-shared-pool-library-cache/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Detecting network bottlenecks</title>
		<link>http://www.loadrunnertnt.com/analyze/detecting-network-bottlenecks/</link>
		<comments>http://www.loadrunnertnt.com/analyze/detecting-network-bottlenecks/#comments</comments>
		<pubDate>Sat, 13 Sep 2008 06:44:20 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[network]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=325</guid>
		<description><![CDATA[At the network level, many things can affect performance. The bandwidth (the amount of data that can be carried by the network) tends to be the first culprit checked. Assuming you have determined that bad performance is attributable to the network component of an application, there is more likely cause of bad network performance than [...]]]></description>
			<content:encoded><![CDATA[<p>At the <strong>network</strong> level, many things can affect performance. The <strong>bandwidth</strong> (the amount of data that can be carried by the network) tends to be the first culprit checked. Assuming you have determined that bad performance is attributable to the network component of an application, there is more likely cause of bad network performance than network bandwidth. The most likely cause of network performance is the application itself and how it is handling distributed data and functionality.<span id="more-325"></span></p>
<p>The overall speed of a particular network connection is limited by (a) the slowest link in the connection chain and (b) the length of the chain. Identifying the slowest link is difficult and may not even be consistent: it can vary at different times of the day or for different communication paths. A network communication path lead from an application through a <strong>TCP/IP stack</strong> (which adds various layers of headers, possibly encrypting and compressing data as well), then through the hardware interface, through a modem, over a phone line, through another modem, over to a service provider’s router, through many heavily congested data lines of various carrying capacities and multiple routers with different maximum throughputs and configurations, to a machine at the other end with its own hard interface, TCP/IP stack and application. A typical web download route is just like this. In addition, there are dropped packets, acknowledgments, retries, bus contention, and so on.</p>
<p>Because so many possible causes of bad <strong>network performance</strong> are external to an application, one option you can consider including in an application is a network speed testing facility that reports to the user. This should test the speed of data transfer from the machine to various destinations: to itself, to another machine on the local network, to the Internet Service Provider, to the target server across the network, and to any other destinations appropriate. This type of diagnostics report can tell users that they are obtaining bad performance from something other than your application. If you feel that the performance of your application is limited by the actual network communication speed, and not by other (application) factors, this facility will report the maximum possible speeds to your user.</p>
<p><span style="text-decoration: underline;"><strong>Latency</strong></span></p>
<p><strong>Latency</strong> is different from the load-carrying capacity (<strong>bandwidth</strong>) of a network. <strong>Bandwidth</strong> refers to how much data can be sent down the communication channel for a given period of time and is limited by the link with the lowest bandwidth in the communication chain. The latency is the amount of time a particular data packet takes to get from one end of the communication channel to the other. Bandwidth tells you the limits within which your application can operate before the performance become affected by the volume of data being transmitted. Latency often affects the user’s view of the performance even when bandwidth isn’t a problem.</p>
<p>In most cases, especially Internet traffic, <strong>latency</strong> is an important concern. You can determine the basic round-trip time for a data packets from any two machines using the <em>ping</em> utility. (Refer to <a title="Understanding Network - How Ping Works?" href="index.php?option=com_content&amp;view=article&amp;id=75:understanding-network-how-ping-works&amp;catid=34:concepts&amp;Itemid=41" target="_blank">&#8220;Understanding Network &#8211; How Ping works?&#8221;</a>). However, the time measure is for a basic underlying protocol (ICMP packet) to travel between the machines. If the communication channel is congested and the overlying protocol requires re-transmissions (often the case for Internet traffic), one transmission at the application level can actually be equivalent to many round trips.</p>
<p>It is important to be aware of these limitations however it is also often possible to tune the application to minimize the number of transfers by (a) packing data together, (b) caching and (c) redesigning the distributed application protocol to aim for a less conversational mode of operation. At the network level, you need to monitor the transmission statistics (using the <em>ping</em> and <em>netstat</em> utilities and packet sniffers) and consider tuning any network parameters that you have access to in order to reduce re-transmissions.</p>
<p><span style="text-decoration: underline;"><strong>TCP/IP Stacks</strong></span></p>
<p>The <strong>TCP/IP stack</strong> is the section of code that is responsible for translating each application-level network request (send, receive, connect, etc.) through the transport layers down to the wire and back up to the application at the other end of the connection. Because the stacks are usually delivered with the operation system and performance-tested before delivery (since a slow network connection on an otherwise fast machine and fast network is pretty obvious), it is unlikely that the TCP/IP stack itself is a performance problem.</p>
<p>In addition to the stack itself, stacks include several tunable parameters. One parameter worth mentioning is the <strong>maximum packet size</strong>. When your application sends data, the underlying protocol breaks the data into packets that are transmitted. There is an optimal size for packets transmitted over a particular communication channel, and the packet size actually used by the stack is compromise. Smaller packets are less likely to be dropped, but they introduced more overhead, as data probably has to be broken up into more packets with more header overhead.</p>
<p>If your communication takes place over a particular set of endpoints, you may want to alter the packet sizes. For a LAN segment with no router involved, the packets can be big (e.g. 8KB). For a LAN with routers, you probably want to set the maximum packet size to the size the routers allow to pass unbroken. (Routers can break up the packets into smaller ones; 1500 bytes is the typical maximum packet size and the standard for the Ethernet. The maximum packet size is configurable by the router’s network administrator.) If your application is likely to be sending data over the Internet and you cannot guarantee the route and quality of routers it will pass through, 500 bytes per packet is likely to be optimal.</p>
<p><span style="text-decoration: underline;"><strong>Network Bottlenecks</strong></span></p>
<p>Other causes of slow network I/O can be attributed directly to the load or configuration of the network. For example, a LAN may become congested when many machines are simultaneously trying to communicate over the network. The potential throughput of the network could handle the load, but the algorithms to provide communication channels slow the network, resulting in a lower maximum throughput. A congested Ethernet network has an average throughput approximately one third the potential maximum throughputs. Congested networks have other problems, such as dropped network packets. If you are using TCP, the communication rate on a congested network is much slower as the protocol automatically resends the dropped packets. If you are using UDP, your application must resend multiple copies for each transfer. Dropping packets in this way is common for the Internet. For LANs, you need to coordinate closely with network administrators to alert them to the problem. For single machines connected by a service provider, suggesting improvements. The phone line to the service provider may be noisier than expected: if so, you also need to speak to the phone line provider. It is also worth checking with the service provider, who should have optimal configurations they can demonstrate.</p>
<p>Dropped packets and re-transmissions are a good indication of network congestion problems, and you should be on constant lookup for them. Dropped packets often occur when routers are overloaded and find it necessary to drop some of the packets being transmitted as the router’s buffer overflow. This means that the overlying protocol will request the packets to be resent. The netstat utility lists re-transmission and other statistics that can identify these sorts of problems. Re-transmissions may indicate that the maximum packet size is too large.</p>
<p><span style="text-decoration: underline;"><strong>DNS Lookup</strong></span></p>
<p>Looking up network address is an often-overlooked cause of bad network performance. When your application tries to connect to a network address such as foo.bar.somthing.org, your application first translates foo.bar.somthing.org into a four-byte network IP address such as 10.33.6.45. This is the actual address that the network understands and uses for routing network packets. <strong>DNS</strong> translation works as follows:</p>
<ol>
<li>The machine running the application sends the text string of the hostname (e.g. foo.bar.something.org) to the DNS server.</li>
<li>The DNS server checks its cache to find an IP address corresponding to that hostname. If the server does not find an entry in the cache, it asks its own DNS server (usually further up the Internet domain-name hierarchy) until ultimately the name is resolved. (This may be by components of the name being resolved, e.g. first .org, then something.org, etc, each time asking another machine as the search request is successively resolved.) This resolved IP address is added to the DBS server’s cache.</li>
<li>The IP address s returned to the original machine running the application.</li>
<li>The application uses the IP address to connect to the desired destination.</li>
</ol>
<p>The address lookup does not need to be repeated once a connection is established, but any other connections (within the same session of the application or in other session s at the same time and later) need to repeat the lookup procedure to start another connection.</p>
<p>You can improve this situation by running a <strong>DNS server</strong> locally on the machine, or on a local server if the application uses a LAN. A DNS server can be run as a “caching-only” server that resets its cache each time the machine is rebooted. There would be little point in doing this if the machine used only one or two connections per hostname between successive reboots. For more frequent connections, a local DNS server can provide a noticeable speedup to connections. <strong>Nslookup</strong> is useful for investigating how a particular system does translations.</p>
<p><a title="Java Performance Tuning" href="index.php?view=article&amp;catid=38%3Arecommended-resources&amp;id=66%3Abooks&amp;option=com_content&amp;Itemid=41" target="_blank">(Source: &#8220;Java Performance Tuning&#8221;)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/detecting-network-bottlenecks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting memory bottlenecks</title>
		<link>http://www.loadrunnertnt.com/analyze/detecting-memory-bottlenecks/</link>
		<comments>http://www.loadrunnertnt.com/analyze/detecting-memory-bottlenecks/#comments</comments>
		<pubDate>Tue, 09 Sep 2008 06:46:26 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[Memory]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=327</guid>
		<description><![CDATA[Monitoring the system memory (RAM) is not usually helpful in identifying memory performance problems. A better indicator will be monitoring paging activities to the page/swap file.  Memory paging activities will be covered in another article.  Most current OS have virtual memory comprising of the actual (real) system memory using physical RAM chips, and [...]]]></description>
			<content:encoded><![CDATA[<p>Monitoring the system memory (RAM) is not usually helpful in identifying <strong>memory</strong> performance problems. A better indicator will be monitoring <strong>paging activities</strong> to the page/swap file.  Memory paging activities will be covered in another article.  Most current OS have <strong>virtual memory</strong> comprising of the actual (real) system memory using physical RAM chips, and one or more page/swap files on the system disks. Processes that are currently running are operating in real memory. The OS can take pages from any of the processes currently in real memory and swap them out to disk. This is known as paging. Paging leaves free space in real memory to allocate to other processes that need to bring in a page from disk. Obviously, if all the processes currently running can fit into real memory, there is no need for the system to swap out any pages. However, if there are too many processes to fit into real memory, <strong>paging</strong> allows the system to free up system memory to run more processes.<span id="more-327"></span></p>
<p><strong>Paging</strong> affects system performance in many ways. One, is that if a process moved some some pages to disk and the process becomes runnable, the OS has to retrieve the pages from the disk before that process can be run which results to delays in performance. In addition, both CPU and the disk I/O spend time doing the paging, reducing available processing power and increasing the load on the disks. This cascading effect involving both the CPU and I/O can degrade the performance of the whole system in such a way that it maybe difficult to even recognize that paging is the problem. The extreme version of too much paging is <strong>thrashing</strong>, in which the system is spending too much time moving pages around that it fails to perform any other significant work. (The next step is likely to be a system crash).</p>
<p>A little paging of the system does not affect the performance enough to cause concern. In fact, some paging can be considered good which indicates that the system’s memory resources are fully utilized. But at the point where paging becomes a significant overhead, the system is overloaded. To monitor paging activities is easy. On UNIX, the utilities <em>vmstat</em> and <em>iostat</em> provide details as to the level of paging, disk activity and memory levels. On Windows, the performance monitor has categories to show these details, such as <em>Pages/sec</em>, <em>Page faults/sec</em> and <em>% Page files usage</em>.</p>
<p>If there is more paging than optimal, the system’s RAM is insufficient or processes are too big. To improve this situation, you need to reduce the memory being used by (a) reducing the number of processes or (b) the memory utilization of some processes. Alternatively, you (c) can add RAM.  If the problem is caused by a combination of your application and others, you can partially address the situation by using <strong>process priorities</strong>. By issuing priorities to the processes, you need to be aware that using this option reduces the amount of RAM available to all other processes, which can make overall system performance worse. Therefore measure the trade-offs before deciding the resolution.</p>
<p><a title="Java Performance Tuning" href="index.php?view=article&amp;catid=38%3Arecommended-resources&amp;id=66%3Abooks&amp;option=com_content&amp;Itemid=41" target="_blank">(Source: &#8220;Java Performance Tuning&#8221;)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/detecting-memory-bottlenecks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Detecting disk bottlenecks</title>
		<link>http://www.loadrunnertnt.com/analyze/detecting-disk-bottlenecks/</link>
		<comments>http://www.loadrunnertnt.com/analyze/detecting-disk-bottlenecks/#comments</comments>
		<pubDate>Thu, 04 Sep 2008 06:50:02 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[Disk]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=329</guid>
		<description><![CDATA[In most cases, applications can be tuned so that disk I/O does not cause any serious performance problems. However, performance problem may still persist even after application tuning. In this article, we will be addressing common disk bottlenecks and their tuning techniques.
Identifying whether the system has a problem with disk utilization first. Each system provides [...]]]></description>
			<content:encoded><![CDATA[<p>In most cases, applications can be tuned so that disk I/O does not cause any serious performance problems. However, performance problem may still persist even after application tuning. In this article, we will be addressing common disk bottlenecks and their tuning techniques.</p>
<p>Identifying whether the system has a problem with disk utilization first. Each system provides its own tools to identify disk usage (Windows: <em>Perfmon</em>, and UNIX: <em>sar, vmstat, iostat utilities</em>). For a start, identify whether the paging is an issue (look at disk-scan rates) and assess the overall utilization of your disks (e.g. Disk Transfers/sec on Windows, output from <em>iostat –D</em> on UNIX). It may be that the system has a problem independent of your application (e.g. unbalanced disks), and correcting this problem may resolve it.<span id="more-329"></span></p>
<p>If the disk analysis does not identify an obvious system problem that is causing the I/O overhead, you could try making a disk upgrade or a reconfiguration. This type of tuning can consist of any of the following:</p>
<ul>
<li>Upgrading to faster disks</li>
<li>Adding more swap space to handle larger buffers</li>
<li>Changing the disk to be striped (where files are striped across several disks, thus providing parallel I/O. e.g. with a RAID system)</li>
<li>Running the data on raw partitions when this is shown to be faster.</li>
<li>Distributing simultaneously accessed files across multiple disks to gain parallel I/O</li>
<li>Using memory-mapped disks or files</li>
</ul>
<p>You can never be sure if any particular disk is local to the user if you have applications running on many systems and unsure of the specification of the target system.  There is significant possibility that the disk used by the application is a <strong>network-mounted disk</strong>. This doubles the variability in response times and throughput. A network disk is a shared resource, as is the network itself, so performance is hugely and unpredictably affected by other users and network load.</p>
<p>We will be touching on the common areas to of performance issue, namely <strong>(a) Disk I/O, (b) Clustering Files, (c) Cache File Systems, (d) Disk Fragmentations</strong> and <strong>(e) Disk Sweet Spots</strong>.</p>
<p><span style="text-decoration: underline;"><strong>Disk I/O</strong></span></p>
<p>Disk writes on the system can impact performance adversely as a whole. System swap files should be placed on a separate disk from their databases (recommended by database vendors). The impact of not doing so can decrease database throughput (and system activity). This performance decreases come from not splitting I/O of two disk-intensive applications (in this case, OS paging and database I/O).</p>
<p>Identifying that there is an I/O problem is usually fairly easy. The most basic symptom is that things take longer than expected, while at the same time the CPU is not at all heavily worked. The disk-monitoring utilities will also show there is a lot of work being done to the disks. At the system level, you should determine the average peak requirements on the disks. Your disks will have some statistics that are supplied by the vendor, including:</p>
<p><strong>Average and peak transfer rates</strong>; normally in megabytes (MB) per seconds, e.g. 5MB/sec. Form this, you can calculate how long an 8K page takes to be transferred from disk, and for example, 5MB/sec is about 5K/ms, so an 8K page takes just under 2ms to transfer.</p>
<p><strong>Average seek time</strong>; normally in milliseconds (ms). This is the time required for the disk head to move radially to the correct location on the disk.</p>
<p><strong>Rotational speed;</strong> normally in revolutions per minutes (rpm), e.g. 7200rpm. From this, you can calculate the average rotational delay in moving the disk under the disk-head reader, i.e., the time taken for half a revolution. For example, for 7200rpm, one revolution takes 60,000ms (60 seconds) divided by 7200rpm, which is about 8.3 ms. So half a revolution takes just over 4ms, which is consequently the average rotational delay.</p>
<p>With the above list, you are able to calculate the total delay for a disk operation to load a random 8K page from disk with the following formula:</p>
<div><em>seek time + rotational delay + transfer time</em></div>
<p>Using the examples given in the list, you have 10 + 4 + 2 = 16 ms to load a random 8K page (almost an order of magnitude slower than the raw disk throughput). This calculation gives you a worst–case scenario for the disk-transfer rates for your application, allowing you to determine if the system is up to the required performance. Note that if you are reading data stored sequentially in disk (as when reading a large file), the<strong> seek time</strong> and <strong>rotational delay</strong> are incurred less than once per 8K page loaded. Basically, these two times are incurred only at the beginning of opening the file and whenever the file is fragmented. But this calculation is confounded by other processes also executing I/O to the disk at the same time. This overhead is part of the reason why swap and other intensive I/O files should not be put on the same disk.</p>
<p>One mechanism for speeding up disk I/O is to <strong>stripe disks</strong>. Disk striping allows data from a particular file to be spread over several disks. Striping allows reads and writes to be performed in parallel across the disks without requiring any application changes. This can speed up disk I/O quite effectively. However, be aware that the seek and rotational overhead previously listed still applies, and there maybe no performance gain if you are making many small random reads.</p>
<p>Finally, note again that using remote disks adversely affects I/O performance. You should not be using remote disks mounted from the network with any I/O-intensive operations if you need good performance.</p>
<p><span style="text-decoration: underline;"><strong>Clustering Files</strong></span></p>
<p>Reading many files sequentially is faster if the files are clustered together on the disk, allowing the disk-head reader to flow from one file to the next. This clustering is best done in conjunction with defragmenting the disks. The overhead in finding the location of a file on the disk (detailed in the previous section) is also minimized for sequential reads if the files are clustered.</p>
<p>If you cannot specify clustering files at the disk level, you can still provide similar functionality by putting all the files together into one large file (as is done with the ZIP file systems). This is fine if all the files are read-only files or if there is just one file that is writable (you place that at the end). However, when there is more than one writable file, you need to manage the location of the internal files in your system as one or more grow. This becomes a problem and is not usually worth the effort. (If the files have a known bounded size, you can pad the files internally, thus regaining the single file efficiency.)</p>
<p><span style="text-decoration: underline;"><strong>Cached File Systems (RAM Disks, tmpfs, cachefs)</strong></span></p>
<p>Most OS provide the ability to map a file system into the system memory . This ability can speed up reads and writes to certain files to control the target environment. Typically, this technique has been used to speed up the reading and writing of temporary files. For example, some compilers (of languages in general, not specifically Java) generate many temporary files during compilation. If these files are created and written directly to the system memory, the speed of compilation is greatly increased. Similarly, if you have a set of external files that are needed by your application, it is possible to map these directly into the system memory, thus allowing their reads and writes to be sped up greatly.</p>
<p>But note that these types of file systems are not persistent and will be cleared (as with system memory) when it&#8217;s rebooted. If the system crashes, anything in a memory-mapped file system is lost. For this reason, these types of file systems are usually suitable only for temporary files or read-only versions of disk-based files (such as mapping a CD-ROM into a memory-resident file system).</p>
<p>Remember that you do not have the same degree of fine control over these file systems that you have over your application. A memory-mapped file system does not use memory resources as efficiently as working directly from your application. If you have direct control over the files you are reading and writing, it is usually better to optimize this within your application rather than outside it. A memory-mapped file system takes space directly from system memory. You should consider whether it would be better to let your application grow in memory instead of letting the file system take up that system memory. For multi-user applications, it is usually more efficient for the system to map shared files directly into memory, as a particular file then takes up just one memory location rather than duplicate in each process.</p>
<p>The creation of memory-mapped file systems is completely system-dependent, and there is no guarantee that it is available on any particular system (though most modern OS do support this feature). For Unix, look up for <em>cachefs</em> and <em>tmpfs</em> while for Windows, look out on how to setup a RAM disk (a portion of memory mapped to a logical disk drive). In a similar way, there are products available that pre-cache shared libraries (DLL) and even executables in memory which allows application to start quicker or loads the quicker, and so may not be much help in speeding up a running system.</p>
<p>But you can apply the technique of memory-mapping file systems directly and quite usefully for applications in which processes are frequently started. Copy the Java distribution and all class files (all JDK, application, and third-party class files) onto a memory-mapped file system and ensure that all executions and classloads take place from the file system. Since everything (executables, DLLs, class files, resources, etc.) is already in memory, the startup time is much faster. Because only the startup (and class loading) time is affected, this technique gives only a small boost to applications that are not frequently starting processes, but can be usefully applied if startup time is a problem.</p>
<p><span style="text-decoration: underline;"><strong>Disk Fragmentation</strong></span></p>
<p>When files are stored on disk, the bytes in the files are note necessarily stored contiguously: their storage depends on file size and contiguous space available on the disk. This non-contiguous storage is called <strong>fragmentation</strong>. Any particular file may have some chunks in one place, and a pointer to the next chunk that may be quite a distance away on the disk.  Hard disks tend to get fragmented over time. This fragmentation delays both reads from files (including loading applications into computer memory on startup) and writes to files. This delay occurs because the disk header must wind on to the next chunk with each fragmentation.</p>
<p>For optimum performance on any system, it is ideal to periodically defragment the disk. This reunites files that have been split up, reducing disk heads time spent searching for data as the file-header locations have been identified, thus speeding up data access. However, defragmenting may not be effective on all systems.</p>
<p><span style="text-decoration: underline;"><strong>Disk Sweet Spots</strong></span></p>
<p>Most disks have a location from which data is transferred faster than from other locations. Usually, the closer the data is to the outside edge of the disk, the faster it can be read from the disk. Most hard disks rotate at constant angular speed. This means that the linear speed of the disk under a point is faster the farther away the point is from the center of the disk. Thus, data at the edge of the disk can be read from (and written to) at the faster possible rate commensurate with the maximum density of data storable on disk.</p>
<p>This location with faster transfer rates usually termed the <strong>disk sweet spot</strong>. Some (Commercial) utilities provide mapped access to the underlying disk and allow you to reorganize files to optimize access. On most server systems, the administrator has control over how logical partitions of the disk apply to the physical layout, and how to position files to the disk sweet spots. Experts for high-performance database system sometimes try to position the index tables of the database as close as possible to the disk sweet spot. These tables consist of relatively small amounts of data that affect the performance of the system in a disproportionately large way, so that any speed improvement in manipulating these tables is significant.</p>
<p>Note that some of the latest OS are beginning to include “awareness” of disk sweet spots, and attempt to move executables to sweet spots when defragmenting the disk. You may need to ensure that the defragmentation procedure does not disrupt your own use of the disk sweet spot.</p>
<p><a title="Java Performance Tuning" href="index.php?view=article&amp;catid=38%3Arecommended-resources&amp;id=66%3Abooks&amp;option=com_content&amp;Itemid=41" target="_blank">(Source: Java Performance Tuning)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/detecting-disk-bottlenecks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting processor bottlenecks</title>
		<link>http://www.loadrunnertnt.com/analyze/detecting-processor-bottlenecks/</link>
		<comments>http://www.loadrunnertnt.com/analyze/detecting-processor-bottlenecks/#comments</comments>
		<pubDate>Sat, 30 Aug 2008 06:53:26 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[processor]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=332</guid>
		<description><![CDATA[In this article of &#8220;Detecting processor bottlenecks&#8221;, we are providing a general idea of determining the bottleneck with two broad categories, namely (a) Processor Load and (b) Process Priorities. Taken from the book, &#8220;Java Performance Tuning&#8221; written by Jack Shirazi, its recommended to go through it to get a better understanding in determining resource bottlenecks [...]]]></description>
			<content:encoded><![CDATA[<p>In this article of &#8220;Detecting processor bottlenecks&#8221;, we are providing a general idea of determining the <strong>bottleneck</strong> with two broad categories, namely <strong>(a) Processor Load</strong> and <strong>(b) Process Priorities</strong>. Taken from the book, <a title="Java Performance Tuning" href="index.php?view=article&amp;catid=38%3Arecommended-resources&amp;id=66%3Abooks&amp;option=com_content&amp;Itemid=41" target="_blank">&#8220;Java Performance Tuning&#8221;</a> written by Jack Shirazi, its recommended to go through it to get a better understanding in determining resource bottlenecks and the of tuning Java technologies. The term CPU and Processor refers to the same thing are used interchangeably in this article.<span id="more-332"></span></p>
<p><span style="text-decoration: underline;"><strong>Processor Load</strong></span></p>
<p>Two areas of <strong>Processor Load</strong> are worth watching as primary performance points.   They are the <strong>Processor Utilization</strong> (expressed in percentage) and the<strong> Runnable Queue</strong> of processes and threads (often called the load or the task queue).</p>
<p><strong>Processor Utilization;</strong> The first indicator is simply the percentage of the CPU (Or CPUs) being used by all the various threads. If this is up to 100% for significant periods of time, you may have a problem. On the other hand, if it isn’t, the CPU is under-utilized, but that is usually preferred. However, the amount of processes and threads existing in the system can be huge which it will be overwhelming to look at all of them. Therefore, start with known processes and tasks, such as application ones (user tasks), followed by OS tasks.</p>
<p>Some common symptoms can be resulted from the following:</p>
<ul>
<li>Low CPU usage can indicate that your application may be blocked for significant periods on disk or network I/O (High I/O or poor I/O)</li>
<li>Low CPU usage can indicate that contention is on another server in the architecture and it&#8217;s waiting for that server to complete its task and send data back to it.</li>
<li>High CPU usage can indicate thrashing (lack of RAM or too many threads)</li>
<li>High CPU contention can indicate inefficient code which (indicating that you need to tune the code and reduce the number of instructions being processed to reduce the impact on the CPU).</li>
</ul>
<p>A reasonable target is <strong>75% CPU utilization</strong> (which from different authors varies from 75% till 85%). This means that the system is being worked toward its optimum, but also allowing some slacks for spikes due to other system or application requirements. However, note that if more than 50% of the CPU is used by system processes (i.e. administrative and IS process), your CPU is probably under-powered. This can be identified by looking at the load of the system over some period when you are not running any applications (always allow the system to run a normal/no load scenario to log its initial benchmark).</p>
<p><strong>Runnable Queue;</strong> The second performance indicator indicates the average number of processes or threads waiting to be scheduled for the processor by the OS. They are run-able processes, but the processor has no time to run them and is keeping them waiting for some significant amount of time. As soon as the run queue goes above zero, the system may display contention for resources. However, there are still exceptions where runnable queue is above zero and the system is still performing at an acceptable level. A good way to identify the acceptable runnable queue of the system is to graph the Avg. Transaction Response Time with the runnable queue statistics (in Windows is <strong>Processor Queue Length</strong>). Observe any degradation of response time when the runnable queue increases. For capacity planning, a guideline proposed by Adrian Cockcroft is that performance starts to degrade if the <span style="color: #ff0000;">run queue grows bigger than four times the number of CPUs</span>.</p>
<p>If you can upgrade the CPU of the target environment, doubling the CPU speed is usually better than doubling the number of CPUs. And remember that parallelism in an application doesn’t necessarily need multiple CPUs. If I/O is significant, the CPU will have plenty of time for many threads.</p>
<p><span style="text-decoration: underline;"><strong>Process Priorities</strong></span></p>
<p>The OS also has the ability to prioritize the processes in terms of providing processor time by allocating process priority levels. Processor priorities provide a way to throttle high-demand processes, thus giving other processes a greater share of the processes. If there are other processes that need to run on the same machine but it doesn’t matter if they were run slowly, you can give your application processes a (much) higher priority than those other processes, thus allowing your application the lion’s share of CPU time on a congested system. This is worth keeping in mind if your application consists of multiple processes, you should also consider the possibility of giving your various processes different levels of priority. Being tempted to adjust the priority levels of processes, however, is often a sign that the CPU is underpowered for the tasks you have given it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/detecting-processor-bottlenecks/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>WASJDK: java.lang.OutOfMemoryError due to allocating large heap objects</title>
		<link>http://www.loadrunnertnt.com/analyze/wasjdk-java-lang-outofmemoryerror-due-to-allocating-large-heap-objects/</link>
		<comments>http://www.loadrunnertnt.com/analyze/wasjdk-java-lang-outofmemoryerror-due-to-allocating-large-heap-objects/#comments</comments>
		<pubDate>Thu, 31 Jul 2008 06:40:23 +0000</pubDate>
		<dc:creator>TnT Admin</dc:creator>
				<category><![CDATA[Analyze]]></category>
		<category><![CDATA[WAS]]></category>

		<guid isPermaLink="false">http://www.loadrunnertnt.com/?p=321</guid>
		<description><![CDATA[Taken from the IBM Public Library for Websphere 4.x (a little out-dated).  In this featured article for WebSphere 4.x, the concept is applicable to most J2EE application in determining if java.lang.OutOfMemoryError is due to allocation of large heap objects which is the cause of heap fragmentation. Heap fragmentation can be detected when there is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="IBM" src="http://loadrunnertnt.com/images/company_ibm_logo.gif" alt="" width="110" height="52" />Taken from the IBM Public Library for <strong>Websphere 4.x</strong> (a little out-dated).  In this featured article for <strong>WebSphere 4.x</strong>, the concept is applicable to most J2EE application in determining if java.lang.OutOfMemoryError is due to allocation of large heap objects which is the cause of heap fragmentation. Heap fragmentation can be detected when there is a (a) high amount of free heap and a (b) high amount of memory in heap while a (c) &#8220;totally out of heap space message&#8221; occur. Techniques on how to resolve the problem such as coding it to prevent <strong>heap fragmentation</strong>, diminishing the effect of potential <strong>heap fragmentation</strong> are discussed in the article, etc.  To get the article, click on the following link:<span id="more-321"></span></p>
<ul>
<li><a title="WASJDK: java.lang.OutOfMemoryError due to allocating large heap objects " href="http://publib.boulder.ibm.com/infocenter/wasinfo/v4r0/index.jsp?topic=/com.ibm.support.was40.doc/html/Java_SDK/swg21176438.html" target="_blank">WASJDK: java.lang.OutOfMemoryError due to allocating large heap objects </a></li>
</ul>
<p>For the latest version of the WebShere documentation and tuning guide, you can refer to the <a title="IBM Websphere 6.x Information Center" href="http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp" target="_blank">IBM Public Library for WebSphere 6.x</a> where there are various resources for performance tuning <strong>WebSphere 6.x.</strong></p>
<p><a title="CSS Corp" href="http://www.csscorp.com/" target="_blank">(Special thanks to Aravind Kumar for contribution, CSS Corp)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loadrunnertnt.com/analyze/wasjdk-java-lang-outofmemoryerror-due-to-allocating-large-heap-objects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
