Friday, November 20, 2009

Randomly Clicking links in JMeter (sometimes known as spidering)

A follow up to Spidering a site with JMeter
A user on the JMeter mailing list posted his solution using the HTML link parser[1] to spider a site. The spidering consists of clicking a link at random from the links parsed from the last accessed page.
The test looks like
Script available at Spider.jmx
The Initial Request is used by the HTML Link Parser to get the initial set of urls from which one will be chosen.
The While controllers condition is simply true , since we want it to loop forever.
The Spider HTTP Sampler has a path of .*.
The If Controller has a condition ${__javaScript(!${JMeterThread.last_sample_ok})}
This simply checks if the last sample fetched failed (by .* or because it fetched a CGI/PDF which cant be parsed for links and if so reexecutes the Initial Request.)
There are numerous tweaks you can implement , you might not execute the initial request, it might be one at random that you pick , or the last successful request. You might choose to check the request being made to restrict the paths.

Note that this clicks a link at random from a set of links acquired from the previously clicked page. This cannot ensure that a link is not repeated and cannot ensure that all links are fetched.

[1] http://jakarta.apache.org/jmeter/usermanual/component_reference.html#HTML_Link_Parser

1 comment:

Anonymous said...

Also, the HTML Link parser is unable to distinguish between GET links and POST links. What the parser really needs is an option to ignore POST/form requests. It's nearly unusable as-is.