This section reviews some of the features controlled by command line flags.
There are two types of "cookies" in common usage on the web. The first
kind follow the HTTP Cookie specification, and are sent as part of the
HTTP header. This section will discuss this type of cookie. Another
type of "cookie" is a string that is embedded in the URL itself, and is
passed from server to client by embedding it directly into the body
of a web page (usually in some url-encoded form). This second
type of cookie is also supported by webclient
, and the mechanisms
for dealing with it are discussed in further detail in the section
URL-embedded State below.
webclient
will automatically accept and cache any and all cookies
returned by the server. The cookies will then be handled following
the usual cookie semantics for a browser: if path names match, then
the cookie will be returned to the server. webclient
does *not* age
cookies, and thus, they will not expire from the cache in that fashion.
Also, webclient
does not maintain a persistent store of cookies: once
webclient
exits, any cookies it had are lost.
In order to make webclient
a more realistic multi-user stress tool,
it will flush the cookie cache at the end of a session. That is,
each new session is started with an empty cookie cache, simulating
new user with a recently restarted browser.
In order to help verify correct operation, webclient
can be made to
check for the presence of cookies on certain paths, and to print an
error and exit if the server did not return a cookie for that path.
The --cookie-path
flag can be specified any number of times to add a
path to the error checking code.
Some web and application servers refer to a state maintenance technique
called url-encoding in connection to a discussion of cookies. Note
that url-encoding does not use cookies in the sense in which the HTTP
spec implies; rather, the server embeds unique, long strings directly into
the urls in the body of the web page. These long strings are used by the
server to provide a cookie-like function. webclient
provides
support for these types of "url-cookies", and is able to track them with
the --handle
flag described in the section
URL-embedded State below.
By default, in order to maintain backwards compatibility, webclient
will check for the presence of a cookie on the path /proclogin.ns
.
This is the same as specifying the flag --cookie-path=/proclogin.ns
If any cookie path is explicitly specified, then the default
/proclogin.ns
is not set.
The -g
and the -c
flags enable the fetching and caching of
images.
webclient
is able to scan a web page and fetch any images
that it finds embedded in the page. It does so by scanning the returned
page for references of the form IMG SRC=
and extracting and
fetching the specified URL. It uses a fairly sophisticated pattern-matching
algorithm to find the URL, and is able to pick its way through some more
obtuse quotation mark and white-space combinations, such as those that
might occur in JavaScript. Note, however, that webclient
does
not provide a JavaScript interpreter, and that therefore it can get
confused by more complex image-fetching JavaScript applets. It does not
support images fetched with client-side Java applets.
Images are fetched only if the -g
option is set; by default,
image-fetching is disabled.
Emulation of a browser's gif-cache is supported with the -c
flag.
That is, if webclient
notices that it has previously
fetched a given gif url this session, it will not fetch that url again.
The result is that the number of gif files fetched by
webclient should match the number of gif files fetched by the browser
during an entire session, assuming that the gif cache was empty
when the user requested the server's logon page.
If the -c
option is not specified, every gif is fetched every time
the page is requested.
By default, webclient
uses four threads and the HTTP/1.1
Persistant Connection protocol for fetching gif files in parallel
over four sockets. The number of threads and the protocol used
can be changed as explained below.
Note:
By default, webclient
uses four threads and the HTTP/1.1
Persistant Connection protocol for fetching gif files in parallel
over four sockets. This behaviour can be modified with three flags:
--no-keep-alive
, --num-threads=nnn
and --http-version=1.x
By default, the HTTP/1.1 protocol specifies that Persistant
Connections are to be used when a browser talks to the web server.
What this means is that once the browser has opened a socket to the
server, it keeps that socket open for further URL requests. This helps
eliminate the overhead of negotiating a new socket for each request.
By default, webclient
does the same, in order to better emulate
a real web user. However, this behaviour can be disabled by specifying
the --no-keep-alive
flag. This flag causes the Connection: Close
header field to be added to the HTTP header, and the socket to be closed
after all of the data has been received.
The defacto industry-standard Netscape extensions to the HTTP/1.0 protocol
had a similar concept, called Keep-Alive. webclient
can be made
to use this protocol by using the --http-protocol=1.0
flag. Currently,
there are only two valid values that this flag can take: HTTP/1.0
and HTTP/1.1. By specifying HTTP/1.0, webclient
will try to use
Keep-Alive by including the header field Connection: KeepAlive
with each request (and keeping the socket open). This can again be
disabled by using the --no-keep-alive
flag.
To further improve performance, browsers open a number of sockets to
the web server for fetching gifs in parallel. The default number of
sockets is four for both Netscape(TM) Navigator and Microsoft(TM) Internet
Explorer, although users can adjust this value from the control panel
or preferences dialog. To emulate this behaviour, webclient
maintains a pool of four threads for gif fetching. Each thread handles
the i/o on one socket. The number of threads (and thus the number of
sockets) that are used can be changed with the --num-threads=nnn
flag.
Note that once webclient
has opened a socket to the server, it
will keep it open indefinitely (as long as the --no-keep-alive
flag wasn't sepcified). However, webservers have only a limited
pool of connections, and busy webservers will routinely close the socket
on unsuspecting browsers. webclient
does notice when this occurs,
and keeps statistics on how often it was able to reuse and open socket,
and how often an open socket was unexpectedly closed by the server.
These stats are printed as part of the normal stats output.
Note:
webclient
supports a number of substitution and re-writing modes.
These include:
Authorization
to be added to the header on a per-client basis, without
requiring multiple copies of an input file.webclient
to
be used in complex scripts while retaining a single input
file. It is typically used to substitute for logon ID's
and passwords that might get embedded in the request URL.
The HTTP headers generated and sent by webclient
can be
fully customized and rewritten. By default, webclient
sends a simple, basic HTTP header. A fully customized header
can be specified with the --header-file
flag, or alternately,
the header can be placed in the input file, using the
<<HEADER>> directive.
Whether or not a custom header has been specified, key-value pairs
in the header can be substituted for or added to the header with the
--header-subst
and --header-add
flags. These flags are
particularly useful when creating multi-user scripts, where each
running copy of webclient
needs to send a slightly different
header. In particular, this is needed in order to perform HTTP-style
authentication.
The default header that webclient
currently should resemble:
User-Agent: webclient/WebLoad v4.0beta3 (Linux OpenSSL 0.9)
Host: webby.com:80
Referer: webby.com/page.html
Accept: */*
Accept-Language: en
Accept-Charset: iso8859-1, *, utf-8
The User-Agent
value will reflect the current actual version of webclient
.
It can be modified with the -U
flag
described below,
or by specifying a custom header. It can be omitted by using a custom
header which does not contain it.
The Host
value is automatically generated and updated by webclient
depending on the server being contacted. If this tag is present in the header,
then webclient
will always update its value as appropriate.
It can be omitted by using a custom header which does not contain it.
The Referer
tag will be automatically added and updated based on the
most recent URL that webclient
had requested. There is currently no
way to disable the presence or automatic update of this tag.
--header-file
FlagA fully customized HTTP header can be specified with the --header-file
command-line flag, for example: webclient --header-file=some.file.name
This header will be used for all fetches, including the fetching
of gifs. A typical header file might look like the following:
Accept: image/gif, image/x-bitmap, image/jpeg, image/png
Accept-Language: en
Pragma: no-cache
Authorization: Basic amFtZXM6amQpMrT=
Note that the header file should not contain the HTTP method
(viz. GET
, POST
), this is handled separately.
Note that the header file should not contain the body for
a POST
request, this is handled separately with the
<<POSTDATA>> input file directive.
The header file should not contain blank lines or comment lines.
It will be parsed into key-value pairs which can be substituted
for with the --header-subst
and --header-add
flags.
--header-subst
and --header-add
FlagsValues in the HTTP header can be substituted for with the
--header-subst
flag. For example,
webclient --header-subst="Accept-Language: fr"
will change the
value of the Accept-Language
tag in the header to be fr
.
The substitution will only be made if the tag already appears in
the header. If the tag does not appear, then the substitution will
not be made.
The --header-add
flag can be used to make a substitution
for an existing value, or to add the tag-value pair if it is not
already present.
Some web sites require authentication using the HTTP 401
response code in conjunction with the Authorization
header field. That is, the web server will deny access
to a web page unless the browser (webclient
) supplied
a field of the form
Authorization: Basic amFtZXM6amQpMrT=
in the header sent with the URL request. The string of
seemingly random letters is an encoded username-password pair.
Appropriate values for the encoded string can be gotten
by using the webmon
tool with tracing enabled.
These values can be placed in the webclient
request
header file. Alternately, it might be more convenient to
specify these on the command line, using the --header-add
flag. This is particularly the case when multiple copies of
webclient
must run, each with it's own login.
The following can be used to add the above line to the
header:
webclient --header-add="Authorization: Basic amFtZXM6amQpMrT="
The difference between the --header-subst
and the
--header-add
flags is that the former will make the
substitution only if the key is already present in the header,
whereas the latter will either substitute or will add
the key-value pair if it is not present.
By default, webclient
sets the User-Agent
tag in HTTP headers
sent to the server to webclient 4.0pre0 (Linux)
or similar.
However, some web servers (in particular, the Netscape Enterprise Server)
check for the User-Agent
type, and respond differently to different
server types. Sometimes the differences are subtle, and yet they can
change overall behavior dramatically: things like redirects, socket
close semantics and returned headers can change, and sometimes even
bugs will be exhibited for some cases but not others.
To get webclient
to trick the webserver into behaving more appropriately,
the -U
flag can be used to change the value of the User-Agent
field.
Before you start, you must figure out what the browser you are trying to
impersonate is sending. To do this, use webmon
with the -t
(trace) option. Run a few requests and then stop webmon
.
Look in the trace file for a line that begins with User-Agent:
.
The string that follows this is the string that must be specified on
the -U
option. For example, with Netscape 4.04, under AIX,
the string is:
User-Agent: Mozilla/4.04 [en] (X11; AIX 4.1; Nav)
You would then pass this flag to webclient as shown below. Note the use of the single quote marks to delimit the string. The quotes are needed whenever there is embedded whitespace in the string, and also to delimit shell special characters, such as "(".
(Note: The DOS shell under Windows95/98/NT cannot use quote marks to delimit a string. In order to prevent the embedded blanks from causing a problem, convert them to hash marks (# signs). webclient will automatically convert them back into spaces).
webclient -U 'Mozilla/4.04 /[en] (X11; AIX 4.1; Nav)'
Some other User-Agent strings:
-U 'Mozilla/3.0 (Win95; I)' Netscape Version 3 for Windows 95
-U 'Mozilla/3.04 (Win95; U)' Netscape Version 3 for Windows 95
-U 'Mozilla/2.02 (OS/2; U)' Netscape Version 2 for OS/2
-U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)' NS for AIX
-U 'Mozilla/4.05 [en] (X11; U; Linux 2.0.32 i586)' NS for Linux
Note that the -U
flag is entirely equivalent to the longer, more
verbose flag --header-add
. The previous example is completely equivalent
to the following:
webclient --header-add="User-Agent: Mozilla/4.04 /[en] (X11; AIX 4.1; Nav)"
Some web-site designs embed customer-specific information into
URL's as an alternative mechanism to "cookies" for maintaining
state information. webclient
can track this state information
in an automated fashion, generating the appropriate URL's
dynamically as it traverses a web site. There is a restriction:
webclient
assumes that the state information is url-encoded
as a key-value pair in the URL.
This is best illustrated with an example. Suppose that when
a user visits a website, the request the URL
/cgi-bin/firstpage
, and that the page that is issued in
response to this contains the URL
/cgi-bin/secondpage?this=that&token=qwertyuiop&up=down
where the string qwertyuiop
is generated dynamically and differs
for every visitor to the site. Then webclient
can be
configured to track navigate this site by using an input file
similar to the following:
GET /cgi-bin/firstpage
GET /cgi-bin/secondpage?this=that&token=xxx&up=down
and using the command line
webclient --handle=token
This will cause webclient
to scan each web page it receives for
new values of the key "token", and substitute for its
value in any subsequent GET or POST requests, including POST data
bodies. The particular value "xxx" used in the input file
does not matter. Substitutions for multiple handles can be done
by specifying as many --handle=
flags as needed.
Note that if a token appears multiple times on the same page
with different values, webclient
will record only the last
value that it finds on the page. This may not be the desired
behavior in some cases. Note that the an ampersand (&),
white space, a (single or double) quote-mark, or a right angle
bracket (>) are assumed to delimit the end of the token.
The flag --substitute
can be used to make generic substitutions
in the request URI and in the POST
body. Thus, for example,
if the input file to webclient
contains a URL of the form
GET /some/where/blort.html
, and the client is started as
webclient --substitute=blort:page001
then the actual URL that
will be requested will be /some/where/page001.html
.
This substitution mechanism allows webclient
to be used
in perl and shell scripts, where different urls need to be
fetched by different clients, but maintaining dozens or hundreds
of client specific URL files is not desired. Typically,
this flag is used to substitute for user-names and passwords
(see below). Substitutions are carried out in both the
URL's and the POST
bodies. As many --substitute
flags can be specified as needed.
When benchmarking password-protected web sites, each copy of webclient
will typically need to use its own username/password pair. Authentication
by web sites is usually handled in one of two different ways: either by
using the HTTP Authorization mechanism
or by embedding the username and password into the request or post data.
The former approach was
discussed above;
the latter approach can be handled with the -u
flag.
Rather than creating a unique input file for each client, with a
username/password hard-coded into the input file, a substitution can
be performed. Thus, for example, if the input file contains the request
GET /path/to/cgi?login=<<USER>>&idcode=<<PIN>>&pwd=<<PASSWD>>
and you wanted to substitute the values linas
, 1234
and
r00tp4ssw0rd
for the login
, idcode
and pwd
, you could
specify
webclient --substitute=<<USER>>:linas \
--substitute=<<PIN>>:1234
--substitute=<<PASSWD>>:r00tp4ssw0rd
on the command line. Alternately, you can use the abbreviated form with
the -u
flag, by merely specifying
webclient -u linas:1234:r00tp4ssw0rd
webclient
contains a number of facilities to simplify error
detection and reporting. Some of these are described below.
-L
or --log-file
flag. This eliminates
the need for searching the report files for an error
message. Note that multiple clients can safely specify
the same log file: all writes to this log file are
serialized. The log file will remain empty if no
error occurs.-A
or --alarm
flag. The value following
this flag should be a timeout expressed in seconds.
If the webserver fails to respond to a request after
this length of time, an error will be logged in the log file,
and a message will be printed to stdout
.webclient
can leave a user logged in and unable
to log in a second time. To avoid this situation, the
--clean-exit
flag can be used to specify a sequence
of URL's that will be run when webclient
encounters
a fatal error.
See below for more details.webclient
will compute a checksum for the page, and compare it to
the checksum stored in the input file. This checking
can be disabled on a page by page basis, or globally
with the -i
flag. Checksums in the input file
can be recomputed with the -v
flag.
See below for more details.
Some web site designs prevent a user from logging on more than
once at the same time. There are a variety of reasons to design
a web site in this way, and many websites enforce this.
When using webclient
to access such a site, it becomes desirable
to log the user off in the case of an error, so that the user is
not blocked from making future logins.
Note that simply logging off by running webclient
a second time
may not be an option because websites that enforce logins usually
use cookies to keep track of the user. That is, a user cannot
log-off unless they also present the right cookie. When webclient
exits for any reason, the current cookie(s) are lost, and thus
it can become impossible to log-off after webclient
has exited.
In order to work in this environment, a log-off script can be
specified with the --clean-exit
flag.
In the case of an error, or if it is interrupted, webclient
can
be made to send a series of HTTP requests by using the --clean-exit
flag to specify a file containing the HTTP requests to run.
The format for the clean-exit file is the same as the input file.
Errors that can trigger a clean exit include any unexpected HTTP
errors (such as 304 Not Found
, 500 Server Error
, etc), timeouts
(due to the use of the -A
flag), or an interrupt (ctrl-C from the
terminal or SIGUSR1, or SIGINT from a controlling shell script).
Note that this last usage simplifies the management of multiple copies
of webclient
via controlling scripts.
Webclient is designed to check the validity of the data that is returned
for a particular request by calculating a check sum for that
page. It then compares the check sum to the one
that is stored in the session request file. If the check sum does
not match, then webclient
assumes that an error has occurred.
Checksum mismatches normally cause webclient
to print
a detailed error message and trace information, and then stop.
If instead, you want it to continue, and just
print a warning message, then specify the
--warn-checksums
option.
However, checksums can be troublesome when a web page includes variable, changing data, such as the current date or time, or a rotating banner advertisement, or other data that changes daily and/or every time the web page is fetched.
To work around pesky checksum pages, validation can be disabled
in one of two ways: one a per-URL basis, and for the entire run.
Validation can be disabled on a per-url basis simply by editing
the input file, and setting the checksum value equal to "-1".
This will cause validation for that page to be skipped.
Validation of checksums for the entire run can be disabled by
specifying the -i
option to webclient
. In general, it is
important not to disable checksums globally, since if you do,
the server could return completely bogus data and you will never
find out that you are timing a bogus page.
The HTTP header is not included in the checksum calculation; therefore variations in the header due to cookies, expiration date pragmas, or server versions will not affect the checksum.
If the web pages are changing only infrequently, the -v
flag
can be used to recompute the check sums, and output a new
session file with the new checksums in it. Alternately, the
-v
flag can be used to create checksums for a request file
that does not already have them. (In normal operation, the
session file will have been created by webmon
, and webmon
will have computed and written out the appropriate checksum.
This is the preferred mode of operation, as the correctness
of the web page can be visually inspected with webmon.)
Webclient supports the concept of 'think time' in order to better
simulate multi-user loads on a server. The think time is the
amount of time that webclient
pauses between URL requests,
simulating a user who has stopped to read a web page before clicking
on the next hyperlink. The think time may be specified either
in the
input file, or with the
--think-time=<float>
flag. The <float>
parameter
specifes the time, in seconds, as a floating point number.
Think-times that are fixed or are randomly distributed may be specified.
By default, a exponentially random distribution is used, although a
gaussian or a fixed distribution may also be specified. One of these
mutually-exclusive distributions may be specified with the
--think-fixed
, --think-exponential
or the
--think-gaussian
flags.
The image below shows both distributions, for a mean think time of
ten seconds.
The exponential distribution is given by
P(t;m) = (1/m) exp (-t/m)
where m is the mean. The standard deviation of the exponential
distribution is m.
The gaussian distribution is given by
P(t;m) = 2 L t exp (- L t2)
where L = pi/ (4 m2), where pi = 3.14... and
m is the mean. Note that the standard deviation is given by
m sqrt (4/pi -1) = 0.5227... m.
The exponential distribution has been long accepted as an appropriate model for typing behaviour at a computer terminal keyboard. The gaussian distribution, with a small probability of a small think time, might more accurately describe web browser users.
The following flags are not documented above, but are still very important and useful:
Print a command summary and exit.
Print webclient
version info and exit.
Specify the file to which the webclient
report will be written.
What is written to the report file is nearly identical to
what webclient
writes to standard out, unless the --quiet
flag has been specified.
Write a webserver-style access log. The industry-standard logfile
format is used. Note that the ip address written in the logfile
is that of the server that was contacted. The result code is
the result code that webclient
received, and the length is
the length (including the header) that was received. The timestamp
that is printed is taken after the entire message has been received.
Write a trace of all of the HTTP traffic to the indicated file.
Don't collect or print summary performance statistics.
Print individual response time observations.
Print request start and end timestamps.
Minimize the number of messages written to standard out.
Write out each URL as it is fetched. Useful for visually inspecting
the forward progress of webclient
through a series of requests.
Note that this can generate a lot of output on a fast system.
Disable bug-compatability mode. Currently, this flag disables only one bug: the 'Content-Length-off-by-two' bug. In this bug, the Netscape browser will send a POST body with an appended CRLF, and then will set the Content-Length in the header two bytes shorter then the actual message. Unfortunately, some servers, notably Sun url-decoding Java Servlets depend on this incorrect length being set, generating parse errors or NullPointerExceptions if not. The correct HTTP protocol for the ContentLength is documented in RFC2616 Paragraph 4.4. Note that by default, bug-compatibility is enabled, and a warning message willl be printed whenever the bug occurs.
Override number of times that the session will be run. Normally, the number of times that a session will be played is specified in the request input file. The value specified with this flag will override that value.
Pause after each session trial, before starting the next session. Normally, once a session has been completed, a new session is started immediately. You can use this flag to specify a delay between sessions. Alternately, you can specify a think-time after the last URL of the session (or before the first URL of the session), leading to the same effect.
Specify a seed value to be used with the random-number generator used to generate random think times. This flag is useful for getting repeatable think times and thus repeatable results.
Fork this process to run in the background after validating all of
the command-line arguments. This is a handy feature for starting
webclient
from a shell script: if some obvious startup error
occurs, the shell can deal the failing client in the foreground.
Otherwise, once past the initial startup, the client will move to
background, freeing the shell to start another client.
Specify the common shared memory location for webclient
to use.
This flag is required when synchronizing multiple copies of
webclient
; it allows the ramp-up and statistics gathering
phases to be appropriately synchronized, and allows some basic
reporting back to the controlling program.