sar(ADM)

sar, cpusar, mpsar, sa1, sa2, sadc -- system activity report package

Syntax

/usr/bin/sar [ -abBcdghLmnOpqrRSuvwy ] [ -A ] [ -o file ] t [ n ]
/usr/bin/sar [ -abBcdghLmnOpqrRSuvwy ] [ -A ] [ -s time ] [ -e time ] [ -i sec ]
[ -f file]

/usr/bin/cpusar -P cpu [ -IjQu ] [ -A ] t [ n ]
/usr/bin/cpusar -P cpu [ -IjQu ] [ -A ] [ -s time ] [ -e time ] [ -i sec ] [ -f file ]

/usr/bin/mpsar [ -abBcdFghILmnOpQqrRSuvwy ] [ -A ] [ -o file ] t [ n ]
/usr/bin/mpsar [ -abBcdFghILmnOpQqrRSuvwy ] [ -A ] [ -s time ] [ -e time ]
[ -i sec ] [ -f file ]

/usr/lib/sa/sadc [ t n ] [ ofile ]
/usr/lib/sa/sadc R ofile

/usr/lib/sa/sa1 [ t [ n ] ]

/usr/lib/sa/sa2 [ -abBcdghLmnOpqrRSuvwy ] [-s time ] [-e time ] [-i sec ]

Description

sar reports system activity on single processor systems. cpusar and mpsar report activity on multiprocessor machines.

These utilities report the status of counters in the operating system that are incremented as the system performs various activities. These include counters for CPU utilization, buffer usage, disk I/O activity, TTY device activity, switching and system-call activity, file access, queue activity, inter-process communications, swapping and paging.

See ``Managing performance'' in the Performance Guide for more information about performance tuning.

cpusar reports activity for an individual CPU specified by the cpu argument to the -P option. cpu must be a number in the range from 1, the base processor, to the number of processors configured on the system. cpusar reports the data on CPU utilization, inter-CPU interrupts, and number of locked processes.

mpsar reports activity for an entire SMP system by combining the data of all the CPUs.

sar, cpusar, and mpsar sample cumulative activity counters in the operating system at n intervals of t seconds, where t should be 1 or greater. The default value of n is 1.

Gathering data for later examination

If you specify the -o option, sar and mpsar save the sampled data in a binary format file for later analysis. You can read the file later by specifying it with the -f option.

cpusar cannot save data. It can read a data file written by mpsar. This file contains the per-CPU data required by cpusar.

If you do not specify an input file or time interval, sar, cpusar, and mpsar read the standard system activity daily data file /usr/adm/sa/sadd for the current day of the month (dd).

You can select a section of the report file by specifying the start and end times with the -s and -e options. The time argument to these takes the form hh[:mm[:ss]]. Here hh is the hour (00-23), mm is the minute (00-59), and ss is the second (00-59). An example would be 20:30.

The -i option selects records at sec second intervals. Otherwise, all intervals found in the data file are reported.

Any information displayed as ``/s'' is a per second average over the interval t. Each of these values is calculated as the total number of occurrences of the event, over the duration of the interval t, divided by the interval t.

Options to examine use of the CPU(s)

The following options report on different aspects of CPU activity:

-q, -R, -u

On SMP systems, the following options may also be used:

-F, -I, -j, -Q

See ``Tuning CPU resources'' in the Performance Guide for more information.

Options to examine use of memory

The following options report on different aspects of virtual memory activity and usage:

-p, -r, -v, -w

See ``Tuning memory resources'' in the Performance Guide for more information.

Options to examine use of I/O

The following options report on different aspects of I/O activity including file access, buffer cache, block devices, namei cache, and serial devices:

-a, -b, -B, -d, -g, -h, -n, -O, -S, -y

See ``Tuning I/O resources'' in the Performance Guide for more information.

Options to examine system call utilization

The following options report on different aspects of system call activity:

-c, -L, -m

See ``Tuning system call activity'' in the Performance Guide for more information.

sar, cpusar, and mpsar options

The following options are common to sar, cpusar, and mpsar:

-A

Equivalent to specifying all the options that do not require arguments.

-u

Report CPU utilization (the default report if no other is specified):

%usr: percentage of time running in user mode
%sys: propercentage of time running in system mode
%wio: percentage of time idle with processes waiting for I/O
%idle: percentage of time otherwise idle

On a typical timesharing system, %sys and %usr have about the same value. In special applications, either of these can be larger than the other without anything being abnormal, depending on the number of system calls that the applications mix makes.

A value of %wio consistently greater than 15% generally means a disk bottleneck. If the value of %idle is close to 0, and the value of %sys is high and much greater than %usr, the bottleneck may be caused by excessive swapping and paging due to a memory shortage.

If the value of %idle is consistently close to 0, together with degraded response time for the users, this may imply that the system is swapping or paging due to memory shortage.

%idle is normally higher than 40% on multiuser systems, even those with a large number of active users. When this figure falls consistently below 30%, the critical resource is processor power. (Run ps(C) to check if the excessive CPU usage is due to a runaway process that is stealing every spare CPU cycle.)

If you are running a large number of users, it may help to use intelligent serial cards or network terminal concentrators to take some of the burden off the CPU.

Database servers or machines running computationally intensive processes should normally be expected to run with a very low value of %idle at peak usage.

In addition, examine the crontab(C) files to see which jobs can be rescheduled to run at off-peak times. Encourage users to run large, non-interactive commands at off-peak hours using batch(C). Run commands at a lower priority using nice(C), or use renice(C) to set a lower priority on existing processes.

On SMP systems, a CPU's usage is reported as ``idle'' if it has been turned off with cpuonoff(ADM).

sar and mpsar options

Options that control the report activity for both sar and mpsar are:

-a

Report use of file access operations:

iget/s: number of files located by inode entry per second
namei/s: number of filesystem path searches per second
dirblk/s: number of directory block reads issued per second

The larger the values reported, the more time the kernel spends accessing user files. This indicates how heavily programs and applications are using the filesystem(s). Since pathnames usually have several components, iget/s is always greater than namei/s. In general, if the ratio of iget/s to namei/s is consistently high, the organization of your filesystem may be inefficient.

-b

Report buffer cache activity:

bread/s: number of kilobytes read per second from disk and other block devices which were not found in the buffer cache
lread/s: total number of kilobytes read per second
%rcache: read cache hit percentage, (lread/s-bread/s)100/lread/s
bwrit/s: number of kilobytes written per second from the buffer cache to block devices by the buffer flushing daemon (bdflush)
lwrit/s: total number of kilobytes written per second
%wcache: write cache hit percentage, (lwrit/s-bwrit/s)100/lwrit/s
pread/s: number of reads via the raw (physical) device mechanism
pwrit/s: number of writes via the raw (physical) device mechanism

The cache hit ratios, %rcache and %wcache, measure the effectiveness of system buffering. If %rcache and %wcache are consistently low, it may be possible to improve performance by increasing the number of buffers and buffer hash queues (by adjusting the NBUF and NHBUF tunable parameters). If your application is I/O-intensive and you have a large memory configuration, you may want to tune the buffer cache so that the values of %rcache and %wcache are high.

sar should be run for the entire duration of an I/O-intensive application to establish if the size of the buffer cache is adequate.

-B

Report copy buffer activity:

cpybuf/s: number of copy buffers required per second
slpcpybuf/s: number of times per second that processes had to sleep while waiting for a copy buffer

Copy buffers are needed by some DMA controllers, disk controllers, and SCSI host adapters that cannot perform DMA to buffers that lie above the first 16MB of memory. If slpcpybuf/s is consistently greater than 0, increase the number of copy buffers (set by the multiphysical buffer parameter NMPBUF), increase the percentage of buffers in the first 16MB of memory (set by PLOWBUFS), or upgrade to controllers that support 32-bit DMA addressing.

-c

Report system calls:

scall/s: total number of system calls per second
sread/s: number of read(S) calls per second
swrit/s: number of write(S) calls per second
fork/s: number of fork(S) calls per second
exec/s: number of exec(S) calls per second
rchar/s: number of characters read per second (by read)
wchar/s: number of characters written per second (by write)

This report is of interest mostly to programmers who are testing application programs.

Typically, reads plus writes account for about half of the total system calls, although this varies greatly with the activities that are being performed by the system. If scall/s is high over an extended period of time, or the number of characters read or written per read or write call is low, this may indicate inefficient application code.

Note that vmstat(C) also provides cumulative statistics about system calls executed (-s option) and number of forks (-f option).

-d

Report activity for each block device, for example, disk drives, floppy disk drives, and SCSI devices including tape, CD-ROM, and floptical drives. When data is displayed, the device specification, device, represents the device type. For example, Sdsk0 represents the first SCSI disk.

The activity data reported is:

device: name of the device whose activity is being reported
%busy: percentage of time device was busy servicing a transfer request
avque: average number of requests outstanding
r+w/s: number of data transfers to and from the device per second
blks/s: number of 512-byte blocks transferred per second
avwait: average time, in milliseconds, that transfer requests wait idly on queue
avserv: average time, in milliseconds, for request to be serviced (for disks, this includes seek, rotational latency, and data transfer times)

Note that avque and avwait are measured only while the queue is occupied. If %busy is small, large queues and service times probably represent periodic flushes of the buffer cache to disk.

The optimum setup is to keep %busy high and avque low by balancing filesystems and swap areas across all disks, disk controllers, and host adapters.

-g

Report on serial I/O:

ovsiohw/s: number of serial I/O hardware interrupt overruns per second
ovsiodma/s: number of serial I/O DMA cache overflows per second
ovclist/s: number of character list buffer overflows per second

If the value of ovclist/s (and warnings on the console) shows that the character list buffers are overflowing, increase the value of NCLIST. You can increase the number of clist buffers on a temporary basis using setconf(ADM) to avoid having to relink and reboot the kernel immediately.

-h

Report scatter-gather and physical I/O, and DMA transfer buffer statistics:

mpbuf/s: number of filesystem scatter-gather buffers allocated per second
ompb/s: number of times the system ran out of scatter-gather buffers per second
mphbuf/s: number of scatter-gather request headers allocated per second
omphbuf/s: number of times the system ran out of scatter-gather request headers per second
pbuf/s: number of asynchronous I/O (AIO) buffers allocated per second
spbuf/s: number of times per second processes had to sleep waiting for AIO buffers
dmabuf/s: number of DMA transfer buffers allocated per second
sdmabuf/s: number of times per second processes had to sleep waiting for DMA transfer buffers

If the value of ompb/s or sdmabuf/s is consistently greater than 0, increase the value of the multiphysical buffer parameter NMPBUF.

-L

Report on those latches whose information changes during the sampling interval:

name: name of the latch
sleep/s: number of times per second processes had to sleep due to blocking on the latch
usp-sl/s: number of times processes, which are blocked spinning, are made to sleep because another process has acquired the latch and set it to sleep-type
ksp-sl/s: number of times that the kernel modified the latch to sleep-mode blocking
sp-acq: number of times that processes, which were blocked spinning, acquired the latch

-m

Report System V interprocess communication (IPC) message queue and semaphore activity:

msg/s: number of message primitives per second
sema/s: number of semaphore primitives per second

If you are not running application programs that use message queues and semaphores, these should print as 0.00. If you are using these facilities and the value of sema/s is high, the application may not be using IPC efficiently.

-n

Report namei cache statistics:

c_hits: number of namei cache hits.
cmisses: number of namei cache misses.
hit %: namei cache hit percentage, c_hits100/(c_hits+cmisses).

The namei cache improves the time required to search a full pathname when first accessing a file. Generally, the higher the value of hit %, the better. If hit % is consistently low, increase the values of CACHEENTS. Note that pathname elements longer than 14 characters are not cached.

-O

Report asynchronous I/O (AIO) requests:

read/s: number of AIO read requests per second
write/s: number of AIO write requests per second
blks/s: total number of blocks being handled asynchronously (both read and write) per second
%direct: percentage of AIO requests being passed to the relevant disk driver by the POSIX.1b aio functions. Requests that are not passed directly to the disk driver are handled by the aio(HW) driver

-p

Report paging activities:

vflt/s: address translation page faults (valid page not in memory) per second
pflt/s: page faults per second caused by attempts to write to a page marked ``copy-on-write'' (COW), or by protection errors (illegal access to page)
pgfil/s: address translation faults per second satisfied by paging in from filesystem
rclm/s: pages added to the free list per second

High values of vlft/s can indicate that the application programs that you are running are not efficient for a paging system due to poor locality of reference.

-q

Report average run and swap queue length while occupied, and percentage of time occupied:

runq-sz: number of runnable processes in memory
%runocc: percentage occupancy of the run queue in memory
swpq-sz: number of runnable processes on swap
%swpocc: percentage occupancy of the run queue on the swap device(s)

If runq-sz is greater than 2 and %runocc is greater than 90%, the CPU is heavily loaded and response time may be degraded. In this case, upgrading to a more powerful CPU or a multiprocessing configuration might improve system response time. If your applications perform a large number of floating-point calculations, you should consider adding floating-point hardware if this does not already exist on your system.

If %swpocc is non-zero, your system has swapped out processes; adding more memory, or reducing the number of buffers may help reduce swapping and paging activity.

-r

Report unused memory pages and swap area disk blocks:

freemem: number of 4KB pages available to user processes
freeswp: number of physical disk (512-byte) blocks available for swapping and paging

The Average statistics also display a count of the number of samples taken.

If freemem drops below GPGSLO at a clock tick, the page stealer (vhand) starts adding clean (that is, text or unchanged) or dirty (changed) pages to the free list until freemem is greater than GPGSHI; it also copies dirty pages to swap. If freemem drops to zero between clock ticks, the swapper (sched) becomes active and starts swapping out whole processes to disk.

Steady drops in the value of freeswp also indicate that the system is swapping or paging. The -l option to swap(ADM) shows the total size of the swap area(s) in blocks.

-R

Report on process scheduling activity:

dptch/s: number of times the dispatcher is run per second
idler/s: number of times the idler is run per second
swidle/s: number of times the idler is switched to per second

-S

Report SCSI request block statistics:

reqblk/s: number of SCSI request blocks allocated per second
oreqblk/s: number of times per second that the system ran out of SCSI request blocks

If oreqblk/s is consistently greater than 0, increase the value of SDSKOUT using configure(ADM).

-v

Report status of kernel tables:

proc-sz: used entries and grown size of process table
inod-sz: used entries and grown size of in-core inode table
file-sz: used entries and grown size of open file table
lock-sz: used entries and grown size of record lock table
ov: overflows between sampling points for each table

If set to values greater than zero, the tunables MAX_PROC, MAX_INODE, MAX_FILE, and MAX_FLCKREC determine the maximum possible size to which the process, in-core inode, file, and record lock tables may grow. The pstat(C) command provides similar information in a different format.

-w

Report system swapping and switching activity:

swpin/s: number of transfers from swap into memory per second (includes initial loading of some programs)
bswin/s: number of 4KB pages transferred from swap into memory per second (includes initial loading of some programs)
swpot/s: number of transfers from memory to swap per second
bswot/s: number of 4KB pages transferred from memory to swap per second
pswch/s: context switches per second

If swpot/s is greater than zero, you may need to increase the amount of memory or reduce the number of buffers. Disk accesses are significantly slower than memory accesses, so minimizing activity to swap is important for good system performance. The -s option of vmstat(C) also provides information about swapping and context-switching activity.

-y

Report terminal (tty) device activity:

rawch/s: number of raw input characters per second
canch/s: number of input characters processed per second in the canonical input queue
outch/s: number of characters output per second
rcvin/s: number of receive hardware interrupts per second
xmtin/s: number of transmit hardware interrupts per second
mdmin/s: number of modem interrupts per second

Not all terminal drivers are written to produce these statistics. Most SCO OpenServer serial, console, and pseudo-terminal drivers and drivers produced by third-party vendors produce the rawch/s, canch/s, and outch/s statistics.

The ratios of rcvin/s to rawch/s, rcvin/s to canch/s, and xmtin/s to outch/s should be fairly constant. For non-intelligent I/O cards, these ratios should be close to 1. Intelligent I/O drivers move many characters per interrupt, and some drivers do not even use interrupts, so the ratios should be lower in these cases.

If the number of interrupts per transmitted character starts to increase dramatically, this may indicate a bad line generating extraneous interrupts. A faulty modem line may be indicated if mdmin/s is significantly greater than 0.

cpusar options

The following option is specific to cpusar:

-j

Report number of interrupts serviced by each interrupt handler for the specified CPU:

vector: name of interrupt handler
ints/s: number of interrupts per second serviced by the handler

mpsar options

The following option is specific to mpsar:

-F

Report on floating-point activity:

prfp: number of processes requiring floating-point hardware
%prfp: percentage of processes requiring floating-point hardware
prfpem: number of processes requiring floating-point emulation
%prfpem: percentage of processes requiring floating-point emulation

cpusar and mpsar options

The following options are common to cpusar and mpsar:

-I

Report on inter-CPU activity:

cpuint_snd/s: inter-CPU interrupts sent per second
cpuint_rcv/s: inter-CPU interrupts received per second
IOcpuints/s: inter-CPU interrupts per second for I/O

-Q

Report number of processes locked to processors. This produces a snapshot of activity at the end of the specified period:

pltoCPU: number of processes locked to processors
%pltoCPU: percentage of total number of processes running on the system locked to processors

Data gathering scripts

sadc and the shell scripts, sa1 and sa2, can be used to sample, save, and process system activity data.

sadc is a data collector. It samples system data n times, with an interval of t seconds between samples, and writes in binary format to ofile or to standard output. The sampling interval t should be greater than 5 seconds; otherwise, the activity of sadc itself may affect the sample. If t and n are omitted, a special record is written.

The R flag to sadc marks the time at which the system counters are reset to 0 when the system goes to multiuser mode. The /etc/init.d/perf file writes the restart mark to the daily data file using the command:

   su sys -c "/usr/lib/sa/sadc R /usr/adm/sa/sa`date +%d`"

sa1 collects and stores data in the binary file /usr/adm/sa/sadd where dd is the current day of the month. The arguments t and n cause records to be written n times at an interval of t seconds, or once if n is omitted.

sa2 writes a readable daily report to the file /usr/adm/sa/sardd.

Enabling and disabling system activity recording

Use the sar_enable(ADM) command to enable or disable system activity recording.

If system activity recording is enabled, the following entries in /usr/spool/cron/crontabs/sys (see crontab(C)) produce records every 20 minutes during working hours and hourly otherwise:

   0 * * * 0-6 /usr/lib/sa/sa1
   20,40 8-17 * * 1-5 /usr/lib/sa/sa1

The following /usr/spool/cron/crontabs/root entry produces a readable report of all activities every 20 minutes during the working day:

   5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A

The result of running the sa1 or sa2 scripts is to make a record of the system's daily activity for the previous month in the directory /usr/adm/sa as binary files (sadd), or ASCII reports (sardd).

Exit values

sar exits with 0 upon successful completion. It exits with value 2 if an invalid option is specified. It exits with value 1 for all other errors.

Diagnostics

The following messages are common to sar, cpusar, and mpsar:

sar: Incompatible start and end times specified (etime <= stime): Start and end times have been specified but the end time is the same as, or before, the start time.
sar: Time step and/or number of steps requested are invalid: The time interval or number of intervals specified are not integer values, are negative values or are otherwise invalid.
sar: Can't open filename: The input file specified with the -f option cannot be opened.
sar: ofile same as ffile: The specified input and output files are identical.
sar: argument -- illegal argument for option option: The argument specified for option is invalid.
sar: cannot open namelist file: The namelist file (/unix) cannot be opened.
sar: read error in namelist file: The namelist file (/unix) cannot be read.
sar: namelist not in a.out format: The format of the namelist file (/unix) is not in a readable format.
sar: read error in section headers: A read error occurred when reading the section headers for the namelist file (/unix).
sar: .text, .data, or .bss was not found in section headers: A region was not found in the section headers for the namelist file (/unix).
sar: cannot allocate space for namelist: Space could not be allocated for the kernel namelist.
sar: error in seeking to string table: An error occurred while trying to find the string table in the namelist file.
sar: read error for string table size: An error occurred while trying to read the string table size from the namelist file.
sar: cannot allocate space for string table: Space could not be allocated for the string table.
sar: read error in string table: An error occurred while trying to read the string table in the namelist file.

The following diagnostic messages are specific to cpusar:

cpusar: Cannot monitor CPU cpu: An unserialized CPU has been specified with the -P option. cpu must be between 1 and the number of serialized CPUs. CPUs inactivated with cpuonoff remain valid.
cpusar: This is not an SMP system: The installed system does not have an SCO OpenServer SMP^® License.

Examples

To see today's CPU activity so far (if activity recording is enabled on your system):

sar

To watch CPU activity evolve for 10 minutes and save data once per minute:

sar -o /tmp/sartmp 60 10

To review activity on block devices from that period:

sar -d -f /tmp/sartmp

To watch interrupt handler activity on the first additional CPU of a multiprocessor system for 10 minutes at 10 second intervals:

cpusar -P 2 -j 10 60

To watch CPU activity evolve for 10 minutes and save data:

mpsar -o /tmp/mpsartmp 60 10

To review activity on block devices from that period:

mpsar -d -f /tmp/mpsartmp

To review base CPU utilization and interrupt activity from that period:

cpusar -P 1 -u -j -f /tmp/mpsartmp

Limitations

Suggested numeric values quoted for sar statistics should be treated as examples only. They may not be desirable, optimal, or achievable on all systems. A desired performance goal should take into account the mix of applications running on a system and the underlying limitations imposed by hardware.

Running multiple copies of sar can affect the results. Data collection is performed automatically by the kernel and is extracted using sar. It is the extraction process, not the collection, that consumes resources, therefore results produced when running multiple copies might not reflect the actual performance of the system.

The current version of sar is compatible with older versions of sar. Any data files saved with older versions can be read with the current version.

Files

/usr/adm/sa/sadd: daily data file
/usr/adm/sa/sardd: daily report file
/usr/lib/sa/sa.adrfl: address file

Standards conformance

sa1, sa2, sadc and sar are conformant with AT&T SVID Issue 2.

cpusar and mpsar are not part of any currently supported standard; they are extensions of AT&T System V provided by The Santa Cruz Operation, Inc.