|
DESCRIPTION
proc rangebar draws one boxplot (also known as box-and-whisker plot).
This is a compact way to display the range and distribution of data.
A rangebar includes a box. The median is displayed within the box, and
the extent of the box shows the interquartile range (the 25th and 75th percentiles).
Tails at either end show either the 5th and 95th percentiles, the minima and maxima,
or the extent of 1.5 interquartile ranges.
proc rangebar may also be used to show means and standard deviations.
FEATURES
Can compute the necessary statistics, or accept externally calculated ones.
Can output the computed statistics; two formats available.
Can produce rangebars based on median and quartiles, or
mean and standard deviations.
Rangebars may be vertical or horizontal.
Allows appearance control over all lines, the bar, and the median indicator.
Automatic display of N as well as number of missing observations.
Display and reporting of outliers.
EXAMPLES
See the Gallery Rangebar page
VARIABLES THAT ARE SET
NVALUES is set to N. If nothing was plotted, this will be equal to 0.
The following variables are set as soon as statistics are computed, whether or
not a plot is actually drawn.
RANGEBARMEDIAN is set to hold the median value.
This might be useful in drawing a line for comparing medians.
Not set when using meanmode.
RANGEBARMEAN is set to hold the mean value.
This might be useful in drawing a line for comparing means.
Set only when using meanmode.
RANGEBARIQRMIN and RANGEBARIQRMAX are set to hold the
values of the bottom and top of the box. Not set when using meanmode.
RANGEBARMIN and RANGEBARMAX are set to hold the
values at the ends of the tails.
UNPLOTTABLE DATA
Invalid values are omitted (however the number of invalids may be shown
using the printm option).
Rangebars that lie completely out of the plotting
area are omitted with a warning.
If outliers are being plotted, outliers that are out of range
are omitted with a warning.
Unless truncate is set to no,
bars and tails are truncated to the bounds of the
plotting area.
MODES
This proc can calculate the median, quartiles, etc.
(use the datafield attribute) before drawing the box plot.
Or it can plot from pre-calculated descriptive statistic values,
if the values or plotfields attributes are used.
There is also meanmode which just calculates mean &
standard deviation and draws an error bar.
To turn on display of outliers, set showoutliers to yes.
Computed statistics may be written to a file, stderr, etc.
This proc may also be used only to compute statistics
(and set variables)
without doing any plotting at all-- see statsonly.
PREREQUISITES
A plotting area must be set up using proc areadef
and proc getdata must be executed to
access or define some data.
MANDATORY ATTRIBUTES
None. Default behavior is for statistics to be calculated from
data field 1 to produce a vertical rangebar against the Y axis
at X location 1.0 and for an N= label to be placed just
above the X axis.
ATTRIBUTES
axis x | y
-
-
Determines which axis to plot against.
x results in horizontal rangebars, while
y results in vertical ones. Default is y.
Example: axis: x
datafield
dfield
-
-
Specifies the data field on which to compute descriptive statistics.
Example: datafield: 2
barloc
plotvalue
-
-
Location where the rangebar is to be rendered.
For vertical boxplots this is a plottable value in X;
for horizontal boxplots this is a plottable value in Y.
Example: barloc: 3
barwidth n
-
-
The width of the box portion of the rangebar in
absolute units
. Default is 0.2 inches.
Example: barloc: 0.1
mediansym
symboldetails
| line
-
-
Specifies the symbol that will be displayed to show the median.
May be a symbol specification (to get dots, etc.) or line which
is the default.
Example: mediansym: shape=diamond
values 5thpercentile 25th median 75th 95th [N]
-
-
Specify pre-computed descriptive statistics that should just
be plotted. 5 plottable values should be given. If a 6th value
is given it is taken to be N, and printn is implied.
Min/max may be substituted for the 5th and 95th percentiles
if desired.
Example: values: 0.3 3.0 5.3 6.2 9.4 236
plotfields 5thpercentile 25th median 75th 95th [N]
or with meanmode: plotfields mean stddev [N]
-
-
Similar to values but rather than literal values, a set of
dfields
are given. The data will be accessed from one row, which may be
specified using plotrecord. (If plotrecord not specified,
data will be taken from the first row by default. Proc processdata
action: select may be used to isolate one row of data.)
Example:
plotfields: 1 2 3 4 5
plotrecord: 1
plotrecord n
-
-
Used with plotfields; indicates which data row to get the values from.
First row is 1. Example: see plotfields above.
If not specified, first row is assumed.
tailmode 5/95 | minmax | 1.5iqr
-
-
Specifies whether the rangebar tails are to extend to the 5th and 95th percentile,
to the min and max, or using 1.5 x IQR.
Only relevant when statistics are computed internally.
Default is 5/95.
Note: 1.5irq mode causes the tails to be rendered using the IQR, which is the
interquartile range (the difference between 75th and 25th percentiles).
This follows the formal Tukey specification, in that the
lower tail will extend from the bottom of the box downward to the nearest data
point on or above (25th - (1.5 x IQR)). Similar for upper tail.
Example: tailmode: minmax
95tics yes | no
-
-
If tailmode is minmax, allows display of 5th and 95th percentile
by adding tics.
Example: 95tics: yes
taildetails
linedetails
-
-
Controls color, width, etc. of tail lines.
Example: taildetails: color=blue width=1.8
outline yes | no
-
-
If yes, box is outlined with a line.
outlinedetails
linedetails
-
-
Controls color, width, etc. of box outline.
color
color
-
-
Specifies the color of the box area.
Example: color: yellow
truncate yes | no
-
-
If yes, bars are truncated to plotting area.
Default is yes.
ticlen n
-
-
Length, in
absolute units
, of the tics which appear at the end of the tails.
Default is 70% of the width of the bar.
meanmode yes | no
-
-
If yes, mean and standard deviation are computed and drawn as
an error bar, instead of a boxplot of median/quartiles, etc.
Mean is shown as a point (style can be controlled using mediansym).
Number of standard deviations may be controlled using nstddevs.
Cannot be used with values; use proc bars instead.
Appearance of the lines may be controlled using taildetails.
logmean yes | no
-
-
If yes, mean and standard deviation are computed in log space.
Useful with data having log characteristics; when plotted in
log space the standard deviations will appear equidistant from the mean.
If log+1 scaling is in effect for the plot, then this feature
will operate in log+1 space (allowing 0.0 values).
skipmed yes | no
-
-
When using meanmode, this attribute may be specified as yes to speed things
up a bit by avoiding computation of medians and percentiles.
Default is always no.
nstddevs n
-
-
Used with meanmode. Specifies the number of standard deviations
to use in each direction when drawing the error bar. Default is 1.
meansym yes |
symboldetails
-
-
Specifies a symbol to be placed at the mean, when a median-based rangebar
is being rendered. If yes, a default symbol (a small black dot)
will be placed at the mean
(its symbol specification is: shape=circle style=filled fillcolor=black radius=0.02)
Other symbols may be rendered by giving
other symboldetails specifications. If no, a mean symbol will not
be rendered. Default is no.
Example: symboldetails: yes
select
conditional-expresion
-
-
Only relevant when computing statistics internally.
Allows cases to be selected for inclusion using a selection expression.
statsonly yes | no
-
-
If yes, statistics will be computed and internal @variables set, but nothing
will be plotted.
PERTAINING TO OUTPUT OF COMPUTED STATS
showstats yes | no | only
-
-
If yes or only, all the computed descriptive statistics
will be written to the showstatsfile.
If only, the statistics will be printed but no bars will be drawn.
showbriefstats yes | no | only
-
-
If yes, and if N > 0, the most important computed descriptive statistics
will be written to the showstatsfile. All fields are on one line, TAB-delimited,
for convenient use by other programs.
The fields are in this order:
-
-
tag, datafield, N, mean, standard deviation, median, min, max, #missing.
-
-
The tag contents may be set using the briefstatstag attribute; if this attribute is not
set then the first result field will be datafield.
briefstatstag text
-
-
Set the contents of the tag that will be written at the beginning of each 'briefstats' record.
This may be useful in identifying cases and groups in the 'brief stats' output.
showstatsfile filename
-
-
If specified, statistics will be written to this file.
If not specified, statistics will be written to the diagnostic stream,
usually stderr.
File will be opened in append mode, so the caller may want
to remove previous contents of the file before invoking ploticus.
PERTAINING TO THE N= and MISSING= LABELS
printn yes | no
-
-
If yes, a label showing N (the number of observations) is produced.
Default is yes.
Example: printn: no
nlocation
locvalue
-
-
Where to position the N label. The label will be aligned with the rangebar.
For vertical rangebars location indicates where to place the label
in Y; for horizontal, X.
Example: nlocation: -4
nword
string
-
-
A template that determines the format of the N label.
Default is N=@@N.
The N value is substituted for the @@N symbol.
Example: (N = @@N)
printmissing yes | no
-
-
If yes, a label showing the number of non-plottable (missing) values is produced.
Only relevant if statistics are calculated internally.
Example: printmissing: yes
mlocation
locvalue
-
-
Where to position the missing values label.
The label will be aligned with the rangebar.
For vertical rangebars location indicates where to place the label
in Y; for horizontal, X.
mword
string
-
-
A template that determines the format of the label showing number of
missing values.
Default is M=@M.
The N value is substituted for the @M symbol.
Example: (@M missing)
ntextdetails
textdetails
-
-
Set the size, color, font or fine-tune the position of the N labels.
Alternate name for backward compatibility: textdetails.
Example: ntextdetails: size=8 style=I
mtextdetails
textdetails
-
-
Set the size, color, font or fine-tune the position of the missing labels.
Example: mtextdetails: size=8 adjust=0,0.05
mwhenexists yes|no
-
-
If yes, the missing label will be displayed only if the number of
missing observations is greater than 0. Default is no.
OUTLIER HANDLING OPTIONS
showoutliers yes | no
-
-
If yes, outliers will be displayed and/or reported upon.
An outlier is any point that is beyond the end of either of the rangebar's tails.
The default way of displaying outliers is circles for the near outliers
and asterisks for the far outliers. The default boundary between near
and far is 3 times the interquartile range.
outlierprint yes | no
-
-
If yes, a report on each outlier will be printed to standard error.
outliernearsym
symboldetails
| none
-
-
Specifies the symbol for displaying near outliers.
Default is a small circle: shape=circle style=outline radius=0.05.
Use none if you are displaying outliers as lines or labels
and don't want any geometric symbol.
outlierfarsym
symboldetails
| none
-
-
Specifies the symbol for displaying far outliers.
Default is an asterisk: shape=circle style=spokes radius=0.05
Use none if you are displaying outliers as lines or labels
and don't want any geometric symbol.
outliernearfarcutoff n
-
-
The boundary between near outliers and far outliers will be n times
the interquartile range. Default is 3.0.
outlierlinelen n
-
-
Display outliers as line segments.
If specified, all outliers (near and far)
are displayed as short line segments rather than symbols.
n will be the length of these segments, in
absolute space
. The color and width of these line segments may be controlled using
outlierlinedetails.
Example: outlierlinelen: 0.1
outlierlabelfield
dfield
-
-
If specified, the contents of this field will appear as a text label for each outlier.
For vertical rangebars the label will appear a bit to the right
of the point.
For horizontal rangebars the label will appear in rotated text, a bit above
the point.
The size, style, and location may be adjusted if desired using the outlierlabeldetails
attribute.
outlierlabeldetails
textdetails
-
-
Details related to the outlier labels.
outlierlinedetails
linedetails
-
-
When outliers are being rendered as line segments, specifies
color, line width, etc. of these line segments.
|
data display engine
Copyright Steve Grubb
|