Dataplot Vol 1 Vol 2

# HISTOGRAM

Name:
... HISTOGRAM
Type:
Graphics Command
Purpose:
Generates a histogram.
Description:
A histogram is a graphical data analysis technique for summarizing the distributional information of a variable. The response variable is divided into equal sized intervals (or bins). The number of occurrences of the response variable is calculated for each bin. The histogram consists of:

 Vertical axis = frequencies or relative frequencies; Horizontal axis = response variable (i.e., the mid-point of each interval).

There are 4 types of histograms:

1. histogram (absolute counts);
2. relative histogram (converts counts to proportions);
3. cumulative histogram;
4. cumulative relative histogram.

The histogram and the frequency plot have the same information except the histogram has bars at the frequency values, whereas the frequency plot has lines connecting the frequency values.

Syntax 1:
<type> <x>             <SUBSET/EXCEPT/FOR qualification>
where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM, CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
<x> is a variable of raw data values;
and where the <SUBSET/EXCEPT/FOR qualification is optional.

This syntax is used when you have raw data. Note that <x> can be either a variable or a matrix. If <x> is a matrix, then a histogram will be generated for all values in that matrix.

Syntax 2:
<type> <y> <x>             <SUBSET/EXCEPT/FOR qualification>
where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM, CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
<y> is a variable containing pre-computed frequencies;
<x> is a variable containing the bin mid-points;
and where the <SUBSET/EXCEPT/FOR qualification is optional.

This syntax is used when you have grouped data with equi-sized bins.

Syntax 3:
<type> <y> <xlow> <xhigh>             <SUBSET/EXCEPT/FOR qualification>
where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM,
CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
<y> is a variable containing pre-computed frequencies;
<xlow> is a variable containing the lower limits for the bins;
<xhigh> is a variable containing the upper limits for the bins; and where the <SUBSET/EXCEPT/FOR qualification is optional.

This syntax is used when you have grouped data with unequal sized bins.

Syntax 4:
SUBSET <type> <y> <x>             <SUBSET/EXCEPT/FOR qualification>
where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM, CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
<y> is a variable of raw data values;
<x> is a group-id variable; and where the <SUBSET/EXCEPT/FOR qualification is optional.

This syntax can be used to highlight the contribution to the histogram for particular subsets of the data. It is demonstrated in the program examples below.

Examples:
HISTOGRAM TEMP
RELATIVE HISTOGRAM TEMP
CUMULATIVE HISTOGRAM TEMP
CUMULATIVE RELATIVE HISTOGRAM TEMP
HISTOGRAM COUNTS STATE
RELATIVE HISTOGRAM COUNTS STATE
CUMULATIVE HISTOGRAM COUNTS STATE
CUMULATIVE RELATIVE HISTOGRAM COUNTS STATE
Note:
The appearance of the bars on the histogram (i.e., whether they are filled or not, the line width of the bar border, etc.) are controlled by the various bar attribute commands. A few are listed in the RELATED COMMANDS section below. See the documentation for the BAR command for a complete list of the bar attribute commands. This is demonstrated with the sample program below.
Note:
You can extract a frequency table from the histogram with the following commands:

HISTOGRAM Y
LET YFREQ = YPLOT
LET XVAL = XPLOT

Then the variables YFREQ and XVAL contain a frequency table. You can also use the

LET Y2 X2 = BINNED Y

command for this purpose.

Note:
By default, DATAPLOT uses a class width of 0.3 X the standard deviation of the variable. Use the CLASS WIDTH command to override this default. DATAPLOT also tends to generate a large number of zero frequency classes at the lower and upper tails. This tends to compress the histogram on the horizontal axis. Use the XLIMITS command or the CLASS LOWER and CLASS UPPER commands to avoid plotting these zero frequency classes.

A number of alternative choices for class width can be set with the command

SET HISTOGRAM CLASS WIDTH

Enter HELP HISTOGRAM CLASS WIDTH for details.

Note:
By default, Dataplot sets the lower and upper class limits to xbar -/+ 6*s (with xbar and s denoting the sample mean and standard deviation, respectively). This can occassionally result in a few outlying points being excluded from the histogram. To adjust the lower and upper class limits so that these outlying points are included, enter the command

SET HISTOGRAM OUTLIERS ON

To revert to the default, enter

SET HISTOGRAM OUTLIERS OFF
Note:
By default, the histogram draws all cells, even those with zero frequency. To suppress these zero frequency cells, enter

SET HISTOGRAM EMPTY BINS OFF

To restore the default, enter

SET HISTOGRAM EMPTY BINS ON
Note:
Previously, Dataplot only generated histograms for the case where the bin widths were equal. This has been extended to the case with unequal bin widths. The syntax is

HISTOGRAM Y XLOW XHIGH

with XLOW containing the values for the lower bin limit and XHIGH containing the values for the upper bin limit.

Note:

SUBSET HISTOGRAM Y X

In this case, X is a group-id variable. This syntax can be used to highlight the contribution to the histogram for particular subsets of the data.

Note:
There are two methods for relative histograms.

The first method simply divides the count in the bin by the total count. That is, the relative frequency of the i-th bin is $$n_{i}/\sum{n_{i}}$$ where $$n_{i}$$ is the count of the i-th bin. In this case, the sum of the relative frequencies is one. To specify this method, enter the command

SET RELATIVE HISTOGRAM PERCENT

The second method normalizes the counts so that the area sums to one. That is, the relative frequency of the i-th bin is $$n_{i}/\sum{n_{i} c_{i}}$$ where $$c_{i}$$ is the width of the i-th bin. To specify this method, enter the command

SET RELATIVE HISTOGRAM AREA

The advantage of the AREA method is that it makes the relative histogram an estimator of the underlying probability distribution. The histogram in this case is actually a simple kernel density estimator of the underlying distribution of the data. This is not the case when the PERCENT option is used.

The default is AREA.

Default:
None
Synonyms:
A synonym for CUMULATIVE RELATIVE HISTOGRAM is RELATIVE CUMULATIVE HISTOGRAM. HIGHLIGHT is a synonym for SUBSET in syntax 4.
Related Commands:
 FREQUENCY PLOT = Generate a frequency plot. KERNEL DENSITY PLOT = Generate a kernel density plot. PERCENT POINT PLOT = Generate a percent point plot. PROBABILITY PLOT = Generate a probability plot. PPCC PLOT = Generates probability plot correlation coefficient plot. PLOT = Generate a data or function plot. CLASS LOWER = Set the lower class minimum for histograms, frequency plots, and pie charts. CLASS UPPER = Set the upper class maximum for histograms, frequency plots, and pie charts. CLASS WIDTH = Set the class width for histograms, frequency plots, and pie charts. HISTOGRAM CLASS WIDTH = Specify alternative default class wdith algorithms for histograms. MINIMUM = Set the frame minima for all plots. MAXIMUM = Set the frame maxima for all plots. LIMITS = Set the frame limits for all plots. BARS = Set the on/off switches for plot bars. BAR WIDTH = Set the widths for plot bars. BAR FILL = Set the on/off switches for plot bar fills. BAR PATTERN = Set the types for bar fill patterns. BAR BORDER LINE = Set the types for bar border lines.
Reference:
Applications:
Data Analysis
Implementation Date:
Pre-1987
2004/09: Support alternative class width algorithms
2007/03: Option to compute histogram of a matrix
2010/01: Support for HIGHLIGHT/SUBSET option
2010/01: Support for non-equispaced histograms
2010/01: Option to suppress empty bins
2010/01: Option to include outliers
Program 1:

LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 1000
MULTIPLOT 2 2
MULTIPLOT SCALE FACTOR 2
MULTIPLOT CORNER COORDINATES 0 0 100 100
XLIMITS -5 5
TITLE CASE ASIS
TITLE OFFSET 2
TITLE Counts Histogram
HISTOGRAM Y
BAR FILL ON
TITLE Relative Histogram
RELATIVE HISTOGRAM Y
BAR FILL OFF
BAR BORDER THICKNESS 0.3
TITLE Cumulative Counts Histogram
CUMULATIVE HISTOGRAM Y
BAR FILL ON
BAR PATTERN D1
BAR PATTERN SPACING 3
TITLE Cumulative Relative Histogram
CUMULATIVE RELATIVE HISTOGRAM Y
END OF MULTIPLOT

Program 2:

. Demonstrate the SUBSET option
skip 25
read rehm.dat y1 y2 x1 x2
.
bar on on on
bar fill on on on
bar fill color lblue red
line blank blank
xlimits 350 650
.
multiplot 2 2
let tag = x2
let tag = 1 subset x2 = 1
let tag = 2 subset x2 <> 1
title Red = Patient 1
highlighted hist y1 tag
let tag = 1 subset x2 = 2
let tag = 2 subset x2 <> 2
title Red = Patient 2
highlighted hist y1 tag
let tag = 1 subset x2 = 3
let tag = 2 subset x2 <> 3
title Red = Patient 3
highlighted hist y1 tag
bar fill color lblu blue red
title Red = Patient 1, Blue = Patient 2
highlighted hist y1 x2
end of multiplot
.
xlimits
move 50 97
just center
case asis
text Highlighted Histograms for REHM.DAT (Y1 = High Air Flow and X2 = Patient ID)

NIST is an agency of the U.S. Commerce Department.

Date created: 11/30/2010
Last updated: 10/14/2015