SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Auxillary Chapter

HISTOGRAM

Name:
    ... HISTOGRAM
Type:
    Graphics Command
Purpose:
    Generates a histogram.
Description:
    A histogram is a graphical data analysis technique for summarizing the distributional information of a variable. The response variable is divided into equal sized intervals (or bins). The number of occurrences of the response variable is calculated for each bin. The histogram consists of:

      Vertical axis = frequencies or relative frequencies;
      Horizontal axis = response variable (i.e., the mid-point of each interval).

    There are 4 types of histograms:

    1. histogram (absolute counts);
    2. relative histogram (converts counts to proportions);
    3. cumulative histogram;
    4. cumulative relative histogram.

    The histogram and the frequency plot have the same information except the histogram has bars at the frequency values, whereas the frequency plot has lines connecting the frequency values.

Syntax 1:
    <type> <x>             <SUBSET/EXCEPT/FOR qualification>
    where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM, CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
                <x> is a variable of raw data values;
    and where the <SUBSET/EXCEPT/FOR qualification is optional.

    This syntax is used when you have raw data. Note that <x> can be either a variable or a matrix. If <x> is a matrix, then a histogram will be generated for all values in that matrix.

Syntax 2:
    <type> <y> <x>             <SUBSET/EXCEPT/FOR qualification>
    where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM, CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
                <y> is a variable containing pre-computed frequencies;
                <x> is a variable containing the bin mid-points;
    and where the <SUBSET/EXCEPT/FOR qualification is optional.

    This syntax is used when you have grouped data with equi-sized bins.

Syntax 3:
    <type> <y> <xlow> <xhigh>             <SUBSET/EXCEPT/FOR qualification>
    where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM,
    CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
    <y> is a variable containing pre-computed frequencies;
                <xlow> is a variable containing the lower limits for the bins;
                <xhigh> is a variable containing the upper limits for the bins; and where the <SUBSET/EXCEPT/FOR qualification is optional.

    This syntax is used when you have grouped data with unequal sized bins.

Syntax 4:
    SUBSET <type> <y> <x>             <SUBSET/EXCEPT/FOR qualification>
    where <type> is one of HISTOGRAM, RELATIVE HISTOGRAM, CUMULATIVE HISTOGRAM, or CUMULATIVE RELATIVE HISTOGRAM;
                <y> is a variable of raw data values;
                <x> is a group-id variable; and where the <SUBSET/EXCEPT/FOR qualification is optional.

    This syntax can be used to highlight the contribution to the histogram for particular subsets of the data. It is demonstrated in the program examples below.

Examples:
    HISTOGRAM TEMP
    RELATIVE HISTOGRAM TEMP
    CUMULATIVE HISTOGRAM TEMP
    CUMULATIVE RELATIVE HISTOGRAM TEMP
    HISTOGRAM COUNTS STATE
    RELATIVE HISTOGRAM COUNTS STATE
    CUMULATIVE HISTOGRAM COUNTS STATE
    CUMULATIVE RELATIVE HISTOGRAM COUNTS STATE
Note:
    The appearance of the bars on the histogram (i.e., whether they are filled or not, the line width of the bar border, etc.) are controlled by the various bar attribute commands. A few are listed in the RELATED COMMANDS section below. See the documentation for the BAR command for a complete list of the bar attribute commands. This is demonstrated with the sample program below.
Note:
    You can extract a frequency table from the histogram with the following commands:

      HISTOGRAM Y
      LET YFREQ = YPLOT
      LET XVAL = XPLOT

    Then the variables YFREQ and XVAL contain a frequency table. You can also use the

      LET Y2 X2 = BINNED Y

    command for this purpose.

Note:
    By default, DATAPLOT uses a class width of 0.3 X the standard deviation of the variable. Use the CLASS WIDTH command to override this default. DATAPLOT also tends to generate a large number of zero frequency classes at the lower and upper tails. This tends to compress the histogram on the horizontal axis. Use the XLIMITS command or the CLASS LOWER and CLASS UPPER commands to avoid plotting these zero frequency classes.

    A number of alternative choices for class width can be set with the command

      HISTOGRAM CLASS WIDTH

    Enter HELP HISTOGRAM CLASS WIDTH for details.

Note:
    By default, Dataplot sets the lower and upper class limits to xbar -/+ 6*s (with xbar and s denoting the sample mean and standard deviation, respectively). This can occassionally result in a few outlying points being excluded from the histogram. To adjust the lower and upper class limits so that these outlying points are included, enter the command

      SET HISTOGRAM OUTLIERS ON

    To revert to the default, enter

      SET HISTOGRAM OUTLIERS OFF
Note:
    By default, the histogram draws all cells, even those with zero frequency. To suppress these zero frequency cells, enter

      SET HISTOGRAM EMPTY BINS OFF

    To restore the default, enter

      SET HISTOGRAM EMPTY BINS ON
Note:
    Previously, Dataplot only generated histograms for the case where the bin widths were equal. This has been extended to the case with unequal bin widths. The syntax is

      HISTOGRAM Y XLOW XHIGH

    with XLOW containing the values for the lower bin limit and XHIGH containing the values for the upper bin limit.

Note:
    Added the following option

      SUBSET HISTOGRAM Y X

    In this case, X is a group-id variable. This syntax can be used to highlight the contribution to the histogram for particular subsets of the data.

Note:
    Fixed a bug in the CUMULATIVE RELATIVE HISTOGRAM for the AREA case. If SET RELATIVE AREA HISTOGRAM is set to AREA (the default), relative histograms are normalized so that the area is equal to 1 and if it set to PERCENT the sum of the bar heights is equal to 1. The PERCENT case did not have a bug.
Default:
    None
Synonyms:
    A synonym for CUMULATIVE RELATIVE HISTOGRAM is RELATIVE CUMULATIVE HISTOGRAM. HIGHLIGHT is a synonym for SUBSET in syntax 4.
Related Commands:
    FREQUENCY PLOT = Generate a frequency plot.
    KERNEL DENSITY PLOT = Generate a kernel density plot.
    PERCENT POINT PLOT = Generate a percent point plot.
    PROBABILITY PLOT = Generate a probability plot.
    PPCC PLOT = Generates probability plot correlation coefficient plot.
    PLOT = Generate a data or function plot.
    CLASS LOWER = Set the lower class minimum for histograms, frequency plots, and pie charts.
    CLASS UPPER = Set the upper class maximum for histograms, frequency plots, and pie charts.
    CLASS WIDTH = Set the class width for histograms, frequency plots, and pie charts.
    HISTOGRAM CLASS WIDTH = Specify alternative default class wdith algorithms for histograms.
    MINIMUM = Set the frame minima for all plots.
    MAXIMUM = Set the frame maxima for all plots.
    LIMITS = Set the frame limits for all plots.
    BARS = Set the on/off switches for plot bars.
    BAR WIDTH = Set the widths for plot bars.
    BAR FILL = Set the on/off switches for plot bar fills.
    BAR PATTERN = Set the types for bar fill patterns.
    BAR BORDER LINE = Set the types for bar border lines.
Reference: Applications:
    Data Analysis
Implementation Date:
    Pre-1987
    2004/09: Support alternative class width algorithms
    2007/03: Option to compute histogram of a matrix
    2010/01: Support for HIGHLIGHT/SUBSET option
    2010/01: Support for non-equispaced histograms
    2010/01: Option to suppress empty bins
    2010/01: Option to include outliers
Program 1:
     
    LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 1000
    MULTIPLOT 2 2
    MULTIPLOT SCALE FACTOR 2
    MULTIPLOT CORNER COORDINATES 0 0 100 100
    XLIMITS -5 5
    TITLE CASE ASIS
    TITLE OFFSET 2
    TITLE Counts Histogram
    HISTOGRAM Y
    BAR FILL ON
    TITLE Relative Histogram
    RELATIVE HISTOGRAM Y
    BAR FILL OFF
    BAR BORDER THICKNESS 0.3
    TITLE Cumulative Counts Histogram
    CUMULATIVE HISTOGRAM Y
    BAR FILL ON
    BAR PATTERN D1
    BAR PATTERN SPACING 3
    TITLE Cumulative Relative Histogram
    CUMULATIVE RELATIVE HISTOGRAM Y
    END OF MULTIPLOT
        
    plot generated by sample program
Program 2:
     
    . Demonstrate the SUBSET option
    skip 25
    read rehm.dat y1 y2 x1 x2
    .
    bar on on on
    bar fill on on on
    bar fill color lblue red
    line blank blank
    xlimits 350 650
    .
    multiplot 2 2
    let tag = x2
    let tag = 1 subset x2 = 1
    let tag = 2 subset x2 <> 1
    title Red = Patient 1
    highlighted hist y1 tag
    let tag = 1 subset x2 = 2
    let tag = 2 subset x2 <> 2
    title Red = Patient 2
    highlighted hist y1 tag
    let tag = 1 subset x2 = 3
    let tag = 2 subset x2 <> 3
    title Red = Patient 3
    highlighted hist y1 tag
    bar fill color lblu blue red
    title Red = Patient 1, Blue = Patient 2
    highlighted hist y1 x2
    end of multiplot
    .
    xlimits
    move 50 97
    just center
    case asis
    text Highlighted Histograms for REHM.DAT (Y1 = High Air Flow and X2 = Patient ID)
        
    plot generated by sample program

Date created: 11/30/2010
Last updated: 12/06/2010
Please email comments on this WWW page to alan.heckert@nist.gov.