SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Auxillary Chapter

FLUCTUATION PLOT

Name:
    FLUCTUATION PLOT (LET)
Type:
    Graphics Command
Purpose:
    Generate a fluctuation plot.
Description:
    The fluctuation plot is a variant of the mosaic plot. The mosaic plot was proposed by John Hartigan as a method for visualizing the counts from contingency tables. In the mosaic plot, a rectangle is drawn for every combination of categories where the area of the rectangle is proportional to the count. To construct a mosaic plot, the following is done.

    1. The horizontal axis is divided according to the category counts of the first variable.

    2. If there is a second variable, then each vertical column is divided according to the counts of the second variable.

    3. If there are more than two variables, repeat steps 1 and 2 according to the counts for each additional variable. That is, each rectangle created in steps 1 and 2 is further sub-divided horizontally and vertically for the third and fourth variables. This subdivision is repeated until all variables have been used.

    For the fluctuation plot, a grid is created so that is each combination of categories has a fixed position on the grid.

    At each grid position, two rectangles are drawn. The first is drawn in a background color and is full size (i.e., the maximum count). A second rectangle is drawn in a foreground color with a height proportional to the count for that particular combination of categories. The background rectangle is drawn to give a sense of scale. If you do not want this background rectangle, then set the color equal to the background color of the plot.

    Some analysts find the format of the flucuation plot easier to interpret than the mosaic plot.

    Although the mosaic and fluctuation plots were developed to visualize counts for categorical data, Dataplot can also generate the fluctuation plot for various statistics. For example, you could use it to display mean values for several factor variables. In particular, we have found it useful for displaying binomial probabilities. For displaying the value of a statistic, the minimum value of the statistic over all combinations of categories will be drawn with zero height and the maximum value of the statistic over all categories will be drawn at the full height. Intermediate values will be scaled between the minimum and maximum values.

Syntax 1:
    FLUCTUATION <stat> PLOT <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <stat> is the one of the following statistics:
        COUNT (or NUMBER or SIZE),
        MEAN, MIDMEAN, MEDIAN, TRIMMED MEAN,
        WINSORIZED MEAN,
        GEOMETRIC MEAN, HARMONIC MEAN, HODGES LEHMAN,
        BIWEIGHT LOCATION, LP LOCATION,
        SUM, PRODUCT,
        STANDARD DEVIATION, STANDARD DEVIATION OF MEAN,
        VARIANCE, VARIANCE OF THE MEAN,
        TRIMMED MEAN STANDARD ERROR,
        AVERAGE ABSOLUTE DEVIATION (or AAD),
        MEDIAN ABSOLUTE DEVIATION (or MAD),
        IQ RANGE, BIWEIGHT MIDVARIANCE, BIWEIGHT SCALE,
        PERCENTAGE BEND MIDVARIANCE,
        WINSORIZED VARIANCE, WINSORIZED STANDARD DEVIATION,
        VARIANCE OF LP LOCATION, SD OF LP LOCATION,
        RELATIVE STANDARD DEVIATION, RELATIVE VARIANCE (or
        COEFFICIENT OF VARIATION),
        RANGE, MIDRANGE, MAXIMUM, MINIMUM, EXTREME,
        LOWER HINGE, UPPER HINGE, LOWER QUARTILE, UPPER QUARTILE,
        <FIRST/SECOND/THIRD/FOURTH/FIFTH/SIXTH/SEVENTH/EIGHTH/
        NINTH/TENTH> DECILE,
        PERCENTILE, QUANTILE, QUANTILE STANDARD ERROR,
        SKEWNESS, KURTOSIS, NORMAL PPCC,
        AUTOCORRELATION, AUTOCOVARIANCE,
        SIN FREQUENCY, SIN AMPLITUDE,
        CP, CPK, CNPK, CPM, CC,
        EXPECTED LOSS, PERCENT DEFECTIVE,
        TAGUCHI SN0 (or SN), TAGUCHI SN+ (or SNL),
        TAGUCHI SN- (or SNS), TAGUCHI SN00 (or SN2);
                <y> is a response variable;
                <x1> ... <xk> is a list of one to six categorical variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated) and you are computing a statistic that requires a single response variable.

    If is omitted, then COUNT is assumed (i.e., we are computing the frequency counts for each combination of categories). For the COUNT case, the response variable is omitted.

Syntax 2:
    FLUCTUATION <stat> PLOT <y1> <y2> <x1> ... <xk>
                <SUBSET/EXCEPT/FOR qualification>
    where <stat> is the one of the following statistics:
        WEIGHTED MEAN, WEIGHTED SD, WEIGHTED VARIANCE,
        LINEAR INTERCEPT, LINEAR SLOPE, LINEAR RESSD,
        LINEAR CORRELATION,
        CORRELATION, RANK CORRELATION,
        COVARIANCE, RANK COVARIANCE,
        WINSORIZED COVARIANCE, WINSORIZED COVARIANCE,
        BIWEIGHT MIDCOVARIANCE, BIWEIGHT MIDCORRELATION,
        PERCENTAGE BEND CORRELATION,
        ODDS RATIO, ODDS RATIO STANDARD ERROR,
        LOG ODDS RATIO, LOG ODDS RATIO STANDARD ERROR,
        FALSE POSITIVE, FALSE NEGATIVE,
        TRUE POSITIVE, TRUE NEGATIVE,
        TEST SENSITIVITY, TEST SPECIFICITY,
        POSITIVE PREDICTIVE VALUE, NEGATIVE PREDICTIVE VALUE,
        RELATIVE RISK,
        RATIO;
                <y1> is the first response variable;
                <y2> is the second response variable;
                <x1> ... <xk> is a list of one to six categorical variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated) and you are computing a statistic that requires two response variables.

Syntax 3:
    FLUCTUATION DIFFERENCE OF <stat> PLOT <y1> <y2> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <stat> is the one of the following statistics:
        MEAN, MIDMEAN, MEDIAN, TRIMMED MEAN, WINSORIZED MEAN,
        GEOMETRIC MEAN, HARMONIC MEAN, HODGES LEHMAN,
        MIDRANGE, BIWEIGHT LOCATION, SUM,
        STANDARD DEVIATION, STANDARD DEVIATION OF MEAN,
        VARIANCE, VARIANCE OF THE MEAN,
        TRIMMED MEAN STANDARD ERROR,
        AVERAGE ABSOLUTE DEVIATION (or AAD),
        MEDIAN ABSOLUTE DEVIATION (or MAD),
        IQ RANGE, BIWEIGHT MIDVARIANCE, BIWEIGHT SCALE,
        PERCENTAGE BEND MIDVARIANCE,
        WINSORIZED VARIANCE, WINSORIZED STANDARD DEVIATION,
        RELATIVE STANDARD DEVIATION, RELATIVE VARIANCE,
        COEFFICIENT OF VARIATION, RANGE,
        MAXIMUM, MINIMUM, EXTREME, QUANTILE,
        SKEWNESS, KURTOSIS;
                <y1> is the first response variable;
                <y2> is the second response variable;
                <x1> ... <xk> is a list of one to six categorical variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated) and you are computing the difference between two response variables for the specified statistic. The variables can either independent (i.e., not paired) or dependent (i.e., paired), but the response variables must have the same number of elements.

Syntax 4:
    FLUCTUATION PLOT <m>             <SUBSET/EXCEPT/FOR qualification>
    where <m> is a matrix containing a two-way table;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where the data have already been cross-tabulated into a two-way table. Although this is typically used for the COUNTS case, the table can in fact contain values for any statistic that has been previously cross-tabulated (including statistics not listed in Syntax 1 - Syntax 3 above).

Syntax 5:
    FLUCTUATION BINOMIAL PROBABILITY PLOT <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is a response variable;
                <x1> ... <xk> is a list of one to six categorical variables;
    and where the is optional.

    This syntax is used for the case when you want to compute binomial probabilities from raw data. In this case, the response variable should be set to 1 to indicate "success" and to 0 to indicate "failure".

Examples:
    FLUCTUATION COUNT PLOT X1 X2 X3 X4
    FLUCTUATION BINOMIAL PROBABILITY PLOT Y X1 X2
    FLUCTUATION PLOT M
Note:
    When there is a single categorical variable, the division is performed horizontally.

    When there are two or more categorical variables, the division is first performed vertically, then horizontally. This vertical/horizontal subdivision is repeated until all the categorical variables are accommodated.

Note:
    In some cases, a few extreme values may dominate the plot. You can specify minimum or maximum values with the commands

      SET FLUCTUATION PLOT FLOOR <value>
      SET FLUCTUATION PLOT CEILING <value>

    Values less than the floor value will be set to the floor value and values greater than the ceiling value are set to the ceiling value.

    The default is to use the minimum and maximum values of the computed statistic. For the COUNT case, the floor value will be set to 0. For the BINOMIAL PROBABILITY case, the floor and ceiling values will be set to 0 and 1, respectively.

    After the fluctuation plot is generated, Dataplot will save the internal parameters STATMINI and STATMAXI that contain the minimum and maximum values, respectively, of the computed statistic.

Note:
    By default, the width of the bars in the fluctuation plot are of constant width. If you want the width of the bars to be proportional to the sample size for each combination of categories, enter the command

      SET FLUCTUATION PLOT WIDTH PROPORTIONAL

    To reset fixed width bars, enter the command

      SET FLUCTUATION PLOT WIDTH FIXED

    This option does not apply to the case where the statistic being computed is the frequency counts (COUNT). In this case, the height of the bars already indicates the frequency counts.

Note:
    The example programs below demonstrate how to control the color for the bars in the fluctuation plot and also how to label the levels of the categories.
Default:
    None
Synonyms:
    None
Related Commands: Reference:
    Unwin, Theus, and Hofmann (2006), "Graphics of Large Data Sets: Visualizing a Million", Springer, chapter 5.

    Friendly (2000), "Visualizing Categorical Data", SAS Institute Inc., p. 90.

Applications:
    Graphical Analysis of Categorical Data
Implementation Date:
    2009/1
Program 1:
    .  Example from page 61 of Friendly
    .  Data denotes counts.
    read matrix m
     5  29 14 16
    15  54 14 10
    20  84 17 94
    68 119 26 7
    end of data
    .
    label case asis
    tic mark label case asis
    title case asis
    title offset 2
    .
    x3label
    title Fluctuation Plot
    y1label Eye Color
    x1label Hair Color
    tic offset units data
    xlimits 1 4
    major xtic mark number 4
    minor xtic mark number 0
    xtic mark offset 1 1
    x1tic mark label format alpha
    x1tic mark label content Black Brown Red Blond
    ylimits 1 4
    major ytic mark number 4
    minor ytic mark number 0
    ytic mark offset 1 1
    y1tic mark label format alpha
    y1tic mark label content Green Hazel Blue Brown
    y1tic mark label justification right
    .
    line color g75 black
    region fill color g75 black
    region border color g75 black
    .
    fluctuation plot m
        
    plot generated by sample program

Program 2:
     
    skip 25
    read alarm.dat inst src expalarm obsalarm
    let n = size expalarm
    let correct = 0 for i = 1 1 n
    let correct = 1 subset expalarm = 0 subset obsalarm = 0
    let correct = 1 subset expalarm = 1 subset obsalarm = 1
    .
    label case asis
    tic mark label case asis
    title case asis
    title offset 2
    .
    x3label
    title Fluctuation Plot of Binomial Probability for Correct Alarm
    y1label Instrument
    x1label Source
    tic offset units data
    xlimits 1 6
    major xtic mark number 6
    minor xtic mark number 0
    xtic mark offset 1 1
    ylimits 1 15
    major ytic mark number 15
    minor ytic mark number 0
    ytic mark offset 1 1
    .
    line color g75 black
    region fill color g75 black
    region border color g75 black
    .
    set fluctuation plot width proportional
    fluctuation binomial probability plot correct inst src
        
    plot generated by sample program

Program 3:
     
    skip 25
    read ripken.dat y x1 to x4
    .
    label case asis
    tic mark label case asis
    title case asis
    .
    x3label
    title Fluctuation Plot for Cal Ripken Mean Batting Average
    let string v1 = Low
    let string v2 = Middle
    let string v3 = Left:sp()High
    let string v4 = Low
    let string v5 = Middle
    let string v6 = Right:sp()High
    let igy = group label v1 to v6
    let string h1 = Inside
    let string h2 = Middlecr()Fastball
    let string h3 = Outside
    let string h4 = Inside
    let string h5 = Middlecr()Curveball
    let string h6 = Right
    let igx = group label h1 to h6
    .
    tic offset units data
    xlimits 1 6
    major xtic mark number 6
    minor xtic mark number 0
    xtic mark offset 1 1
    x1tic mark label format group label
    x1tic mark label content igx
    ylimits 1 6
    major ytic mark number 6
    minor ytic mark number 0
    ytic mark offset 1 1
    y1tic mark label format group label
    y1tic mark label content igy
    y1tic mark label justification right
    .
    line color g75 black
    region fill color g75 black
    region border color g75 black
    .
    fluctuation mean plot y x2 x1 x4 x3
    .
    move 50 92
    just center
    text (Minimun BA: ^statmini, Maximum BA: ^statmaxi)
        
    plot generated by sample program

Date created: 1/6/2009
Last updated: 1/6/2009
Please email comments on this WWW page to alan.heckert@nist.gov.