
STREAM READName:
The STREAM READ was added to allow some of these large data sets to be read and certain statistics to be computed without reading the entire data set into memory. Although there is a limited amount of analyses that can be performed with this command, it may allow some useful initial exploratory analysis to be performed on these large data sets. There are several variations of this command that will be described separately.
where <x1>, <x2>, ... <xk> is a list of variables to be read. This version of the command is used to read the input file and to write a new version of the data using a specified Fortranlike format. This command is useful in the following way. Large data files can take a long time to read. If you can use the SET READ FORMAT command to read the data, this can significantly speed up the reading of the data For example, reading the data set used by the example programs below used 24.7 cpu seconds on a Linux machine running CentOS. Performing the same read on the same platform with a SET READ FORMAT required 0.6 cpu seconds. Cpu times will vary depending on the hardware and operating system, but this is indicative of the relative performance improvement that can be obtained by using the SET READ FORMAT command. This example file is not particularly large (361,920 rows). The speed improvement becomes even more important when we start dealing with multiple millions of rows. Often large data sets will initially not be in a format where the SET READ FORMAT can be used. So this command can be used once, with the SET WRITE FORMAT command, to create a new version of the file that is formatted in a way that the SET READ FORMAT can be used. This new file is then used for subsequent Dataplot sessions that use this data.
where <stat> is one of Dataplot's supported univariate statistics; and where <x1>, <x2>, ... <xk> is a list of variables to be read. This syntax will read the file a userspecified number of rows at a time. It will then replace those rows with the specified statistic. That is, the original data will be replaced with the specified statistic for fixed intervals of the data. For example, you can read 1,000 rows, compute (and save) the mean for those 1,000 rows for each variable, then repeat for the next 1,000 rows. That is, the original data will be replaced with the means of fixed intervals of the data. To specify the number of rows to read at a time, enter
Alternatively, you can specify one of the variabes to define the group (i.e., when the value of the specified variable changes, this denotes the start of a new group). For this option, enter
This capability is motivated by the desire to handle large data sets that may exceed Dataplot's storage limits. This command allows you to compute some basic statistics (mean, minimum, maximum, standard deviation, and so on) for slices of the data. Often, some useful exploratory analysis can be performed on this compressed data.
where <x1>, <x2>, ... <xk> is a list of variables to be read. This is a variant of Syntax 2 that allows a default set of statistics to be computed on a single pass of the data. The following statistics are computed:
For this syntax, a tag variable (TAGSTAT) will be created that defines the statistic (i.e., each row of TAGSTAT contains a value from 1 to 21). TAGSTAT can be used to exract the desired statistic for each group.
where <x1>, <x2>, ... <xk> is a list of variables to be read. This syntax will compute the following statistics using a 1pass algorithm for all of the data:
Each of the <x1> ... <xk> will contain 8 rows containing the above eight statistics for each column read.
STREAM READ WRITE BIG.DAT X1 TO X10 SET STREAM READ SIZE 100 STREAM READ GROUP STATISTIC MEAN BIG.DAT X1 TO X10 STREAM READ GROUP STATISTIC STANDARD DEVIATION BIG.DAT X1 TO X10 STREAM READ DEFAULT STATISTICS BIG.DAT X1 TO X10 STREAM READ FULL STATISTICS BIG.DAT X1 TO X10
. Step 1: Demonstrate the group statistic option of stream read . skip 25 set read format 3F7.0 set stream read group variable rowid stream read group statistics mean elliottr.dat redcolme rowid colid . . Step 2: Generate plot of column means . title offset 2 title case asis label case asis . title Column Means for Red Pixels for ELLIOTTR.DAT y1label Column Mean x1label Row . plot redcolme vs rowid . . Step 3: Reset read settings . skip 0 set read formatProgram 2: . Step 1: Demonstrate the default statistic option of stream read . skip 25 set read format 3F7.0 set stream read group variable rowid stream read default statistics elliottr.dat red rowid colid . let redmean = red retain redmean subset tagstat = 6 let redsd = red retain redsd subset tagstat = 7 let redmin = red retain redmin subset tagstat = 4 let redmax = red retain redmax subset tagstat = 5 . . Step 2: Plot some of the statistics . multiplot corner coordinates 5 5 95 95 multiplot scale factor 2 multiplot 2 2 . label case asis title case asis case asis title offset 2 . title Mean of Columns plot redmean . title SD of Columns plot redsd . title Minimum of Columns plot redmin . title Maximum of Columns plot redmax . end of multiplot . justification center move 50 97 text Statistics for Columns of Red Pixels in ELLIOTTR.DAT . . Step 2: Reset read settings . skip 0 set read formatProgram 3: . Step 1: Demonstrate the default statistic option of stream read . skip 25 set read format 3F7.0 stream read full statistics elliottr.dat red rowid colid . . Step 2: Print statistics for red variable . feedback off set write decimals 2 print "Statistics for variable RED:" print " " print " " let aval = red(1) print "Size: ^aval" let aval = red(2) print "Minimum: ^aval" let aval = red(3) print "Maximum: ^aval" let aval = red(4) let aval = round(aval,2) print "Mean: ^aval" let aval = red(5) let aval = round(aval,2) print "SD: ^aval" let aval = red(6) let aval = round(aval,2) print "Skewness: ^aval" let aval = red(7) let aval = round(aval,2) print "Kurtosis: ^aval" let aval = red(8) print "Range: ^aval" feedback on . . Step 3: Reset read settings . skip 0 set read format Statistics for variable RED: Size: 361920 Minimum: 140 Maximum: 4095 Mean: 369.38 SD: 745.5 Skewness: 4.23 Kurtosis: 20 Range: 3955  
Privacy
Policy/Security Notice
NIST is an agency of the U.S. Commerce Department.
Date created: 07/24/2017 