SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

ASCII FILES

This section provides guidance on reading ASCII data files in Dataplot. This includes discussion of some commands added to the 1/2004 version of Dataplot. In particular, discussion is included for ASCII files created by the Excel program.

Dataplot has limited support for binary data files. Currently, only binary files created using Fortran unformatted WRITE are supported. Enter HELP SET READ FORMAT for details.

Also, Dataplot does not currently support directly reading files from other statistical/spreadsheet programs or database files. Some support may be provided in future releases, but for now you need to save the data from these programs in an ASCII file in order to read them into Dataplot. XML based data files are becoming increasingly popular as well. At this time, Dataplot does not support XML based data files, although we anticipate looking at this issue for subsequent releases.

IDEAL CASE

By default, Dataplot assumes rectangular data files containing numeric data where the data columns are separated by one or more spaces, commas, or tabs.

In this case, you can read the file with a command like the following:

    READ FILE.DAT Y X1 X2

The first argument after the READ is the name of the ASCII file. The remaining arguments identify the variable names. Variable names can be up to eight characters long and should be limited to alphabetic (A-Z) and numeric (0-9) characters. Although other characters can in fact be used, this is discouraged since their use can cause problems in some contexts. Variable names are not case sensitive (Dataplot converts all alphabetic characters to upper case). Variable names are separated with one or more spaces (commas are not allowed as delimiters in this context).

Dataplot recognizes the first argument as a file name if it finds a "." in the name. If no "." is found, Dataplot assumes the first argument is a variable name and it tries to read from the keyboard rather than the file.

The remainder of this section discusses various issues that may cause problems when reading ASCII files and provides suggestions on how to deal with these issues. The following topics are discussed:

  1. Viewing ASCII files within Dataplot
  2. Header lines/restricted rows or columns
  3. Long data records
  4. Automatic variable names
  5. Reading fixed columns
  6. Reading variables with unequal lengths
  7. Reading character data
  8. Reading row oriented data
  9. Comment lines in data files
  10. Reading Excel files
  11. File name restrictions
  12. Comma as decimal point
  13. Missing values and undefined numbers
  14. Reading date and time fields
  15. Reading IP addresses
  16. Reading monetary data (e.g., $23,461.58)
  17. Reading numeric values with trailing "+" or "-"
  18. Commas within character fields
  19. Reading binary data
  20. Reading image data

If you create the ASCII file yourself, it is recommended that you create it with variables of equal length (pick some numeric value to signify missing data) and with data items separated by one or more spaces. Inclusion of a header giving a description of the data file is optional, but we find it helpful (Dataplot can skip over the header lines). When the ASCII files are created by another program (e.g., Excel), then you may have less control over the format of the file. Hopefully, most ASCII files you encounter can be handled using the commands discussed below.

VIEWING THE ASCII FILE WITHIN DATAPLOT

In order to identify some of the issues discussed below, it is often helpful to view the ASCII file before trying to read it into Dataplot. You can do this with the command

    LIST FILE.DAT

This will list the file 20 lines (you can change the number of lines with the SET LIST LINES command) at a time. You can then enter a carriage return to view the next 20 lines or a "no" to stop viewing the file.

For some of the commands given below, you need to either know approriate line numbers or column numbers.

To view the file with line numbers, enter the command

    NLIST FILE.DAT

To identify appropriate columns, enter the command

    RULER

This will identify the first 80 columns.

HEADER LINES/RESTRICTED ROWS OR COLUMNS

Many data files contain header lines at the beginning of the file that provide a description of the file. In order to skip over these lines, enter the command

    SKIP N

where N identifies how many lines to skip.

Most of the sample data files that are distributed with Dataplot contain a line starting with hyphens ("---"). You can use the command

    SKIP AUTOMATIC

for these files. Dataplot will skip all lines until a line starting with three or more hypens is encoutered.

In a related issue, if you want to restrict the read to certain rows in the file, you can enter the command

    ROW LIMITS N1 N2

with N1 and N2 denoting the first and last rows to read, respectively.

You can also restrict the read to certain columns of the file using the command

    COLUMN LIMITS C1 C2

with C1 denoting the first column to read and C2 the last column to read.

LONG DATA RECORDS

When reading from the keyboard, Dataplot restricts a single record to a maximum of 80 columns.

When reading from a file, Dataplot previously restricted a single record to a maximum of 132 columns. The March, 2003 version raised the default limit to 255 characters. In addition, the following command was added:

    MAXIMUM RECORD LENGTH N

with N denoting the size of the largest record to be read.

Dataplot accepts values of N up to 9999. However, be aware that some Fortran compilers may impose their own limit. These limits tend not to be well documented, but with modern compilers they should be sufficiently large that this should not be a problem in practice.

If you specify a SET READ FORMAT command (discussed below), you do not need to specify the maximum record length.

AUTOMATIC VARIABLE NAMES

Dataplot normally reads variable names on the READ command. However, many ASCII files will have the name of the variables given directly in the file or Dataplot can assign the variable names automatically.

Specific methods include the following.

  1. Many of the sample files in provided in the Dataplot installation use a syntax like
     Y     X1   X2
     ----------------
     <data values>
           
    For these files, you can enter the commands

      SKIP AUTOMATIC
      READ FILE.DAT

    In this case, Dataplot will skip all lines until a line starting with three or more hypens is encountered. It will then backspace to the previous line and read the variable names from that line.

  2. Many ASCII data files will have the variable names on the first line of the file. For these files, you can enter the commands

      SET VARIABLE LABELS ON
      READ FILE.DAT

  3. If you would like Dataplot to simply assign the variable names, enter the command

      READ FILE.DAT

    Dataplot will read the first line of the file to determine the number of variables. It will then assign the names X1, X2, and so on to the variable names.

Note that Dataplot's usual rules for variable names still apply. That is, a maximum of eight characters will be used and spaces will delimit variable names. The use of special (i.e., not a number and not an alphabetic character) characters is discouraged. You may need to edit the file if the variable names do not follow these rules (more than eight characters will simply be ignored, so the issue is more one of duplicate variable names in the first eight characters).

READING FIXED COLUMNS

By default, Dataplot performs free format reads. That is, you do not need to line up the columns neatly. You do need to provide one or more spaces (tabs, commas, colons, semi-colons, parenthesis, or brackets can be used as well) between data fields.

Many data files will contain fixed fields. There are several reasons you may want or need to take advantage of these fixed fields rather than using a free format read.

  1. If your data fields do not contain spaces (or some other delimiter) between data columns, you need to tell Dataplot how to interpret the columns.

  2. In some cases, you may only want to read selected variables in the data file.

  3. Using a formatted read can significantly speed up the reading of the data. If you have small or moderate size data files (say 500 rows or fewer), this is really not an issue. However, if you are reading 50,000 rows, you can significantly speed up the read by specifying the format.

  4. If the data fields have unequal lengths, Dataplot will not interpret the data file correctly with a free format read. It assigns the data items in the order they are encountered to the variable names in the order they are given. Dataplot does not try to guess if a data item is missing based on the columns.

    The issue of unequal lengths is discussed in detail in the next section.

There are two basic cases for fixed fields.

  1. The data fields are justified by the decimal point.

    In this case, you can use the

      SET READ FORMAT

    command to specify a Fortran-like format to read the file. Enter HELP READ FORMAT for details.

    Using a formatted read is significantly faster than a free format read.

  2. Many programs will write ASCII files with fixed columns, but the data fields will be either left or right justified rather than lined up by the decimal point.

    In this case, you can use a special form of the COLUMN LIMITS command that was introduced with the January, 2004 version. Normally, the first and last columns to read are specified. However, you can now enter variables for the lower and upper limits as in the following example:

      LET LOWER = DATA 1 21 41
      LET UPPER = DATA 10 30 50
      COLUMN LIMITS LOWER UPPER

    That is, if variables rather than parameters are specified, separate column limits are specified for each data field. In this case, the first data field is between columns 1 and 10, the second field is between columns 21 and 30, and the third field is between 41 and 50.

    When this syntax is used, only one variable is read for each specified field. If the field is blank, then this is interpreted as a missing value.

READING VARIABLES OF UNEQUAL LENGTH

Dataplot normally expects all variables to be of equal length. If some variables have missing rows, this can have undesired results. Dataplot will assign the first value read to the first variable name, the second value to second variable and so on. If fewer values than variables are specified, then variables that have no data values are not read at all (even if they have values for other rows).

If you have a data file where the columns have unequal lengths, you can do one of the following things.

  1. Pick some value to represent a missing value and fill in missing data points with that value. After reading the data, you can use a RETAIN command to remove them. For example, if you use -99 to signify a missing value, you can enter

      RETAIN Y SUBSET Y > -99

    Alternatively, you can use a SUBSET clause on subsequent plot and analysis commands.

  2. Use the variable form of the COLUMN LIMITS command as described above. By default, when a blank field is encountered, it is set to zero. You can specify the value to use by entering the command

      SET READ MISSING VALUE <value>

    This option depends on having consistent columns for each of the data fields.

  3. If your data has both columns of unequal length and inconsistent columns for given data fields, an alternative is to use a comma delimited data file. That is, separate data values with a comma. If there is no data data between successive commas, this is treated as a missing value. The default is to assign a value of zero. Alternatively, you can use the SET READ MISSING VALUE command described above.

    You can specify a delimiter other than a comma with the command

      SET READ DELIMITER <character>

    The variable form of the COLUMN LIMITS, the SET READ MISSING VALUE, and the SET READ DELIMITER commands were introduced in the January, 2004 version. The interpretation of successive commas as a missing value was also introduced in the January, 2004 version.

READING DATA WITH CHARACTER FIELDS

Dataplot has not previously supported character data. The one execption is that you could read row labels with the READ ROW LABEL command (enter HELP READ ROW LABEL for details). If encountered, Dataplot would generate an error message and not read the data file correctly.

With the January 2004 version, we have introduced some limited support for character data. Specifically, we have added the command

    SET CONVERT CHARACTER <ON/IGNORE/ERROR>

Setting this to ERROR will continue the current Dataplot action of reporting character data as an error. This is recommended for the case when a file is suppossed to contain only numeric data and the presence of character data is in fact indicative of an error in the data file.

Setting this to IGNORE will instruct Dataplot to simply ignore any fields containing character data. This can be useful if you simply want to extract the numeric data fields in the file without entering COLUMN LIMITS or SET READ FORMAT commands.

Setting this to ON will read character fields and write them to the file "dpzchf.dat". Note that Dataplot saves numeric data "in memory" for fast access. Since character data has limited use in Dataplot, we have decided to save character data externally to minimize memory requirements. Dataplot keeps a separate name table for the character data fields (the names for character variables are stored in the file "dpzchf.dat").

There are some restrictions on when Dataplot will try to read character data:

  1. This only applies to the variable read case. That is, READ PARAMETER and READ MATRIX will ignore character fields or treat them as an error.

  2. Dataplot will only try to read character data from a file. When reading from the keyboard (i.e., when READ is specified with no file name), character data will be ignored when a SET CONVERT CHARACTER ON is specified.

  3. This capability is not supported for the SERIAL READ case.

  4. The SET READ FORMAT command does not accept the "A" format specification for reading character fields.

  5. A maximum of 20 character variables will be saved.

  6. A maximum of 24 characters for each character variable will be saved.

  7. The character variables from at most one data file will be saved in a given session.

Some of these restrictions may be addressed in subsequent releases of Dataplot.

Currently, Dataplot has limited support for character variables. Specifically,

  1. The row label can be used for the plot character by entering the command

      CHARACTER ROWLABEL

  2. You can convert a character variable to a coded numeric variable with the command

      LET Y = CHARACTER CODE IX
      LET Y = ALPHABETIC CHARACTER CODE IX

    with IX denoting the name of the character variable. These command assigns a numeric value for each unique name in the character variable.

    For the CHARACTER CODE case, the coding is from 1 to K where K is the number of unique values. The order is based on the order these values are found in the file.

    For the ALPHABETIC CHARACTER CODE case, the coding is from 1 to K where K is the number of unique values. The order is performed in alpabetical order.

We anticipate additional use of character variables in subsequent releases of Dataplot.

If your character fields contain non-numeric/non-alphabetic characters, then it is recommended that the character fields be enclosed in quotes. When Dataplot encounters a quote (either a single or double quote), it interprets everything until a matching quote is found as part of that character field. If the quotes are not used, then spaces, tabs, parenthesis, brackets, colons, and semi-colons are interpreted as delimiters that signify the end of that data item.

READING ROW ORIENTED DATA

Dataplot assumes a column oriented format. That is, a row of data represents a single record (or case) and a column of data represents a variable. If a data file has a row orientation, then this is reversed. A row of data represents a variable and a column of data represents a record (or case).

The following example shows one way of correctly reading the data into Dataplot. Suppose that your data file contains five rows with each row corresponding to a single variable. You can do the following:

    LOOP FOR K = 1 1 5
      ROW LIMITS K K
      SERIAL READ FILE.DAT X^K
    END OF LOOP

COMMENT LINES IN DATA FILES

It is sometimes convenient to include comments in data files. If these comments are contained at the beginning of the file, then the SKIP command can be used. To have Dataplot check for comment lines in the data file, enter the command

    COMMENT CHECK ON

The default comment character is a ".". That is, any line starting with a ". " is treted as a comment line and ignored. To specify a different comment character, enter the command

    COMMENT CHARACTER <char>

with denoting the desired comment character.

EXCEL FILES

At the current time (1/2004), Dataplot does not support the direct reading of Excel data files. We are planning to add this capability in a future release of Dataplot. Until that time, you need to save the data in Excel to an ASCII file and read that ASCII file into Dataplot.

Excel provides the following options for writing ASCII data files:

  1. Formatted text (space delimited) (.PRN extension)

    This format will use consistent columns for the data fields. The variable form of the COLUMN LIMITS command can be used when the data columns have unequal length.

    Character fields will often not have the separating space. The variable form of the COLUMN LIMITS command can be used in this case as well.

  2. CSV (Comma delimited) (.CSV extension)

    This format will separate data fields with a single comma. Missing data is represented with successive commas. Dataplot can now (as of the January 2004 version) handle this correctly.

  3. Text (Tab delimited) (.TXT extension) Text (MS-DOS) (.TXT extension)

    These files will separate data fields with a tab character. Note that Dataplot converts all non-printing characters (including tabs) to a single space character.

    This format is not appropriate for data containing variables with unequal lengths since it will not generate consistent columns for the data fields. Use either the space delimited or comma delimited file for that case.

The 2014/12 version of Dataplot added the capability of reading and writing to the system clipboard under Windows. Using the "copy" function and Excel and then using the READ CLIPBOARD command in Dataplot will in many cases be the easiest way to retrieve data from Excel files. Enter HELP CLIPBOARD for details.

FILE NAME RESTRICTIONS

A few comments on file names.

  1. File names are limited to 80 characters or less (this includes the path name if given).

  2. If the file name contains either spaces or hypens, it should be enclosed in double quotes. For example,

      READ "C:\My Documents\SAMPLE.DAT" Y X1 X2

  3. The file name should be a valid file name on the local operating system.

  4. The file name must contain a period "." in the file name itself or as a trailing character. Dataplot strips off trailing periods on those systems where it is appropriate to do so. On systems where trailing periods can be a valid file name (e.g., Unix), Dataplot first tries to open the file with the trailing period. If this fails, it will try to open the file name without the trailing period.

  5. On systems where file names are case sensitive (i.e., Unix), Dataplot first tries to open the file name as given. If the file is not found, it then tries to match the file name after converting the name to all upper case characters. If it is still not found, it will convert the file name to all lower case characters

    If your file name contains a mixture of upper and lower case characters, then you need to enter the case for the file name correctly on the READ command.

COMMA AS DECIMAL POINT

Dataplot follows the United States convention where the decimal point is the period ".". Some locales may use a different character to denote the decimal point. In particular, some countries use the comma ",".

To allow Dataplot to read files that use a character other than the "." for the decimal point, enter the command

    SET DECIMAL POINT <value>

where <value> denotes the character that specifies the decimal point.

Note this support is fairly limited. Specifically, it applies to free-format reads (i.e., no SET READ FORMAT command has been entered). In addition,

  1. This option is not supported for the WRITE command. WRITE will always use a period for the decimal point.

  2. Dataplot alphanumeric output (e.g., the output from the FIT command) is generated with the period as the decimal point.

  3. As mentioned above, if you read your data with a SET READ FORMAT command, the data must use the period for the decimal point.

MISSING VALUES AND UNDEFINED NUMBERS

Some software programs will have special characters to denote missing values or undefined values (e.g., the result of trying to divide by 0).

In particular, Unix/Linux software often uses "nan" to denote an undefined number. If Dataplot encounters an "nan" in a numeric field, it will convert it to the Dataplot "missing value". The "nan" search is not case sensitive (i.e., it will check for "NAN", "NaN", etc.). You can specify what Dataplot will use for the missing value by entering the command

    SET READ MISSING VALUE <value>

where <value> is a numeric value.

Missing value flags are specific to individual programs. You can specify a character string that denotes a missing value with the command

    SET DATA MISSING VALUE <value>

where <value> is a string with 1 to 4 characters. If Dataplot encounters <value> in a numeric field, it will convert it to the Dataplot "missing value". The missing value string is not case sensitive. You can specify what Dataplot will use for the missing value by entering the command

    SET READ MISSING VALUE <value>

where <value> is a numeric value.

READING DATE AND TIME FIELDS

Date and time fields will typically have syntax like

    2016/06/22
    12:43:08
Dataplot treats the "/" and ":" as indicating character fields (based on the SET CHARACTER CONVERT command, this will either cause an error, result in this field being ignored, or the field being read as a character variable).

The following commands were added (2016/06) to help deal with date and time fields.

    SET DATE DELIMITER <character>
    SET TIME DELIMITER <character>

Although Dataplot does not have explicit date or time variables, these commands allow the components of date and time fields to be read as separate numeric variables. For example,

    SET DATE DELIMITER /
    SET TIME DELIMITER :
    READ YEAR MONTH DAY HOUR MIN SEC
    2016/06/22 23:19:03
    END OF DATA

READING IP ADDRESSES

IP addresses typically have a syntax like

    129.6.37.209

By default, Dataplot will generate an error when trying to read a field of this type. To address this, you can enter the command

    SET READ IP ADDRESSES ON

If this switch is ON, Dataplot will scan the line and if a field is encountered that conains more than one period ".", Dataplot will convert these periods to spaces before parsing the line.

The default is OFF since this adds additional processing time to the READ and most data sets do not contain IP addresses.

READING MONETARY DATA

Monetary data may sometimes be given as

The "$" and "," in these numeric fields will cause problems. The "$" will be treated as a non-numeric value (depending on other SET commands, this will be treated as an error or the numeric field will be read as a character field). The comma is typically treated as a field delimiter. If you have this kind of data, enter the commands

    set read dollar sign ignore on
    set read comma ignore on

To reset the defaults, enter

    set read dollar sign ignore off
    set read comma ignore off

Note that if you enter the SET READ COMMA IGNORE ON command, the comma will no longer be treated as the delimiter. Dataplot cannot currently handle the case where the comma is used both for monetary data and also as a field delimiter.

READING NUMERIC VALUES WITH TRAILING "+" OR "-"

On occassion, numeric fields may have a trailing "+" or a trailing "-". The "+" is typically used to indicate that the value is greater than or equal to the entered value. Likewise, the "-" is used to indicate that the value is less than or equal to the entered value. This may be used when data is truncated at a high or low value. If you have data that uses this convention, enter

Dataplot does not have any convention for indicating that a number in fact means "greater than" or "less than", so it will read the numeric value and simply ignore the "+" or "-".

To reset the defualt, enter

    set read trailing plus minus ignore off

COMMAS WITHIN CHARACTER FIELDS

If you are reading data that may contain character fields, you can specify whether you want commas in the character fields to be treated as part of the character field or as a delimiter.

To have the comma treated as a delimiter, enter

To have the comma not be interpreted as a delimiter (i.e., it will simply be another character in the character field), enter

    set character field comma delimiter off

The default is OFF.

READING BINARY DATA

Currently, the only types of binary data that Dataplot currently supports are:

  1. A few types of image files can be read on some platforms. This is discussed in the next section.

  2. Dataplot may be able to read some files created using Fortran unformatted data files. Dataplot is most likely to have success reading unformatted Fortran files that contain only numeric data and use a consistent record structure. Unformatted Fortran files that contain a mixture of character and numeric data will not be read successfully.

Support for other types of binary files may be added in future releases. However, this support will probably be for specific types of binary files as oppossed to arbitrary binary files.

The advantage of using unformatted Fortran files is that file sizes may be significantly smaller and reading the data can be significantly faster. One potential use of unformatted Fortran files is to save a large data file that you will read many times in Dataplot.

The disadvantages of using unformatted Fortran files are that they are not human readable, they cannot be edited or modified using an ASCII editor, and, most importantly, they are not portable between operating systems and compilers. That is, unformatted Fortran files typically need to be read using the same operating system and compiler that was used to create them.

For details on using unformatted Fortran files, enter

    HELP SET READ FORMAT

READING IMAGE DATA

If Dataplot was built with support for the GD library, Dataplot can read image data in PNG, JPEG, or GIF format. If you have image data in another format, you may be able to use an image conversion program (e.g., NetPBM or ImageMagick) to convert it to one of the supported formats.

For further information, enter

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 07/07/2004
Last updated: 07/19/2017

Please email comments on this WWW page to alan.heckert@nist.gov.