![]()
Notwithstanding going out of business, retiring old systems, or replacing inadequate systems with commercial off-the-shelf systems, there are basically three techniques for addressing the year 2000 computer problem: windowing, encoding or encapsulation, and expansion. An overview of these three techniques and variations reveals interesting factors and information that should be taken into consideration when deciding upon a conversion strategy for dealing with correcting the year 2000 problem. This bulletin presents an overview of these methods and provides a comparison of the pros and cons of each. While by no means an exhaustive comparison of solutions or characteristics, it shows that there are a myriad of considerations before the final decision.
Background
The year 2000 problem is pernicious and everywhere. It entails the long-standing habit of programmers to drop the first two digits of the year when manipulating date information. Nominally, 1998 would be manipulated as 98 with the 19 assumed. On January 1, 2000, this will no longer be possible. As an example, a computer program that computes the age of a person from his or her birth date would perform a calculation such as subtracting the current date from the person's birth date. Today, May 5, 1998, a person born in 1937 would be 61 years old, i.e., 98 - 37 = 61. If the same program tries to compute the age of the same person on January 1, 2000, the response would be different: 00 - 37 = -37. Some programs drop the minus sign, so the result would be 37 which is a valid age but is wrong. A person applying for pension benefits might be denied them based on the faulty age computed. This assumes that the program would not crash when trying to compute a negative age.
Fixing this problem may seem to be a simple matter of just adding the two digits needed to differentiate 1900 from 2000. The fix is simple in many cases, but problems like this are everywhere in programs that compute finance charges, keep track of back orders for products, regulate the amount of water flow in a cooling system, compute billing transactions for telephone calls, or hundreds of thousands of other applications where dates are pertinent, and some where dates appear to have nothing to do with the computations.
Statistical analysis of systems has shown that date information has an impact on 3 to 6 percent of the source code. That is, for every 100 lines of source code, 3 to 6 of those lines have something to do with date information, such as date calculations, date comparisons, or sorting of information involving dates. At first glance, this does not seem to be significant, but every line of the source code must be examined in order to determine if it is involved in any date manipulations. With estimates from $0.50 to $1.50 per line of source code for repairing "run-of-the-mill" data processing systems, or up to $8.00 per line for embedded systems in military equipment, the cost for repairing a 100,000-line program could be anywhere from $50,000 to $800,000. A small manufacturing organization with sales of $50,000,000 per year and 50 employees may have as many as 5,000,000 lines of code in their accounting, inventory, payroll, and general ledger systems. Repairing such an environment would cost in the neighborhood of $2,500,000 to $7,500,000. A large corporation may have 100,000,000 lines of code in its systems and may have to spend close to $100 million to repair them, so any savings that can be attributed to the solution technique become extremely important in the profitability of the organization.
The windowing solution entails defining a 100-year window in which a special algorithm determines whether a 2-digit year is in the 1900s or the 2000s. The selected window, either static or moving, depends on the selection of a base year, which, in the case of a moving window, may be incremented every year that it is in use. For example, a 100-year window starting in base year 1950 would be good until 2049. The century-determining algorithm would ascertain that all years 50 through 99 are in the 1900s and all years 00 through 49 are in the 2000s. The moving window would move one year for each year that it was in use, such that the next year the window would be 1951 through 2050, and so on.
The windowing technique is probably the least expensive to implement. It is also the most volatile and requires the most administering. The data files and database, hardware, and networking components would require minimal changes if any, leaving mainly the software component to be repaired.
This solution gets its name from enclosing the data in routines that separate the physical content of the data from the software that manipulates it. This is done through an encoding mechanism and software designed specifically to input and output date information to and from the system, and to store and retrieve this information from a database. Modifications to the software and the data files/database contents are necessary in this technique. Hardware and networking components are modified in a fashion similar to those required in windowing.
Encoding can take many forms. The form chosen has an effect on the amount of modification required. Though there are many others, three representative coding schemes are described in this bulletin: offset encoding, replacement encoding, and modified Julian Day Number (JDN).
Offset Encoding
Offset encoding adjusts the system to use dates that are 28 years earlier by modifying the date with a year offset. This is accomplished by modifying input and output routines and database store and retrieval routines to subtract 28 years (the offset) from the actual date entered in the system or stored in the database. The effect is that 28 years are added to the time span when the year 2000 problem will again crop up and have to be solved. This means the organization will have another 28 years to solve the year 2000 problem by replacing systems, rewriting software, or some other technique. The offset of 28 years was chosen because the calendar repeats itself on the same day-of-the-week, including leap year, every 28 years.
Input and output routines are modified to accept 4-digit or 2-digit years with some combination of windowing or built-in intelligence to determine which century is meant. These routines then subtract 28 from the year and pass this date on to other routines. Computation routines receive dates in the form already offset and ready to be manipulated. For example, today's date is May 5, 1998, which would be May 5, 1970, with the offset. A routine that computes age based on some prior date, such as May 4, 1950, and today's date would perform the computation internally as 70 - 22 = 48. The offset would also be applied to the 1950 date before it was handed off to the age computation routine, in this case 50 - 28 = 22.
For data files and databases, the offset date would be stored and retrieved, necessitating no changes in the corresponding access functions. Without a modification to the database to add a flag or indicator that an offset has been applied, the software using these dates must be aware that an offset is being used. An across-the-board policy of applying an offset to all dates is the preferred technique. If any date is stored without the offset, maintenance of the system will become almost impossible.
Routines that prepare date information for output to a display, printed matter, or another system must be modified to remove the offset and provide the true date. Care must be taken that every occurrence of these output dates is found. If one is missed, there is not only the chance that the displayed date will be interpreted incorrectly, it may be used as manual input to another process and thereby corrupt the data.
Sorting dates using offset encoding does not change as long as the offset dates are used for sort input. If the year, however, is 1900 or earlier, the sort routine will have to be changed to take the very early dates into consideration. Archived data must be handled in one of two methods: the archived data may be converted to replace dates with fully specified dates, or any programs that read the archived data must perform the offset as the data is retrieved. Offset encoding cannot be used if the offset date results in a value greater than 1999, i.e., if dates greater than 2028 can be expected.
Third-party software may not be able to use offset encoded dates directly. The data may have to be converted before being exported to other systems.
Replacement Encoding
Replacement encoding replaces the data in the data files and databases with new values. Instead of subtracting an offset, the year itself is replaced with a code to denote which century applies. Several schemes have been proposed, each of which is appropriate in specific cases. For Binary Coded Decimal (BCD) or character string date fields where the actual date is stored in yymmdd or related format, the first digit of the year is replaced with a code. For example, 98 may be replaced with A8, 99 with A9, 00 with B0, and so on, where A represents years in the 1900s and B represents those in the 2000s. Date fields that are formatted as packed decimal or zoned decimal store their digits in the four low order bits of an 8-bit byte leaving the high order 4 bits unused. These high order bits are replaced with a flag that denotes which century is applied.
In both cases, the routines that store and retrieve elements of the database and data files must be modified to extract this coded flag or coded character from the date before it reaches any of the date manipulation routines. From then on, the process of modifying the system is similar to that used in offset encoding.
This technique is applicable to all dates in general, including 1900 and before or 2000 and after. Third-party software may not be able to use replacement-encoded dates directly. The data may have to be converted before being exported to other systems. In addition, the database field containing this date information cannot be a date datatype since it is no longer a valid date from the point of view of the database. This results in the loss of using the built-in date manipulation functions inherent in modern database management systems.
Modified Julian Day Number
Modified Julian Day Number (MJD) is similar to replacement encoding, but in this case the entire date is replaced with a new value based on the Julian Day Number (JDN). JDN is a representation of the number of days that have passed since some epoch or base date, in this case 4713 BC. Today's date, May 5, 1998, would be 2450938. Tomorrow would be 2450939. Since these are 7-digit numbers, they will not fit within the 6-digit spaces occupied by current yymmdd formatted dates. To solve this problem, the leftmost digit, in this case the 2, is dropped since it will not change for another 1,500 years. Now, May 5, 1998, is represented as 450938 which will fit within the original 6-digit field. This is the Modified Julian Day Number (MJD). Determining century is not a problem since there are standard routines that convert MJD to any calendar system based solely on the count of days since the beginning of the cycle.
In general, the same Gregorian date formats can be entered and displayed (i.e., 2-digit years), except where century may be ambiguous. This means that input and output routines do not have to be modified except in those instances where users may not be able to determine which century is meant, or the context of the application requires 4-digit years. Windowing may be a useful technique to overcome this problem area.
Date computations, such as computing the age of a person, are simplified. Computing age entails subtracting one MJD from another and taking the absolute value. If the actual Gregorian date is required, it can be converted from the MJD as needed. Internally, the date computation routines would undergo modifications similar to those required for other encoding solutions, i.e., to allow 4-digit year manipulation. Alternately, all computation routines can be converted to use MJD directly. In most cases, this would simplify the computations and reduce system load. All storing and retrieving would be performed by routines modified specifically to handle MJD. These routines would be called only when needed to retrieve or store information in the database or data files. Export of the data can be performed in either Gregorian format or MJD. This is dependent upon whether other users of the interchange data files use Gregorian or MJD. Sort routines do not have to be modified. MJD is numerical and can be sorted according to the collating sequence of digits or binary number representation.
A drawback of MJD is that everyone using the database must convert their systems at the same time. The chance of corrupting export data is significant if not all system managers have planned for the modifications to go into effect during a single changeover process, i.e., no organization can change until all are ready to change. Third-party software may not be able to use MJD encoded dates directly. The data may have to be converted before being exported to other systems. In addition, the database field containing this date information cannot be a DATE datatype since it is no longer a valid date from the point of view of the database. This results in the loss of using the built-in date manipulation functions inherent in modern database management systems.
Expansion
The most straightforward but still extremely difficult conversion with respect to the year 2000 problem is the expansion of all date fields to use 4-digit years. This involves modifying the software and the structure of data files and databases. For systems that use flat file systems or custom-built database management systems, this could cause serious problems in the overall conversion process. Modifying such database structures has the disadvantage of promulgating changes to almost every program that touches the modified structures, whether they have anything to do with date manipulations or not.
Not only do input/output, computation, sort, and database store and retrieve routines have to be modified, but virtually all common code, such as COBOL copy books and C include files, must be modified. The same conversion requirements that befall networking and hardware must be implemented as well. Expansion is comparable to changing operating systems or database management systems in the amount of disruption that can be caused during conversion. Significant planning and coordination is necessary to undertake this type of project. For this reason alone, many organizations opt to use windowing or encoding solutions rather than 4-digit year expansion.
If a commercial database management system (DBMS) is used, modification of the date formats and structure can be made invisible to the software. Since commercially available DBMSs provide functions for converting dates into 15 or 20 different forms, and for performing data computations, many of the structure changes and computation routines can be removed from the software that uses the DBMS.
The most important aspect of the date expansion technique is that an organization will garner a system that is more robust and better able to handle date processing for the long term.
Conclusion
There are many other types of solutions used in the year 2000 conversion problem. These include retirement of systems that will no longer be used, replacement of systems with new custom built systems, reengineering existing systems, replacement with the purchase of commercial off-the-shelf systems, going out of business altogether, or being purchased by another organization that has the wherewithal to provide resources for conversion. An organization contemplating the strategy to use in addressing their year 2000 problem must determine which solution is appropriate based primarily on the factors that are important in its well-being. The number and size of systems operating within an organization, the number of information technology professionals available to work on the conversion effort, the management commitment to the year 2000 conversion effort, the resources available to use in the conversion effort, and many other factors must be taken into consideration in the decision process. This bulletin only describes some of the technical issues of several specific and widely used solutions. The table summarizes the characteristics of each solution discussed in this bulletin.
| CHARACTERISTIC | WINDOWING | ENCODING/ENCAPSULATION | EXPANSION | ||||
| FIXED/MOVING | OFFSET/REPLACEMENT/MJD | |
|||||
| Need to modify internal date formats | Yes/Yes | Yes/Yes/Yes | Yes | ||||
| Need for modified I/O or computation routines | Yes/Yes | Yes/Yes/Yes | Yes | ||||
| Can handle dates >99 years apart | No/No | No/Yes/Yes | Yes | ||||
| Need to modify database values | No/No | Yes/Yes/Yes | Yes | ||||
| Need to modify database routines | No/No | No/No/No | No | ||||
| Need to modify sort routines | Yes, if >99 years/ Yes, if >99 years |
No/Depends on encoding used/No | No |
An individual organization must determine what weight each of these characteristics has in relation to the resources and time available. An organization that has the funds but not the personnel to perform the conversion may opt to hire a contractor to perform the expansion conversion. Another organization with the personnel and expertise but insufficient funds may decide to implement the fixed windowing solution. The crux of the decision is what characteristics are most important to the organization.
Summary
There are numerous technical methods to ameliorate the year 2000 conversion problem, each with its own set of characteristics that make it more or less applicable to an organization's system environment. The three major techniques described and compared in this bulletin, windowing, encoding or encapsulation, and expansion, are all time-consuming and expensive. The selection of one solution over another will have grave and lasting consequences on the conversion project in terms of the amount of code that must be modified, whether or not data values and data structures must be modified, and the resultant stability and maintainability of the converted systems. The importance of each characteristic is a function of the weight attached to it by an organization through the resources and capabilities that are available to support the conversion effort. A decision must be made in order to begin the process. The deadline for finishing conversion is approaching and will not be moved.