Notes
Slide Show
Outline
1
National Software Reference Library
  • Douglas White
  • Information Technology Laboratory
  • July 2004
2
Introduction
  • The National Software Reference Library is:
  • A physical collection of over 5,000 software packages on secured shelves
  • A database of file “fingerprints” (or “hashes”) and additional information to uniquely identify each file on the shelves
  • A Reference Data Set (RDS) extracted from the database onto CD, used by law enforcement, investigators, researchers, others
3
Use of the NSRL
  • Eliminate as many known files as possible from the examination process using automated means
  • Discover expected file name with unknown contents
  • Identify origins of files
  • Look for malicious files, e.g., hacker tools
  • Identify duplicate files
  • Provide rigorously verified data for forensic investigations
4
How Did the NSRL Start?
  • Law Enforcement needed software hashes that could be used in investigations and in court.
  • Source must be unbiased - NIST is a neutral organization
  • Data produced must be of the highest quality
  • Data must be traceable and repeatable
  • There must be a repository of original software
  • NIST provides an open rigorous process
5
NSRL Software Collection
  • Balance of most popular (encountered often) and most desired (pirated often)
    • Currently 32 languages, used internationally
  • Software is purchased commercially
  • Software is donated under non-use policy
  • List of contents available on website
  • www.nsrl.nist.gov
6
NSRL Software Database
  • Information to uniquely identify every file on every piece of media in every application
  • Database schema is available on website
  • 4,200 Bytes per application
  • 750 Bytes per file
  • Total database size now 20 GB for 5,000 applications with 31,900,000 files
7
NSRL Reference Data Set
  • The Reference Data Set (RDS) is a selection of information from the NSRL database
  • Allows positive identification of manufacturer, product, operating system, version, file name from file “signature”
  • Data format available for forensic tool developers
  • Published quarterly, free redistribution
  • Possible to publish critical data out of regular schedule; in February 2004 NSRL supplied 500,000 Arabic file signatures to FBI & DoD
8
RDS Field Use Concept
9
RDS Field Use Example
10
Hashes
  • Like a person’s fingerprint
  • Uniquely identifies the file based on contents
  • You can’t create the file from the hash
  • Primary hash value used is Secure Hash Algorithm (SHA-1) specified in FIPS 180-1, a 160-bit hashing algorithm
    • 1045 combinations of 160-bit values
  • “Computationally infeasible” to find two different files less than 264 bits in size producing the same SHA-1
    • 264 bits is one million terabytes
11
Hash Examples
12
NSRL & National Archives and Records Administration
  • Use hashing process on non-classified Presidential materials
  • Identify application files
  • Identify duplicate files
  • Access to older installed software
13
NSRL & Voting Systems Needs
  • Determine that software used during elections is the expected software
    • Tested, certified version is definitively identifiable
    • Same during distribution, installation, setup, or use
    • “Chain of custody”
  • Transparency
    • The NSRL methodology is in the public domain, available for inspection
    • Jurisdictions can share knowledge with each other
14
EAC & NSRL
  • Can verify that operating system file contents have not been modified
  • Can verify that application file contents have not been modified
  • Can verify that known static sections of files have not been modified
  • At 866MHz, SHA-1 of 50MB takes ~5 sec. , MD5 of 50MB takes ~4 sec.
15
Voting Research Issues
  • Working with software companies to get access to software
  • Distribution vs. installation hashes
  • If there is any setup after the hashes are made, how do you know what changes are valid?
  • Possible/practical to have on-location, time-of-certification hashing?
  • Verification within time/ space/ security constraints
16
Discussion
  • Questions about the NSRL
  • Discussion of the NSRL and Voting Systems


17
Contact
  • Douglas White
  • Software Diagnostics and Conformance Testing
  • Information Technology Laboratory
  • Telephone: 301-975-4761
  • Email:  nsrl@nist.gov
  • Web:    www.nsrl.nist.gov