|
1
|
- Douglas White
- Information Technology Laboratory
- July 2004
|
|
2
|
- The National Software Reference Library is:
- A physical collection of over 5,000 software packages on secured shelves
- A database of file “fingerprints” (or “hashes”) and additional
information to uniquely identify each file on the shelves
- A Reference Data Set (RDS) extracted from the database onto CD, used by
law enforcement, investigators, researchers, others
|
|
3
|
- Eliminate as many known files as possible from the examination process
using automated means
- Discover expected file name with unknown contents
- Identify origins of files
- Look for malicious files, e.g., hacker tools
- Identify duplicate files
- Provide rigorously verified data for forensic investigations
|
|
4
|
- Law Enforcement needed software hashes that could be used in
investigations and in court.
- Source must be unbiased - NIST is a neutral organization
- Data produced must be of the highest quality
- Data must be traceable and repeatable
- There must be a repository of original software
- NIST provides an open rigorous process
|
|
5
|
- Balance of most popular (encountered often) and most desired (pirated
often)
- Currently 32 languages, used internationally
- Software is purchased commercially
- Software is donated under non-use policy
- List of contents available on website
- www.nsrl.nist.gov
|
|
6
|
- Information to uniquely identify every file on every piece of media in
every application
- Database schema is available on website
- 4,200 Bytes per application
- 750 Bytes per file
- Total database size now 20 GB for 5,000 applications with 31,900,000
files
|
|
7
|
- The Reference Data Set (RDS) is a selection of information from the NSRL
database
- Allows positive identification of manufacturer, product, operating
system, version, file name from file “signature”
- Data format available for forensic tool developers
- Published quarterly, free redistribution
- Possible to publish critical data out of regular schedule; in February
2004 NSRL supplied 500,000 Arabic file signatures to FBI & DoD
|
|
8
|
|
|
9
|
|
|
10
|
- Like a person’s fingerprint
- Uniquely identifies the file based on contents
- You can’t create the file from the hash
- Primary hash value used is Secure Hash Algorithm (SHA-1) specified in
FIPS 180-1, a 160-bit hashing algorithm
- 1045 combinations of 160-bit values
- “Computationally infeasible” to find two different files less than 264
bits in size producing the same SHA-1
- 264 bits is one million terabytes
|
|
11
|
|
|
12
|
- Use hashing process on non-classified Presidential materials
- Identify application files
- Identify duplicate files
- Access to older installed software
|
|
13
|
- Determine that software used during elections is the expected software
- Tested, certified version is definitively identifiable
- Same during distribution, installation, setup, or use
- “Chain of custody”
- Transparency
- The NSRL methodology is in the public domain, available for inspection
- Jurisdictions can share knowledge with each other
|
|
14
|
- Can verify that operating system file contents have not been modified
- Can verify that application file contents have not been modified
- Can verify that known static sections of files have not been modified
- At 866MHz, SHA-1 of 50MB takes ~5 sec. , MD5 of 50MB takes ~4 sec.
|
|
15
|
- Working with software companies to get access to software
- Distribution vs. installation hashes
- If there is any setup after the hashes are made, how do you know what
changes are valid?
- Possible/practical to have on-location, time-of-certification hashing?
- Verification within time/ space/ security constraints
|
|
16
|
- Questions about the NSRL
- Discussion of the NSRL and Voting Systems
|
|
17
|
- Douglas White
- Software Diagnostics and Conformance Testing
- Information Technology Laboratory
- Telephone: 301-975-4761
- Email: nsrl@nist.gov
- Web: www.nsrl.nist.gov
|