H-Net: Preserving and Improving Access to Specialized Electronic Mailing List Archives
H-Net Digital Preservation Policies and Procedures
H-Net E-Mail List Conformance to OAIS: Information Packages
H-Net messages and their accompanying metadata are present in the system as the three standard Information Package (IP) variants from the Open Archival Information System (OAIS) model: Submission Information Packages (SIPs), Archival Information Packages (AIPs), and Dissemination Information Packages (DIPs). Seven-day concatenations of messages gathered into "notebook" files, along with accompanying preservation metadata, are AIP specializations known as Archival Information Collections (AICs).
Figure 1. H-Net Information Packages
Submission Information Packages (SIPs) in the H-Net preservation system consist of the messages posted by the list editors. The LISTSERV e-mail list software strips out any header information peculiar to the sender's mailer software. Messages then include the body of the message and the following header fields:
- Date--date and time message sent
- Reply-To--list name and e-mail
- Sender--list name and e-mail
- From--name and e-mail of individual sender
- Mime-Version--such as "1.0 (Apple Message framework v752.3)"
- Content-Transfer-Encoding--such as "7bit"
- Content-Type--such as "text/plain; charset=US-ASCII; delsp=yes; format=flowed"
All metadata but the subject line is provided by either the submitter's system or the LISTSERV software. The original author or the list editor must provide subject line metadata.
Within 24 hours of submission, SHA-256 message digest algorithms, or hashes, are created for each message. These are stored in a database and used for fixity checks. At the same time, key metadata is extracted and MD5 hashes are created for each message. This metadata, which is stored in a log browse cache database to expedite message retrieval, includes:
- filename--name of notebook file where message is stored
- offset--byte position in notebook file where message is stored
- from--name and e-mail address of original author or default to editor's information
- subject--as posted by the editor
- dpb--date posted
- cbd--date in a different format for sorting purposes
- messageid--MD5 hash
An Archival Information Package (AIP) consists of the message ingested (body and header)--the Content Information--and additional associated metadata known as Preservation Description Information (PDI) stored in the cache and fixity databases. (See below for how OAIS defines PDI and how H-Net message metadata fulfills the OAIS requirements.) A collection of AIPs is described by the OAIS model as an AIP specialization known as an Archival Information Collection (AIC). In the H-Net system, notebook files consisting of a seven-day accumulation of messages for a given list make up part of an AIC. The AIC also includes a SHA-256 hash assigned to the notebook and stored in the fixity database as PDI.
The OAIS model recommends the association of digital objects with Preservation Description Information (PDI). This PDI becomes part of the AIP or AIC and is also preserved within the system; it includes Reference Information, Context Information, Provenance Information, and Fixity Information. Messages and notebook files preserved in the H-Net system are associated with all four types of PDI.
- Reference Information. The name of an H-Net notebook file acts as its Reference Information. For example, "h-albion.log0702b" is the Reference Information for the notebook file from the H-Albion list for days 8-14 of July 2002. Reference Information for an individual message consists of the name of the notebook file in which it is embedded plus the MD5 hash assigned to it on ingest and stored in the metadata cache. Refer to [H-Net Message Ingest, Storage, and Retrieval Processes] for an explanation of how notebook files are formed and named, and how a combination of a notebook filename and an MD5 hash is used to identify a unique message.
- Context Information. For both notebook files and messages, Context Information may also be gleaned from the notebook filename. Again, the file "h-albion.log0702b" will include all messages posted to the H-Albion list during days 8-14 of July 2002. Additional Context Information for individual messages may be found in the subject line of the message. Subject lines may also show the relationship of a response to an original message and other responses, providing users with the means to follow the thread of a discussion. Also, messages post to notebook files in their original order of submission and approval, providing context regarding other issues under discussion at that time. (See Figure 2.)
- Provenance Information. Again, Provenance Information for both H-Net notebook files and messages is in the notebook filename, as it identifies the list to which the notebook and message belong. Additional Provenance Information for a message may be found in its header, especially the "from" line and metadata about the list editor. Subject lines provide references to threads that can determine how a message originated.
- Fixity Information. As noted above, within 24 hours of posting, the SHA-256 message digest algorithm is used to establish fixity for a message; the resulting hash becomes that message's Fixity Information. The SHA-256 message hashes are stored in a database and used to perform fixity checks when a notebook file closes. If the hashes reconcile, the closed notebook file will receive its own SHA-256 hash as Fixity Information. All notebook file hashes will be stored in the fixity database, and notebook files will be validated on a weekly basis using message digest calculations. Refer to Ensuring the Integrity of the H-Net E-Mail Lists for a more detailed explanation of how H-Net message and notebook fixity is established and checked.
Figure 2. Example H-Albion Browser View Showing Temporal Context of Messages
|Table 1. PDI for AIPs and AICs in the H-Net E-Mail List Preservation System||Preservation Description Information (PDI)||Message (AIP)||Notebook File (AIC)|
|Reference Information||filename + messageid||filename|
|Context Information||filename, subject||filename|
|Provenance Information||filename, from, subject||filename|
|Fixity Information||SHA-256 hash for message||SHA-256 hash for notebook|
When a user selects a message through the H-Net web-based browser interface, the system retrieves it by referencing the metadata stored in the cache. The selected message and metadata from the notebook make up the Dissemination Information Package (DIP). Metadata displayed includes:
- Name and e-mail address of original author
- Name and e-mail address of list editor
- Author's subject
- Editor's subject
- Date the message was written
- Date the message was posted to the list
Figure 3 shows a screen shot of a DIP retrieved through the H-Net browser interface. Although this web-based browser interface is the access method preferred by the majority of users, they also have the option of accessing H-Net DIPs from an e-mail program by typing in LISTSERV commands.
Figure 3. Example DIP retrieved through H-Net browser Interface
Last Revised July 2009