H-Net: Preserving and Improving Access to Specialized Electronic Mailing List Archives
H-Net Digital Preservation Policies and Procedures
H-Net Message Ingest, Storage, and Retrieval Processes
The H-Net electronic mailing lists run on L-Soft International Inc.'s LISTSERV automated electronic mailing list software application. As such, LISTSERV provides the basis for the ingest, storage, and retrieval processes of the H-Net e-mail list system. H-Net list editors and subscribers interact with the LISTSERV software through a web-based browser interface or by typing commands in their e-mail program.
When a subscriber sends a message to an H-Net list, it is forwarded to a list editor for approval. The editor either approves the message as is or edits the content before forwarding it back to LISTSERV for posting. Some private H-Net lists permit subscribers to post messages without going through an editor. If an editor makes any changes to a message, the author and date written (creation date) metadata changes to reflect the editor's name and the current date. The editor has the option to manually re-enter the original message's author, date, and subject.
When the editor sends the message for posting, he or she receives an acknowledgement message. (Note that the editor has the option to turn off this acknowledgement feature.) The message will then appear in the mailboxes of all subscribers who have chosen to receive messages as they post. In terms of the Open Archival Information System (OAIS) model, a message sent to the LISTSERV server for posting by an editor or other responsible party is a Submission Information Package (SIP). (Refer to H-Net E-Mail List Conformance to OAIS: Information Packages for more information on H-Net SIPs.)
Figure 1. H-Net Message Posting Process
In addition to distributing H-Net e-mail messages, the LISTSERV software also creates, stores, and maintains message collections. Within 24 hours of ingest, each message receives SHA-256 and MD5 hashes. The SHA-256 hashes are stored in a database and will be used for performing fixity checks. (See Ensuring the Integrity of the H-Net E-Mail Lists. Key metadata is extracted and cached with the MD5 hashes in a separate database to be used for message discovery and retrieval purposes. (See Figure 2.)
Figure 2. Example of Cached Metadata for Two Messages on H-Albion List
The message itself becomes part of a file known as a "notebook." Each notebook is a flat text file that concatenates with messages as they post to a given list during a seven-day time period. The time periods run every seven days by day of the month, with an extra time period for months having 29, 30, or 31 days. Days 1-7 of the month are known as time period "a," days 8-14 as time period "b," and so on. (See Table 1.) At the end of a time period, the active notebook file closes and a new one is created. Each closed notebook file receives a SHA-256 hash that is stored in the fixity database.
Table 1. H-Net Notebook File Time Periods
|Days of Month||Time Period|
Names of notebook files are constructed to include the name of the H-Net list and the year, month, and time period of messages included in that file. For example, the filename of the notebook file covering the first seven days of February 2008 in the H-Africa list would be "h-africa.log0802a." Every message posted to the list during that time period will be in that notebook file, in the order in which it posted. (See Figure 3.)
Figure 3. Portion of Notebook File from H-Albion List
In OAIS model terms, the ingested message and its metadata stored in the fixity and cached metadata databases comprise an Archival Information Packages (AIP). Each notebook file--a collection of AIPs--plus its SHA-256 hash make up an Archival Information collection (AIC). (Refer to H-Net E-Mail List Conformance to OAIS: Information Packages for more information on H-Net AIPs and AICs.)
Users of a given H-Net public list may access messages through the H-Net web-based browser interface or by using LISTSERV commands. To access a message through the browser, the user navigates to the list of choice and clicks on the "Discussion Logs" link on that list's home web page. From there, they select the desired month for which they wish to view messages. They are then presented with a list of messages and can click on the "View" link next to a message they wish to view. (See Figure 4.) A log browse application pulls the message from the original notebook file and builds a URL for the message consisting of the notebook filename and the unique MD5 hash associated with that particular message. The message then transforms into HTML for viewing in the browser. (See Figure 5.)
Figure 4. Posted Message List in H-Albion
Figure 5. Sample H-Net Retrieved Message View, with URL Detail Above
Using LISTSERV commands, a user may also bring up a desired e-mail message in their e-mail program. Users may also search for messages through the browser interface. Every 24 hours, the newest messages in the notebook file are parsed and copied to a BRS database from which they are available for full-text search. Selected messages are displayed in the browser. In OAIS terms, a message displayed in the user's browser or e-mail program and its accompanying metadata comprise a Dissemination Information Package (DIP) in the H-Net system. (Refer to H-Net E-Mail List Conformance to OAIS: Information Packages for more information on H-Net DIPs.)
Figure 6. H-Net Message Ingest, Storage, and Retrieval Processes
Last Revised July 2009