H-Net: Preserving and Improving Access to Specialized Electronic Mailing List Archives
About the Project
In October 2005, H-Net developed its first strategic plan, an ambitious roadmap to a new networking and administrative platform. One element of the plan calls for the creation of a more comprehensive backup and preservation program for all of H-Net's digital records. To that end this project will submit H-Net electronic records and preservation protocols for a formal review by an advisory board of electronic records archivists for the first time in the organization's history. The H-Net council and MATRIX staff have long conceived of the body of records as a permanent resource and have sought to implement daily procedures and best practices for storage and access above and beyond the standard electronic mailing list tools typically implemented.
This research and development project has two primary goals. First, we seek to ensure the preservation, authenticity and sustainability of H-Net records. Second, the program seeks to improve and extend access to the H-Net collection as a valuable and extensive public record of scholarly discussion through the development of tools and methods for preserving, searching and sorting electronic mail records. The results of this research will have broader impact for other topical discussion lists and for large tracts of electronic mail generally.
Preserving H-Net records
To ensure the preservation, sustainability and authenticity of the H-Net records (Goal One), the first stage of this research and development project involves reviewing the current preservation practices within the detailed InterPARES framework for preserving authentic records.
We are working with the Archivist Advisory Board to evaluate the execution of preservation in terms of:
- Managing the preservation function;
- Bringing records into the preservation system;
- Maintaining them over time; and
- Outputting the records.
This will entail:
- Fully documenting all current authenticity, preservation, and persistence practices of H-Net;
- Evaluating these practices in light of the recommendations of the InterPARES framework for preservation of authentic electronic records and RLG's An Audit Checklist for the Certification of Trusted Digital Repositories;
- Preparing plan for improvements in H-Net practices for preserving its electronic records -- including MD5 hash analysis and potential LOCKSS participation;
- Reviewing with Archivist Advisory Board and H-Net governing council and Editor Advisory Board during steps 1-3 concerning both relevance to H-Net and to large listserv archives more generally;
- Implementing the preservation plan; and
- Formalizing a plan for longer-term needs and migration strategy.
Enhancing access to H-Net Archive through improved search tools
Goal Two, improving and extending access through the development of tools and methods for preserving, searching and sorting, is to be achieved in part through the aforementioned preservation steps, as well as through the expert user feedback and the research on developing and testing the Semantic Augmented Consensus Clustering (SACC) approach to the H-Net records. The SACC approach, chosen because of its flexibility and capacity for incorporating expert user feedback, will have three major areas of activity.
- First, researchers will provide an architecture for clustering text based on a consensus of multiple clustering techniques.
- Second, we will augment the standard clustering techniques in the consensus with information derived from a novel structured representation called Semantic Relationship Graphs (SRG).
- Third, we will provide a way for users to participate in the clustering process and influence the results of consensus.
Steps for the Semantic Augmented Consensus Clustering work will include:
- Preparing the H-Net records for SACC testing by migrating to discrete storage and applying additional metadata tagging and interoperability standards;
- Developing interfaces for subtopic clustering results;
- User-testing by H-Net scholars and researchers, H-Net Editor Advisory Board and review by Archivist Advisory Board; and
- Refining the SACC approach based on expert user feedback, including field-specific semantic relationships.