The research objective underlying the work of PREFORMA is to explore critical factors in the quality of standard implementation in order to establish a long-term sustainable ecosystem around a range of practical tools, together with a variety of stakeholder groups. The tools should be innovative and provide a reference implementation of the most common file format standards for the assessment of the collections to be archived and for the correction of the digital archive. This involves acquiring knowledge about:
- how to establish a methodology or an objective frame of reference to interpret and implement the standard specifications against the background of the current variations of interpretations and implementations by software vendors; is there a need to consolidate the diverse implementations or is a better approach to centralize the interpretation to a specific implementation (i.e. promote one interpretation and implementation as the standard)?
- given the answer to the first question, how to determine whether a file is what it claims to be, i.e., in this context, what makes a file a valid file, i.e., conform to the “standard”?
- how can the open source project continue to be developed and sustained in the short and long run; can an open source community operate as the normative source for the answer to the first and second question?
PREFORMA research and development activities aim to empower memory institutions to gain full control over the technical properties of preservation files. This is achieved through the development of an open-source conformance checker and the establishment of a healthy ecosystem around an open source ‘reference’ implementation for specific file formats.
The first activity is to develop an open-source toolset for conformance checking of digital files, intended for long-term preservation in memory institutions. The conformance checker:
- verifies whether a file has been produced according to the specifications of a standard file format, and hence,
- verifies whether a file matches the acceptance criteria for long-term preservation by the memory institution,
- reports in human and machine readable format which properties deviate from the standard specification and acceptance criteria,
- performs automated fixes for simple deviations in the metadata of the preservation file, leaving the original bitstream untouched and created a correct copy of the object to be preserved.
Development of the conformance checker focuses on four use cases that facilitate the interaction between the supplier, academic research and memory institution. They are compliant with the OAIS Reference Model and represent conformance checking procedures at different moments in the life cycle of a preservation file:
- Conformance Checking at Creation Time: Producers pro-actively check if technical properties of a file meet the acceptance criteria of an OAIS Archive, e.g. government agencies checking conformance of text documents to be deposited at public archives when the document is made available.
- Conformance Checking at Transfer time: Archives check the technical properties of files ingested in the OAIS Archive, assessing whether they meet the acceptance criteria for ingest and conformance to the relevant preservation file formats, e.g. libraries monitor the preservation status of digital publications deposited in their digital repository.
- Conformance Checking at Digitization time: Archives check the technical properties of digital representations of collection items, internally or externally produced, if they meet the requirements specified in the digitization tender, e.g. museums doing quality control on the digital representations and documentation, produced by photographers.
- Conformance Checking at Migration time: Archives check the technical properties of files that are repackaged or transcoded, following the rules defined in the preservation strategy of the OAIS Archive, e.g. libraries doing quality control when transcoding audiovisual files from a ‘transmission’ to a ‘preservation’ format.
The conformance checker should allow for deployment in different infrastructures and environments such as:
- the PREFORMA project website, demonstrating the scope and functionality of the tool,
- deployment within an evaluation framework that facilitates gathering structured feedback on the conformance checking process. PREFORMA will require deployment within the DIRECT infrastructure for test and evaluation of the tool in the PCP procedure (see next paragraph),
- the package file must be executable and capable of running stand-alone on a PC. This ensures the conformance checker can be used in small-scale institutions without centralized IT infrastructure,
- the tool must allow for deployment in network-based solutions (dedicated server, cloud solutions) for digital repositories,
- the tool must allow for plugging it into proprietary legacy systems via API’s.
The second activity is to establish a network of common interest in order to gain control over the technical properties of preservation files. This involves the adoption of a ‘reference implementation’ by other software applications, and continuous improvement of the ‘standard’ specification through engagement in the standardization process.
The network gathers all stakeholders that control different stages in the lifecycle of a preservation file, providing a sustainable and viable ecosystem for the deployment of tools developed by PREFORMA as well as tools adopting the reference implementation. These stakeholders include:
- developers, controlling the production of preservation files, e.g. by file editors or transcoders, thus aiming at improving the effectiveness and interoperability of their software.
- digital preservationists, controlling the acceptance and management of preservation files in digital repositories, thus aiming at improving the preservation status of the digital collection they maintain and the effectiveness of the ingest procedures.
- standardization bodies, maintaining the formal specifications of file formats in standards, thus aiming to improve the specification of the standard.