29-255-3 Me. Code R. § 9

Current through 2024-46, November 13, 2024
Section 255-3-9 - INDEXING AND METADATA

Since digitized images do not have intelligence within them indicating their contents, appropriate index information or metadata is required to properly identify and later retrieve digitized images. For digital images, indexing and file naming schema are essential for locating and retrieving stored imaged records. Indexing typically consists of a structured format and controlled vocabulary that allows more precise description of a record's content.

The state agency must define and document specific indexing requirements needed to access the records efficiently prior to the performance of any imaging and indexing.

Indexing must comply with the specific requirements of the state agency but at minimum it must include the following:

Unique Identifier for Documents: Each document (including each multi-page document) must have a unique filename or other identifier, preferably sequential, which can be numeric, alphanumeric, or alphabetic as required by the government entity. Each filename must be unique across all records series and storage media, not merely within a single disc or other piece of removable media. If required, images will be filed in appropriate electronic folders on the designated storage media.

Indexing Fields/Descriptive Metadata: The index of documents must consist of a number of fields to ensure adequate access to the records. Whenever possible, the field data must consist of objective indexing terms (such as personal names, file numbers, and dates) or terms from a controlled vocabulary (such as subjects or geographical information), rather than subjective data. Index data often includes information such as record type, creation date, record creator, disposition date, among other information.

Indexing Structure: Although the structure of an electronic content management system (ECMS) database is outside the scope of these guidelines, the state agency must have a methodology in place to transfer all the images and corollary index data to the intended retrieval system. The indexing data must be stored in a non-proprietary format to allow its transfer to other systems and databases as needed through the conversion project and for the entire retention period of the records. Each record within the database must be associated with the respective digital image or document via its unique filename.

Optical Character Recognition: If required, optical character recognition (OCR) or intelligent character recognition (ICR) may be performed to convert digital images into electronic text. The government or its chosen vendor must certify the conversion to be at least 95% accurate as measured by character count, and the converted text must be associated with the respective digital image or document. Due to this error rate, OCR will not be used as the sole finding aid when retrieving digitized images. Some manual indexing is always required.

a.Correcting or Making Allowances for OCR Output: Depending on the need for accuracy in the OCR'd text, the text may be reviewed and corrected or fuzzy searching may be used to retrieve character strings. Post-OCR correction consists of review of the OCR output against the original text and hand-correction of the OCR output. Fuzzy searching works by searching for character strings that match or predominately match the character string being searched.

Directory Structure: Regardless of the image filename, files will be organized in a file directory or folder system that will link to metadata stored elsewhere in a database. Directories may have their own organization independent of the image files, such as folders arranged by date or records series number, or they may replicate the physical or logical organization of the originals being digitized.

Technical Metadata: During the imaging process, production metadata will be maintained either within the individual images or separate from but associated with each body of digitized images. For instance, these metadata may be created as part of a digital file during actual imaging, may be added to the file after imaging, may be associated with each file in an ECMS, or may be retained entirely separate from the files but associated to each file by their unique filenames. These metadata will include, at minimum, the following:

Unique identifier Title of records series State Archives or other retention schedule name and item number (from the State General Schedule, Agency Specific Schedule or Records Disposition Authorization Number State agency name Name of the imaging vendor or government staff person conducting the imaging Date of the imaging Pixels per inch (ppi) Equipment used to capture the images Software used to capture the images

The state agency will maintain these metadata for the life of the records.

29-255 C.M.R. ch. 3, § 9