+ All Categories
Home > Documents > Creating a New JHOVE2 Format Module Sheila Morrissey Portico Code4Lib 2011 Bloomington IN, February...

Creating a New JHOVE2 Format Module Sheila Morrissey Portico Code4Lib 2011 Bloomington IN, February...

Date post: 24-Dec-2015
Category:
Upload: grace-butler
View: 218 times
Download: 2 times
Share this document with a friend
Popular Tags:
85
Creating a New JHOVE2 Format Module Sheila Morrissey Portico Code4Lib 2011 Bloomington IN, February 7, 2011
Transcript

Creating a New JHOVE2 Format Module

Sheila MorrisseyPortico

Code4Lib 2011Bloomington IN, February 7, 2011

The preservation problemManaging the gap between what you were given and what you need

– That gap is only manageable if it is quantifiable

– Characterization tells you what you have, as a stable starting point for iterative preservation planning and action

Adopted from A. Brown, “Developing Practical Approaches to Active Preservation,” IJDC 2:1 (June 2007): 3-11.

Characterization

Preservation action

Preservation planning

“What? So what?”

Characterization is the automated determination of the intrinsic and extrinsic properties of a formatted object

– Identification

– Feature extraction

– Validation

– Assessment

Determining the presumptive format of a digital object based on suggestive extrinsic hints and intrinsic signatures

Reporting the intrinsic properties of an object significant for classification, analysis, and planning

Supported formats

JHOVE2 can identify (by DROID) many more formats than it can validate (by modules)

– PRONOM registry documents over 550 “formats”http://www.nationalarchives.gov.uk/PRONOM

Supported formats

ICC color profile (ICC.1:2004-10)

JPEG 2000 JP2 (ISO/IEC 15444-1), JPX (ISO/IEC 15444-2)

PDF PDF 1.0 – 1.7, ISO 3200-1, PDF/A-1 (ISO 19005-1), PDF/X-1(ISO 15920-1), -1a (ISO 15930-4), -2 (ISO 15930-5) -3 (ISO 15930-6)

SGMLShapefileMain, Index, dBASE, …

TIFF TIFF 4 – 6, Class B, F, G, P, R, Y, TIFF/EP (ISO 12234-2),TIFF/IT (ISO 12639), GeoTIFF, Exif (JEITA CP-3451), DNG

UTF-8 ASCII (ANSI X3.4)

WAVE BWF (EBU N22-1997)

XMLZip

Contributed format modules

From Wegener Institute (http://www.awi-potsdam.de)– netCDF– Grib

From NationalbibliothekBibliothèque nationale de France (BnF) (http://www.bnf.fr/fr/acc/x.accueil.html)– arc– gzip

YOU!!!– ???

Characterization strategy

1. Identify format (if not previously identified)

2. Dispatch to appropriate format module

a) Extract format features and validate– If a nested source unit is found, process

recursively…

b) Validate format profiles (if registered)3. Assess

4. If unitary source unit, calculate message digests (optional)

5. If an aggregate source unit, try to identify aggregate format, and if successful, process recursively…

Characterization strategy

directory/

abc.shp abc.shx abc.dbf abc.tif xyz.pdf

Characterization strategy

directory/

abc.shp abc.shx abc.dbf abc.tif

Main Index dBASE GeoTIFF

xyz.pdf

PDF

Characterization strategy

directory/

abc.shp abc.shx abc.dbf

abc.tifclump

Main Index dBASE

GeoTIFF

Shapefile xyz.pdf

PDF

Characterization strategy

directory/

abc.shp abc.shx abc.dbf

abc.tif

clump

clump

Main Index dBASE

GeoTIFF

Shapefile

“GIS object” xyz.pdf

PDF

API design idioms

Separation of concerns– Annotation and reflection

confluence.ucop.edu/display/JHOVE2Info/Background+Papers

Inversion of control (IOC) / dependency injection– Martin Fowler

martinfowler.com/articles/injection.html

– Spring frameworkwww.springsource.org/

Separation of concerns

“Let POJOs be POJOs”– Focus on modeling the format itself

“Let the code write itself”– Reportables “know” how to expose their

properties for display– Reference documentation generated from the

code

Annotation and Reflection:Reportable properties

Each reportable property is represented by a field and accessor and mutator methodsThe accessor method must be marked with the @ReportableProperty annotation

public class MyReportable implements Reportable{ protected String myProperty;

@ReportableProperty(order=1, desc=“description”, ref=“reference”) public String getMyProperty() { return this.myProperty; } public void setMyProperty(String property) { this.myProperty = property; }}

Dependency injection

All JHOVE2 function is embodied in pluggable modules

– Flexible customization Re-sequencing of pre-existing modules

– Easy extensibility Additional format modules and profiles Additional aggregate identifiers Additional displayers New behaviors

RenderabilityModule

JHOVE2 framework

Embodiment of a characterization strategy as a configurable sequence of command-invoked modules

public void characterize(Source source, Input input) throws IOException, JHOVE2Exception{ source.getTimerInfo().setStartTime();/* Update summary counts of source units, by type. */ this.sourceCounter.incrementSourceCounter(source); for (Command command : this.commands){ TimerInfo time2 = command.getTimerInfo(); time2.resetStartTime(); try { command.execute(this, source, input); } finally { time2.setEndTime(); } } source.getTimerInfo().setEndTime();}

Characterization

Creating a New Format Module:What are the deliverables?

• Source code• Configuration files• Sample (test) files• Documents

Format Module Artifacts:Source Code

• Module classes– Module (extends org.jhove2.module.format.BaseFormatModule)

– Profiles (extend org.jhove2.module.format. AbstractFormatProfile) as required by format

– Supporting classes expressing format content model as required by format

• Test classes– JUnit test(s)

Format Module ArtifactsConfiguration Files

• Spring IOC Bean XML configuration files,• For Module• For unit test as needed• For Assessment criteria

• Messages properties file additions if needed• Properties files

• Displayer• Units of measure• Module-specific

Format Module Artifacts:Sample (Test) Files

–Sample files used in unit test• Valid files• Invalid files to exercise validity constraints

Format Module Artifacts:Documentation

• Module Specification DocumentSee examples on the JHOVE2 wiki “Modules Documents” page

<https://bitbucket.org/jhove2/main/wiki/Module>

Format Module Artifacts ListNew CSV Format Module

Source codesrc/main/java/org/jhove2/module/format/csv/CsvModule.javasrc/test/java/org/jhove2/module/format/csv/CsvModuleTest.java

Configuration filesSpring

config/spring/module/format/csv/jhove2-csv-config.xmlconfig/spring/module/assess/jhove2-ruleset-csv-config.xmlsrc/test/resources/config/module/format/csv/test-config.xml

Messagesconfig/messages/jhove2_message.properties (update, not new)

Displayconfig/properties/module/displayer/org/jhove2/module/format/csv/CsvModule_displayer.propertiesconfig/properties/module/units/org/jhove2/module/format/csv/CsvModule_unitproperties (optional)

Module-specific properties filesconfig/properties/module/format/csv/csv.properties (optional, implementation-determined)

Test File(s)src/test/resources/examples/csv/goodFile.csvsrc/test/resources/examples/csv/badFile01.csvsrc/test/resources/examples/csv/badFile02.csv….

DocumentationCSV Module specification document: Jhove2 wiki

Format Module Artifacts:The Good News

• Generate module and profile from interfaces and base classes via inheritance– Classes reflect format’s own content model: cross-cutting “JHOVE2”

concerns handled via annotation (persistence, serialization, generation of JHOVE2 identifiers for reportable properties)

• Template for Spring XML Module configuration files• Utilities to generate

– Displayer properties files– Units of measure properties files– XML assessment configuration file

• Utilities for specification document– Script to generate tabular content for specification document– Macro to import utility-generated tabular content

Format Module: Research and Analysis

• Format Definition (org.jhove2.core.format.Format)– Names– Type (format/family)– Ambiguity (ambiguous/unambiguous)– Identifiers– Specifications– Validity (comprehensive/selective)– Profiles (none)

• Significant (Reportable) properties (org.jhove2.module.format.csv.CsvFormatModule)

Format Definition:CSV Names

• JHOVE2 canonical name– Comma Separated Values

• Format aliases– CSV– DSV

Might already be defined in config/spring/module/format/jhove2-otherFormats-config.xml

Format Definition :CSV Formal Identifiers

• JHOVE2 identifier (see org.jhove2.core.I8R$Namespace)– [JHOVE2] http://jhove2.org/terms/format/csv

• PRONOM (PUID) identifier (used by DROID)– [PUID] x-fmt/18

• MIME type identifier– [MIME] text/csv

• RFC identifer– [RFC] text/csv

• Other identifiers in other namespaces (see org.jhove2.core.I8R$Namespace)

Might already be defined in config/spring/module/format/jhove2-otherFormats-config.xmlIf you are not using DROID, then you MUST have the identifier(s) from the namespace of your identification tool

Format Definition :CSV Formal Identifiers in Spring

<!– Comma Separated Values JHOVE2 identifier bean --> <!-- (canonical identifier in JHOVE2 namespace) --><!– Single constructor arg defaults to JHOVE2 namespace -->

<bean id="CommaSeparatedValuesIdentifier" class="org.jhove2.core.I8R" scope="singleton">

<constructor-arg type="java.lang.String" value="http://jhove2.org/terms/format/csv"/></bean>

<!– Comma Separated Values PUID identifier bean --><!-- (canonical identifier in PRONOM namespace (used by DROID identifier tool)

--><bean id="CommaSeparatedValuesPUID1" class="org.jhove2.core.I8R"

scope="singleton"> <constructor-arg type="java.lang.String”value="x-fmt/18"/> <constructor-arg type="org.jhove2.core.I8R$Namespace" value="PUID"/></bean

Format Definition :CSV Formal Identifiers in Spring

<!–- Comma Separated Values MIME type aliasIdentifier bean --><bean id="CommaSeparatedValuesMIMEType" class="org.jhove2.core.I8R"

scope="singleton"><constructor-arg type="java.lang.String" value="text/csv"/><constructor-arg type="org.jhove2.core.I8R$Namespace" value="MIME"/>

</bean>

<!–- Comma Separated Values RFC aliasIdentifier bean--><bean id="CommaSeparatedValuesRFC4180" class="org.jhove2.core.I8R"

scope="singleton"><constructor-arg type="java.lang.String" value="RFC 4180"/><constructor-arg type="org.jhove2.core.I8R$Namespace" value="RFC"/>

</bean>

Format Definition :CSV Specifications

• For CSV, many variants• Closest document to a format spec is RFC

– RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)

Format Definition :CSV Specification in Spring

<bean id=“CsvSpec" class="org.jhove2.core.Document" scope="singleton"><constructor-arg type="java.lang.String"

value=“RFC 4180 Common Format and MIME Type for CSV Files"/><constructor-arg type="org.jhove2.core.Document$Type" value="Specification"/><constructor-arg type="org.jhove2.core.Document$Intention" value="Authoritative"/><property name="author" value=“Y. Shafranovich"/><property name="date" value=“October 2005"/><property name="identifiers">

<list value-type="org.jhove2.core.I8R"><ref bean=" CsvSpecificationURI "/>

</list></property><property name="publisher" value="The Internet Engineering Task Force (IETF)"/>

</bean>

<!–- CSV RFC specification URI bean --><bean id=“CsvSpecificationURI" class="org.jhove2.core.I8R" scope="singleton">

<constructor-arg type="java.lang.String" value=“http://www.ietf.org/rfc/rfc4180.txt"/><constructor-arg type="org.jhove2.core.I8R$Namespace" value="URI"/>

</bean>

Format Definition :CSV Format Bean Definition in Spring

<!-- Bean for the JHOVE2 Comma Separated Values Format Bean --> <bean id="CommaSeparatedValuesFormat" class="org.jhove2.core.format.Format" scope="singleton"><constructor-arg type="java.lang.String" value="Comma Separated Values"/><constructor-arg ref="CommaSeparatedValuesIdentifier"/> <constructor-arg type="org.jhove2.core.format.Format$Type" value="Format"/> <constructor-arg type="org.jhove2.core.format.Format$Ambiguity" value="Unambiguous"/><property name="aliasIdentifiers">

<set value-type="org.jhove2.core.I8R"><ref bean="CommaSeparatedValuesIdentifier"/><ref bean="CommaSeparatedValuesPUID1"/><ref bean="CommaSeparatedValuesMIMEType"/><ref bean="CommaSeparatedValuesRFC4180"/>

</set></property><property name="aliasNames">

<set><value>CSV</value><value>DSV</value>

</set></property><property name="specifications">

<list value-type="org.jhove2.core.Document"><ref bean="CsvSpec"/>

</list></property></bean>

Format Module:Format Module Recipe

• Create package• Place in inheritance hierarchy• Enforce persistence requirements• Populate static (non-user-configurable) fields• Implement 2-argument constructor • Create module’s Spring Bean• Define reportable properties and associated methods• Annotate reportable properties accessors• Configure Message properties file• Override parse() method• Implement Validator interface methods

Format Module:Create Package

• Package– org.jhove2.module.format.csv

Format Module:Inheritance Hierarchy

• Inheritance– Extends org.jhove2.module.format.BaseFormatModule

– Implements org.jhove2.module.format.Validator

Format Module:Persistence requirements

• Module must be annotated with the BerkeleyDBJE @Persistent annotation

• Module must have a 0-argument constructor• Module should not contain any non-static nested (inner)

classes• Module field type must be

– “simple” Java type or– Persistent type or– Have a com.sleepycat.persist.model.PersistentProxy

implementation created for it in package org.jhove2.persist.berkeleydpl.proxies

Annotate Reportable Properties

Format Module:Persistence requirements

import com.sleepycat.persist.model.Persistent;

// Berkeley DB JE annotation@Persistent

public class CsvModule extends BaseFormatModule implements Validator{/** * No-arg constructor required by persistence layer */@SuppressWarnings("unused")private CsvModule() {

this(null, null);}…

Format Module:Non-configurable fields

@Persistentpublic class CsvModule extends BaseFormatModule implements Validator{/** Directory module version identifier. */public static final String VERSION = "n.n.n";/** Directory module release date. */public static final String RELEASE = "yyyy-mm-dd";/** Directory module rights statement. */public static final String RIGHTS = "Copyright YYYY by "+ "Copyright holder name "+ "Available under the terms of the BSD license.";/** Module validation coverage. */public static final Coverage COVERAGE = Coverage.Inclusive;/** CSV validation status. */protected Validity validity;

Format Module:Two-argument Constructor

/** * @param format * @param formatModuleAccessor */public CsvModule(Format format, FormatModuleAccessor

formatModuleAccessor) {super(VERSION, RELEASE, RIGHTS, format, formatModuleAccessor);this.validity = Validity.Undetermined;

}…

Format Module:Spring Bean

<bean id="CSVModule" class="org.jhove2.module.format.csv.CsvModule" scope="prototype"><constructor-arg ref="CommaSeparatedValuesFormat"/><!–- persistence manger bean ref; same for all format modules =<constructor-arg ref="FormatModuleAccessor"/><property name="developers">

<list value-type="org.jhove2.core.Agent"><ref bean="CSVAgent"/>

</list></property>

</bean>

<!–- Module author bean -<bean id="CSVAgent" class="org.jhove2.core.Agent" scope="singleton">

<constructor-arg type="java.lang.String" value="CSV Author Name"/><constructor-arg type="org.jhove2.core.Agent$Type" value=“Personal"/> <!-- Personal or Corporate -<property name="URI" value="http://www.csvagent.org/"/>

</bean>

Format Module: Reportable Properties:CSV Base Definition

file = [header CRLF] record *(CRLF record) [CRLF]header = name *(COMMA name)record = field *(COMMA field)name = fieldfield = (escaped / non-escaped)escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF /

2DQUOTE) DQUOTEnon-escaped = *TEXTDATACOMMA = %x2CCR = %x0D ;as per section 6.1 of RFC 2234 [2]DQUOTE = %x22 ;as per section 6.1 of RFC 2234 [2]LF = %x0A ;as per section 6.1 of RFC 2234 [2]CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]TEXTDATA = %x20-21 / %x23-2B / %x2D-7E

From RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)

Format Module: Reportable Properties: CSV Complications

• Delimiter character might be “;” instead of “,”• EOL might be “\n” instead of “\r\n”• EOL might be embedded in contents of field• Different implementations escape the escape character

differently– “” vs. \”

• Last record in file might not have EOL• All records might not have same number of fields• Some implementations trim leading/trailing whitespace in

escaped fields• Some implementations allow characters other than ASCII-

printable characters• No syntactic way to detect if first record is “header” record

Format Module: CSV Reportable Properties

• Delimiter character• EOL character(s)• Escape character• Escape character sequence within field• Number of records• Number of fields

– First record– Max– Min– Per record

• Field names from header row• Count of records with embedded EOL• Count of records with embedded escape characters• Count of records with leading/trailing whitespace in escaped fields• Does last record in file have EOL?• Does file contain characters other than ASCII-printable ones?

Format Module: CSV Reportable Properties

• Add significant properties as protected fields to module class – Might need to create ancillary @Persistent class to

reflect model of format– Class should extend org.jhove2.core.reportable.AbstractReportable

• Create public accessors for those fields• Annotate accessors with @ReportableProperty

annotation

Format Module:Reportable Properties: Fields

// Add significant properties as protected fieldsprotected String delimiterCharacter;protected String eolString;protected String escapeCharacter;protected String escapeCharacterSequenceWithinField;protected int recordCount;protected int fieldCountFirstRecord;protected int fieldCountMax;protected int fieldCountMin;protected List<Integer> fieldsPerRecord;protected List<String> fieldNames;protected int recordsWithEmbeddedEolCount;protected int recordsWithEmbeddedEscapeCharCount;protected int recordsWithUntrimmedWhitespaceCount;protected boolean eolInLastRecord;protected boolean containsNonAsciiPrintableChars;

Format Module:Reportable Properties: Accessors

// Create public accessors for reportable properties fieldspublic String getDelimiterCharacter() {...}public String getEolString() {...}public String getEscapeCharacter() {...}public String getEscapeCharacterSequenceWithinField() {...}public int getRecordCount() {...}public int getFieldCountFirstRecord() {...}public int getFieldCountMax() {...}public int getFieldCountMin() {...}public List<Integer> getFieldsPerRecord() {...}public List<String> getFieldNames() {...}public int getRecordsWithEmbeddedEolCount() {...}public int getRecordsWithEmbeddedEscapeCharCount() {...}public int getRecordsWithUntrimmedWhitespaceCount() {...}public boolean isEolInLastRecord() {...}public boolean isContainsNonAsciiPrintableChars() {...}

Format Module:Reportable Properties: Annotation

public @interface ReportableProperty { /** Default description and reference value. */ public static final String DEFAULT = "Not available."; /** * Property type: raw or descriptive. A raw property reports itself in the exact form that was found * in the source unit; a descriptive property reports itself in a more human-readable form. */ public enum PropertyType {Default, Raw, Descriptive} /** * Ordinal position of this property relative to all properties directly defined in a class. */ public int order() default 1; /** * Property reference, a citation to an external source document that defines the property. */ public String ref() default DEFAULT;

/** Property type: raw or descriptive. */ public PropertyType type() default PropertyType.Default;

/** Property description. */ public String value() default DEFAULT;}

Format Module:Reportable Properties: Annotation

@ReportableProperty( order=10, value="Character used to delimit fields in record.",

ref="RFC 1480, Section 2, paragraph 4")public String getDelimiterCharacter() {return delimiterCharacter;

}

Format Module:Reportable Message Properties

import org.jhove2.core.Message;

…// (Reportable) Message propertiesprotected Message delimiterCharNotFoundMessage;

Format Module:Configure Message Properties File

############################################################################## Message templates for class org.jhove2.module.format.csv.CsvModule# #########################################################################

org.jhove2.module.format.csv.CsvModule.DelimitorCharacterNotFoundMessage=No occurrence of delimiter character {0} found in source

#

Added to file config/messages/jhove2_messages.properities

Format Module:Message Creation

Object[]messageArgs = new Object[]{csvDelimiterChar};

delimiterCharNotFoundMessage = new Message( Severity.WARNING,

Context.OBJECT,"org.jhove2.module.format.csv.CsvModule.DelimitorCha

racterNotFoundMessage",messageArgs,jhove2.getConfigInfo());

Format Module: Override Parse() method

/** * Parse a source unit. * @param jhove2 JHOVE2 framework * @param sourceunit * @param input CSV source input * @return Number of bytes consumed * @throws EOFException * @throws IOException * @throws JHOVE2Exception */

@Override public long parse(JHOVE2 jhove2, Source source, Input input) throws IOException, JHOVE2Exception { // where the real work happens // parse the Source (take care of those CSV complications!!) // populate reportable properties // construct any Error, Warning, or Info messages return 0; }

Format Module: Override Parse() method

Some Implementation Choices:• Write from scratch

– TIFF– WAV– UTF-8– ICC

• Wrap existing JAVA library– XML– Beware of persistence traps: Inner classes, non-persistable fields

• Wrap existing non-JAVA library– SGML– Beware of performances hits (shell out) or memory leaks (JNI)

Format Module: Implement Validator methods

/* (non-Javadoc) * @see org.jhove2.module.format.Validator#getCoverage() */@Overridepublic Coverage getCoverage() {

return this.COVERAGE;}/* (non-Javadoc) * @see org.jhove2.module.format.Validator#isValid() */@Overridepublic Validity isValid() {

return this.validity;}

Format Module: Implement Validator methods

/* (non-Javadoc) * @see

org.jhove2.module.format.Validator#validate(org.jhove2.core.JHOVE2, org.jhove2.core.source.Source, org.jhove2.core.io.Input)

*/@Overridepublic Validity validate(JHOVE2 jhove2, Source source, Input

input)throws JHOVE2Exception {

//Parse might already have set validity; if not; test //reportable fields values and setif (this.validity.equals(Validity.Undetermined)){

//...}return this.validity;

}

Format Module: Unit Test

• JUnit 4• Important to include both good and bad

sample files

Format Module: Unit Test

package org.jhove2.module.format.csv;import static org.junit.Assert.*;import org.junit.Before;import org.junit.Test;

public class CsvModuleTest {@Before

public void setUp() throws Exception {}@Test

public void testValidate() {fail("Not yet implemented");

}@Test

public void testParse() {fail("Not yet implemented");

}}

Format Module: Unit Test: Where it Goes

Unit tests: src/test/java/org/jhove2/module/format/csv

Sample (test) files src/test/resources/examples/csv

Spring beans for unit tests: src/test/resources/config/module/format/csv– Update Spring configuration file filepaths-config.xml with

base path of your sample file <bean id="csvDirBasePath" class="java.lang.String" >

<constructor-arg type="java.lang.String" value="examples/csv/"/>

</bean>

Format Module Artifacts:What’s Left?

• Source code• Configuration files• Sample (test) files• Documents

Format Module ArtifactsConfiguration Files

• Spring IOC Bean XML configuration files,• For Module• For unit test as needed• For assessment

• Messages properties file additions if needed• Properties files

– Displayer– Units of measure– Module-specific

Format Module: CSV Assessment Criteria

• Delimiter character=?• EOL character(s)=?• Escape character =?• Escape character sequence =?• All records have same number of columns?• Contains no escaped fields with untrimmed

whitespace?• Contains no characters other than ASCII-printable?• Contains no fields with embedded EOL?See Richard Anderson’s workshop this

afternoon!!!!

Configuration Files:“We’ve got an app for that!”

• Displayer– jhove2_dpfg.cmd (Windows)– jhove2_dpfg.sh (Unix)

• Units of measure– jhove2_upfg.cmd (Windows)– jhove2_upfg.sh (Unix)

Configuration Files:Displayer Properties

USAGE:jhove2_dpfg.cmd <fully-qualified-classname> <output-directory-path>

Configuration Files:Displayer Properties

Example:jhove2_dpfg.cmd org.jhove2.module.format.csv.CsvModule c:\props

Command line output:Succesfully created displayer property file for class org.jhove2.module.format.csv.CsvModule

File can be found at c:\props\org\jhove2\module\format\csv\CsvModule_displayer.properties

Configuration Files:Editable File

# _displayer.properties# The visibility directives control the display of the properties identified by URI# The directives can be: Always, IfFalse, IfNegative, IfNonNegative, IfNonPositive,# IfNonZero, IfPositive, IfTrue, IfZero, Never# A property is not displayed if its value is not consistent with the directive.# Negative means ...,-2,-1; NonNegative means 0,1,2...# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EolString Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EscapeCharacter Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EscapeCharacterSequenceWithinField Always |

Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord Always | Never |

IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountMax Always | Never | IfNegative |

IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountMin Always | Never | IfNegative |

IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldNames Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldsPerRecord Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordCount Always | Never | IfNegative |

IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithEmbeddedEolCount Always | Never |

IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithEmbeddedEscapeCharCount Always |

Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithUntrimmedWhitespaceCount Always

| Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars Always |

Never | IfTrue | IfFalsehttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isEolInLastRecord Always | Never | IfTrue |

IfFalse

Configuration Files:Editable File

# _displayer.properties# The visibility directives control the display of the properties

identified by URI# The directives can be: Always, IfFalse, IfNegative, IfNonNegative,

IfNonPositive,# IfNonZero, IfPositive, IfTrue, IfZero, Never# A property is not displayed if its value is not consistent with the

directive.# Negative means ...,-2,-1; NonNegative means 0,1,2...# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter Always | Never

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord Always | Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars Always | Never | IfTrue | IfFalse

Configuration Files:Editable File

# _displayer.properties# The visibility directives control the display of the properties

identified by URI# The directives can be: Always, IfFalse, IfNegative, IfNonNegative,

IfNonPositive,# IfNonZero, IfPositive, IfTrue, IfZero, Never# A property is not displayed if its value is not consistent with the

directive.# Negative means ...,-2,-1; NonNegative means 0,1,2...# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter Always

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord IfPositive

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars IfTrue

Configuration Files:Units of Measure Properties

USAGE:jhove2_upfg.cmd <fully-qualified-classname> <output-directory-path>

Configuration Files:Units of Measure Properties

Example:jhove2_upfg.cmd org.jhove2.module.format.csv.CsvModule c:\props

Command line output:Succesfully created unit property file for class org.jhove2.module.format.csv.CsvModule

File can be found at c:\props\org\jhove2\module\format\csv\CsvModule_unit.properties

Configuration Files:Editable File

# Units of measure properties# Note: These unit of measure labels are descriptive only; changing the label# does NOT change the determination of the underlying property value.http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

RecordCount http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

RecordsWithUntrimmedWhitespaceCount http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

RecordsWithEmbeddedEscapeCharCount http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

FieldCountMax http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

FieldCountMin http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

FieldCountFirstRecord http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

RecordsWithEmbeddedEolCount

Configuration Files:Editable File

# Units of measure properties# Note: These unit of measure labels are descriptive only; changing the label# does NOT change the determination of the underlying property value.

http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

RecordsWithEmbeddedEolCount record

Format Module Artifacts:What’s Left?

• Source code• Configuration files• Sample (test) files• Documents

– Format Module Specification Document• “We’ve got an app for (part of) that!”

Documentation :Specification Sections

1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes

Documentation :Minimal template edit

1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes

Documentation :Sections from Tabular Data

1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes

Documentation :Write “By Hand”

1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes

DocumentationModule Specification Recipe

• Create module specification from Word Template• Generate tabular information (reportable

properties)• Use Word macro to format tabular information

for pasting into module specification• Complete other sections• Add specification document to JHOVE2 wiki

Documentation :Create Tabular Data

• Generate tabular information (reportable properties) for format module specification– jhove2_doc.cmd (Windows)– jhove2_doc.sh (Unix)

Documentation :Create Tabular Data

USAGE:jhove2_doc.cmd<fully-qualified-classname> <output-directory-path

Documentation :Create Tabular Data

• Outputs– CsvModule_id.txt

• (Section 2: Identification)

– CsvModule_ref.txt • (Section 3: References)

– CsvModule_Reportable_properties.txt• (Section 7: Reportable properties)

Documentation :Format tabular data with Macro

• Edit the output file in WordPad or NotePad to save with MS line endings)

• Follow instructions in Macro file to create formatted text

• Copy and paste in Specification document

Documentation :Create Tabular Data

IN generated file:

Property DelimiterCharacterIdentifier http://jhove2.org/terms/property/org/

jhove2/module/format/csv/CsvModule/DelimiterCharacterType java.lang.StringDescription Character used to delimit fields in

record.ReferenceRFC 1480, Section 2, paragraph 4

Documentation :Create Tabular Data

DelimiterCharacter PropertyIdentifier http://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/

DelimiterCharacter

Type java.lang.StringDescription Character used to delimit fields in record.Reference RFC 1480, Section 2, paragraph 4

DocumentationModule Specification Recipe

• Create module specification from Word Template• Generate tabular information (reportable

properties)• Use Word macro to format tabular information

for pasting into module specification• Complete other sections• Add specification document to JHOVE2 wiki

Questions?http://jhove2.org

[email protected]@listserv.ucop.edu

CDLStephen AbramsPatricia CruseJohn KunzeIsaac RabinovitchMarisa StrongPerry Willett

Stanford UniversityRichard AndersonTom CramerHannah Frost

PorticoJohn MeyerSheila Morrissey

Library of CongressMartha AndersonJustin Littman

With help fromWalter HenryNancy HoebelheinrichKeith JohnsonEvan Owens

Advisory BoardDeutsche NationalbibliothekDspace / MITEx LibrisFedora Commons / RutgersFlorida Center for Library AutomationHarvard UniversityKoninklijke BibliotheekNational Archives (UK)National Archives (US)National Library of AustraliaNational Library of New ZealandNationalbibliothekBibliothèque nationale de France (BnF)Planets / Universität zu KölnTessella


Recommended