NANETHTIFFs
“TIFF stands for Thousands of Incompatible File Formats”
...0000 0000 1000 0000…
…0000 0000 0100 0000…
…0000 0000 0100 0000…
...0000 0000 1000 0000…
OR
What does “valid” mean?
• Well-formed?– These images are!
• Verified by external tool?– These images are!
• Confirmed as matching spec?– Who knows?
• Internally consistent?– These images are definitely NOT!
How do we detect it?
$ exiftool -n -if '$FileSize# * 8 < $ImageWidth * $ImageHeight * $BitsPerSample' -p '$filename is TOO SMALL - $FileSize - ($ImageWidth*$ImageHeight*$BitsPerSample)' *.tif
NL-HaNA_2.24.01.09_0_901-2649.tif is TOO SMALL - 9952834 - (3454*2877*16)
NL-HaNA_2.24.01.09_0_901-2809.tif is TOO SMALL - 9952834 - (3454*2877*16)
NL-HaNA_2.24.01.09_0_901-3431.tif is TOO SMALL - 9809944 - (2859*3425*16)
NL-HaNA_2.24.01.09_0_901-4419.tif is TOO SMALL - 9807680 - (3425*2859*16)
NL-HaNA_2.24.01.09_0_901-4451.tif is TOO SMALL - 9807680 - (3425*2859*16)
NL-HaNA_2.24.01.09_0_901-5197.tif is TOO SMALL - 9809944 - (2859*3425*16)
NL-HaNA_2.24.01.09_0_901-5556.tif is TOO SMALL - 9807680 - (3425*2859*16)
BUT: As it stands, this only works on grayscale TIFFs
How do we FIX it?
$ exiftool -BitsPerSample=8 NL-HaNA_2.24.01.09_0_901-5556.tif
AGAIN: As it stands, this only works on grayscale TIFFs
What do we take away?
• Don’t always assume your vendors/digitisers are doing the job right.
• Don’t always assume that “successful validation” is meaningful. (Also: learn the limitations of your tools.)
• The only thing better than double-checking is triple-checking.
• KNOW WHAT YOU ARE “PRESERVING”!
One last time…