Seminario Maurizio Agelli, 20-09-2012

Post on 20-Aug-2015

766 views 6 download

Tags:

transcript

Archiving and Cataloging Digital Photographs

Maurizio Agelli, CRS4

{ agelli@crs4.it }

September 20th 2012, 5.30pm

Aula Magna Facoltà di Architettura - Via Corte d'Appello - Cagliari

Point de vue du Gras, Nicéphore Niépce, 1826 (from Wikimedia Commons)

Boulevard du Temple, Louis Daguerre, 1838 (from Wikimedia Commons)

The first photograph was taken less than 200 years ago ...

How many photos have ever been

taken ?

[ source: Jonathan Good, 2011 - 1000memories.com ]

500 to 800 billiontaken in 2011 [source: Observatoire des Professions de l'Image ]

Number of photos ever shot (up to 2011): ~3.5 x 1012

Presentation Outline

1) Archiving as part of the photographic workflow

2) Describing photographs: metadata

3) Organizing images in catalogs

4) Ensuring long-term storage: backup and migration

5) An overview of image archiving tools

6) A Digital Asset Management platform developed at CRS4

- 1 -

Archiving as part of the photographic workflow

Photo Archive

A collection of images kept in secure, long-term storage.

[ dpBestflow.org ]

Pho

to b

y S

eew

eb -

CC

BY

-SA

2.0

Pho

to b

y M

.Age

lli -

CC

BY

-SA

2.0

Building a digital photo archiveinvolves many decisions ...

File formats

Metadata File naming

Folder structure

Catalog organization

Backup policies

Archiving platform

... which strongly depend on the photographic workflow

Migration policies

What to archive ?

A general workflow

Capture Ingestion Working Publishing

Archive

No single workflow suits all photographers and all clients [UPDIG]

Workflow decisions are determined by volume production, turnaround, image quality requirements, regulations, costs, etc..

A general workflow, more in detail

Capture Ingestion Working Publishing

Archive

camera computer

All camera-related stuff

- Image transfer- File renaming- Add bulk metadata- Batch editing- Format conversion

Focus on volume and speed

- Image editing- Metadata editing- Create derivative work

Focus on quality

- Export images- Print images- Publish to web

Store, search, organize, ...

Digital Asset Management Platform

File formats / 1

Camera sensor

In-camera processing

Scanner

TIFFJPEG(DNG)

RAWJPEG(DNG)(TIFF)

Film

RAWMany RAW formats (>200).Proprietary, undocumented.Encodes values from camera sensor, before demosaicing (12-16 bit/pixel, 1 color/pixel) .Lossless. May be compressed.

TIFFOpen standard.8, 16, 32 bit RGBLossless, big file size !Possible PSD replacement (supports layers).

DNG (DIGITAL NEGATIVE)

Open standard, created by Adobe.Targeted to replace RAW, but stilllimited adoption by the industry.

File formats / 2

JPEGOpen standardCompressed, lossy8 bit RGB: suitable for displaying, not good for editing

~35 MB

~5.3 MB

TIFF48 bit / pixel

uncompressed

NEF12 bit / pixelcompressed

JPEG 2000Better compression than Jpeg (wavelet transform vs. cosine transform)8, 16 bit RGBLossless / lossyMany extra features: regions of interest, progressive decoding, multi-resolution decoding.

Example: 6Mpixel image (Nikon D40)

~5 MB

DNG12 bit / pixelcompressed

JPEG90%

quality

~0.6 MB

File formats and image editing

CAMERAPARAMETRIC EDITING

RASTER EDITING

EXPORTRAW RAW or DNG JPG

TIFF or DNGEXPORT

JPG

Parametric Image EditingImage data are not modified.Source file is preserved. Editing is saved as a list of rules which are applied at rendering time.(e.g. Lightroom, Aperture)

Raster Image EditingImage pixels are modified.A new file containing the edited image shall be saved in order to preserve the original.(e.g. Photoshop, Picture Window Pro)

TIFF or DNG

CAPTURE

INGESTION

WORKING

File formats decision tree

PUBLISHINGJPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG

JPG JPG TIFF RAW TIFF DNG TIFF JPG DNG TIFF JPG DNG TIFF

JPG JPG TIFF RAW DNG JPG DNG JPG DNG TIFF

JPG RAW DNG TIFF

CAMERA SCANNER

Note: unusual decision paths have been omitted

Capture Ingestion Working Publishing

A r c h i v e

Which files to archive?

ORIGINALFILES

MASTERFILES

DERIVATIVEFILES

- 2 -

Metadata

The importance of metadata

"An image is worth 1000 words", but ...

... there are questions which only words can answer:

When was it shot?

... and where?

Who are those people?

Who took this photograph ?

Can I use it freely ?

Pho

to b

y M

auriz

io A

gelli

- C

C B

Y-S

A 2

.0

Metadata

Information about content.

Pho

to b

y M

. Age

lli -

CC

BY

-SA

2.0

A more precise definition

METADATA

"Structured encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities"

[source American Library Association]

Image metadata is nothing new ...

Pho

to b

y an

yjaz

z65

[ CC

BY

-NC

2.0

] ht

tp://

ww

w.fl

ickr

.com

/pho

tos/

4902

4304

@N

00/

Where digital image metadata can be written?

image data

metadata

+image data

metadata

○ inside the image file

○ in a sidecar file

○ in a database○ in an online registry○ in the file name

d40-20120920-DSC_0153-edited.jpgcamera date id derived

Image metadata standards

EXIFIPTC

XMPMpeg-7

DICOM

PLUS

Creative Commons

Dublin Core

IPTC IIMInformation Interchange ModelCreated in 1991 by International Press Communication CouncilAdobe defined the mechanism for embedding IPTC IIM metadata in image files (1994)Driven by NEWS INDUSTRYFocused on high-level properties (description, geo location, ...) Cannot be extended

EXIFExchangeable Image File FormatCreated in 1995 by Japan Electronic Industries Development AssociationDriven by CAMERA MANUFACTURERSFocused on low-level properties (camera settings, geo coordinates, date/time, ...) Cannot be extended

Image Data

EXIF

IPTC IIM

XMPExtensible Metadata PlatformOpen standard, created by Adobe○ defines a data model and a

serialization model (RDF/XML)○ also covers video, audio, text○ structured as a set of schemas○ can be extended with new

metadata schemas○ multi-lingual qualifiers○ can be serialized and stored in

most file formats (not in RAW!)○ it is widely supported by the

industry

Image Data

EXIF

IPTC IIM

XMP

Legacy Metadata

Dublin Core

XMP Basic

Rights

Media Mng

Photoshop

Camera RAW

EXIF

IPTC Core

IPTC Extens.

...

A timeline of image standards

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2001

First DSLR(Kodak DCS-100)

professional DSLRs

EXIF(first release)

JPEG(first release)

Kodak Photo CD

TIFF(first release)

IPTCIIM

IPTCHeaders(Adobe)

XMP(first release)

consumer DSLRs

A quick look inside XMP>200 properties + all EXIF and IPTC properties

TITLE (dc:title)DESCRIPTION (dc:description)DESCRIPTION WRITER (photoshop:CaptionWriter)RATING (xmp:Rating)KEYWORDS (dc:subject)GEO COORDINATES (exif:GPSLatitude, exif:GPSLongitude)LOCATION (photoshop:Country, photoshop:State, photoshop:City,..)AUTHOR (dc:creator, exif:Artist)RIGHTS (xmp:Rights).....

A quick look inside XMPDate/Time Metadata

The originalpainting( ~1507)

Iptc4xmpExt:AODateCreated

An ancient postcard(1925)

photoshop:DateCreated

The digital representationof the postcard(2008)

xmp:CreateDate

The archived image (metadata last edited in 2012)

xmp:MetadataDate

Extending XMPCreative CommonsCC provides a legal and technical infrastructure to help people share knowledge and creativity.

Pho

to b

y C

reat

ive

Com

mon

s C

C B

Y 3

.0

CC defines a set of properties that allow authors to specify under which conditions their content can be distributed and used.

CC recommends XMP for embedding CC properties inside resources.

Extending XMPPLUS

Picture Licensing Universal SystemNon-profit organization whose mission is to simplify and facilitate the communication and management of image rights.PLUS Registry○ unique ids for creators, right holders, images, ...○ access to rights information and other metadataPLUS License Data Format (LDF)○ metadata schema for embedding image license○ 88 properties○ dedicated XMP PLUS namespace

Extending XMPPRISM

Publishing Requirements for Industry Standard MetadataDefined by IDEAlliance, a global community of content and media creators.PRISM Metadata for Images provides information about:○ objects pictured (manufacturer, model, description, ...)○ slideshows (sequences of images)○ shooting info (viewpoint, season, visual technique, ...)PRISM Advertising Metadata provides information about the usage of the image in an advertising campaignPRISM defines dedicated XMP namespaces: pmi and pam

Extending XMPArea Tagging

Metadata Working Group

○ XMP-MP Schema for face tags○ adopted by Picasa

Microsoft has created a new XMP schema for tagging people

Handling Social TaggingA research issue

[ source: Jonathan Good, 2011 - 1000memories.com ]

140 billion photos in Facebook (up to 2011)

- 3 -

Organizing images in catalogs

Pic

ture

by

Hen

ry T

rotte

r, 20

05 -

Sou

rce:

Wik

imed

ia C

omm

ons

catalognouna list of the contents of a library or a group of libraries, arranged according to any of various systems

[ Dictionary.com ]

catalogv.tr.1. to make an itemized list of2. to classify (a book or publication, for

example) according to a categorical system

[ Dictionary.com ]

Photo Cataloging Software

Prime goals of Photo Cataloging Software:○ provide a secure, long-term storage○ find the images when you need them○ interoperate with other tools of the same ecosystem (in

the present, as well as future)

Photo Cataloguing Software falls into the broad domain of Digital Asset Management. Let's try grabbing some definitions ...

An ecosystem is made up of many parts that must not only coexist but also work with each other to survive. When all the elements work in concert, the system can thrive.(Peter Krogh, The DAM Book)

Digital Asset Management

a way of keeping an overview of your digital files and make sure they don't get lost or altered unintentionally [J.Jacobsen, T.Schlenker, L.Edwards, Implementing a DAM System, Elsevier]

the protocol for downloading, renaming, backing up, rating, grouping, archiving, optimizing, maintaining, thinning, and exporting files [P.Krog, The DAM Book, O'Reilly]

a complete toolbox to the author, publisher, and the end users of the media to efficiently utilize the assets [D.Austerberry, Digital Asset Management 2nd edition, Focal Press]

a term open to many definitions ...

... and whose scope goes beyond the domain of photography

Digital Libraries

Creative Industries Publishing

Enterprise Content Management

Core functionalitiesof a photo catalog / DAM software( will use these two terms interchangeably )

○ Import images○ Harvest metadata○ Manage metadata in a database ( + index for search)○ Synchronize metadata○ Export images○ Organize photos with hierarchical keywords○ Manage originals, masters and derivatives files as

different renditions of the same item

Extra functionalities such as file rename, raw converter, editor, publishing tools may be provided too.

Harvesting and synchronizing metadata

Image Data

EXIF

IPTC IIM

XMP

EXIF

IPTC IIM

.....

DatabaseHarvest

metadata

Synchronize metadata

Image Storageimport export

User Interface

Hierarchical keywords

Phot

o by

Isa

belle

Pal

atin

CC

BY-

SA 2

.0

○ typically mapped to dc:subject○ no semantic rules for describing the hierarchy,

special characters are used, e.g.:Organizations|Industry|ACME

Renditions / Version sets

Image Storageimport

export

ORIGINAL

MASTER (edited)

DERIVATIVES...

Different files related to the same image under certain circumstances shall be managed as a single item.

Covered by XMP-MM (Media Management)

Cataloging applications provide different solutions (e.g. stacking, version sets) 1 item, N renditions

- 4 -

Ensuring long-term storage:backup and migration

There are many causes of data loss

disk / hardware failure

viruses

lightning

transfer errorstheft

loss

fire

human errors

floods

Pho

to b

y Lu

cina

M -

CC

BY

-NC

2.0

Which files to backup

Original Files

Working Files

Derivative Files

Master Files

Catalog (DB)

PRIMARY STORAGE

1 2 3

ON-LINEBACKUP(e.g. NAS)

OFF-LINEBACKUP

OFF-SITEBACKUP

storage media are swapped at every backup

rsync (*)

A possible backup strategy for single user workflow

4

CLOUDBACKUP

(*) deleting files on the receiving side shall be disabled for ORIGINALS, MASTERS and DERIVATIVES 5 additional copy on CLOUD

Service (Amazon S3, Elephant Drive, Symform. ...)

additional copy ona remote NAS

Copy to optical storage(ORIGINALS, MASTERS, DERIVATEIVES)

Migration

○ file formats can become obsolete (just think what is happening to Kodak Photo CD ...)

○ storage evolves (higher capacity, higher speed, ...)○ solution:

○ monitoring the storage process○ conversion to newer and safer formats (e.g. DNG)○ periodical replacement of storage devices

Currently there are no permanent solutions for storing digital content. No media lasts forever, and file formats become obsolete. Migration must be considered as a necessary part of every storage strategy.

[ dpBestflow.org ]

- 5 -

An overview of image archiving tools and services

Image management applicationsApplication types

INGESTIONTOOL

CULLING APPLICATION

RASTER IMAGE

EDITOR

PARAMETRIC IMAGE

EDITOR

RAWPROCESSOR

SPECIAL PURPOSE

EDITOR

PUBLISHINGTOOLS

DEDICATEDPRINTING

SOFTWARE

Image Browser DAM

(Photo Catalog)

SCANNERSOFTWARE

Image management applicationsExamples

INGESTIONTOOL

CULLING APPLICATION

RASTER IMAGE

EDITOR

PARAMETRIC IMAGE

EDITOR

RAWPROCESSOR

SPECIAL PURPOSE

EDITOR

PUBLISHINGTOOLS

DEDICATEDPRINTING

SOFTWARE

Image Browser DAM

(Photo Catalog)

SCANNERSOFTWARE

Fast Picture Viewer

Photomatix

Picture Window Pro Photoshop

Lightroom

Vuescan

ApertureIDImager

Bridge

Adobe Camera Raw

ImageIngester Pro

Silverfast

QimageQuad Tone RIP

Bibble Pro

A few photo cataloging applications Product Notes Platforms Cost (EUR)

Adobe Lightroom 4 include Adobe Camera RAW, many export features WIN / MAC 130

Photo Supreme (formerly known as IDIMAGER)

very powerful catalog explorer, multiuser DB WIN / MAC 80

Phase One Media Pro (formerly known as Expression Media, formerly as iView)

WIN / MAC ~85

Apple Aperture 3 MAC 63

Corel AfterShot Pro (formerly known as Bibble Pro)

WIN / MAC ~50

Digikam Software Collection 3

RAW processing based on dcraw, rendition support from version 2

Linux free

Picasa 3.9 WIN / MAC free

PicaJet basic editing, multiuser DB WIN ~50

Common features:○ parametric editor, with possibility to use an external editor○ XMP support (with some issues when exporting/importing keyword hierarchies)○ some kind of rendition support○ trial period (typically 30 days)

Multi-user photo management

○ commercial○ Daminion http://daminion.net/○ Canto Cumulus http://www.canto.com/○ Celum http://www.celum.com/

○ open-source○ ZenPhoto (GPL)○ Montala Resource Space (BSD)○ Gallery (GPL)○ Razuna (AGPL)○ NotreDAM (GPL3)

- 6 -

NotreDAM:an open-source DAM

platform developed at CRS4

Bibliography

References

1. Jonathan Good - How many photos have ever been taken? - September 15, 2011 - http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox

2. Observatoire des Professions de l'Image - Les chiffres officiels 2010 du marché de la photo et de l'image en France et dans le Monde - http://www.sipec.org/pdf/OPI2011.pdf

3. UPDIG Photographers Guidelines v4.0 - Universal Photographic Imaging Guidelines - http://www.updig.org/pdfs/updig_photographers_guidelines_v40.pdf

4. dpBestflow.org Best Practices - http://dpbestflow.org/links/32 5. Maurizio Agelli, Maria Laura Clemente, Mauro Del Rio, Daniela Ghironi,

Orlando Murru and Fabrizio Solinas, CRS4 - NotreDAM, a multi-user, web based Digital Asset Management platform - TPDL 2011 Conference on Theory and Practice of Digital Libraries, Berlin http://notredam.org/wp-content/uploads/2012/02/TPDL2011-notredam-demo.pdf

6. MS Windows Dev center - People tagging Overview - http://msdn.microsoft.com/en-us/library/windows/desktop/ee719905(v=vs.85).aspx#_people_tagging

Metadata Standards

○ Exchangeable image file format for digital still cameras: Exif Version 2.3 http://www.cipa.jp/english/hyoujunka/kikaku/pdf/DC-008-2010_E.pdf

○ IPTC Information Interchange Model (IIM), IIM Schema for XMP, Specification Version 1.0, Document Revision 1, 2008 http://www.iptc.org/std/IIM/4.1/specification/IPTC-IIM-Schema4XMP-1.0-spec_1.pdf

○ XMP Specification http://www.adobe.com/devnet/xmp.html○ Part 1: Data Model, Serialization and Core Properties○ Part 2: Additional Properties○ Part 3: Storage in Files

○ PLUS Technical Specification http://ns.useplus.org/go.ashx

○ PRISM 2.0 Specifications http://www.prismstandard.org/specifications/

Further reading

○ Peter Krogh - The DAM Book, Digital Asset Management for Photographers, 2nd edition - O'Reilly

○ Patti Russotti, Richard Anderson - Digital Photography Best Practices and Workflow - Focal Press

○ Metadata Working Group - Guidelines for Handling Image Metadata - http://www.metadataworkinggroup.org/specs/