LIS654 lecture 2 Omeka internals Thomas Krichel 2012-09-21.

transcript

LIS654 lecture 2

Omeka internals

Thomas Krichel2012-09-21

foreword to Omeka

• Terminology is one of the difficult problems in digital librarianship.

• I will use the double quotes here to represent a term that is used as it is in Omeka.

• Please open your WinSCP, Omeka web, web admin and phpMyAdmin.

• In Omeka, you store “items”.• Item are either digital resources– images– video

• or something non-digital of which your are storing a digital representation of– person– event

some item properties• Items can be “public”. By default, as a basic

security precaution, items are not public.• Items can be “featured”. An items that is

features is highlighted on the site in a particular way. This allows you to change the appearance of the site by providing different featured items over time.

• Items have item types. Each item is of one type.

• Each item may belong to a collection.

table: “items”

• It stores data about each “item”– “id” of the item, an autoincrement– “item_type_id”, number | +1 slide– “collection_id”, number | +2 slide– whether it is “featured”, a Boolean– whether it is “public”, a Boolean – when last “modified”, a time – when “added”, a time

table: “item_types”

• Each item is of one type. Types are described in the “item_types” table, with the columns– “id” an autoincrement – “name” the name of the item type, string– “description” a longer explanation what the item

type means.• Each record in the “item” table references an

id in the type.

table: “collections”

• Each collection is described here– “id” auto_increment– “name”, a string– “description”, a string– users who are “collectors”, a string |not further discussed|– whether it is “public”, a Boolean– whether it is “featured”, a Boolean– when “added”, a time – when last “modified”, a time – the “owner_id” |not further discussed|

items to files• An item has two aspects to it. – There is the metadata about the items. – There is the item itself. This is in fact a collection of

“file”s. • File records store information about files on

the server that hold information related to an item.

• The item can be viewed as a conceptual container of (possibly zero) files.

• Each file is attached to an item.

table: “files” |1|

• The fields in that table are– “id” auto_increment – “item_id” of the item the file attaches to– “size” in bytes – “has_derivative_image”, a Boolean– the time last “modified”, a time– the time it was “added”, a time– if it was “stored”, a Boolean

table: “files” |2|• More fields of this table– the “authentication” a checksum of the path to the

file – the “mime_browser”, a mime type as sent to browser– the “mime_os”, the mime type as determined by the

Omeka installation, using an external application– the “type_os”, the file type as determined by the

Omeka installation, using an external application– the “archive_filename”, a random file name– the “original_filename”, filename or URL of origin

file storage

• The “archive” directory stores files. • The original is in “files”. • Derivative files are in– “thumbnails”– “fullsize”– “square_thumbnails”

• I don’t know why the original size is not the full size.

metadata

• Metadata is a descriptions that can be attached to a “record”.

• Records something that groups are items and files, and some aggregates of items– collections– exhibits (only used by ExhibitBuilder)

• Metadata is a set of attribute/value pairs. The attributes are called “elements”.

table: “elements”• We start with the “elements” table. It contains

all the properties one can attach to records.– an “id” auto_increment– a “record_type_id”, the id of a “record_type” |+1– a “data_type_id”, the id of a “data_type” |+2– an “element_set_id”, id of an “element_set” |+3– an “order” that appears always to be null, unused– a “name” for the property – a “description” containing the fill-in instructions.

table: “record_types”

This table contains two recordsid | name | description1 | All | Elements, element sets, and element texts

assigned to this record type relate to all possible records i.e. items and their aggregates.

2 | Item | Elements, element sets, and element texts assigned to this record type relate to item records.

table: “data_types”Only contains these records

id | name | description

1 | Text | A long, typically multi-line text string. Up to 65535 characters.

2 | Tiny Text | A short, typically one-line text string. Up to 255 characters.

3 | Date Range | A date range, begin to end. In format yyyy-mm-dd yyyy-mm-dd.

4 | Integer | Set of numbers consisting of the natural numbers including 0 (0, 1, 2, ...) and their negatives (0, -1, -2, ...).

9 | Date | A date in format yyyy-mm-dd

10| Date Time | A date and time combination in the format: yyyy-mm-dd hh:mm:ss

table: “element_sets”

element set. These elements are common to all Omeka resources, including items, files, collections, exhibits, and entities. See http://dublincore.org/documents/dces/.”

3 | 2 | Item Type Metadata | “The item type metadata element set, consisting of all item type elements bundled with Omeka and all item type elements created by an administrator.”

item-type specific metadata

• You can create data elements (aka metadata fields) for a specific item.

• You can not however, share these fields across item types.

• So if you want to express the “geekiness” of your item, and you have several types that can be geeky, you have to add “geekiness” as an element for each item type separately.

creating item types

• You can create your own item types.• When you create an item type, it

automatically has the Dublin Core metadata property fields attached to it.

• But if your item type, say is a room, you can create properties such as “size”, “height”, “cul-de-sac-ness”.

Omeka tags

• A tag is a way for Omeka to group individual items together.

• Each item can have multiple tags.• Each tag can be attached to multiple items. • We say that there is a many-to-many

relationship between items and tags. • For LIS650 veterans, it’s like grouping HTML

elements in the <body> into classes.

table: elements_texts

• This contains the values of properties. Fields are– id | an auto_increment– record_id | the id of the record it is attached to– record_type_id | the id of the record type of the

record. I am not sure why this is required.– element_id | the id of the element (property)– html | a Boolean, whether HTML or not– text | the value of the property

table: tags

• Each tag is recorded in this table. It has only two columns– “id”, an autoincrement identifier– the “name” a string up to 256 characters long

• This table stores all the tags. A “tag” here is the value that as tagging takes. This is not what we would commonly call a tag.

table: taggings

• This table has the following columns– “id” an auto_increment– “relation_id” gives the id of the item that has

been tagged. – “tag_id” gives the number of the tag being given– “entity_id” |who did it?, not further discussed– “type”, a type of action taken, not further

discussed.– “time” a timestamp when the action happened.

Dublin Core data

• Dublin Core is a metadata set that is used in Omeka.

• This is the common set for all types. • We need to review the official meaning of

these elements here. • I quote from Hillman’s Dublin core usage

guide. http://dublincore.org/documents/usageguide/elements.shtml

Dublin core: title

• “The name given to the resource. Typically, a Title will be a name by which the resource is formally known.”

• “If in doubt about what constitutes the title, repeat the Title element.”

Dublin core: subject

• “The topic of the content of the resource. Typically, a Subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.”

Dublin core: description

• “An account of the content of the resource. Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.”

• “Use full sentences.”

Dublin core: type• “The nature or genre of the content of the

resource. Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMIType vocabulary ). To describe the physical or digital manifestation of the resource, use the FORMAT element.”

Dublin core: source

• “A Reference to a resource from which the present resource is derived. The present resource may be derived from the Source resource in whole or part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system”… “include in this area information about a resource that is related intellectually to the described resource but does not fit easily into a Relation element.”

Dublin core: relation

• “A reference to a related resource. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.”

Dublin core: coverage

• “The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary.

Dublin core: creator

• “An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organization, or a service. Typically the name of the Creator should be used to indicate the entity.”

• “Creators should be listed separately, preferably in the same order that they appear in the publication.”

Dublin core: publisher

• “The entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.”

• “The intent of specifying this field is to identify the entity that provides access to the resource. “

Dublin core: contributor

• An entity responsible for making contributions to the content of the resource. Examples of a Contributor include a person, an organization or a service. Typically, the name of Contributor should be used”.

• “The same general guidelines for using names of persons or organizations as Creators apply here.”

Dublin core: rights• “Information about rights held in and over the

resource. Typically a Rights element will contain a rights management statement for the resource, or reference a service providing such information.”

• “Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.”

Dublin core: date

• “A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601” “and follows the YYYY-MM-DD format.”

Dublin core: format• “The physical or digital manifestation of the

resource. Typically, Format may include the media-type or dimensions of the resource. Examples of dimensions include size and duration.”

• “Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [http://www.iana.org/ assignments/media-types/]”

Dublin core: identifier

• “An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Examples of formal identification systems include the Uniform Resource Identifier (URI)” …

Dublin core: language

• “A language of the intellectual content of the resource. Recommended best practice for the values of the Language element is defined by RFC 3066 [RFC 3066, http://www.ietf.org/rfc/ rfc3066.txt] which, in conjunction with ISO 639 [ISO 639, http://www.oasis- open.org/cover/iso639a.html]), defines two- and three-letter primary language tags with optional subtags.”

item type specific metadata

• There are a bunch of different types that are built-in.

• Each type takes Dublin Core metadata as well as some extra metadata

• These item-specific metadata fields can be changed using the web interface.

Omeka built-in item types 1

• Document A resource containing textual data.• Moving Image A series of visual

representations that, when shown in succession, impart an impression of motion.

• Oral History A resource containing historical information obtained in interviews with persons having firsthand knowledge.

• Sound A resource whose content is primarily intended to be rendered as audio.

• Still Image A static visual representation. Examples of still images are: paintings, drawings, graphic designs, plans and maps.

• Website A resource comprising of a web page or web pages and all related assets ( such as images, sound and video files, etc. ).

• Event A non-persistent, time-based occurrence. Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, and responsible agents associated with an event. Examples include an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration.

• Email A resource containing textual messages and binary attachments sent electronically from one person to another or one person to many people.

• Lesson Plan Instructional materials.• Hyperlink Title, URL, Description or

annotation.

• Person An individual, biographical data, birth and death, etc.

• Interactive Resource A resource requiring interaction from the user to be understood, executed, or experienced. Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments

Omeka user types

• Omeka has user types that are defined in the PHP code.

• $userRoles = array('admin', 'contributor', 'researcher');

• To this we have to add ‘super’ as a super user.• We can not change these types, unless we

change the PHP code.

plugin example

• I demonstrating the CSV import plugin.• I do this because I hate interacting with a

computer.

• Comma-separated values are a simple text format to represent a table.

• The table is a sequence of lines.• The first line contains field names.• The next line contains field values.

comma separation

• The separation of fields is done by comma. • If the value contains a comma, it has to be

surrounded by double quotes.• Example table of economists

Name, Birthday“Krichel, Thomas”, 1965-06-05“Marx, Karl Heinrich”, 1818-05-05

CSV plugin

• The CSV plugin allows you to upload a CSV file.• That file will describe resources you want to

include in your collection.• The resources can then be included in bulk

into your Omeka installation.

install• Local computer way– Download the plugin from its URL, say URL to your

local machine. – Upload unzipped plugin directory to

ssh://user@dlib.info/omeka/plugins• On tie– cd omeka/plugins– GET URL > csv.zip– unzip csv.zip– rm csv.zip

activate plugin• From the main menu, look for “Manage

Plugins”.• Look for “CSV Import”. If this is not there, you

have not put the plugin files into the right place.

• Click “install” next to it.• Accept the defaults on the next screen.• You now see the “CSV Import” option on the

compose and upload the csv file• A sample csv file is at

http://wotan.liu.edu/home/krichel/courses/ lis654/examples/csv/manhattan.csv

• You have to place it into the omeka/plugins/CsvImport/csv_files/ folder of your home directory. Delete test.csv.

• I don’t think that the name matters, but avoid blanks and other exotic characters in the name and give it the ending “.csv”.

steps 1

• Step 1: Select File and Item Settings– CSV File – Item Type – Collection

• After that step, the CSV file is read and checked to have the proper format.

• If that check fails you have to edit the file.

step 2• The name of your columns has been

recognized and you are asked to match it to the omeka information.

• You have three option– match to a metadata element– match to tags– mach to a file

map to element

• Map To Element”– Dublin Core (common) – Item-type dependent fields

• “Use HTML” means interpret value as HTML– “File” Don’t map to metadata, interpret values as

file• You can match to several metadata field at

match to tags

• “Tags” Don’t map to element, interpret values as tags.

• Note that tag have to be separated by the tag separator that you have fixed in your general settings. Otherwise a tag string can not be parsed into separate tags.

match to file for uploads

• “File” values have to be URLs starting with http. – Pointers to files on tie are not supported.– No big deal because Thomas has given you the

web site. • Omeka will fire up a browser and fetch the file

from the web.

successful import

• If the import is successful,– go to the public interface with another web

browser window– check you like the result

• If you don’t like what you see, hit the undo link in the csv import screen. That link may not be available later.

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!

LIS654 lecture 2 Omeka internals Thomas Krichel 2012-09-21.

Documents