Polish Optical Internet PIONIER
as a digital infrastructure
for the arts and humanities
Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz
Poznań Supercomputing and Networking Center
Digital Humanities Centres: Experiences and Perspectives, Warszawa, 8-9.12.2016
PSNC: General introduction
Poznan Supercomputing and Networking Center (PSNC)
Founded in 1993
Affiliated to the Institute of Bioorganic
Chemistry Polish Academy of Science
291 employees in 2016
Leading operator of Polish eInfrastructure
PSNC Activities
National Research and Education Network – PIONIER (Polish Optical Internet)
Research Metropolitan Area Network - POZMAN
HPC Center
Data repositories and Digital Libraries Federation
New Generation Networks
HPC, Grids & Clouds
Grand challenges applications
New media and visualization technologies
Knowledge Platforms
Internet of Things
Future Internet - Technology, Applications and Services for IS
Cyber Security
Center of e-Infrastructure
Center of Research & Development
PSNC: eInfrastructure
Pan-European backbone of PIONIER network
6479 km of fiber infrastructure in Poland
2359 km of fiber in Europe (IRU)
8838 km of fiber in total
Services deployed for eScience in PIONIER Network
Communication Access &
Security
Clouds Data
Preservation
New Media
Eduroam
Federated Identity (Pionier.Id)
Virtual firewall
Applications on Demand
Databases on Demand
Services for digital libraries
Archiving on Demand
Infrastructure / Resourceson Demand
Environment:
• 1600 sqm in data center #1
• 300+ sqm in data center #2
• Air, liquid coolled systems, DLC
• Video monitoring, fire protection
• 24h monitoring
• PSNC is part of:
• European HPC infrastructure (PRACE)
• European and national grid infrastructure (EGI, PL-GRID)
• European (EUDAT) and National data infrastructure
(Platon, NDS)
High Performance Computing Center in Poznan
Computing:
• HPC infrastructure (1.4 PFLOPS now, 110th position on TOP500 list)
• 2 data-centers
• HPC/HTC systems
• GPGPU clusters
• Additional prototype installations
Storage:
• Hierarchical data infrastructure (47 PB)
• Part of European Data Infrastructure
PlatonTV
• 6x Production Studios, 15x Recording
Studios, 1x OB-VAN – equipped with HD
broadcast quality production infrastructure
• Live / VoD production supported
• Platform deployed in PIONIER network to
support content production, broadcast,
storage, delivery and presentation
Interactive scientific HD TV platform – deployed in PIONIER network
Production StorageBroadcast /
Acquisition Distribution Presentation
Recording
Studio
Production
Studio
Outside
Broadcastin
g VAN
Content
repository
Virtual
Channel
System
Content
Delivery
System
Portal /
AoD
System
Files
Live
Files
VoD
LiveLive / VoD
Metadata
Metadata
See more:
http://tv.pionier.net.pl/Default.aspx?id=2083
• More than 4500 assets (1700h
/ 28 TB) in repository and still
growing
• 145 servers deployed
• Platform developed by PSNC
• In-house TV production teams
PSNC: Digital Humanities
Digital Humanities Laboratories, Resources and Services
PSNC Laboratories:
• Laboratory of digitization workflows
• Laboratory of visualization and interaction
• Laboratory of voice interface technologies for next generation services
• Laboratory of service software technologies
• Laboratory of ICT integration with the surroundings
• Virtual Laboratory for e-Science
• Laboratory of integration of network services with IOT
• Laboratory of environment-friendly information technology - "Green ICT"
• Laboratory of telemedicine
• Laboratory of cyberspace security and critical infrastructures protection
• …
Laboratory Scenario – Example with ROThA project
• Cooperation with Adam Mickiewicz
University in Poznań
• Aim: to build an annotated text corpora
from medieval court judgments, reprinted,
transcribed and published in a book series
~50 years ago
Task UAM PSNC
Selection of source material + -
Scanning and batch OCR - +
OCR correction and segmentation + -
Development of XML schema for annotation (TEI-based)
+ +
Automated conversion of OCR to XML/TEI - +
Annotation of XMLs + -
XML database setup and data import - +
Development of web applicationfor viewing and editing the corpora
- +
Platform for Polish scientific and cultural heritage
resources – http://fbc.pionier.net.pl/
dynamic
linking
indexing
searching
Polish on-line
services with
scientific and
cultural heritage
resources
aggregation
searching
dynamic linking
normalisation
enrichment
dynamic
linking(work in progress)
Users flow
Data flow
Legend:
In total: 4 168 368 metadata
records from 131 data
sources, of which 3 169 487
in open access
/data for 2016/12/01
Users traffic in FBC
In November 2016:
Estimated numer of visits in FBC portal in 2016: 1 745 000
(~87% visits is from Poland)
Polish on-line
services with
scientific and
cultural heritage
resources
227 326 click-throughs
to digital objects
193 969visits
Source: FBC web analytics
FBC as a source of traffic for cooperating websites
Example: TOP 3 traffic sources for websites:
Wrocław University
Digital Library
Digital Library
of Greater Poland
Digital National Museum
in Warsaw
Digital Repository
of Research Institutes
Data for November 2016, excluding Google traffic. Source: FBC web analytics
Technical aspects crucial for end users
• High availability • Good indexing and positioning in Google
Source: PSNC service monitoring system (left, 01.12.2016), Google Search Console (right, 28.11.2016)
Europeana Cloud
• Cloud infrastructure for Europeana:
aggregation, storage, processing
and access to data in scale
• Developed since 2013 r. by
Europeana, The European Library
and PSNC
• Hosted in PSNC
• Used in production to give access
to Europeana Newspapers content
(via IIIF protocol)
• Further development works under
Europeana Digital Services
Infrastructure (DSI)
https://locloudhosting.net/
PSNC cloud service, recommended for collections up to 25 000 objects
In production since January 2015
Free service, developed further under Europeana DSI-2 project
https://ptpn.locloudhosting.net/
https://teatr.locloudhosting.net/
VTL System
• Free service for OCR and transcription
of historical documents
• Built-in OCR engine with recognition
profiles for Polish old-prints
• Supports crowdsoucing-based
transcription and closed team
cooperation
• Interoperable with FBC – allows to
import objects from digital libraries
(based on OAI-PMH protocol)
http://wlt.pcss.pl/
TEI-based systems
Example:
Karaites Digital Archive
• Based on XML database,
TEI as native storage format
• Data imported from
previously used MS Word
and MS Access
• Developed further to
support full bi-directional
integration with VTL
https://jazyszlar.karaimi.org/
In collaboration with Adam
Mickiewicz University
http://wlt.synat.pcss.pl/
Summary
Broadband network access
Hosting and support for different content
providers
Multimedia services and infrastructure
Advanced laboratories
Long term data archiving
Complemantary European eInfrastructure
Services (GEANT, EGI, EUDAT, OpenAire)
Integration with Europeana Cloud
Cooperation within DARIAH-PL/DARIAH-EU
Joint research and development projects
Advanced usecase-centric services
Scalable computations (OCR, format
conversion, content aggregation)
Web analytics
Semantics and Knowledge Platforms
Interoperable research tools (Virtual
Transcription Laboratory, TEI-based
databases)
Conferences, trainings, workshops, public
digitization days