+ All Categories
Home > Documents > VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020 - 688380 D1.3 - Data Management Plan

Date post: 05-Jan-2017
Category:
Upload: tranlien
View: 224 times
Download: 2 times
Share this document with a friend
44
VaVeL H2020 - 688380 D1.3 - Data Management Plan National and Kapodistrian University of Athens June 7, 2016 Status: Final Scheduled Delivery Date: 31/05/2015
Transcript
Page 1: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeLH2020 - 688380

D1.3 - Data Management Plan

National and Kapodistrian University of Athens

June 7, 2016

Status: Final

Scheduled Delivery Date: 31/05/2015

Page 2: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

D1.3, Version 1.0, May 2016 2 http://www.vavel-project.eu/

Page 3: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Document History

• (June 3rd, 2016) Version 1.0 Submitted to the EC and uploaded to VaVeL website.

• (June 4th, 2016) Version 1.1, added links and comments on potential open data reposi-tories that the consortium will investigate as candidates for data publication.

• (June 7th, 2016) Version 1.2, minor changes.

D1.3, Version 1.0, May 2016 3 http://www.vavel-project.eu/

Page 4: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Executive summary

This document includes information about the data sources that the VaVeL consortium willwork and conduct research on. More specifically for each data source the partners havedefined a data management plan. The plan consists of information about legal issues, privacy,infrastructure changes, archiving, maintenance, standards and accessibility. The document willbe regularly updated as more information becomes available and data issues are resolved. Pleasecheck the website of the project (www.vavel-project.eu) under the deliverables section forupdates. Early on, the consortium has agreed to make every effort to provide open access toas many datasets as possible. This document reflects this continuous effort.

D1.3, Version 1.0, May 2016 4 http://www.vavel-project.eu/

Page 5: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Document Information

Contract Number H2020-688380 Acronym VaVeLName Variety, Veracity, VaLue: Handling the Multiplicity of Urban SensorsProject URL http://www.vavel-project.eu/EU Project Officer First Name - Last Name

Deliverable D1.3 Data Management PlanWork Package Number WP1Date of Delivery 31/05/2016 Actual 31/05/2016Status FinalNature ReportDistribution Type PublicAuthoring Partner National and Kapodistrian University of AthensQA Partner IBMContact Person Ioannis Katakis [email protected]

Dimitrios Gunopulos [email protected] Fax

List of Contributors: Ioannis Katakis (UoA), Dimitrios Gunopulos (UoA), Jaroslaw Legierski(OPL), Izabella Krzeminska (OPL), Robert Kunicki (CoW), Jakub Marecek (IBM), MaggieO’Donnell (DCC), Aaron O’Connor (DCC).

D1.3, Version 1.0, May 2016 5 http://www.vavel-project.eu/

Page 6: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Project Information

This document is part of a research project funded by Horizon H2020 programme of theCommission of the European Communities as project number 688380. The Beneficiaries in thisproject are:

No. Name Short Name Country1 National and Kapodistrian University of

AthensUoA Greece

2 Technische Universitat Dortmund TUD Germany3 Technion - Israel Institute of Technology Technion Israel4 Fraunhofer-Gesellschaft Zur Forderung

Der Angewandten Forschung E.V.Fraunhofer Germany

5 IBM Ireland Limited IBM Ireland6 AGT International AGT GROUP (R&D) GMBH Germany7 Orange Polska S.A. OPL Poland8 Dublin City Council DCC Ireland9 City of Warsaw CoW Poland10 Warsaw University of Technology WUT Poland

D1.3, Version 1.0, May 2016 6 http://www.vavel-project.eu/

Page 7: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Table of Contents

1 Introduction 9

2 Dublin City Council Data 122.1 Data from a Traffic Management System (SCATS) . . . . . . . . . . . . . . . 122.2 Public Transport Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Closed-Circuit Television Data . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Measurements of Weather and Pollution . . . . . . . . . . . . . . . . . . . . 232.5 Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 City of Warsaw Data 283.1 Real time trams location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Bus Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 19115 Non-emergency notification system . . . . . . . . . . . . . . . . . . . . 293.4 Public transport timetables . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5 Bus & trams stops locations . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.6 Park& Ride . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.7 Bike roads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.8 Bike stations location (Veturilo) . . . . . . . . . . . . . . . . . . . . . . . . . 333.9 Metro Entrances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.10 Address points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.11 Streets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.12 City of Warsaw Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.13 RSS services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.14 Veturilo stations (Warsaw City Bike system) . . . . . . . . . . . . . . . . . . 403.15 Orange subscribers location statistics . . . . . . . . . . . . . . . . . . . . . . 40

D1.3, Version 1.0, May 2016 7 http://www.vavel-project.eu/

Page 8: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Index of Figures

1 Real-Time Data from City of Warsaw . . . . . . . . . . . . . . . . . . . . . . 92 Real-Time Data from City of Dublin . . . . . . . . . . . . . . . . . . . . . . 93 VaVeL’s history in publishing data . . . . . . . . . . . . . . . . . . . . . . . . 104 The vertex-based transit graph. Cited in verbatim from https://github.com/

openplans/OpenTripPlanner/wiki/GraphStructure. . . . . . . . . . . . 165 An illustration of a delay function, which gives the travel-time along a segment of

a road as a function of its utilisation, i.e. the ratio of the number of concurrentusers to the maximum thereof. . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 NRA data visualisation. Arrows indicate wind speed and direction; heatmapblobs indicate cumulative rain intensity at each station, in mm/h . . . . . . . 25

7 Sample Tweets from Live Drive. . . . . . . . . . . . . . . . . . . . . . . . . . 268 ZTM-Warsaw’s Twitter Account . . . . . . . . . . . . . . . . . . . . . . . . 38

D1.3, Version 1.0, May 2016 8 http://www.vavel-project.eu/

Page 9: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

1 Introduction

In this document the VaVeL consortium presents information about the data that will beexploited in the context of the project. It is by design that the major data providers are alsomember of the consortium. These are:

The Dublin City Council, that will provide traffic, public transport, weather and videodata.The City of Warsaw, that will provide public transport data, data on emergency callsand citizen reporting.Orange Polska, that will provide subscriber’s location data.

Figure 1: Real-Time Data from City of WarsawFigure 2: Real-Time Data from City ofDublin

It is important to note that most of the above data sources will be provided in real-time.This document serves as a management plan for the above data sources. More specifically itaddresses the following issues and questions.

Meta-data: Details about the meta-information that accompany our data (if available).

Standards: We mention any standards that are followed by the data or by the way theconsortium or the data providers provide access to the data.

Infrastructure Improvements: The infrastructure providing the data is as important asthe data themselves. Hence, we present information on necessary infrastructure changesthat were required in order to improve any data-related aspects (accessibility, informationrichness, volume, etc).

Quality: Data veracity is one of the main objectives of VaVeL. We provide brief informationabout the consortium’s efforts to address data quality issues if necessary. Depending onthe case, ‘quality’ might imply cleaning, pre-processing, adding meta-data, transformingto a more convenient format or providing easier access.

D1.3, Version 1.0, May 2016 9 http://www.vavel-project.eu/

Page 10: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Accessibility: We explicitly describe the access level provided for each data source forvarious user groups (consortium, public, etc). On top of that we outline the technicalmeans that are necessary to access the data. We also report on our efforts to make thedata more easier to discover.

Assessable and Intelligible: We describe the means that we provide in order to make thedata more easy to use and understand its content and value.

Legal Issues & Privacy: The consortium provides details about legal issues related toeach dataset as well as the path of resolving them.

Maintenance Plan: In this section we will describe the maintenance plan for each dataset.More specifically, we will discuss archiving of historical data and how the potential ofmaintaining/utilizing the data after the end of the project.

Some of the above items were inspired by the document “Guidelines on Data Managementin Horizon 2020”, published by the European Commission, Directorate-General for Research &Innovation (Version 1.0, 15 February 2016).

VaVeL’s Open Data Strategy and History. The consortium is willing to publicly shareand make easily accessible and discoverable as many datasets as possible. Many data sets arealready available online (see following sections) and every effort will be made to make evenmore data sources accessible. On top of that, the consortium intends to make available toolsthat analyze urban data. More importantly the consortium has a history in publishing opendata. Dublinked (see Figure 3a) is a web platform hosting multiple data resources originatingfrom Dublin. On the other hand, the City of Warsaw along with its technical partners (Orange)has a history in making APIs for processing and accessing data open (api.um.warszawa.pl -see Figure 3b).

(a) The Dublinked website in Dublin (b) Open APIs in Warsaw

Figure 3: VaVeL’s history in publishing data

D1.3, Version 1.0, May 2016 10 http://www.vavel-project.eu/

Page 11: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Open Data Portals On top of the above, the VaVeL consortium is currently investigatingthe exploitation of additional portals and ways to disseminate, archive, register and index itsdatasets and APIs in order to make the resources more discoverable. Such are the following:

European Union Open Data Portal (http://data.europa.eu/euodp/en/data) -where a lot of European Organizations archive open data sets.

The programmable Web (http://www.programmableweb.com/apis) - where morethan 15.000 APIs are indexed. This repository is especially suitable for the APIs availablefrom the City of Warsaw.

D1.3, Version 1.0, May 2016 11 http://www.vavel-project.eu/

Page 12: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

2 Dublin City Council Data

2.1 Data from a Traffic Management System (SCATS)

The Sydney Co-ordinated Adaptive Traffic System provides information on vehicular trafficat fixed sensor locations as spatio-temporal time series. The SCATS data are produced byaggregating the primary source data that are collected by the Dublin SCATS traffic sensormonitoring system.

The primary data are given in the Strategic Monitoring (SM) format Each sensor sendsmessages with varying frequency (depending on the location, conditions and other factors).The SM format specifies the message parameters. In practice, data are imported from twosources. For a period of time in 2012, the data have been recorded1 as a sequence of followingtuples:

streetSegId : a unique identifier for a street segment ID,armNumber : an identifier for the arm on a street segment,armAngle: bearing of the arm,gpsArm: GPS position 20 meters into the arm,gpsCentroid : GPS position of the centroid of the intersection.aggerateCount: aggregated vehicles volume count on the arm,flow : flow ratio calculated as the volume divided by the highest volume that has beenmeasured in a sliding window of a week.

These samples are captured at 6-minute intervals. The more recent data from 01/01/2013onwards are sampled every minute and are provided by DCC and IBM as a sequence of followingitems

year, month, day, hour, minute: denoting the timestampsite: measurement locationstrategicApproachLinkisLinkdetector index : index of the detectordegreeOfSaturation: flow/capacityflow : current flow value

These samples are used in conjunction with a file, which contains the coordinates. Thedetector index from the sequence refers to the lane number in the detectors.csv file.

These messages, in addition to the information that is maintained after the aggregation tothe SCATS format, includes additional system information that is not used in our analysis. Thisdataset is a sequence of tuples (z,m, t), where z is a geographic location of the observation(the sensor position), m is a metric and t is an integer. The location is either detector index,or a vector consisting of a number of elements, including the GPS coordinates of the detector.The metric m contains:

1available at:http://www.dublinked.ie/datastore/server/FileServerWeb/FileChecker?metadataUUID=a5aaaf4ca2404e0ca02e21fc0bdf1882&filename=SCATS-Dublin.zip

D1.3, Version 1.0, May 2016 12 http://www.vavel-project.eu/

Page 13: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

aggerateCount: aggregated vehicles volume count on the arm,flow : flow ratio calculated as the volume divided by the highest volume that has beenmeasured in a sliding window of a week.

Integer t element is the timestamp of the 5 minute interval in POSIX time, i.e. the number ofmicroseconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), 1 January1970.

Data Collection SCATS Region has automated collection of operational and performancedata. Traffic counts are collected on a lane-by-lane basis wherever detectors are installed.Collected data can be sent to the SCATS Central Manager for backup. If there is a failurein the communications with SCATS Central Manager, SCATS Region maintains a queue ofdata until the communications are restored. This ensures that there is no loss of data on theSCATS Central Manager.

SCATS Central Manager manages the connection of up to 64 SCATS Regions (8 regionsconnected on the DCC system) and provides a global view of the whole system. SCATS Regionis the software that is used to manage the traffic signal sites in a region. SCATS primarilymanage the dynamic timing of signal phases at traffic intersections. The system uses sensorsat each traffic intersection to detect vehicle presence in each lane and pedestrian demands.The vehicle sensors are inductive loops installed beneath the road surface.

Metadata XY coordinate data for SCATS intersections. Traffic volumes forintersections. SCATS Picture is the application that is used to create ormodify the site location details and site graphics stored in the SCATSCentral Manager database. Meta data for the site is stored in the LXfiles

Standards SCATS Access version 6.9.2 Copyright c© 2014 Roads and MaritimeServices. SCATS proprietary format is the property of RMS. The formatof the data that has been passed on to the consortium currently canbe made available to 3rd parties. SCATS data stream is provided inJSON format.

D1.3, Version 1.0, May 2016 13 http://www.vavel-project.eu/

Page 14: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

InfrastructureImprovements

To add resilience the SCATS Management System has been changedfrom a physical to virtual environment.

SCATS-CMS Virtualised Environment

Quality The SCATS data that is collected is 100% accurate in relation to thedata that it receives from its sensors.

D1.3, Version 1.0, May 2016 14 http://www.vavel-project.eu/

Page 15: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Accessibility Information contained in SCATS Access version 6.9.2 documentationmay be of a commercially sensitive nature and must not be given toany individual or organisation without prior written consent from Roadsand Maritime Services. The SM data from the Dublin system has beensupplied to IBM for processing and translating to an open format whichcan then be of use to the consortium. Access to the SCATS data is viaan AWS cloud based structure. For the VaveL project the inclusion ofother data sources from SCATS will be explored to assess their validityfor inclusion as a data stream to assist in automatic incident detection.Using AWS brought the following advantages: Compute services basedon pay-as-you-use rates, 24x7 management of servers up to OS layerincluded regular back-up schedule. All operating system licenses arebuilt into the price. High resilience as it is running over 2 AvailabilityZones (AZ) and a storage area in AWS S3 (Storage Bucket), SCATSdata stream provided in JSON.

Cloud Services Architecture

Assessable andIntelligible

Associated software produced and/or used in the project maybe assess-able for and Intelligible to third parties in contexts such as scientificscrutiny and peer review (e.g. are the minimal datasets handled to-gether with scientific papers for the purpose of peer review, are data isprovided in a way that judgments can be made about their reliabilityand the competence of those who created them)

D1.3, Version 1.0, May 2016 15 http://www.vavel-project.eu/

Page 16: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Legal Issues andPrivacy

Dublin City Council Law Department has advised that they have no issuewith the release of the data. Data is stored in the SCATS applicationlogs which relates to system changes. No personal/confidential data isstored.

Maintenance Plan The SCATS system is under an annual maintenance contract based onthe software licences used by DCC. This is to ensure that the systemas adequate support to maintain the level of service required.SCATS data backup files for configuration backups have two sub di-rectories LX files containing daily backup of SCATS data which arecreated daily and RAM data backups which are also crested daily.SCATS Region collects data and stores in daily files. Each file includesthe date and time at which the data was collected and the data itself.The collection of data occurs automatically. The data is retained forthe period specified when configuring SCATS Region. The default is365 days. A backup and archival system is in place to ensure thatold data is still accessible outside the SCATS Regions specified dataretention period. Meta data for the site is stored in the LX files

Table 1: SCATS DATA - Management Plan

2.2 Public Transport Data

The street map is represented as a graph, where vertices represent important locations in spacefor a given means of transport (e.g. road intersections for cars). Each edge represents a meansof traversing between the vertices, which can involve actual movement (e.g. between twointersections) or waiting (e.g. at a bus-stop). The graph is illustrated in Figure 4.

Figure 4: The vertex-based transit graph. Cited in verbatim from https://github.com/

openplans/OpenTripPlanner/wiki/GraphStructure.

D1.3, Version 1.0, May 2016 16 http://www.vavel-project.eu/

Page 17: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

0 0.2 0.4 0.6 0.8 1

0.6

0.8

1

Utilisation

Tra

velti

me,

scale

d

Piecewise-linear

Piecewise-convex

1

Figure 5: An illustration of a delayfunction, which gives the travel-timealong a segment of a road as a func-tion of its utilisation, i.e. the ratio ofthe number of concurrent users to themaximum thereof.

In principle, the GPS data are a sequence of vectors yz,t, where z is a traffic object, e.g. abus with an on-board GPS receiver, and t is an integer, e.g. the POSIX time of the acquisition.

The overall data model is rather complex, but closely parallels those used by OpenStreetMap,OpenTripPlanner, and the General Transit Feed Specification; we hence direct the reader tothe reference documentation for those.

Our custom extensions to the standard format consist of:

the travel-time estimates, which correspond to the weights of the edges in the graphthe altitude data, which correspond to weights of the vertices in the graph

The travel-time estimates are stored as delay functions and vehicle count data. A delayfunction gives the travel-time along a segment of a road as a function of its utilisation, i.e.the ratio of the number of concurrent users to the maximum thereof. See Figure ?? for anexample. The delay functions are computed from the vehicle-count data (SCATS) and tracesof vehicle movement (Bus GPS) described above.

The vehicle GPS traces are imported from three very different data sources, even in thecase of Dublin. Instead of plain coordinates, there is a more complex data model based onthe General Transit Feed Specification. There, a vehicle Journey (or “route” in GTFS) is aparticular instance of a journeyPattern starting at a given time. A journey Pattern is a sequenceof two or more stops. In between each two stops, there are one or more blocks within a trip(or “segments” in GTFS and elsewhere)2. Notice that the production time table starts at 6amand ends at 3am in Dublin.

The first source of GPS traces captures the movement of buses in Dublin in the periodfrom 01/02/2012 till 30/04/2012 (except the days 10th till 12th February 2012) and containsthe following values:

timestamp: timestamp microseconds since 01/01/1970 00:00:00 GMT,lineId : bus line identifier,direction: a string identifying the direction,journeyPatternId

2Please see https://developers.google.com/transit/gtfs/reference for a detailed reference.

D1.3, Version 1.0, May 2016 17 http://www.vavel-project.eu/

Page 18: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

timeFrame: the start date of the production time table (in Dublin the production timetable starts at 6am and ends at 3am),vehicleJourneyId : a given run on the journey pattern,operator : bus operator, not the driver,congestion: boolean value [0=no,1=yes],gpsPos: GPS position of the vehicle,delay : seconds, negative if bus is ahead of schedule,blockId : section identifier of the journey pattern,vehicleId : vehicle identifier,stopId : stop identifier,atStop: boolean value [0=no,1=yes].

The second source of GPS traces captures the movement of buses in Dublin during a partof November 2012 (06/11/2012 till 30/11/2012) and contains tuples of the following elements:

timestamp: timestamp microseconds since 01/01/1970 00:00:00 GMT,lineId : bus line identifier,direction: a string identifying the direction,journeyPatternIdtimeFrame: the start date of the production time table (in Dublin the production timetable starts at 6am and ends at 3am),vehicleJourneyId : a given run on the journey pattern,operator : bus operator, not the driver,congestion: boolean value [0=no,1=yes],gpsPos: GPS position of the vehicle,delay : seconds, negative if bus is ahead of schedule,blockId : section identifier of the journey pattern,vehicleId : vehicle identifier,stopId : stop identifier,atStop: boolean value [0=no,1=yes].

The third source of GPS traces captures the movement of buses in Dublin during January2013 (01/01/2013 till 31/01/2013) and contains tuples of the following elements:

timestamp: timestamp microseconds since 01/01/1970 00:00:00 GMT,lineId : bus line identifier,direction: a string identifying the direction,journeyPatternIdtimeFrame: the start date of the production time table (in Dublin the production timetable starts at 6am and ends at 3am),vehicleJourneyId : a given run on the journey pattern,operator : bus operator, not the driver,congestion: boolean value [0=no,1=yes],gpsPos: GPS position of the vehicle,delay : seconds, negative if bus is ahead of schedule,blockId : section identifier of the journey pattern,vehicleId : vehicle identifier,

D1.3, Version 1.0, May 2016 18 http://www.vavel-project.eu/

Page 19: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

stopId : stop identifier,atStop: boolean value [0=no,1=yes].

Data Collection The standardised SIRI Vehicle Monitoring Service reports current positionsof vehicles that are located and monitored in an ITCS. The data receiving client system mayuse this data for visualisation of the vehicles in a map, in tables, lists or diagrams or for anyother purpose

Metadata XY coordinates of the Dublin Bus stops. Distance and route patterns.Standards The system uses SIRI (Service Interface for Real-time Information) protocol.

The Service Interface for Real Time Information (SIRI) specifies a Europeaninterface standard for exchanging information about the planned, currentor projected performance of real-time public transport operations betweendifferent computer systems.

InfrastructureImprovements

The standardised SIRI services work based on bidirectional communication.For security reasons a virtual private network (VPN) is established.

The exchange of visualisation data starts with the subscription request of thedata receiving system (MORTPI). Once the request is done, trip informationis transmitted by the data producer (AVLC) to the data receiving systemthroughout its entire validity period. The method and frequency of repetitionis a matter for the data producer, but can be specified by the displayingsystem in the scope of the subscription.

Quality The Service Interface for Real Time Information (SIRI) specifies a Europeaninterface standard for exchanging information about the planned, currentor projected performance of real-time public transport operations betweendifferent computer systems.

Accessibility Access to this information has been agreed as per the INSIGHT project andthe same platform and access rights will apply for the VaVeL project.

Assessable and In-telligible

Associated software produced and/or used in the project maybe assessablefor and Intelligible to third parties in contexts such as scientific scrutiny andpeer review (e.g. are the minimal datasets handled together with scientificpapers for the purpose of peer review, are data is provided in a way thatjudgments can be made about their reliability and the competence of thosewho created them)

D1.3, Version 1.0, May 2016 19 http://www.vavel-project.eu/

Page 20: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Legal Issues andPrivacy

Dublin City Council has advised that they have no issue with the use of thebus data and the accumulation and storage of this data by a third party fordata distributed.

Maintenance Plan The Public Transport Data and the system which provide the data are coveredunder a maintenance contract. Any changes/ additional requirements to thesystem or to provide access to data will be carried out under the supervision ofthe maintenance contractor and in accordance with the terms and conditionsof the contract to ensure that the integrity of the system and/or the dataprovided by the system is not compromised.Archiving and Preservation: The Public Transport Data systems are anintegral part of the DCC traffic systems and it is our aim to ensure theintegrity of the system and access to the system by third parties as agreednow and into the future is maintained. Any access agreements made fordata during the life time of this project will be done so under the currentand subsequent maintenance contracts. These will be reviewed as part ofany new maintenance contract and every effort will be made to facilitatethe access to data now and into the future. Any changes to the system toallow for expansion or upgrade will be done so in a manner that all stakeholders will be consulted and informed of subsequent changes.

Table 2: Public Transport Data - Management Plan

2.3 Closed-Circuit Television Data

Closed Circuit Television (CCTV) has been in use by Dublin City Council Environment andTransportation department for over 20 years with 280 camera installations at present. The useof traffic cameras is an essential tool for traffic management in the city in conjunction with anadaptive traffic control system, SCATS. Currently the Traffic Control Centre operators use theCCTV cameras to manually scan the traffic network to detect, verify and manage incidents.Selections of cameras are displayed on the Audio Visual wall and there is rotation of the entireCCTV camera list on one IP input. Each operator has access to Indigo Vision on their desktop which can be customized to display CCTV combinations as required. Traffic surveillance isan integral part of the traffic management system and the closer the time of the detectionof the incident to the time of its occurrence the greater the impact the traffic control centreoperator can have in effectively managing it. It would also be in the scope of the research toassess how this CCTV data could be combined with other sensory data from SCATS, weatherdata, public transport data in detecting incidents on the traffic network.

The Traffic CCTV system consists of 2 backend systems running side by side, MeyertechAnalogue CCTV & Indigo Vision IP CCTV. Every Camera in the system is available in bothAnalogue and IP format. This redundancy is to ensure that one system is always available tothe Control Centre. The analogue cameras are available in IP by encoding the stream usingIndigo Vision 10 and the same also applies for IP streams which are decoded using IndigoVision hardware and software to make them available to the Meyertech system. All analoguecameras are compressed to IP via an Indigovision 9000 encoder to H264 format. The codec

D1.3, Version 1.0, May 2016 20 http://www.vavel-project.eu/

Page 21: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

facilitates can operate at Cif / 2 Cif / 4 Cif at a variable bandwidth. The current operationstream 1 is set to 2048kbs and the “Mobile Centre” for remote access is set to 1024 kbs stream2. Transmission of images from site is normally by high quality fibre optic cable and usinguncompressed digital transmission equipment, ensuring no errors are introduced to the imageprior to reaching the station equipment in DCC. Where a site has no fibre transmission available,the analogue camera is compressed on site by an Indigo Vision 9000 codec and transmittedto a fibre point via an NGW Express IP VPN Tunnel. The IP from site is transported to theCCTV stack via fibre and the stream is then pointed to an Indigovision decoder which allowsthe video to be viewed on the Meyertech system. Web images from a selection of cameras aremade available on the Dublin City Council web site using Fusion Capture software. The Fusioncapture application provides images to the web site only if the camera is in the “home position”,which is when the camera is zoomed out. These images are updated every 10 minutes whichis dependent on the number of cameras in the cycle when in the home position and are notupdated if the camera is in use by an operator.

Metadata Data on the XY coordinates for the CCTV camera locations has beensupplied to IBM and the consortium.

Standards Standards operated by DCC are currently ONVIF compliant. An advisorynote from the CCTV contrac on ONVIF there are several differentlayers and it does not always work seamlessly. ONVIF standards applyto IP only.

InfrastructureImprovements

It should be noted that the Fusion Capture technology was designed 12years ago. The system design is struggling to meet the availability ofcompatible computer components available today. The system currentlyruns on Windows XP and has some driver issues with the capture card.No updates or patches are available.For fusion capture images it should be noted that several operatorshave the capability to set a preset, this can be set anywhere, zoomed inor out. The Fusion Capture has no capability to know what the camerais looking at or even if the camera has responded to the request to“Goto” Preset 1.As part of the VaVeL project work with the consortium to explore thepossibilities of using the CCTV cameras as a sensor with the aim ofenhancing the functionality to use the camera as a senor where by thecameras can be trained to detect incidents and automatically alert thetraffic control operators that there is an incident on the traffic network.This could result in a faster more efficient response to incidents whichin turn could reduce traffic congestion. To develop this research, videodata that captures the scene for different incident types will be used totrain algorithms to provide incident detection capability.Discussions are currently underway with the CCTV maintenance con-tractor to formalise how the requirements of this project will be includedunder the current maintenance framework agreement and how thesewill be included and documented to provide data for future use.

D1.3, Version 1.0, May 2016 21 http://www.vavel-project.eu/

Page 22: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Quality The Meyertech system provides analogue 1Vpp Video images. Indigotakes a 1Vpp image and compresses it to IP compression Cif / 2 Cif/ 4 Cif variable bandwidth. The Fusion Capture compression is doneby the card in the XP machine compression would be low but detailsunknown.

Accessibility There is an API available as part of the SDK from Indigo vision and this isonly released under an NDA. The SDK contains commercially sensitiveinformation and will not be released or distributed for general use.This issue is currently under discussion with DCC and the maintenancecontractor and currently investigating the option to provide an “isolated”sample of some cameras to the 3rd party until the development iscomplete.

Assessable andIntelligible

The data produced and/or used in the project is usable by third partieseven long time after the collection of the data. CCTV images whichare currently available on the DCC website and the accumulation andstorage of these images by a third party from the DCC website.

Legal Issues andPrivacy

Dublin City Council Law Department has advised that they have noissue with the use of the CCTV images which are currently availableon the DCC website and the accumulation and storage of these imagesby a third party from the DCC website. A select number of camerasare made available to the public on the Dublin City Councils traffichome page. Other agencies with access to the cameras include theother three local authorities in the Dublin Region, Railway ProcurementAgency, Dublin Port Tunnel, Dublin Bus and An Garda Siochana.

D1.3, Version 1.0, May 2016 22 http://www.vavel-project.eu/

Page 23: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Maintenance Plan CCTV is an integral part of the DCC traffic management infrastructureand it is DCCs policy to maintain and enhance its CCTV system as ithas done since 1989.The CCTV system is covered under a maintenance contract whichincludes - CCTV out-station equipment, maintenance of Indigo Visionequipment and Meyertech equipment located at all remote operatoruser sites, maintenance of Wireless and Radio Link communicationequipment, maintenance of all In-station Traffic Control Centre CCTVequipment including recording equipment, hard drives, fans, filters, andmonitors, maintenance / cleaning of CCTV equipment, poles, connec-tions, wiring, and housing sealing, supply, installation, testing, andcommissioning of CCTV cameras, encoders, CCTV monitors and workstations, video recording facilities, CCTV masts and poles, mini pillars,cabinets, including all traffic management and safety requirements as-sociated with the works, supply and installation of CCTV camera polesin Dublin City and environs, supply and installation of all communica-tions equipment associated with CCTV, supply, installation, testing andcommissioning of all equipment and software supplied as part of thecontract.Any changes/ additional requirements to the system or to provide accessto data will be carried out under the supervision of the maintenancecontractor and in accordance with the terms and conditions of thecontract to ensure that the integrity of the system and/or the dataprovided by the system is not compromised.Archiving and Preservation. The CCTV system is an integral part of thetraffic management system and this is envisaged into the future and it isour aim to ensure the integrity of the system and access to the systemby third parties as agreed now and into the future is maintained. Anyaccess agreements made for data during the life time of this project willbe done so under the current and subsequent maintenance contracts.These will be reviewed as part of any new maintenance contract andevery effort will be made to facilitate the access to data now and intothe future. Any changes to the system to allow for expansion or upgradewill be done so in a manner that all stake holders will be consulted andinformed of subsequent changes.

Table 3: Closed-Cicruit Television Data - Management Plan

2.4 Measurements of Weather and Pollution

Ireland’s National Roads Authority (NRA) maintains a network of sensor stations around Dublincity, each of which samples a variety of environmental factors at ten-minute intervals. As partof the initial data-collection effort, we have created a tool which pulls information from thirteen

D1.3, Version 1.0, May 2016 23 http://www.vavel-project.eu/

Page 24: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

of these stations into a central database. At present, our focus is on creating a historicalarchive for future exploitation rather than providing the data in real-time, and as such the datais harvested only once per day; this can, of course, be changed at a later date to account forthe project’s evolving requirements. The database also contains meta-information about thevarious data points, allowing human-readable reports to be generated with ease.

The database can be queried using standard SQL. It is currently only accessible from withinIBM, but it can be easily migrated to another location as necessary.

The full list of stations from which these sensor data are drawn is provided in Table 4, whilesome of the more interesting points captured by the database are highlighted in Table 5. Avisualisation of some of this data is shown in Figure 6.

Table 4: NRA stations

Dublin Port Tunnel M1 Drogheda BypassM1 Dublin Airport M11 Bray BypassM4 Enfield M50 Blanchardstown MasterM50 Blanchardstown Slave M50 Dublin AirportM50 Sandyford Bypass Tipping Bucket M50 Sandyford MasterM7 Newbridge Bypass M7 Portlaoise BypassN81 Tallaght

Table 5: Illustrative NRA datapoints

Code Description UnitCL Cloud State Status Code: Clear, Cloud, Cloud and RainPW Present Weather Status Code: 0 (unobstructed) to 99 (tornado)WL Water Layer mmSL Snow Layer mmIL Ice Layer mmRH Relative Humidity %PR Precipitation Total mmRI Rain Intensity mm/hP Pressure hpaT Air Temperature ◦CTS Surface Temperature ◦CVI Visibility mWD Wind Direction ◦

WS Wind Speed m/s

Alternatively, there are weather data available through The Weather Company. In January2016, IBM has acquired The Weather Company’s B2B, mobile and cloud-based web-properties,weather.com, Weather Underground, The Weather Company brand and WSI, its global business-to-business brand. Such data can be accessed, e.g., via wunderground3. These can be queried

3http://www.wunderground.com

D1.3, Version 1.0, May 2016 24 http://www.vavel-project.eu/

Page 25: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Figure 6: NRA data visualisation. Arrows indicate wind speed and direction; heatmap blobsindicate cumulative rain intensity at each station, in mm/h

for weather information at a particular coordinate, e.g. Dublin, posing the following request(the <key> has to be generated in advance by registering to the wunderground website):

http://api.wunderground.com/api/<key>/hourly10day/q/Ireland/Dublin.json

As result, a json object is returned which contains the following fields:

FCTTIME : the time of the weather forecasttemp: the temperaturecondition: the weather condition, e.g. “Rain”icon: an icon to depict on a map, e.g. “rain”icon url : link to an icon for graphical user interfaceshumidity : humidity in percentfeelslike: the perceived temperatureand many undocumented fields.

2.5 Social Media

Further input to the system is provided by Twitter Inc, a social network operating a shortmessaging service. Twitter issues a stream of messages (“tweets”) up to 140 characters long,optionally including one or more “hashtags” - that is, arbitrary words preceded with a hashcharacter, used to denote topics to which the message relates (e.g., #dublin). Tweets mayalso include links to websites and other auxiliary data; see Figure 7 for some examples.

D1.3, Version 1.0, May 2016 25 http://www.vavel-project.eu/

Page 26: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata The metadata are described above and at https://www.wunderground.

com/weather/api/

Standards The standards are described at https://www.wunderground.com/

weather/api/

Infrastructure Im-provements

None to be disclosed at the moment

Quality The data quality is discussed at https://www.wunderground.com/

weather/api/

Accessibility IBM has an unrestricted access to both the complete history of data, currentdata, and weather forecasts. This access has not been shared with the otherpartners.

Assessable and In-telligible

The intelligibility issues are discussed at https://www.wunderground.

com/weather/api/

Legal Issues andPrivacy

IBM will make the data available for commercial licensing aiming for along-term availability into the future.

Maintenance Plan a) Archiving and Preservation: IBM aims for long-term preservation of thedata into the future. b) Usable beyond the original purpose for which it wascollected: No.

Table 6: Weather Data - Management Plan

“N3: Heavy delays from the M50 to J3 due to a collision on the R121 at Blan-chardstown SC. Traffic is down to one lane in both directions.”

“M50: Emergency services are at the scene before J5 Finglas. Middle and rightlanes now blocked. Traffic almost back to the M1/M50 roundabout”

Figure 7: Sample Tweets from Live Drive.

The Twitter web application and its public API allow developers to retrieve a substream ofmessages based on a given set of criteria; specifics hashtags, for instance, or tweets producedby a certain user, etc. The stream is a sequence of tweets, which primarily consist of:

tweetId : a unique tweet identifierdate: integer, POSIX time of the tweet publicationtwitterUserId : twitter user identifiercoordinate: geo-localization of tweetmessageText: tweet text.

The stream is indexed by hashtag and clustered according to a given set of criteria (e.g. GPSco-ordinates). The Twitter substream generated within a geographical area of interest can beisolated by following relevant users (e.g., @livedrive) and monitoring certain hashtags (e.g.,#dublin).

Note that the input stream is not limited to users who are already known to the system; alltweets by Twitter users who are publicly tweeting in the area of interest are collected.

D1.3, Version 1.0, May 2016 26 http://www.vavel-project.eu/

Page 27: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

In more detail, we can access the following fields in the Twitter stream:

tweetId : a unique Tweet ID, assigned by TwittertwitterUserId : a twitter-ID of Tweeting user. Unique per Twitter account.twitterUserScreenName: mnemonic user name (login name)latitude: geographic latitude of sending devicelongitude: geographic longitude of sending devicemessageText: the actual tweet in raw textual form. It may include non-ASCII characters.messageDate: timestamp of message sending. Format ’YYYY-MM-DD hh:mm:ss’location: tweet location place name, for Gazetteer lookupscountryCode: ISO short country code (two characters)retweetStatusId : referred (tweetId) to embedded retweet (original tweet). 0 if not aretweet, -1 if not set or invalid value.isRetweeted : boolean flag (’y’—’n’) if tweet contained a retweetreplyStatusId : referrer (tweetId) if tweet is-in-reply-to. -1 if not an answer-to tweet.replyUserId : referrer (twitterUserId) to author of original tweet being answered. -1 if notan answer-to tweet.isFavorite: Boolean flag (’y’—’n’) if tweet was marked as favoritefollowersCount: Number of Twitter users currently following tweet authorfollowingCount: Number of Twitter users the tweet author currently follows

Batch data samples are retrieved from the Twitter API using a spatial query. Further forDublin, The Live Drive Radio data set results from Twitter messages sent by people driving inDublin that report traffic hazards to the local radio.

Metadata The metadata are described above and at https://dev.twitter.com/

rest/public

Standards The standards are described above and at https://dev.twitter.com/

rest/public

Infrastructure Im-provements

None to be disclosed at the moment

Quality None to be disclosed at the momentAccessibility In November 2014, IBM Corporation entered into a licensing agreement

with Twitter Inc., which allows for unlimited access to the data by IBM.Limited subset of the data is publicly available at https://dev.twitter.com/rest/public

Assessable and In-telligible

See https://dev.twitter.com/rest/public

Legal Issues andPrivacy

IBM cannot share this access with the consortium.

Maintenance Plan a) Archiving and Preservation: Data may be archived by Twitter Inc. b)Usable beyond the original purpose for which it was collected: Within IBMCorporation.

Table 7: Social Media Data - Management Plan

D1.3, Version 1.0, May 2016 27 http://www.vavel-project.eu/

Page 28: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

3 City of Warsaw Data

3.1 Real time trams location

Warsaw operates 25 tram lines. Total trams line length exceeds 360 km. Number of trams inuse is as follows:

Morning peak: 414Mid-day: 313Afternoon peak: 421

Real time Trams location web service exposes information about geographical location of trams.Data set contains information about all vehicles active at the moment. The data is updatedevery 15 seconds. This dataset has been released by the Warsaw Trams.

Metadata (data category from CKAN): real time, trams, online data. Meta datadescribing this dataset are used only in documentation as keywords: e.g.real time, trams, online data. CKAN platform used as middleware forCoW open data exposition supports meta-data in form of RDF but thisfunctionality is currently not used by CoW IT department.

Standards RESTlike Web Services, pooling data refreshing every 15 seconds. CoWtrams location is exposed using RESTlike Web Services, in form of GETHTTP method. Information about trams location is refreshing every 15seconds and must be retrieved by developers using request - response model(pooling). The geographical coordinates are float numbers compliant withEPSG 4326 (WGS 84). Example: 20.992 for long, 51.242 for the latitude.

InfrastructureImprovements

Caching of data period on MUNDO backend was changed from 30 to 15seconds for VaVeL project

Quality The quality of data is currently analyzed by consortium members. Someissues have already been resolved.

Accessibility Publicly available open data after registration and terms and conditionacceptance.

Assessable andIntelligible

Documentation available

Legal Issues andPrivacy

Open data registration on api.um.warszawa.pl needed. Terms of use areavailable at https://api.um.warszawa.pl website.

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Online data available by API. Historicaldata collected in csv files is available at CoW Cloud b) Usable beyond theoriginal purpose for which it was collected not possible

Table 8: Real Time Tram Locations - Management Plan

Real time Trams location web service exposes information about geographical locationof trams. Data set contains information about all active at the moment vehicles. Warsaw

D1.3, Version 1.0, May 2016 28 http://www.vavel-project.eu/

Page 29: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

operates 25 tram lines. Average number of vehicles varies from 414 during morning peak, 313in mid-day, to 421 in the afternoon. The data is updated every 15 seconds. This dataset hasbeen released by the Warsaw Trams CoW agency.

Where HTTP response parameters are listed below:

Time - datetime timestampLat - float - latitude (GPS)Lon - float longitude (GPS)FirstLine - string - number of the first line realized by vehicleLines - string - the numbers of all lines (for multiline brigades will be more than one line)Brigade - string - the number of brigadeStatus - string task status can assume values “RUNNING” or “FINISHED”.LowFloor - bool - indicates if the tram is a low floor one 1 Yes, 0 No.

Trams Location historical Data Archival data available to the consortium. Data from 21march 2016 are collected in csv files and available at CoW Cloud.

Time - datetime timestampLat - float - latitude (GPS)Lon - float longitude (GPS)FirstLine - string - number of the first line realized by vehicleLines - string - the numbers of all lines (for multiline brigades will be more than one line)Brigade - string - the number of brigadeFirstLine Brigade - concatenate 2 fields (tram ID for a day)Status - string task status can assume values “RUNNING” or “FINISHED”LowFloor - bool - indicates if the tram is a low floor one 1 Yes, 0 No.

3.2 Bus Data

285 bus linesTotal line length 4379,9 kmNumber of buses in use: 1729 (1366 operated by MZA)Morning peak: 1644Mid-day: 1035Afternoon peak: 1619

Historical Data Archive data from 21 April 2016 are available in CoW cloud. Data format:Side Number(vehicle Number), unix timestamp, latitude GPS, longitude GPS, Line, Brigade

Example: 1525,1461362407,21.170208,52.160407,146,4

3.3 19115 Non-emergency notification system

API enables reporting of various issues to the City by locals and visitors. Issues such as failures,defects and non-critical threats concerning eg. the state of roads, snow removal, damage, actsof vandalism, etc. API also allows users to obtain information filtered by the keys.

Information available:

D1.3, Version 1.0, May 2016 29 http://www.vavel-project.eu/

Page 30: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata CKAN: busStandards Websocket interface exposed by MZA (Warsaw’s Buses Authority) accessible

only in internal CoW networkInfrastructureImprovements

None

Quality The quality of data is currently analyzed by consortium members. Someissues have already been resolved.

Accessibility Public data with restricted access. Available only for VaVeL consortiummembers

Assessable and In-telligible

Documentation available

Legal Issues andPrivacy

Public data with restricted access Available only for VaVeL consortiummembers

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) Usable beyond the original purposefor which it was collected not possible.

Table 9: Bus Data - Management Plan

Metadata CKAN keywords: bus, historical dataStandards Flat csv files with bus locationsInfrastructureImprovements

Dedicated data collector was developed

Quality The quality of data is currently analyzed by consortium members. Someissues resolved.

Accessibility Public data with restricted access Available only for VaVeL consortiummembers

Assessable and In-telligible

Documentation available

Legal Issues andPrivacy

Public data with restricted access. Available only for VaVeL consortiummembers.

Maintenance Plan a) Archiving and Preservation: Archiving and Preservation: This data setwill be implemented as source of information for realization of use casesdefined by CoW and stored in data processing system installed in CoW. b)Usable beyond the original purpose for which it was collected not possible.

Table 10: Bus Historical Data - Management Plan

D1.3, Version 1.0, May 2016 30 http://www.vavel-project.eu/

Page 31: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

siebelEventId: Event ID in the city CRM systemdeviceType: type of device used to submit notificationstreet: notification street name - this field is only used in notifications registered by CRMoperators.It is validated with the city’s street names dictionary.street2: notification street name field only used in notifications submitted by citizens(ie. notifications generated outside of CRM). Not validated.district: district of the notificationcity: city of the notificationhouseNumber: building number of the notificationaparmentNumber: apartament number of the notificationcategory: notifcication categorysubcategory: notification subcategory(dictionary value, as when reporting)event: the process of intervention (dictionary value, as when reporting)description: notification descriptioncreateDate: creation datenotificationNumber: notification numberxCoordWGS84: Latitude notification in WGS84 standardyCoordWGS84: Longitude notification in WGS84 standardxCoordOracle: Latitude notification in Oracle Spartial standardyCoordOracle: Longitude notification in Oracle Spartail standardnotificationType: (INCIDENT(”Awaria/ Interwencja”), INFORMA-TIONAL(”Informacyjne”), COMPLAINT(”Reklamacja”), STATUS(”Statussprawy/zgoszenia”) PUBLIC INFORMATION(”Wniosek o dostp do informacjipublicznej”), FREEFORM(”Wolne wnioski i uwagi”)statuses - List of notification statuses (changeDate change date, status status, descriptiondescription,Source notification source (API(”API”), CALL(”CALL”), CKM(”CKM”),MAIL(”MAIL”), MOBILE(”MOBILE”), PHONE(”Phone”), PORTAL(”PORTAL”),SMS(”SMS”), WEB(”Web”), WEBCHAT(”WEBCHAT”), EMPTY(”brak”)

3.4 Public transport timetables

Public transport timetables data set is managed by Warsaw’ Public Transport Authority andstored in MySQL database. This information is exposed for developers as an Open API byapi.um.warszawa.pl portal in RESTlike Web Services form. Exposed API allows to obtaininformation about timetables and information about bus or trams lines for the selected stops.

API provides three methods. First of them (getBusstopId) is mandatory for the use of theother and is used to obtain the ID stop identifier. The other two (getTimetable and getLines)are used to obtain data about lines and timetables related with the stop.

Historical data Public transport historical timetables data set is managed by Warsaw’ PublicTransport Authority and stored in MySQL database. This information is exposed for developersas an Open API by api.um.warszawa.pl portal in RESTlike Web Services form. Exposed API

D1.3, Version 1.0, May 2016 31 http://www.vavel-project.eu/

Page 32: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data category from CKAN): not emergency issue, real time, online dataStandards RESTlike Web Services, data from Siebel CRM databaseInfrastructureImprovements

Data exposed during MUNDO project, no additional improvements needed

Quality The quality of data is currently analyzed by consortium membersAccessibility Publicly available open data after registration and terms and condition

acceptanceAssessableand Intelligible

Documentation available.

Legal Issues andPrivacy

Open data registration on api.um.warszawa.pl needed. Terms of use areavali-able at https://api.um.warszawa.pl website.

Maintenance Plan a) Archiving and Preservation: This data set will be utilized as source ofinformation for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Via an API there are available historicaldata for approx. 2 month period. b) usable beyond the original purpose forwhich it was collected not possible

Table 11: 19115 Data - Management Plan

Metadata (data categories from CKAN): transport, timetablesStandards RESTlike Web Services, data from ZTM MySQL SQL database exposed by

MUNDO platformInfrastructureImprovements

Data exposed during MUNDO project, no additional improvements needed

Quality The quality of data is currently analyzed by consortium members.Accessibility Publicly available open data after registration and terms and condition

acceptance.Assessableand Intelligible

Documentation available.

Legal Issues andPrivacy

Open data registration on api.um.warszawa.pl needed. Terms of use areavailable at http://www.ztm.waw.pl/?c=628&l=1 website.

Maintenance Plan a) Archiving and Preservation: The actual timetable is available via theAPI. This data set will be utilized as source of information for realization ofuse cases defined by CoW and stored in data processing system installed inCoW. b) usable beyond the original purpose for which it was collected notpossible.

Table 12: Public transport timetables - Management Plan

D1.3, Version 1.0, May 2016 32 http://www.vavel-project.eu/

Page 33: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data categories from CKAN): transport, timetables historical dataStandards RESTlike Web Services, data from ZTM MySQL database exposed by

MUNDO platform.Infrastructure Im-provements

New data set implemented for VaVeL project. Redevelopment CKANextensions was made and implemented on MUNDO Data server

Quality The quality of data is currently analyzed by consortium membersAccessibility Public data with restricted access. Available only for VaVeL consortium mem-

bers after registration and terms and condition acceptance. DocumentationAvaliable.

Assessableand Intelligible

Documentation avaliable.

Legal Issues andPrivacy

Restricted access only for VaVeL consortium member

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Via the API the timetable is availablefor the past 4 months b) usable beyond the original purpose for which itwas collected not possible.

Table 13: Public transport historical timetables - Management Plan

allows to obtain information about timetables and information about bus or trams lines for theselected stops in selected time in the past.

3.5 Bus & trams stops locations

Bus & trams stops locations API offers developers the information about current geographicallocation of bus and trams stops in Warsaw. This dataset has been released by the PublicTransport Authority (ZTM).

3.6 Park& Ride

Park & Ride Parking information data set contains information about P&R parking stations inCity of Warsaw for selected geographical areas.

3.7 Bike roads

Park & Ride Parking information data set contains information about bike roads in City ofWarsaw.

3.8 Bike stations location (Veturilo)

Bike stations location (Veturilo) data set contains information about rent-a-bike Veturilostations in City of Warsaw for selected geographical areas.

D1.3, Version 1.0, May 2016 33 http://www.vavel-project.eu/

Page 34: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data categories from CKAN): bus, tram, stops, geolocationStandards RESTlike Web ServicesInfrastructureImprovements

New data set implemented for VaVeL project. Redevelopment CKANextensions was made and implemented on MUNDO Data server

Quality The quality of data is currently analyzed by consortium membersAccessibility Public data with restricted access. New data set implemented for VaVeL

project. Redevelopment CKAN extensions was made and implemented onMUNDO Data server

Assessable and In-telligible

Documentation avaliable

Legal Issues andPrivacy

Available only for VaVeL consortium members after registration and termsand condition acceptance. Additional terms of use are available at http:

//www.ztm.waw.pl/?c=628&l=1 website.Maintenance Plan a) Archiving and Preservation: This data set will be implemented as source

of information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Via API accessible are the descriptionsof actual bus & trams stops parameters, b) usable beyond the originalpurpose for which it was collected not possible.

Table 14: Bus and Trams Stop Locations - Management Plan

Metadata (data categories from CKAN): park & ride, static data, geolocationStandards REST like Web Services, WFSInfrastructureImprovements

None

Quality The quality of the data will be analyzed by consortium members.Accessibility Public data with restricted access.Assessable andIntelligible

Documentation available.

Legal Issues andPrivacy

Public data with restricted access. Terms of usehttp://mapa.um.warszawa.pl/warunki.html

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.

Table 15: Park & Ride - Management Plan

D1.3, Version 1.0, May 2016 34 http://www.vavel-project.eu/

Page 35: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data categories from CKAN): bike roads, static data, vector map, WFSStandards RESTlike Web Services, WFSInfrastructureImprovements

None

Quality The quality of the data will be analyzed by consortium membersAccessibility Public data with restricted access.Assessableand Intelligible

Documentation available.

Legal Issues andPrivacy

Public data with restricted access. Terms of usehttp://mapa.um.warszawa.pl/warunki.html

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.

Table 16: Bike Roads - Management Plan

Metadata (data categories from CKAN): city bike, static data, VeturiloStandards RESTlike Web Services, WFSInfrastructureImprovements

None

Quality The quality of the data must be analyzed by consortium membersAccessibility Public data with restricted access.Assessable andIntelligible

Documentation avaliable, WFS standard documentation is publicly availablehttp://www.opengeospatial.org/standards/wfs

Legal Issues andPrivacy

Public data with restricted access

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.

Table 17: City bike stations - Management Plan

D1.3, Version 1.0, May 2016 35 http://www.vavel-project.eu/

Page 36: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data categories from CKAN):metro entrances, static dataStandards RESTlike Web Services, WFSInfrastructureImprovements

None

Quality The quality of the data will be analyzed by the consortium members.Accessibility Public data with restricted access.Assessable andIntelligible

Documentation avaliable. WFS standard documentation is publicly availablehttp://www.opengeospatial.org/standards/wfs

Legal Issues andPrivacy

Public data with restricted access. Terms of usehttp://mapa.um.warszawa.pl/warunki.html

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.

Table 18: Metro entrances - Management Plan

3.9 Metro Entrances

Metro Entrances data set exposes information about metro entrances in Warsaw. API allowsto retrieve information for selected geographical area and filter data based on defined keys.Access to data is based on Web Feature Service (WFS) standard defined by Open GeospatialConsortium dedicated for exposition geospatial information in vector maps form. http:

//www.opengeospatial.org/standards/wfs.

3.10 Address points

Address points data set offers information on addresses in City of Warsaw for a selectedgeographical area. API allows to retrieve information for the selected geographical area andfilter data based on defined keys. Access to data is based on Web Feature Service (WFS)standard defined by Open Geospatial Consortium dedicated for the exposition geospatialinformation in vector maps form. Dataset Address points is maintained by Office of Surveyingand Cadastre (BGiK) City of Warsaw and exposed using a URL (endpoint).

3.11 Streets

This data set exposes information about location of streets in the City of Warsaw. An APIallows to retrieve information for a selected geographical area and filter data based on definedkeys. Access to data is based on Web Feature Service (WFS) standard defined by OpenGeospatial Consortium dedicated for exposition geospatial information in vector maps form.The dataset Streets is maintained by Office of Surveying and Cadastre (BGiK) City of Warsawend exposed using a URL.

D1.3, Version 1.0, May 2016 36 http://www.vavel-project.eu/

Page 37: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data categories from CKAN):address points, static data, geolocationStandards RESTlike Web Services, WFSInfrastructureImprovements

None

Quality The quality of the data will be analyzed by consortium members.Accessibility Public data with restricted access.Assessableand Intelligible

Documentation available. WFS standard documentation is publicly available.http://www.opengeospatial.org/standards/wfs

Legal Issues andPrivacy

Public data with restricted access

Maintenance Plan a) Archiving and Preservation: This data set will be utilized as source ofinformation for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.

Table 19: Address points - Management Plan

Metadata (data categories from CKAN):streets, static data, geolocationStandards RESTlike Web Services, WFSInfrastructure Im-provements

None

Quality The quality of the data will be assessed by the consortium membersAccessibility Public data with restricted access.Assessable andIntelligible

WFS standard documentation is publicly available http://www.

opengeospatial.org/standards/wfs

Legal Issues andPrivacy

Public data with restricted access

Maintenance Plan a) Archiving and Preservation: This data set will be utilized as source ofinformation for realization of use cases defined by CoW and stored in a dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.

Table 20: Streets - Management Plan

D1.3, Version 1.0, May 2016 37 http://www.vavel-project.eu/

Page 38: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Figure 8: ZTM-Warsaw’s Twitter Account

3.12 City of Warsaw Twitter

ZTM Public Transport Authority publishes information about public transport on Twitterhttps://twitter.com/ztm_warszawa. Primarily it is information about the failures of publictransport, the resolution of the failure and sudden timetable changes resulting from unplannedevents (demonstrations, accidents, etc.).

Standards Data from Twitter is available using Twitter Open API. Details can be foundon page: https://dev.twitter.com/overview/documentation. Twitter offers two OpenAPI sets for developers:

REST API mostly dedicated for off-line access - https://dev.twitter.com/rest/

public

Stream API dedicated for real time data access - https://dev.twitter.com/

streaming/overview

Access to Twitter API requires a developer account (OAuth protocol credential needed) andapplication in Twitter developers portal (https://dev.twitter.com/apps).

3.13 RSS services

Public Transport Authority runs RSS (Rich Site Summary) service information about publictransport. RSS contains 5 main categories:

NewsPress releasesChanges in public transportPublic procurementDifficulties

D1.3, Version 1.0, May 2016 38 http://www.vavel-project.eu/

Page 39: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata (data categories from CKAN):hashtagsStandards RESTlike Web Services, WFSInfrastructureImprovements

External data provider - N/A

Quality The quality of the data will be assessed by VaVeL’s consortium members.Accessibility Private data with open access. Data publicly available after registration

Assessable andIntelligible

Public available open data delivered by Warsaw Public Transport Authority

Legal Issues andPrivacy

Twitter Term and Conditions acceptance needed: https://dev.twitter.

com/overview/terms/agreement-and-policy

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored indata processing system installed in CoW.

Table 21: Twitter Data - Management Plan

Metadata RSS (XML) meta-dataStandards RSS, multiple feeds: News: http://www.ztm.waw.pl/rss.php?l=

1&IDRss=1 Press releases: http://www.ztm.waw.pl/rss.php?l=1&

IDRss=2 Changes in public transport: http://www.ztm.waw.pl/rss.

php?l=1&IDRss=3 Public procurement: http://www.ztm.waw.pl/rss.

php?l=1&IDRss=4 Changes in public transport: http://www.ztm.waw.

pl/rss.php?l=1&IDRss=6

InfrastructureImprovements

Data is delivered by Warsaw Public Transport Authority. Changes are notpossible.

Quality The quality of the data will be assessed by VaVeL’s consortium members.Accessibility Open Data exposed in Internet for everyone Publicly available open data

delivered by Warsaw Public Transport Authority.

Assessable and In-telligible

RSS is well known information exposition standard, polish language is usedin RSS information.

Legal Issues andPrivacy

Open Data

Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in adata processing system installed in CoW.

Table 22: RSS services - Management Plan

D1.3, Version 1.0, May 2016 39 http://www.vavel-project.eu/

Page 40: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Metadata None.Standards XML data exposed using dedicated URL.InfrastructureImprovements

Data is delivered by Warsaw Public Bikes operator (external system).Changes are not possible.

Quality The quality of the data will be assessed by the VaVeL consortium.Accessibility Data exposed in Internet for everyone. Publicly available data delivered by

Warsaw Public Bikes operator.Assessable andIntelligible

XML file structure is clear. However there is no API documentation.

Legal Issues andPrivacy

Public data. Unfortunately API usage terms and conditions are currentlynot accessible on the Nextbike web page.

Maintenance Plan a) Archiving and Preservation: This data set will be utilized as a source ofinformation for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected: depends on confirmation from Warsaw PublicBikes operator Nextbike.

Table 23: Veturilo stations - Management Plan

3.14 Veturilo stations (Warsaw City Bike system)

Warsaw’s City Bike system (Veturilo) exposes an API that contains information about bikesaccessibility in Veturilo stations. Warsaw Public Bikes near real time information is provided bythe portal http://nextbike.net (data refreshing every 1 minute).

3.15 Orange subscribers location statistics

Dataset of mobile subscriber’s location statistics contains statistical information on the amountof terminals communicated with given cells of the Public Land Mobile Network (PLMN).

Subscriber activity is detected on the basis of network events (13 different events aretaken into account) that are triggered together with voice and xMS communication. Inactiveterminals are periodicaly updated accordingly to network & terminal settings (usually 1-2 hrs).For the VaVeL project samples will be delivered of data from urban area for selected cellslocated in Warsaw, for a defined period of time. The raw stream of data from mobile cells inWarsaw is between 300 and 400 events per second. Volume of raw data is between 18 and 20M events for time period 24 hours for Warsaw area (data from about 6000 cells). Statisticinformation are collected in csv files. Average file size with aggregate of events from 24 hoursfor Warsaw area is about 8-9 MB.

Metadata None

D1.3, Version 1.0, May 2016 40 http://www.vavel-project.eu/

Page 41: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Standards Mobile data statistics are calculated and collected by dedicated networksystem based on the events from MSS (Mobile Switching Centre Server).Statistics are provided in form of flat csv files. File names have the fol-lowing form: statistics hours YYYY-MM-DD.csv File is generated daily at1:00 am and contains data from previous day. For example, a file namedstats hours 2016-05-04.csv is created on May 5 at 1:00 AM and includesstatistics from 4th May. Data structure in files contains the followingcolumns: date & hour,x,y,numer of events, where csv file columns are listedbellow:

number of events - numeric (1,10) - Unique amount of MSISDNdetected in given periodx - float - latitude (GPS) center of celly - float longitude (GPS) center of cellradius - Int cell radiusdate & hour date and hour - (e.g. - 2016-05-05 01:00:00 means2016-05-05 between 0:00 am and 1:00 am)

InfrastructureImprovements

To expose the data described in this chapter a re-development of events-collecting system was performed. The changes include:

automation of statistics recordingdata compressionautomation of historical data cleaning

The system is used for data collection is the pre-production instance andcontains numerous restrictions e.g. limited storage and limited performance.

Quality Because of the aforementioned limitations there is a possibility that not allevents from all cells will be reported. The test instance due to a single nodearchitecture cannot provide high values of SLA (redundancy mechanismnot implemented). Since the events generation mechanism related to TDMevents, not all subscribers activities are reported (e.g. statistics might notcontain information about mobile data usage).

Accessibility For data analysis this data set will be send via e-mail as an encrypted attach-ment to consortium leader Password will be send separated communicationchannel (e.g. SMS). Because of polish telecommunication law restrictionand internal Orange Polska regulation this data set can be used only byconsortium members for the project VaVeL. Open Access and any sharingthis dataset with 3rd parties is prohibited.

D1.3, Version 1.0, May 2016 41 http://www.vavel-project.eu/

Page 42: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Assessable andIntelligible

Raw data sharing with other parties (not consortium members) is currentlynot possible because of telecommunication law (Operators cannot processany telecommunication data without the special and clear consent fromend user of terminal) and internal Orange regulations (privacy policy donot allow for sharing any data which can deliver business information aboutnetwork).Statistics used for VaVeL project are prepared based on a validated mech-anism. Based on all available network events on MSS we calculate theamount of unique MSISDNS appeared in defined cell in defined quantumof time (most often one hour). Mechanism is simple without any otherconditions so the data are not performed and calibrated. But based on lawregulation - this data aggregation procedure is not reversible. Also if data isexposed close to real time there are additional restrictions in which statisticslower than 10 for cell cannot be displayed. But in case of VaVeL project thisis not implemented because we transfer historical location statistics. Thisdata set is well documented in a data manual provided to the consortium.

D1.3, Version 1.0, May 2016 42 http://www.vavel-project.eu/

Page 43: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Legal Issues andPrivacy

Based on current regulations in Polish Telco Law and EU directives concern-ing the topic of telco transmission data processing we include below somepoints that require attention. Based on the above documents we outlinesome basic rules related to operators using data:

1. In particular, concerning personal data, always should be considered thehighest protection required by the law principles (TelecommunicationLaw (TL), Law on the protection of personal data (LPPD), EUdirectives and in particular to 2002/58/EC directive on e-Privacy and95/46/EC directive on personal data protection)

2. For processing of personal data, the consent of the data subject isrequired.

3. Anonymised data can be used without consent of the subject only ifthey are aggregated from the first step of processing and can not beassociated to an individual.

4. Everything what can be legally conducted by OLP can be also per-formed by subcontractors (based on appropriate contracts). Usingpersonal data by other companies should be specified in the providedconsent provided by the subject for data usage.

Law protecting the privacy of operators data do not make any extraordinaryexceptions concerning R&D area. So for VaVeL project rules are the sameas in case of creating any other operators data usage for other than deliveryof telco services ordered by end user or terminal owner. Based on that weinvestigate and deploy some techniques allowing the calculation of statisticsallowed by Law in the closest network area MSS.That’s why we will be able to share with other participants only thoseaggregated statistics which are safe from a legal point of view. Thismechanism of statistics calculation was investigated with some proof ofconcept projects and we evaluated their reliability for being used as a sourceof location statistics. We prepare some extrapolation and compare operatorsdata with calculations made based on optic sigh (based on camera).On the other hand we also believe that, even aggregated statistics, datataken from all users have better quality and value, than data taken fromthose who only give consent. Users that provide consent for using locationdata are actually a sub-sample but the mechanism can not be treated as“random selection” and we are not able to predict the bias of this factor inthese data (especially in case of small areas of observation).

Maintenance Plan a) Archiving and Preservation: Finally Orange Mobile Subscriber’s Locationstatistics will be implemented as source of information for realization ofuse cases defined by CoW and stored in data processing system installed inCoW.b) usable beyond the original purpose for which it was collected not possible

D1.3, Version 1.0, May 2016 43 http://www.vavel-project.eu/

Page 44: VaVeL H2020 - 688380 D1.3 - Data Management Plan

VaVeL H2020-688380

Table 24: Orange subscribers location statistics - Management Plan

D1.3, Version 1.0, May 2016 44 http://www.vavel-project.eu/


Recommended