VaVeLH2020 - 688380
D1.3 - Data Management Plan
National and Kapodistrian University of Athens
June 7, 2016
Status: Final
Scheduled Delivery Date: 31/05/2015
VaVeL H2020-688380
D1.3, Version 1.0, May 2016 2 http://www.vavel-project.eu/
VaVeL H2020-688380
Document History
• (June 3rd, 2016) Version 1.0 Submitted to the EC and uploaded to VaVeL website.
• (June 4th, 2016) Version 1.1, added links and comments on potential open data reposi-tories that the consortium will investigate as candidates for data publication.
• (June 7th, 2016) Version 1.2, minor changes.
D1.3, Version 1.0, May 2016 3 http://www.vavel-project.eu/
VaVeL H2020-688380
Executive summary
This document includes information about the data sources that the VaVeL consortium willwork and conduct research on. More specifically for each data source the partners havedefined a data management plan. The plan consists of information about legal issues, privacy,infrastructure changes, archiving, maintenance, standards and accessibility. The document willbe regularly updated as more information becomes available and data issues are resolved. Pleasecheck the website of the project (www.vavel-project.eu) under the deliverables section forupdates. Early on, the consortium has agreed to make every effort to provide open access toas many datasets as possible. This document reflects this continuous effort.
D1.3, Version 1.0, May 2016 4 http://www.vavel-project.eu/
VaVeL H2020-688380
Document Information
Contract Number H2020-688380 Acronym VaVeLName Variety, Veracity, VaLue: Handling the Multiplicity of Urban SensorsProject URL http://www.vavel-project.eu/EU Project Officer First Name - Last Name
Deliverable D1.3 Data Management PlanWork Package Number WP1Date of Delivery 31/05/2016 Actual 31/05/2016Status FinalNature ReportDistribution Type PublicAuthoring Partner National and Kapodistrian University of AthensQA Partner IBMContact Person Ioannis Katakis [email protected]
Dimitrios Gunopulos [email protected] Fax
List of Contributors: Ioannis Katakis (UoA), Dimitrios Gunopulos (UoA), Jaroslaw Legierski(OPL), Izabella Krzeminska (OPL), Robert Kunicki (CoW), Jakub Marecek (IBM), MaggieO’Donnell (DCC), Aaron O’Connor (DCC).
D1.3, Version 1.0, May 2016 5 http://www.vavel-project.eu/
VaVeL H2020-688380
Project Information
This document is part of a research project funded by Horizon H2020 programme of theCommission of the European Communities as project number 688380. The Beneficiaries in thisproject are:
No. Name Short Name Country1 National and Kapodistrian University of
AthensUoA Greece
2 Technische Universitat Dortmund TUD Germany3 Technion - Israel Institute of Technology Technion Israel4 Fraunhofer-Gesellschaft Zur Forderung
Der Angewandten Forschung E.V.Fraunhofer Germany
5 IBM Ireland Limited IBM Ireland6 AGT International AGT GROUP (R&D) GMBH Germany7 Orange Polska S.A. OPL Poland8 Dublin City Council DCC Ireland9 City of Warsaw CoW Poland10 Warsaw University of Technology WUT Poland
D1.3, Version 1.0, May 2016 6 http://www.vavel-project.eu/
VaVeL H2020-688380
Table of Contents
1 Introduction 9
2 Dublin City Council Data 122.1 Data from a Traffic Management System (SCATS) . . . . . . . . . . . . . . . 122.2 Public Transport Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Closed-Circuit Television Data . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Measurements of Weather and Pollution . . . . . . . . . . . . . . . . . . . . 232.5 Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 City of Warsaw Data 283.1 Real time trams location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Bus Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 19115 Non-emergency notification system . . . . . . . . . . . . . . . . . . . . 293.4 Public transport timetables . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5 Bus & trams stops locations . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.6 Park& Ride . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.7 Bike roads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.8 Bike stations location (Veturilo) . . . . . . . . . . . . . . . . . . . . . . . . . 333.9 Metro Entrances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.10 Address points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.11 Streets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.12 City of Warsaw Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.13 RSS services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.14 Veturilo stations (Warsaw City Bike system) . . . . . . . . . . . . . . . . . . 403.15 Orange subscribers location statistics . . . . . . . . . . . . . . . . . . . . . . 40
D1.3, Version 1.0, May 2016 7 http://www.vavel-project.eu/
VaVeL H2020-688380
Index of Figures
1 Real-Time Data from City of Warsaw . . . . . . . . . . . . . . . . . . . . . . 92 Real-Time Data from City of Dublin . . . . . . . . . . . . . . . . . . . . . . 93 VaVeL’s history in publishing data . . . . . . . . . . . . . . . . . . . . . . . . 104 The vertex-based transit graph. Cited in verbatim from https://github.com/
openplans/OpenTripPlanner/wiki/GraphStructure. . . . . . . . . . . . 165 An illustration of a delay function, which gives the travel-time along a segment of
a road as a function of its utilisation, i.e. the ratio of the number of concurrentusers to the maximum thereof. . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 NRA data visualisation. Arrows indicate wind speed and direction; heatmapblobs indicate cumulative rain intensity at each station, in mm/h . . . . . . . 25
7 Sample Tweets from Live Drive. . . . . . . . . . . . . . . . . . . . . . . . . . 268 ZTM-Warsaw’s Twitter Account . . . . . . . . . . . . . . . . . . . . . . . . 38
D1.3, Version 1.0, May 2016 8 http://www.vavel-project.eu/
VaVeL H2020-688380
1 Introduction
In this document the VaVeL consortium presents information about the data that will beexploited in the context of the project. It is by design that the major data providers are alsomember of the consortium. These are:
The Dublin City Council, that will provide traffic, public transport, weather and videodata.The City of Warsaw, that will provide public transport data, data on emergency callsand citizen reporting.Orange Polska, that will provide subscriber’s location data.
Figure 1: Real-Time Data from City of WarsawFigure 2: Real-Time Data from City ofDublin
It is important to note that most of the above data sources will be provided in real-time.This document serves as a management plan for the above data sources. More specifically itaddresses the following issues and questions.
Meta-data: Details about the meta-information that accompany our data (if available).
Standards: We mention any standards that are followed by the data or by the way theconsortium or the data providers provide access to the data.
Infrastructure Improvements: The infrastructure providing the data is as important asthe data themselves. Hence, we present information on necessary infrastructure changesthat were required in order to improve any data-related aspects (accessibility, informationrichness, volume, etc).
Quality: Data veracity is one of the main objectives of VaVeL. We provide brief informationabout the consortium’s efforts to address data quality issues if necessary. Depending onthe case, ‘quality’ might imply cleaning, pre-processing, adding meta-data, transformingto a more convenient format or providing easier access.
D1.3, Version 1.0, May 2016 9 http://www.vavel-project.eu/
VaVeL H2020-688380
Accessibility: We explicitly describe the access level provided for each data source forvarious user groups (consortium, public, etc). On top of that we outline the technicalmeans that are necessary to access the data. We also report on our efforts to make thedata more easier to discover.
Assessable and Intelligible: We describe the means that we provide in order to make thedata more easy to use and understand its content and value.
Legal Issues & Privacy: The consortium provides details about legal issues related toeach dataset as well as the path of resolving them.
Maintenance Plan: In this section we will describe the maintenance plan for each dataset.More specifically, we will discuss archiving of historical data and how the potential ofmaintaining/utilizing the data after the end of the project.
Some of the above items were inspired by the document “Guidelines on Data Managementin Horizon 2020”, published by the European Commission, Directorate-General for Research &Innovation (Version 1.0, 15 February 2016).
VaVeL’s Open Data Strategy and History. The consortium is willing to publicly shareand make easily accessible and discoverable as many datasets as possible. Many data sets arealready available online (see following sections) and every effort will be made to make evenmore data sources accessible. On top of that, the consortium intends to make available toolsthat analyze urban data. More importantly the consortium has a history in publishing opendata. Dublinked (see Figure 3a) is a web platform hosting multiple data resources originatingfrom Dublin. On the other hand, the City of Warsaw along with its technical partners (Orange)has a history in making APIs for processing and accessing data open (api.um.warszawa.pl -see Figure 3b).
(a) The Dublinked website in Dublin (b) Open APIs in Warsaw
Figure 3: VaVeL’s history in publishing data
D1.3, Version 1.0, May 2016 10 http://www.vavel-project.eu/
VaVeL H2020-688380
Open Data Portals On top of the above, the VaVeL consortium is currently investigatingthe exploitation of additional portals and ways to disseminate, archive, register and index itsdatasets and APIs in order to make the resources more discoverable. Such are the following:
European Union Open Data Portal (http://data.europa.eu/euodp/en/data) -where a lot of European Organizations archive open data sets.
The programmable Web (http://www.programmableweb.com/apis) - where morethan 15.000 APIs are indexed. This repository is especially suitable for the APIs availablefrom the City of Warsaw.
D1.3, Version 1.0, May 2016 11 http://www.vavel-project.eu/
VaVeL H2020-688380
2 Dublin City Council Data
2.1 Data from a Traffic Management System (SCATS)
The Sydney Co-ordinated Adaptive Traffic System provides information on vehicular trafficat fixed sensor locations as spatio-temporal time series. The SCATS data are produced byaggregating the primary source data that are collected by the Dublin SCATS traffic sensormonitoring system.
The primary data are given in the Strategic Monitoring (SM) format Each sensor sendsmessages with varying frequency (depending on the location, conditions and other factors).The SM format specifies the message parameters. In practice, data are imported from twosources. For a period of time in 2012, the data have been recorded1 as a sequence of followingtuples:
streetSegId : a unique identifier for a street segment ID,armNumber : an identifier for the arm on a street segment,armAngle: bearing of the arm,gpsArm: GPS position 20 meters into the arm,gpsCentroid : GPS position of the centroid of the intersection.aggerateCount: aggregated vehicles volume count on the arm,flow : flow ratio calculated as the volume divided by the highest volume that has beenmeasured in a sliding window of a week.
These samples are captured at 6-minute intervals. The more recent data from 01/01/2013onwards are sampled every minute and are provided by DCC and IBM as a sequence of followingitems
year, month, day, hour, minute: denoting the timestampsite: measurement locationstrategicApproachLinkisLinkdetector index : index of the detectordegreeOfSaturation: flow/capacityflow : current flow value
These samples are used in conjunction with a file, which contains the coordinates. Thedetector index from the sequence refers to the lane number in the detectors.csv file.
These messages, in addition to the information that is maintained after the aggregation tothe SCATS format, includes additional system information that is not used in our analysis. Thisdataset is a sequence of tuples (z,m, t), where z is a geographic location of the observation(the sensor position), m is a metric and t is an integer. The location is either detector index,or a vector consisting of a number of elements, including the GPS coordinates of the detector.The metric m contains:
1available at:http://www.dublinked.ie/datastore/server/FileServerWeb/FileChecker?metadataUUID=a5aaaf4ca2404e0ca02e21fc0bdf1882&filename=SCATS-Dublin.zip
D1.3, Version 1.0, May 2016 12 http://www.vavel-project.eu/
VaVeL H2020-688380
aggerateCount: aggregated vehicles volume count on the arm,flow : flow ratio calculated as the volume divided by the highest volume that has beenmeasured in a sliding window of a week.
Integer t element is the timestamp of the 5 minute interval in POSIX time, i.e. the number ofmicroseconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), 1 January1970.
Data Collection SCATS Region has automated collection of operational and performancedata. Traffic counts are collected on a lane-by-lane basis wherever detectors are installed.Collected data can be sent to the SCATS Central Manager for backup. If there is a failurein the communications with SCATS Central Manager, SCATS Region maintains a queue ofdata until the communications are restored. This ensures that there is no loss of data on theSCATS Central Manager.
SCATS Central Manager manages the connection of up to 64 SCATS Regions (8 regionsconnected on the DCC system) and provides a global view of the whole system. SCATS Regionis the software that is used to manage the traffic signal sites in a region. SCATS primarilymanage the dynamic timing of signal phases at traffic intersections. The system uses sensorsat each traffic intersection to detect vehicle presence in each lane and pedestrian demands.The vehicle sensors are inductive loops installed beneath the road surface.
Metadata XY coordinate data for SCATS intersections. Traffic volumes forintersections. SCATS Picture is the application that is used to create ormodify the site location details and site graphics stored in the SCATSCentral Manager database. Meta data for the site is stored in the LXfiles
Standards SCATS Access version 6.9.2 Copyright c© 2014 Roads and MaritimeServices. SCATS proprietary format is the property of RMS. The formatof the data that has been passed on to the consortium currently canbe made available to 3rd parties. SCATS data stream is provided inJSON format.
D1.3, Version 1.0, May 2016 13 http://www.vavel-project.eu/
VaVeL H2020-688380
InfrastructureImprovements
To add resilience the SCATS Management System has been changedfrom a physical to virtual environment.
SCATS-CMS Virtualised Environment
Quality The SCATS data that is collected is 100% accurate in relation to thedata that it receives from its sensors.
D1.3, Version 1.0, May 2016 14 http://www.vavel-project.eu/
VaVeL H2020-688380
Accessibility Information contained in SCATS Access version 6.9.2 documentationmay be of a commercially sensitive nature and must not be given toany individual or organisation without prior written consent from Roadsand Maritime Services. The SM data from the Dublin system has beensupplied to IBM for processing and translating to an open format whichcan then be of use to the consortium. Access to the SCATS data is viaan AWS cloud based structure. For the VaveL project the inclusion ofother data sources from SCATS will be explored to assess their validityfor inclusion as a data stream to assist in automatic incident detection.Using AWS brought the following advantages: Compute services basedon pay-as-you-use rates, 24x7 management of servers up to OS layerincluded regular back-up schedule. All operating system licenses arebuilt into the price. High resilience as it is running over 2 AvailabilityZones (AZ) and a storage area in AWS S3 (Storage Bucket), SCATSdata stream provided in JSON.
Cloud Services Architecture
Assessable andIntelligible
Associated software produced and/or used in the project maybe assess-able for and Intelligible to third parties in contexts such as scientificscrutiny and peer review (e.g. are the minimal datasets handled to-gether with scientific papers for the purpose of peer review, are data isprovided in a way that judgments can be made about their reliabilityand the competence of those who created them)
D1.3, Version 1.0, May 2016 15 http://www.vavel-project.eu/
VaVeL H2020-688380
Legal Issues andPrivacy
Dublin City Council Law Department has advised that they have no issuewith the release of the data. Data is stored in the SCATS applicationlogs which relates to system changes. No personal/confidential data isstored.
Maintenance Plan The SCATS system is under an annual maintenance contract based onthe software licences used by DCC. This is to ensure that the systemas adequate support to maintain the level of service required.SCATS data backup files for configuration backups have two sub di-rectories LX files containing daily backup of SCATS data which arecreated daily and RAM data backups which are also crested daily.SCATS Region collects data and stores in daily files. Each file includesthe date and time at which the data was collected and the data itself.The collection of data occurs automatically. The data is retained forthe period specified when configuring SCATS Region. The default is365 days. A backup and archival system is in place to ensure thatold data is still accessible outside the SCATS Regions specified dataretention period. Meta data for the site is stored in the LX files
Table 1: SCATS DATA - Management Plan
2.2 Public Transport Data
The street map is represented as a graph, where vertices represent important locations in spacefor a given means of transport (e.g. road intersections for cars). Each edge represents a meansof traversing between the vertices, which can involve actual movement (e.g. between twointersections) or waiting (e.g. at a bus-stop). The graph is illustrated in Figure 4.
Figure 4: The vertex-based transit graph. Cited in verbatim from https://github.com/
openplans/OpenTripPlanner/wiki/GraphStructure.
D1.3, Version 1.0, May 2016 16 http://www.vavel-project.eu/
VaVeL H2020-688380
0 0.2 0.4 0.6 0.8 1
0.6
0.8
1
Utilisation
Tra
velti
me,
scale
d
Piecewise-linear
Piecewise-convex
1
Figure 5: An illustration of a delayfunction, which gives the travel-timealong a segment of a road as a func-tion of its utilisation, i.e. the ratio ofthe number of concurrent users to themaximum thereof.
In principle, the GPS data are a sequence of vectors yz,t, where z is a traffic object, e.g. abus with an on-board GPS receiver, and t is an integer, e.g. the POSIX time of the acquisition.
The overall data model is rather complex, but closely parallels those used by OpenStreetMap,OpenTripPlanner, and the General Transit Feed Specification; we hence direct the reader tothe reference documentation for those.
Our custom extensions to the standard format consist of:
the travel-time estimates, which correspond to the weights of the edges in the graphthe altitude data, which correspond to weights of the vertices in the graph
The travel-time estimates are stored as delay functions and vehicle count data. A delayfunction gives the travel-time along a segment of a road as a function of its utilisation, i.e.the ratio of the number of concurrent users to the maximum thereof. See Figure ?? for anexample. The delay functions are computed from the vehicle-count data (SCATS) and tracesof vehicle movement (Bus GPS) described above.
The vehicle GPS traces are imported from three very different data sources, even in thecase of Dublin. Instead of plain coordinates, there is a more complex data model based onthe General Transit Feed Specification. There, a vehicle Journey (or “route” in GTFS) is aparticular instance of a journeyPattern starting at a given time. A journey Pattern is a sequenceof two or more stops. In between each two stops, there are one or more blocks within a trip(or “segments” in GTFS and elsewhere)2. Notice that the production time table starts at 6amand ends at 3am in Dublin.
The first source of GPS traces captures the movement of buses in Dublin in the periodfrom 01/02/2012 till 30/04/2012 (except the days 10th till 12th February 2012) and containsthe following values:
timestamp: timestamp microseconds since 01/01/1970 00:00:00 GMT,lineId : bus line identifier,direction: a string identifying the direction,journeyPatternId
2Please see https://developers.google.com/transit/gtfs/reference for a detailed reference.
D1.3, Version 1.0, May 2016 17 http://www.vavel-project.eu/
VaVeL H2020-688380
timeFrame: the start date of the production time table (in Dublin the production timetable starts at 6am and ends at 3am),vehicleJourneyId : a given run on the journey pattern,operator : bus operator, not the driver,congestion: boolean value [0=no,1=yes],gpsPos: GPS position of the vehicle,delay : seconds, negative if bus is ahead of schedule,blockId : section identifier of the journey pattern,vehicleId : vehicle identifier,stopId : stop identifier,atStop: boolean value [0=no,1=yes].
The second source of GPS traces captures the movement of buses in Dublin during a partof November 2012 (06/11/2012 till 30/11/2012) and contains tuples of the following elements:
timestamp: timestamp microseconds since 01/01/1970 00:00:00 GMT,lineId : bus line identifier,direction: a string identifying the direction,journeyPatternIdtimeFrame: the start date of the production time table (in Dublin the production timetable starts at 6am and ends at 3am),vehicleJourneyId : a given run on the journey pattern,operator : bus operator, not the driver,congestion: boolean value [0=no,1=yes],gpsPos: GPS position of the vehicle,delay : seconds, negative if bus is ahead of schedule,blockId : section identifier of the journey pattern,vehicleId : vehicle identifier,stopId : stop identifier,atStop: boolean value [0=no,1=yes].
The third source of GPS traces captures the movement of buses in Dublin during January2013 (01/01/2013 till 31/01/2013) and contains tuples of the following elements:
timestamp: timestamp microseconds since 01/01/1970 00:00:00 GMT,lineId : bus line identifier,direction: a string identifying the direction,journeyPatternIdtimeFrame: the start date of the production time table (in Dublin the production timetable starts at 6am and ends at 3am),vehicleJourneyId : a given run on the journey pattern,operator : bus operator, not the driver,congestion: boolean value [0=no,1=yes],gpsPos: GPS position of the vehicle,delay : seconds, negative if bus is ahead of schedule,blockId : section identifier of the journey pattern,vehicleId : vehicle identifier,
D1.3, Version 1.0, May 2016 18 http://www.vavel-project.eu/
VaVeL H2020-688380
stopId : stop identifier,atStop: boolean value [0=no,1=yes].
Data Collection The standardised SIRI Vehicle Monitoring Service reports current positionsof vehicles that are located and monitored in an ITCS. The data receiving client system mayuse this data for visualisation of the vehicles in a map, in tables, lists or diagrams or for anyother purpose
Metadata XY coordinates of the Dublin Bus stops. Distance and route patterns.Standards The system uses SIRI (Service Interface for Real-time Information) protocol.
The Service Interface for Real Time Information (SIRI) specifies a Europeaninterface standard for exchanging information about the planned, currentor projected performance of real-time public transport operations betweendifferent computer systems.
InfrastructureImprovements
The standardised SIRI services work based on bidirectional communication.For security reasons a virtual private network (VPN) is established.
The exchange of visualisation data starts with the subscription request of thedata receiving system (MORTPI). Once the request is done, trip informationis transmitted by the data producer (AVLC) to the data receiving systemthroughout its entire validity period. The method and frequency of repetitionis a matter for the data producer, but can be specified by the displayingsystem in the scope of the subscription.
Quality The Service Interface for Real Time Information (SIRI) specifies a Europeaninterface standard for exchanging information about the planned, currentor projected performance of real-time public transport operations betweendifferent computer systems.
Accessibility Access to this information has been agreed as per the INSIGHT project andthe same platform and access rights will apply for the VaVeL project.
Assessable and In-telligible
Associated software produced and/or used in the project maybe assessablefor and Intelligible to third parties in contexts such as scientific scrutiny andpeer review (e.g. are the minimal datasets handled together with scientificpapers for the purpose of peer review, are data is provided in a way thatjudgments can be made about their reliability and the competence of thosewho created them)
D1.3, Version 1.0, May 2016 19 http://www.vavel-project.eu/
VaVeL H2020-688380
Legal Issues andPrivacy
Dublin City Council has advised that they have no issue with the use of thebus data and the accumulation and storage of this data by a third party fordata distributed.
Maintenance Plan The Public Transport Data and the system which provide the data are coveredunder a maintenance contract. Any changes/ additional requirements to thesystem or to provide access to data will be carried out under the supervision ofthe maintenance contractor and in accordance with the terms and conditionsof the contract to ensure that the integrity of the system and/or the dataprovided by the system is not compromised.Archiving and Preservation: The Public Transport Data systems are anintegral part of the DCC traffic systems and it is our aim to ensure theintegrity of the system and access to the system by third parties as agreednow and into the future is maintained. Any access agreements made fordata during the life time of this project will be done so under the currentand subsequent maintenance contracts. These will be reviewed as part ofany new maintenance contract and every effort will be made to facilitatethe access to data now and into the future. Any changes to the system toallow for expansion or upgrade will be done so in a manner that all stakeholders will be consulted and informed of subsequent changes.
Table 2: Public Transport Data - Management Plan
2.3 Closed-Circuit Television Data
Closed Circuit Television (CCTV) has been in use by Dublin City Council Environment andTransportation department for over 20 years with 280 camera installations at present. The useof traffic cameras is an essential tool for traffic management in the city in conjunction with anadaptive traffic control system, SCATS. Currently the Traffic Control Centre operators use theCCTV cameras to manually scan the traffic network to detect, verify and manage incidents.Selections of cameras are displayed on the Audio Visual wall and there is rotation of the entireCCTV camera list on one IP input. Each operator has access to Indigo Vision on their desktop which can be customized to display CCTV combinations as required. Traffic surveillance isan integral part of the traffic management system and the closer the time of the detectionof the incident to the time of its occurrence the greater the impact the traffic control centreoperator can have in effectively managing it. It would also be in the scope of the research toassess how this CCTV data could be combined with other sensory data from SCATS, weatherdata, public transport data in detecting incidents on the traffic network.
The Traffic CCTV system consists of 2 backend systems running side by side, MeyertechAnalogue CCTV & Indigo Vision IP CCTV. Every Camera in the system is available in bothAnalogue and IP format. This redundancy is to ensure that one system is always available tothe Control Centre. The analogue cameras are available in IP by encoding the stream usingIndigo Vision 10 and the same also applies for IP streams which are decoded using IndigoVision hardware and software to make them available to the Meyertech system. All analoguecameras are compressed to IP via an Indigovision 9000 encoder to H264 format. The codec
D1.3, Version 1.0, May 2016 20 http://www.vavel-project.eu/
VaVeL H2020-688380
facilitates can operate at Cif / 2 Cif / 4 Cif at a variable bandwidth. The current operationstream 1 is set to 2048kbs and the “Mobile Centre” for remote access is set to 1024 kbs stream2. Transmission of images from site is normally by high quality fibre optic cable and usinguncompressed digital transmission equipment, ensuring no errors are introduced to the imageprior to reaching the station equipment in DCC. Where a site has no fibre transmission available,the analogue camera is compressed on site by an Indigo Vision 9000 codec and transmittedto a fibre point via an NGW Express IP VPN Tunnel. The IP from site is transported to theCCTV stack via fibre and the stream is then pointed to an Indigovision decoder which allowsthe video to be viewed on the Meyertech system. Web images from a selection of cameras aremade available on the Dublin City Council web site using Fusion Capture software. The Fusioncapture application provides images to the web site only if the camera is in the “home position”,which is when the camera is zoomed out. These images are updated every 10 minutes whichis dependent on the number of cameras in the cycle when in the home position and are notupdated if the camera is in use by an operator.
Metadata Data on the XY coordinates for the CCTV camera locations has beensupplied to IBM and the consortium.
Standards Standards operated by DCC are currently ONVIF compliant. An advisorynote from the CCTV contrac on ONVIF there are several differentlayers and it does not always work seamlessly. ONVIF standards applyto IP only.
InfrastructureImprovements
It should be noted that the Fusion Capture technology was designed 12years ago. The system design is struggling to meet the availability ofcompatible computer components available today. The system currentlyruns on Windows XP and has some driver issues with the capture card.No updates or patches are available.For fusion capture images it should be noted that several operatorshave the capability to set a preset, this can be set anywhere, zoomed inor out. The Fusion Capture has no capability to know what the camerais looking at or even if the camera has responded to the request to“Goto” Preset 1.As part of the VaVeL project work with the consortium to explore thepossibilities of using the CCTV cameras as a sensor with the aim ofenhancing the functionality to use the camera as a senor where by thecameras can be trained to detect incidents and automatically alert thetraffic control operators that there is an incident on the traffic network.This could result in a faster more efficient response to incidents whichin turn could reduce traffic congestion. To develop this research, videodata that captures the scene for different incident types will be used totrain algorithms to provide incident detection capability.Discussions are currently underway with the CCTV maintenance con-tractor to formalise how the requirements of this project will be includedunder the current maintenance framework agreement and how thesewill be included and documented to provide data for future use.
D1.3, Version 1.0, May 2016 21 http://www.vavel-project.eu/
VaVeL H2020-688380
Quality The Meyertech system provides analogue 1Vpp Video images. Indigotakes a 1Vpp image and compresses it to IP compression Cif / 2 Cif/ 4 Cif variable bandwidth. The Fusion Capture compression is doneby the card in the XP machine compression would be low but detailsunknown.
Accessibility There is an API available as part of the SDK from Indigo vision and this isonly released under an NDA. The SDK contains commercially sensitiveinformation and will not be released or distributed for general use.This issue is currently under discussion with DCC and the maintenancecontractor and currently investigating the option to provide an “isolated”sample of some cameras to the 3rd party until the development iscomplete.
Assessable andIntelligible
The data produced and/or used in the project is usable by third partieseven long time after the collection of the data. CCTV images whichare currently available on the DCC website and the accumulation andstorage of these images by a third party from the DCC website.
Legal Issues andPrivacy
Dublin City Council Law Department has advised that they have noissue with the use of the CCTV images which are currently availableon the DCC website and the accumulation and storage of these imagesby a third party from the DCC website. A select number of camerasare made available to the public on the Dublin City Councils traffichome page. Other agencies with access to the cameras include theother three local authorities in the Dublin Region, Railway ProcurementAgency, Dublin Port Tunnel, Dublin Bus and An Garda Siochana.
D1.3, Version 1.0, May 2016 22 http://www.vavel-project.eu/
VaVeL H2020-688380
Maintenance Plan CCTV is an integral part of the DCC traffic management infrastructureand it is DCCs policy to maintain and enhance its CCTV system as ithas done since 1989.The CCTV system is covered under a maintenance contract whichincludes - CCTV out-station equipment, maintenance of Indigo Visionequipment and Meyertech equipment located at all remote operatoruser sites, maintenance of Wireless and Radio Link communicationequipment, maintenance of all In-station Traffic Control Centre CCTVequipment including recording equipment, hard drives, fans, filters, andmonitors, maintenance / cleaning of CCTV equipment, poles, connec-tions, wiring, and housing sealing, supply, installation, testing, andcommissioning of CCTV cameras, encoders, CCTV monitors and workstations, video recording facilities, CCTV masts and poles, mini pillars,cabinets, including all traffic management and safety requirements as-sociated with the works, supply and installation of CCTV camera polesin Dublin City and environs, supply and installation of all communica-tions equipment associated with CCTV, supply, installation, testing andcommissioning of all equipment and software supplied as part of thecontract.Any changes/ additional requirements to the system or to provide accessto data will be carried out under the supervision of the maintenancecontractor and in accordance with the terms and conditions of thecontract to ensure that the integrity of the system and/or the dataprovided by the system is not compromised.Archiving and Preservation. The CCTV system is an integral part of thetraffic management system and this is envisaged into the future and it isour aim to ensure the integrity of the system and access to the systemby third parties as agreed now and into the future is maintained. Anyaccess agreements made for data during the life time of this project willbe done so under the current and subsequent maintenance contracts.These will be reviewed as part of any new maintenance contract andevery effort will be made to facilitate the access to data now and intothe future. Any changes to the system to allow for expansion or upgradewill be done so in a manner that all stake holders will be consulted andinformed of subsequent changes.
Table 3: Closed-Cicruit Television Data - Management Plan
2.4 Measurements of Weather and Pollution
Ireland’s National Roads Authority (NRA) maintains a network of sensor stations around Dublincity, each of which samples a variety of environmental factors at ten-minute intervals. As partof the initial data-collection effort, we have created a tool which pulls information from thirteen
D1.3, Version 1.0, May 2016 23 http://www.vavel-project.eu/
VaVeL H2020-688380
of these stations into a central database. At present, our focus is on creating a historicalarchive for future exploitation rather than providing the data in real-time, and as such the datais harvested only once per day; this can, of course, be changed at a later date to account forthe project’s evolving requirements. The database also contains meta-information about thevarious data points, allowing human-readable reports to be generated with ease.
The database can be queried using standard SQL. It is currently only accessible from withinIBM, but it can be easily migrated to another location as necessary.
The full list of stations from which these sensor data are drawn is provided in Table 4, whilesome of the more interesting points captured by the database are highlighted in Table 5. Avisualisation of some of this data is shown in Figure 6.
Table 4: NRA stations
Dublin Port Tunnel M1 Drogheda BypassM1 Dublin Airport M11 Bray BypassM4 Enfield M50 Blanchardstown MasterM50 Blanchardstown Slave M50 Dublin AirportM50 Sandyford Bypass Tipping Bucket M50 Sandyford MasterM7 Newbridge Bypass M7 Portlaoise BypassN81 Tallaght
Table 5: Illustrative NRA datapoints
Code Description UnitCL Cloud State Status Code: Clear, Cloud, Cloud and RainPW Present Weather Status Code: 0 (unobstructed) to 99 (tornado)WL Water Layer mmSL Snow Layer mmIL Ice Layer mmRH Relative Humidity %PR Precipitation Total mmRI Rain Intensity mm/hP Pressure hpaT Air Temperature ◦CTS Surface Temperature ◦CVI Visibility mWD Wind Direction ◦
WS Wind Speed m/s
Alternatively, there are weather data available through The Weather Company. In January2016, IBM has acquired The Weather Company’s B2B, mobile and cloud-based web-properties,weather.com, Weather Underground, The Weather Company brand and WSI, its global business-to-business brand. Such data can be accessed, e.g., via wunderground3. These can be queried
3http://www.wunderground.com
D1.3, Version 1.0, May 2016 24 http://www.vavel-project.eu/
VaVeL H2020-688380
Figure 6: NRA data visualisation. Arrows indicate wind speed and direction; heatmap blobsindicate cumulative rain intensity at each station, in mm/h
for weather information at a particular coordinate, e.g. Dublin, posing the following request(the <key> has to be generated in advance by registering to the wunderground website):
http://api.wunderground.com/api/<key>/hourly10day/q/Ireland/Dublin.json
As result, a json object is returned which contains the following fields:
FCTTIME : the time of the weather forecasttemp: the temperaturecondition: the weather condition, e.g. “Rain”icon: an icon to depict on a map, e.g. “rain”icon url : link to an icon for graphical user interfaceshumidity : humidity in percentfeelslike: the perceived temperatureand many undocumented fields.
2.5 Social Media
Further input to the system is provided by Twitter Inc, a social network operating a shortmessaging service. Twitter issues a stream of messages (“tweets”) up to 140 characters long,optionally including one or more “hashtags” - that is, arbitrary words preceded with a hashcharacter, used to denote topics to which the message relates (e.g., #dublin). Tweets mayalso include links to websites and other auxiliary data; see Figure 7 for some examples.
D1.3, Version 1.0, May 2016 25 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata The metadata are described above and at https://www.wunderground.
com/weather/api/
Standards The standards are described at https://www.wunderground.com/
weather/api/
Infrastructure Im-provements
None to be disclosed at the moment
Quality The data quality is discussed at https://www.wunderground.com/
weather/api/
Accessibility IBM has an unrestricted access to both the complete history of data, currentdata, and weather forecasts. This access has not been shared with the otherpartners.
Assessable and In-telligible
The intelligibility issues are discussed at https://www.wunderground.
com/weather/api/
Legal Issues andPrivacy
IBM will make the data available for commercial licensing aiming for along-term availability into the future.
Maintenance Plan a) Archiving and Preservation: IBM aims for long-term preservation of thedata into the future. b) Usable beyond the original purpose for which it wascollected: No.
Table 6: Weather Data - Management Plan
“N3: Heavy delays from the M50 to J3 due to a collision on the R121 at Blan-chardstown SC. Traffic is down to one lane in both directions.”
“M50: Emergency services are at the scene before J5 Finglas. Middle and rightlanes now blocked. Traffic almost back to the M1/M50 roundabout”
Figure 7: Sample Tweets from Live Drive.
The Twitter web application and its public API allow developers to retrieve a substream ofmessages based on a given set of criteria; specifics hashtags, for instance, or tweets producedby a certain user, etc. The stream is a sequence of tweets, which primarily consist of:
tweetId : a unique tweet identifierdate: integer, POSIX time of the tweet publicationtwitterUserId : twitter user identifiercoordinate: geo-localization of tweetmessageText: tweet text.
The stream is indexed by hashtag and clustered according to a given set of criteria (e.g. GPSco-ordinates). The Twitter substream generated within a geographical area of interest can beisolated by following relevant users (e.g., @livedrive) and monitoring certain hashtags (e.g.,#dublin).
Note that the input stream is not limited to users who are already known to the system; alltweets by Twitter users who are publicly tweeting in the area of interest are collected.
D1.3, Version 1.0, May 2016 26 http://www.vavel-project.eu/
VaVeL H2020-688380
In more detail, we can access the following fields in the Twitter stream:
tweetId : a unique Tweet ID, assigned by TwittertwitterUserId : a twitter-ID of Tweeting user. Unique per Twitter account.twitterUserScreenName: mnemonic user name (login name)latitude: geographic latitude of sending devicelongitude: geographic longitude of sending devicemessageText: the actual tweet in raw textual form. It may include non-ASCII characters.messageDate: timestamp of message sending. Format ’YYYY-MM-DD hh:mm:ss’location: tweet location place name, for Gazetteer lookupscountryCode: ISO short country code (two characters)retweetStatusId : referred (tweetId) to embedded retweet (original tweet). 0 if not aretweet, -1 if not set or invalid value.isRetweeted : boolean flag (’y’—’n’) if tweet contained a retweetreplyStatusId : referrer (tweetId) if tweet is-in-reply-to. -1 if not an answer-to tweet.replyUserId : referrer (twitterUserId) to author of original tweet being answered. -1 if notan answer-to tweet.isFavorite: Boolean flag (’y’—’n’) if tweet was marked as favoritefollowersCount: Number of Twitter users currently following tweet authorfollowingCount: Number of Twitter users the tweet author currently follows
Batch data samples are retrieved from the Twitter API using a spatial query. Further forDublin, The Live Drive Radio data set results from Twitter messages sent by people driving inDublin that report traffic hazards to the local radio.
Metadata The metadata are described above and at https://dev.twitter.com/
rest/public
Standards The standards are described above and at https://dev.twitter.com/
rest/public
Infrastructure Im-provements
None to be disclosed at the moment
Quality None to be disclosed at the momentAccessibility In November 2014, IBM Corporation entered into a licensing agreement
with Twitter Inc., which allows for unlimited access to the data by IBM.Limited subset of the data is publicly available at https://dev.twitter.com/rest/public
Assessable and In-telligible
See https://dev.twitter.com/rest/public
Legal Issues andPrivacy
IBM cannot share this access with the consortium.
Maintenance Plan a) Archiving and Preservation: Data may be archived by Twitter Inc. b)Usable beyond the original purpose for which it was collected: Within IBMCorporation.
Table 7: Social Media Data - Management Plan
D1.3, Version 1.0, May 2016 27 http://www.vavel-project.eu/
VaVeL H2020-688380
3 City of Warsaw Data
3.1 Real time trams location
Warsaw operates 25 tram lines. Total trams line length exceeds 360 km. Number of trams inuse is as follows:
Morning peak: 414Mid-day: 313Afternoon peak: 421
Real time Trams location web service exposes information about geographical location of trams.Data set contains information about all vehicles active at the moment. The data is updatedevery 15 seconds. This dataset has been released by the Warsaw Trams.
Metadata (data category from CKAN): real time, trams, online data. Meta datadescribing this dataset are used only in documentation as keywords: e.g.real time, trams, online data. CKAN platform used as middleware forCoW open data exposition supports meta-data in form of RDF but thisfunctionality is currently not used by CoW IT department.
Standards RESTlike Web Services, pooling data refreshing every 15 seconds. CoWtrams location is exposed using RESTlike Web Services, in form of GETHTTP method. Information about trams location is refreshing every 15seconds and must be retrieved by developers using request - response model(pooling). The geographical coordinates are float numbers compliant withEPSG 4326 (WGS 84). Example: 20.992 for long, 51.242 for the latitude.
InfrastructureImprovements
Caching of data period on MUNDO backend was changed from 30 to 15seconds for VaVeL project
Quality The quality of data is currently analyzed by consortium members. Someissues have already been resolved.
Accessibility Publicly available open data after registration and terms and conditionacceptance.
Assessable andIntelligible
Documentation available
Legal Issues andPrivacy
Open data registration on api.um.warszawa.pl needed. Terms of use areavailable at https://api.um.warszawa.pl website.
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Online data available by API. Historicaldata collected in csv files is available at CoW Cloud b) Usable beyond theoriginal purpose for which it was collected not possible
Table 8: Real Time Tram Locations - Management Plan
Real time Trams location web service exposes information about geographical locationof trams. Data set contains information about all active at the moment vehicles. Warsaw
D1.3, Version 1.0, May 2016 28 http://www.vavel-project.eu/
VaVeL H2020-688380
operates 25 tram lines. Average number of vehicles varies from 414 during morning peak, 313in mid-day, to 421 in the afternoon. The data is updated every 15 seconds. This dataset hasbeen released by the Warsaw Trams CoW agency.
Where HTTP response parameters are listed below:
Time - datetime timestampLat - float - latitude (GPS)Lon - float longitude (GPS)FirstLine - string - number of the first line realized by vehicleLines - string - the numbers of all lines (for multiline brigades will be more than one line)Brigade - string - the number of brigadeStatus - string task status can assume values “RUNNING” or “FINISHED”.LowFloor - bool - indicates if the tram is a low floor one 1 Yes, 0 No.
Trams Location historical Data Archival data available to the consortium. Data from 21march 2016 are collected in csv files and available at CoW Cloud.
Time - datetime timestampLat - float - latitude (GPS)Lon - float longitude (GPS)FirstLine - string - number of the first line realized by vehicleLines - string - the numbers of all lines (for multiline brigades will be more than one line)Brigade - string - the number of brigadeFirstLine Brigade - concatenate 2 fields (tram ID for a day)Status - string task status can assume values “RUNNING” or “FINISHED”LowFloor - bool - indicates if the tram is a low floor one 1 Yes, 0 No.
3.2 Bus Data
285 bus linesTotal line length 4379,9 kmNumber of buses in use: 1729 (1366 operated by MZA)Morning peak: 1644Mid-day: 1035Afternoon peak: 1619
Historical Data Archive data from 21 April 2016 are available in CoW cloud. Data format:Side Number(vehicle Number), unix timestamp, latitude GPS, longitude GPS, Line, Brigade
Example: 1525,1461362407,21.170208,52.160407,146,4
3.3 19115 Non-emergency notification system
API enables reporting of various issues to the City by locals and visitors. Issues such as failures,defects and non-critical threats concerning eg. the state of roads, snow removal, damage, actsof vandalism, etc. API also allows users to obtain information filtered by the keys.
Information available:
D1.3, Version 1.0, May 2016 29 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata CKAN: busStandards Websocket interface exposed by MZA (Warsaw’s Buses Authority) accessible
only in internal CoW networkInfrastructureImprovements
None
Quality The quality of data is currently analyzed by consortium members. Someissues have already been resolved.
Accessibility Public data with restricted access. Available only for VaVeL consortiummembers
Assessable and In-telligible
Documentation available
Legal Issues andPrivacy
Public data with restricted access Available only for VaVeL consortiummembers
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) Usable beyond the original purposefor which it was collected not possible.
Table 9: Bus Data - Management Plan
Metadata CKAN keywords: bus, historical dataStandards Flat csv files with bus locationsInfrastructureImprovements
Dedicated data collector was developed
Quality The quality of data is currently analyzed by consortium members. Someissues resolved.
Accessibility Public data with restricted access Available only for VaVeL consortiummembers
Assessable and In-telligible
Documentation available
Legal Issues andPrivacy
Public data with restricted access. Available only for VaVeL consortiummembers.
Maintenance Plan a) Archiving and Preservation: Archiving and Preservation: This data setwill be implemented as source of information for realization of use casesdefined by CoW and stored in data processing system installed in CoW. b)Usable beyond the original purpose for which it was collected not possible.
Table 10: Bus Historical Data - Management Plan
D1.3, Version 1.0, May 2016 30 http://www.vavel-project.eu/
VaVeL H2020-688380
siebelEventId: Event ID in the city CRM systemdeviceType: type of device used to submit notificationstreet: notification street name - this field is only used in notifications registered by CRMoperators.It is validated with the city’s street names dictionary.street2: notification street name field only used in notifications submitted by citizens(ie. notifications generated outside of CRM). Not validated.district: district of the notificationcity: city of the notificationhouseNumber: building number of the notificationaparmentNumber: apartament number of the notificationcategory: notifcication categorysubcategory: notification subcategory(dictionary value, as when reporting)event: the process of intervention (dictionary value, as when reporting)description: notification descriptioncreateDate: creation datenotificationNumber: notification numberxCoordWGS84: Latitude notification in WGS84 standardyCoordWGS84: Longitude notification in WGS84 standardxCoordOracle: Latitude notification in Oracle Spartial standardyCoordOracle: Longitude notification in Oracle Spartail standardnotificationType: (INCIDENT(”Awaria/ Interwencja”), INFORMA-TIONAL(”Informacyjne”), COMPLAINT(”Reklamacja”), STATUS(”Statussprawy/zgoszenia”) PUBLIC INFORMATION(”Wniosek o dostp do informacjipublicznej”), FREEFORM(”Wolne wnioski i uwagi”)statuses - List of notification statuses (changeDate change date, status status, descriptiondescription,Source notification source (API(”API”), CALL(”CALL”), CKM(”CKM”),MAIL(”MAIL”), MOBILE(”MOBILE”), PHONE(”Phone”), PORTAL(”PORTAL”),SMS(”SMS”), WEB(”Web”), WEBCHAT(”WEBCHAT”), EMPTY(”brak”)
3.4 Public transport timetables
Public transport timetables data set is managed by Warsaw’ Public Transport Authority andstored in MySQL database. This information is exposed for developers as an Open API byapi.um.warszawa.pl portal in RESTlike Web Services form. Exposed API allows to obtaininformation about timetables and information about bus or trams lines for the selected stops.
API provides three methods. First of them (getBusstopId) is mandatory for the use of theother and is used to obtain the ID stop identifier. The other two (getTimetable and getLines)are used to obtain data about lines and timetables related with the stop.
Historical data Public transport historical timetables data set is managed by Warsaw’ PublicTransport Authority and stored in MySQL database. This information is exposed for developersas an Open API by api.um.warszawa.pl portal in RESTlike Web Services form. Exposed API
D1.3, Version 1.0, May 2016 31 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data category from CKAN): not emergency issue, real time, online dataStandards RESTlike Web Services, data from Siebel CRM databaseInfrastructureImprovements
Data exposed during MUNDO project, no additional improvements needed
Quality The quality of data is currently analyzed by consortium membersAccessibility Publicly available open data after registration and terms and condition
acceptanceAssessableand Intelligible
Documentation available.
Legal Issues andPrivacy
Open data registration on api.um.warszawa.pl needed. Terms of use areavali-able at https://api.um.warszawa.pl website.
Maintenance Plan a) Archiving and Preservation: This data set will be utilized as source ofinformation for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Via an API there are available historicaldata for approx. 2 month period. b) usable beyond the original purpose forwhich it was collected not possible
Table 11: 19115 Data - Management Plan
Metadata (data categories from CKAN): transport, timetablesStandards RESTlike Web Services, data from ZTM MySQL SQL database exposed by
MUNDO platformInfrastructureImprovements
Data exposed during MUNDO project, no additional improvements needed
Quality The quality of data is currently analyzed by consortium members.Accessibility Publicly available open data after registration and terms and condition
acceptance.Assessableand Intelligible
Documentation available.
Legal Issues andPrivacy
Open data registration on api.um.warszawa.pl needed. Terms of use areavailable at http://www.ztm.waw.pl/?c=628&l=1 website.
Maintenance Plan a) Archiving and Preservation: The actual timetable is available via theAPI. This data set will be utilized as source of information for realization ofuse cases defined by CoW and stored in data processing system installed inCoW. b) usable beyond the original purpose for which it was collected notpossible.
Table 12: Public transport timetables - Management Plan
D1.3, Version 1.0, May 2016 32 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data categories from CKAN): transport, timetables historical dataStandards RESTlike Web Services, data from ZTM MySQL database exposed by
MUNDO platform.Infrastructure Im-provements
New data set implemented for VaVeL project. Redevelopment CKANextensions was made and implemented on MUNDO Data server
Quality The quality of data is currently analyzed by consortium membersAccessibility Public data with restricted access. Available only for VaVeL consortium mem-
bers after registration and terms and condition acceptance. DocumentationAvaliable.
Assessableand Intelligible
Documentation avaliable.
Legal Issues andPrivacy
Restricted access only for VaVeL consortium member
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Via the API the timetable is availablefor the past 4 months b) usable beyond the original purpose for which itwas collected not possible.
Table 13: Public transport historical timetables - Management Plan
allows to obtain information about timetables and information about bus or trams lines for theselected stops in selected time in the past.
3.5 Bus & trams stops locations
Bus & trams stops locations API offers developers the information about current geographicallocation of bus and trams stops in Warsaw. This dataset has been released by the PublicTransport Authority (ZTM).
3.6 Park& Ride
Park & Ride Parking information data set contains information about P&R parking stations inCity of Warsaw for selected geographical areas.
3.7 Bike roads
Park & Ride Parking information data set contains information about bike roads in City ofWarsaw.
3.8 Bike stations location (Veturilo)
Bike stations location (Veturilo) data set contains information about rent-a-bike Veturilostations in City of Warsaw for selected geographical areas.
D1.3, Version 1.0, May 2016 33 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data categories from CKAN): bus, tram, stops, geolocationStandards RESTlike Web ServicesInfrastructureImprovements
New data set implemented for VaVeL project. Redevelopment CKANextensions was made and implemented on MUNDO Data server
Quality The quality of data is currently analyzed by consortium membersAccessibility Public data with restricted access. New data set implemented for VaVeL
project. Redevelopment CKAN extensions was made and implemented onMUNDO Data server
Assessable and In-telligible
Documentation avaliable
Legal Issues andPrivacy
Available only for VaVeL consortium members after registration and termsand condition acceptance. Additional terms of use are available at http:
//www.ztm.waw.pl/?c=628&l=1 website.Maintenance Plan a) Archiving and Preservation: This data set will be implemented as source
of information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. Via API accessible are the descriptionsof actual bus & trams stops parameters, b) usable beyond the originalpurpose for which it was collected not possible.
Table 14: Bus and Trams Stop Locations - Management Plan
Metadata (data categories from CKAN): park & ride, static data, geolocationStandards REST like Web Services, WFSInfrastructureImprovements
None
Quality The quality of the data will be analyzed by consortium members.Accessibility Public data with restricted access.Assessable andIntelligible
Documentation available.
Legal Issues andPrivacy
Public data with restricted access. Terms of usehttp://mapa.um.warszawa.pl/warunki.html
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.
Table 15: Park & Ride - Management Plan
D1.3, Version 1.0, May 2016 34 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data categories from CKAN): bike roads, static data, vector map, WFSStandards RESTlike Web Services, WFSInfrastructureImprovements
None
Quality The quality of the data will be analyzed by consortium membersAccessibility Public data with restricted access.Assessableand Intelligible
Documentation available.
Legal Issues andPrivacy
Public data with restricted access. Terms of usehttp://mapa.um.warszawa.pl/warunki.html
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.
Table 16: Bike Roads - Management Plan
Metadata (data categories from CKAN): city bike, static data, VeturiloStandards RESTlike Web Services, WFSInfrastructureImprovements
None
Quality The quality of the data must be analyzed by consortium membersAccessibility Public data with restricted access.Assessable andIntelligible
Documentation avaliable, WFS standard documentation is publicly availablehttp://www.opengeospatial.org/standards/wfs
Legal Issues andPrivacy
Public data with restricted access
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.
Table 17: City bike stations - Management Plan
D1.3, Version 1.0, May 2016 35 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data categories from CKAN):metro entrances, static dataStandards RESTlike Web Services, WFSInfrastructureImprovements
None
Quality The quality of the data will be analyzed by the consortium members.Accessibility Public data with restricted access.Assessable andIntelligible
Documentation avaliable. WFS standard documentation is publicly availablehttp://www.opengeospatial.org/standards/wfs
Legal Issues andPrivacy
Public data with restricted access. Terms of usehttp://mapa.um.warszawa.pl/warunki.html
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.
Table 18: Metro entrances - Management Plan
3.9 Metro Entrances
Metro Entrances data set exposes information about metro entrances in Warsaw. API allowsto retrieve information for selected geographical area and filter data based on defined keys.Access to data is based on Web Feature Service (WFS) standard defined by Open GeospatialConsortium dedicated for exposition geospatial information in vector maps form. http:
//www.opengeospatial.org/standards/wfs.
3.10 Address points
Address points data set offers information on addresses in City of Warsaw for a selectedgeographical area. API allows to retrieve information for the selected geographical area andfilter data based on defined keys. Access to data is based on Web Feature Service (WFS)standard defined by Open Geospatial Consortium dedicated for the exposition geospatialinformation in vector maps form. Dataset Address points is maintained by Office of Surveyingand Cadastre (BGiK) City of Warsaw and exposed using a URL (endpoint).
3.11 Streets
This data set exposes information about location of streets in the City of Warsaw. An APIallows to retrieve information for a selected geographical area and filter data based on definedkeys. Access to data is based on Web Feature Service (WFS) standard defined by OpenGeospatial Consortium dedicated for exposition geospatial information in vector maps form.The dataset Streets is maintained by Office of Surveying and Cadastre (BGiK) City of Warsawend exposed using a URL.
D1.3, Version 1.0, May 2016 36 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data categories from CKAN):address points, static data, geolocationStandards RESTlike Web Services, WFSInfrastructureImprovements
None
Quality The quality of the data will be analyzed by consortium members.Accessibility Public data with restricted access.Assessableand Intelligible
Documentation available. WFS standard documentation is publicly available.http://www.opengeospatial.org/standards/wfs
Legal Issues andPrivacy
Public data with restricted access
Maintenance Plan a) Archiving and Preservation: This data set will be utilized as source ofinformation for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.
Table 19: Address points - Management Plan
Metadata (data categories from CKAN):streets, static data, geolocationStandards RESTlike Web Services, WFSInfrastructure Im-provements
None
Quality The quality of the data will be assessed by the consortium membersAccessibility Public data with restricted access.Assessable andIntelligible
WFS standard documentation is publicly available http://www.
opengeospatial.org/standards/wfs
Legal Issues andPrivacy
Public data with restricted access
Maintenance Plan a) Archiving and Preservation: This data set will be utilized as source ofinformation for realization of use cases defined by CoW and stored in a dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected not possible.
Table 20: Streets - Management Plan
D1.3, Version 1.0, May 2016 37 http://www.vavel-project.eu/
VaVeL H2020-688380
Figure 8: ZTM-Warsaw’s Twitter Account
3.12 City of Warsaw Twitter
ZTM Public Transport Authority publishes information about public transport on Twitterhttps://twitter.com/ztm_warszawa. Primarily it is information about the failures of publictransport, the resolution of the failure and sudden timetable changes resulting from unplannedevents (demonstrations, accidents, etc.).
Standards Data from Twitter is available using Twitter Open API. Details can be foundon page: https://dev.twitter.com/overview/documentation. Twitter offers two OpenAPI sets for developers:
REST API mostly dedicated for off-line access - https://dev.twitter.com/rest/
public
Stream API dedicated for real time data access - https://dev.twitter.com/
streaming/overview
Access to Twitter API requires a developer account (OAuth protocol credential needed) andapplication in Twitter developers portal (https://dev.twitter.com/apps).
3.13 RSS services
Public Transport Authority runs RSS (Rich Site Summary) service information about publictransport. RSS contains 5 main categories:
NewsPress releasesChanges in public transportPublic procurementDifficulties
D1.3, Version 1.0, May 2016 38 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata (data categories from CKAN):hashtagsStandards RESTlike Web Services, WFSInfrastructureImprovements
External data provider - N/A
Quality The quality of the data will be assessed by VaVeL’s consortium members.Accessibility Private data with open access. Data publicly available after registration
Assessable andIntelligible
Public available open data delivered by Warsaw Public Transport Authority
Legal Issues andPrivacy
Twitter Term and Conditions acceptance needed: https://dev.twitter.
com/overview/terms/agreement-and-policy
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored indata processing system installed in CoW.
Table 21: Twitter Data - Management Plan
Metadata RSS (XML) meta-dataStandards RSS, multiple feeds: News: http://www.ztm.waw.pl/rss.php?l=
1&IDRss=1 Press releases: http://www.ztm.waw.pl/rss.php?l=1&
IDRss=2 Changes in public transport: http://www.ztm.waw.pl/rss.
php?l=1&IDRss=3 Public procurement: http://www.ztm.waw.pl/rss.
php?l=1&IDRss=4 Changes in public transport: http://www.ztm.waw.
pl/rss.php?l=1&IDRss=6
InfrastructureImprovements
Data is delivered by Warsaw Public Transport Authority. Changes are notpossible.
Quality The quality of the data will be assessed by VaVeL’s consortium members.Accessibility Open Data exposed in Internet for everyone Publicly available open data
delivered by Warsaw Public Transport Authority.
Assessable and In-telligible
RSS is well known information exposition standard, polish language is usedin RSS information.
Legal Issues andPrivacy
Open Data
Maintenance Plan a) Archiving and Preservation: This data set will be implemented as sourceof information for realization of use cases defined by CoW and stored in adata processing system installed in CoW.
Table 22: RSS services - Management Plan
D1.3, Version 1.0, May 2016 39 http://www.vavel-project.eu/
VaVeL H2020-688380
Metadata None.Standards XML data exposed using dedicated URL.InfrastructureImprovements
Data is delivered by Warsaw Public Bikes operator (external system).Changes are not possible.
Quality The quality of the data will be assessed by the VaVeL consortium.Accessibility Data exposed in Internet for everyone. Publicly available data delivered by
Warsaw Public Bikes operator.Assessable andIntelligible
XML file structure is clear. However there is no API documentation.
Legal Issues andPrivacy
Public data. Unfortunately API usage terms and conditions are currentlynot accessible on the Nextbike web page.
Maintenance Plan a) Archiving and Preservation: This data set will be utilized as a source ofinformation for realization of use cases defined by CoW and stored in dataprocessing system installed in CoW. b) usable beyond the original purposefor which it was collected: depends on confirmation from Warsaw PublicBikes operator Nextbike.
Table 23: Veturilo stations - Management Plan
3.14 Veturilo stations (Warsaw City Bike system)
Warsaw’s City Bike system (Veturilo) exposes an API that contains information about bikesaccessibility in Veturilo stations. Warsaw Public Bikes near real time information is provided bythe portal http://nextbike.net (data refreshing every 1 minute).
3.15 Orange subscribers location statistics
Dataset of mobile subscriber’s location statistics contains statistical information on the amountof terminals communicated with given cells of the Public Land Mobile Network (PLMN).
Subscriber activity is detected on the basis of network events (13 different events aretaken into account) that are triggered together with voice and xMS communication. Inactiveterminals are periodicaly updated accordingly to network & terminal settings (usually 1-2 hrs).For the VaVeL project samples will be delivered of data from urban area for selected cellslocated in Warsaw, for a defined period of time. The raw stream of data from mobile cells inWarsaw is between 300 and 400 events per second. Volume of raw data is between 18 and 20M events for time period 24 hours for Warsaw area (data from about 6000 cells). Statisticinformation are collected in csv files. Average file size with aggregate of events from 24 hoursfor Warsaw area is about 8-9 MB.
Metadata None
D1.3, Version 1.0, May 2016 40 http://www.vavel-project.eu/
VaVeL H2020-688380
Standards Mobile data statistics are calculated and collected by dedicated networksystem based on the events from MSS (Mobile Switching Centre Server).Statistics are provided in form of flat csv files. File names have the fol-lowing form: statistics hours YYYY-MM-DD.csv File is generated daily at1:00 am and contains data from previous day. For example, a file namedstats hours 2016-05-04.csv is created on May 5 at 1:00 AM and includesstatistics from 4th May. Data structure in files contains the followingcolumns: date & hour,x,y,numer of events, where csv file columns are listedbellow:
number of events - numeric (1,10) - Unique amount of MSISDNdetected in given periodx - float - latitude (GPS) center of celly - float longitude (GPS) center of cellradius - Int cell radiusdate & hour date and hour - (e.g. - 2016-05-05 01:00:00 means2016-05-05 between 0:00 am and 1:00 am)
InfrastructureImprovements
To expose the data described in this chapter a re-development of events-collecting system was performed. The changes include:
automation of statistics recordingdata compressionautomation of historical data cleaning
The system is used for data collection is the pre-production instance andcontains numerous restrictions e.g. limited storage and limited performance.
Quality Because of the aforementioned limitations there is a possibility that not allevents from all cells will be reported. The test instance due to a single nodearchitecture cannot provide high values of SLA (redundancy mechanismnot implemented). Since the events generation mechanism related to TDMevents, not all subscribers activities are reported (e.g. statistics might notcontain information about mobile data usage).
Accessibility For data analysis this data set will be send via e-mail as an encrypted attach-ment to consortium leader Password will be send separated communicationchannel (e.g. SMS). Because of polish telecommunication law restrictionand internal Orange Polska regulation this data set can be used only byconsortium members for the project VaVeL. Open Access and any sharingthis dataset with 3rd parties is prohibited.
D1.3, Version 1.0, May 2016 41 http://www.vavel-project.eu/
VaVeL H2020-688380
Assessable andIntelligible
Raw data sharing with other parties (not consortium members) is currentlynot possible because of telecommunication law (Operators cannot processany telecommunication data without the special and clear consent fromend user of terminal) and internal Orange regulations (privacy policy donot allow for sharing any data which can deliver business information aboutnetwork).Statistics used for VaVeL project are prepared based on a validated mech-anism. Based on all available network events on MSS we calculate theamount of unique MSISDNS appeared in defined cell in defined quantumof time (most often one hour). Mechanism is simple without any otherconditions so the data are not performed and calibrated. But based on lawregulation - this data aggregation procedure is not reversible. Also if data isexposed close to real time there are additional restrictions in which statisticslower than 10 for cell cannot be displayed. But in case of VaVeL project thisis not implemented because we transfer historical location statistics. Thisdata set is well documented in a data manual provided to the consortium.
D1.3, Version 1.0, May 2016 42 http://www.vavel-project.eu/
VaVeL H2020-688380
Legal Issues andPrivacy
Based on current regulations in Polish Telco Law and EU directives concern-ing the topic of telco transmission data processing we include below somepoints that require attention. Based on the above documents we outlinesome basic rules related to operators using data:
1. In particular, concerning personal data, always should be considered thehighest protection required by the law principles (TelecommunicationLaw (TL), Law on the protection of personal data (LPPD), EUdirectives and in particular to 2002/58/EC directive on e-Privacy and95/46/EC directive on personal data protection)
2. For processing of personal data, the consent of the data subject isrequired.
3. Anonymised data can be used without consent of the subject only ifthey are aggregated from the first step of processing and can not beassociated to an individual.
4. Everything what can be legally conducted by OLP can be also per-formed by subcontractors (based on appropriate contracts). Usingpersonal data by other companies should be specified in the providedconsent provided by the subject for data usage.
Law protecting the privacy of operators data do not make any extraordinaryexceptions concerning R&D area. So for VaVeL project rules are the sameas in case of creating any other operators data usage for other than deliveryof telco services ordered by end user or terminal owner. Based on that weinvestigate and deploy some techniques allowing the calculation of statisticsallowed by Law in the closest network area MSS.That’s why we will be able to share with other participants only thoseaggregated statistics which are safe from a legal point of view. Thismechanism of statistics calculation was investigated with some proof ofconcept projects and we evaluated their reliability for being used as a sourceof location statistics. We prepare some extrapolation and compare operatorsdata with calculations made based on optic sigh (based on camera).On the other hand we also believe that, even aggregated statistics, datataken from all users have better quality and value, than data taken fromthose who only give consent. Users that provide consent for using locationdata are actually a sub-sample but the mechanism can not be treated as“random selection” and we are not able to predict the bias of this factor inthese data (especially in case of small areas of observation).
Maintenance Plan a) Archiving and Preservation: Finally Orange Mobile Subscriber’s Locationstatistics will be implemented as source of information for realization ofuse cases defined by CoW and stored in data processing system installed inCoW.b) usable beyond the original purpose for which it was collected not possible
D1.3, Version 1.0, May 2016 43 http://www.vavel-project.eu/
VaVeL H2020-688380
Table 24: Orange subscribers location statistics - Management Plan
D1.3, Version 1.0, May 2016 44 http://www.vavel-project.eu/