+ All Categories
Home > Documents > Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30...

Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30...

Date post: 19-Apr-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
117
Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino (UNIMIB), Matteo Palmonari (UNIMIB), Nikolay Nikolov (SINTEF) Contributors: ALL Reviewers: Titi Roman(SINTEF), Fernando Perales (JOT) Distribution: PU Grant n. 732590 - H2020-ICT-2016-2017/H2020-ICT-2016-1
Transcript
Page 1: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

DataManagementPlan

Deliverablen: 2.1Date: 30June2017Status: FinalVersion: 1.0Authors: AngeloMarguglio(ENG),AndreaMaurino(UNIMIB),MatteoPalmonari

(UNIMIB),NikolayNikolov(SINTEF)Contributors: ALL

Reviewers: TitiRoman(SINTEF),FernandoPerales(JOT)Distribution: PU

Grantn.732590-H2020-ICT-2016-2017/H2020-ICT-2016-1

Page 2: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 2

HistoryofChanges

Version Date Description Revised by

0.2 23/02/2017 Outlineandmainsectionsaddedtothedeliverable

AngeloMarguglio(ENG)

0.3 09/05/2017 Updateddocumenttemplate AngeloMarguglio(ENG)

0.4 21/05/2017 FollowingcommentsfromMatteoPalmonariUNIMIB),providedfurtherdetailson§5,§6,and§7.4.

AngeloMarguglio(ENG)

0.5 24/05/2017 Assignedmainroles AngeloMarguglio(ENG)

0.6 29/05/2017 FirstsampledescriptionofJOTdatasetstobeusedinBC4

AngeloMarguglio(ENG)

0.7 02/06/2017 Minorrevisions AngeloMarguglio(ENG)

0.8 08/06/2017 Updateddatasetdescriptionstructure;addedBC4description;integratedcontributionsfromSINTEF

AngeloMarguglio(ENG),NikolayNikolov(SINTEF)

0.9 13/06/2017 Changedorderof“EthicsandLegalCompliance”and“ProjectDataManagement”chapters;added“EthicsandLegalrequirements”paragraph;addedlegalrequirementsregardingpersonaldata;addedmappingtableDataset-BC.Deliverablereadyforpeerreviewing.

AngeloMarguglio(ENG),AndreaMaurino(UNIMIB),NikolayNikolov(SINTEF)

1.0 28/06/2017 Deliverablerevisedaccordingto AngeloMarguglio(ENG),

Page 3: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 3

reviewer’scomment:addedmoreinformationabouttheGDPRnormativeinthegeneralsection.Addedlastinputtocompletedatasetdescription.AddedexplanationaboutInteroperabilityandVocabularyinEW-Shoppproject.Finalcheckbyprojectcoordinator.

MatteoPalmonari(UNIMIB)

Page 4: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 4

Executivesummary

EW-Shoppaimsat supporting companiesoperating in the fragmentedEuropeanecosystemof theeCommerce, Retail and Marketing industries to increase their efficiency and competitiveness byleveragingdeepcustomerinsightsthataretoochallengingforthemtoobtaintoday.Theintegrationof public and private data collected by different business partnerswill ensure to cover customerinteractions and activities across different channels, providing insights on rich customer journeys.These integrated data will be further enriched with information about weather and events, twocrucialfactorsimpactingconsumerchoices.Torealizetheseobjectives,aplatform,alsoreferredtoasEW-Shoppplatform,willbebuilt.

TheDataManagementPlan(DMP)reportsonthedatathatEW-Shoppprojectwilluseandgenerateduringitslife,fromthesetupoftheEW-ShoppPlatformtothebusinessexploitationofitsservices.

Thedeliverable, following theHorizon2020guidelines1, defines thegeneral approach thatwill beadopted in thecontextofEW-Shoppproject in termsofdatamanagementpolicies. InaccordancewiththeseGuidelines,thisdeliverablewillincludeinformationaboutthehandlingofdataduringandaftertheendoftheproject,reservingaparticularattentiontothemethodologyandstandardstobeapplied.

InadditiontotheguidelinesprovidedbytheEuropeanCommission,thisdocumentalsoreferstotheplan to address the legal and ethical issues related to data that will be collected, in closecollaborationwith theactivitiesundertakenby theEW-ShoppEthicsAdvisoryBoardand themainoutcomesfromWP7.

The deliverable describes the approach established in EW-Shopp to ensure the life-cyclemanagement of the public and proprietary datasets provided by the consortiummembers to theprojectaswellasotherdatasetproducedbytheConsortiumduringtheprojectexecution.

Inparticular,thisreportdescribesrules,bestpracticesandstandardsusedwithregardtomakethedata findable, accessible, interoperable and reusable (FAIR data) and the process to collect andmanagedataincompliancewithethicalandlegalrequirements.Thedeliverableincludesahigh-leveldescription of the four business cases (BC1: Bing Bang, Ceneje, and Browsetel; BC2: GfK, BC3:Measurence; BC4: Jot Internet Media) and descriptions of the datasets provided for EW-Shoppproject,whichaimtodetailidentification,origin,format,access,securityofthedataandtotakeintoaccountlegalandethicsrequirements.

1EuropeanCommission,Directorate-GeneralforResearch&Innovation(26July2016).GuidelinesonFAIRDataManagementinHorizon2020.Retrievedfromhttp://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

Page 5: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 5

TableofContent

HistoryofChanges.................................................................................................................................2

Executivesummary................................................................................................................................4

TableofContent....................................................................................................................................5

ListofTables12

Acronymstable....................................................................................................................................16

Chapter1 Introduction...................................................................................................................17

1.1 PrinciplesunderlyingEW-ShoppDMP....................................................................................18

1.2 GeneralApproach...................................................................................................................18

1.3 Applicabledocumentsandreferences...................................................................................19

1.4 Updatesofthisdeliverable.....................................................................................................20

Chapter2 ProjectDataManagement.............................................................................................21

2.1 Projectpurposes.....................................................................................................................21

2.2 Projectdata............................................................................................................................21

2.3 Audience.................................................................................................................................22

2.4 Rolesandresponsibilities.......................................................................................................22

Chapter3 EthicsandLegalCompliance..........................................................................................25

3.1 Legalrequirementsregardingpersonaldata..........................................................................25

3.1.1 Coreconcepts................................................................................................................25

3.1.2 FundamentalPrinciples..................................................................................................28

3.1.3 Notificationprocessanddataprotectionimpactassessment.......................................30

3.1.4 NotificationprocessinEW-Shoppproject.....................................................................31

3.2 Ethicsrequirementsregardingtheinvolvementofhumanrights..........................................32

3.3 IntellectualPropertyRights....................................................................................................32

Chapter4 BusinessCasehigh-leveldescription.............................................................................33

4.1 BingBang,CENEJE(BC1).........................................................................................................33

4.2 GfK(BC2).................................................................................................................................34

4.3 Measurence(BC3)..................................................................................................................34

4.4 JOT(BC4).................................................................................................................................35

Page 6: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 6

Chapter5 EW-ShoppMethodologyforDMP..................................................................................37

5.1 ElementsofEW-ShoppDataManagementPlan....................................................................37

5.1.1 DatasetIDENTIFICATION................................................................................................38

5.1.2 DatasetORIGIN..............................................................................................................38

5.1.3 DatasetFORMAT............................................................................................................38

5.1.4 DatasetACCESS..............................................................................................................39

5.1.5 DataSECURITY...............................................................................................................42

5.2 Processtocollectdatasetdetails...........................................................................................43

Chapter6 Datasetdescription........................................................................................................44

6.1 CEDataset-ConsumerData:PurchaseIntent.......................................................................44

6.1.1 DatasetIDENTIFICATION................................................................................................44

6.1.2 DatasetORIGIN..............................................................................................................44

6.1.3 DatasetFORMAT............................................................................................................45

6.1.4 DatasetACCESS..............................................................................................................46

6.1.5 DatasetSECURITY...........................................................................................................46

6.1.6 EthicsandLegalrequirements.......................................................................................47

6.2 MEDataset-ConsumerData:Locationanalyticsdata(Hourly).............................................47

6.2.1 DatasetIDENTIFICATION................................................................................................47

6.2.2 DatasetORIGIN..............................................................................................................48

6.2.3 DatasetFORMAT............................................................................................................48

6.2.4 DatasetACCESS..............................................................................................................48

6.2.5 DatasetSECURITY...........................................................................................................49

6.2.6 EthicsandLegalrequirements.......................................................................................49

6.3 MEDataset-ConsumerData:Locationanalyticsdata(Daily)...............................................50

6.3.1 DatasetIDENTIFICATION................................................................................................50

6.3.2 DatasetORIGIN..............................................................................................................50

6.3.3 DatasetFORMAT............................................................................................................51

6.3.4 DatasetACCESS..............................................................................................................51

6.3.5 DatasetSECURITY...........................................................................................................52

6.3.6 EthicsandLegalrequirements.......................................................................................52

6.4 BBDataset-ConsumerData:CustomerPurchaseHistory.....................................................53

6.4.1 DatasetIDENTIFICATION................................................................................................53

6.4.2 DatasetORIGIN..............................................................................................................53

Page 7: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 7

6.4.3 DatasetFORMAT............................................................................................................54

6.4.4 DatasetACCESS..............................................................................................................54

6.4.5 DatasetSECURITY...........................................................................................................55

6.4.6 EthicsandLegalrequirements.......................................................................................55

6.5 BBDataset-ConsumerData:ConsumerIntentandInteraction............................................55

6.5.1 DatasetIDENTIFICATION................................................................................................55

6.5.2 DatasetORIGIN..............................................................................................................56

6.5.3 DatasetFORMAT............................................................................................................56

6.5.4 DatasetACCESS..............................................................................................................57

6.5.5 DatasetSECURITY...........................................................................................................57

6.5.6 EthicsandLegalrequirements.......................................................................................58

6.6 MEDataset-ConsumerData:Locationanalyticsdata(Weekly)...........................................58

6.6.1 DatasetIDENTIFICATION................................................................................................58

6.6.2 DatasetORIGIN..............................................................................................................58

6.6.3 DatasetFORMAT............................................................................................................59

6.6.4 DatasetACCESS..............................................................................................................59

6.6.5 DatasetSECURITY...........................................................................................................60

6.6.6 EthicsandLegalrequirements.......................................................................................60

6.7 BTDataset-CustomerCommunicationData:ContactandConsumerInteractionHistory...61

6.7.1 DatasetIDENTIFICATION................................................................................................61

6.7.2 DatasetORIGIN..............................................................................................................62

6.7.3 DatasetFORMAT............................................................................................................62

6.7.4 DatasetACCESS..............................................................................................................64

6.7.5 DatasetSECURITY...........................................................................................................64

6.7.6 EthicsandLegalrequirements.......................................................................................65

6.8 ECMWFDataset-Weather:MARSHistoricalData.................................................................65

6.8.1 DatasetIDENTIFICATION................................................................................................65

6.8.2 DatasetORIGIN..............................................................................................................65

6.8.3 DatasetFORMAT............................................................................................................66

6.8.4 DatasetACCESS..............................................................................................................66

6.8.5 DatasetSECURITY...........................................................................................................67

6.8.6 EthicsandLegalrequirements.......................................................................................67

6.9 CEDataset-ProductsandCategories:ProductAttributes.....................................................67

Page 8: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 8

6.9.1 DatasetIDENTIFICATION................................................................................................67

6.9.2 DatasetORIGIN..............................................................................................................68

6.9.3 DatasetFORMAT............................................................................................................68

6.9.4 DatasetACCESS..............................................................................................................69

6.9.5 DatasetSECURITY...........................................................................................................69

6.9.6 EthicsandLegalrequirements.......................................................................................70

6.10 JSIDataset-Media:EventRegistry....................................................................................70

6.10.1 DatasetIDENTIFICATION...........................................................................................70

6.10.2 DatasetORIGIN..........................................................................................................70

6.10.3 DatasetFORMAT.......................................................................................................71

6.10.4 DatasetACCESS..........................................................................................................71

6.10.5 DatasetSECURITY......................................................................................................72

6.10.6 EthicsandLegalrequirements...................................................................................72

6.11 GfKDataset-Consumerdata:Consumerdata...................................................................73

6.11.1 DatasetIDENTIFICATION...........................................................................................73

6.11.2 DatasetORIGIN..........................................................................................................73

6.11.3 DatasetFORMAT.......................................................................................................74

6.11.4 DatasetACCESS..........................................................................................................74

6.11.5 DatasetSECURITY......................................................................................................75

6.11.6 EthicsandLegalrequirements...................................................................................75

6.12 GfKDataset-Marketdata:Salesdata...............................................................................75

6.12.1 DatasetIDENTIFICATION...........................................................................................75

6.12.2 DatasetORIGIN..........................................................................................................76

6.12.3 DatasetFORMAT.......................................................................................................76

6.12.4 DatasetACCESS..........................................................................................................77

6.12.5 DatasetSECURITY......................................................................................................77

6.12.6 EthicsandLegalrequirements...................................................................................78

6.13 GfKDataset–Products&Categories:Productattributes..................................................78

6.13.1 DatasetIDENTIFICATION...........................................................................................78

6.13.2 DatasetORIGIN..........................................................................................................79

6.13.3 DatasetFORMAT.......................................................................................................79

6.13.4 DatasetACCESS..........................................................................................................80

6.13.5 DatasetSECURITY......................................................................................................81

Page 9: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 9

6.13.6 EthicsandLegalrequirements...................................................................................81

6.14 MEDataset-ConsumerData:Doorcounterdata..............................................................81

6.14.1 DatasetIDENTIFICATION...........................................................................................81

6.14.2 DatasetORIGIN..........................................................................................................82

6.14.3 DatasetFORMAT.......................................................................................................82

6.14.4 DatasetACCESS..........................................................................................................83

6.14.5 DatasetSECURITY......................................................................................................83

6.14.6 EthicsandLegalrequirements...................................................................................83

6.15 BBDataset-ProductsandCategories:ProductAttributes................................................84

6.15.1 DatasetIDENTIFICATION...........................................................................................84

6.15.2 DatasetORIGIN..........................................................................................................84

6.15.3 DatasetFORMAT.......................................................................................................85

6.15.4 DatasetACCESS..........................................................................................................85

6.15.5 DatasetSECURITY......................................................................................................86

6.15.6 EthicsandLegalrequirements...................................................................................86

6.16 CEDataset-Marketdata:Productspricehistory..............................................................86

6.16.1 DatasetIDENTIFICATION...........................................................................................86

6.16.2 DatasetORIGIN..........................................................................................................87

6.16.3 DatasetFORMAT.......................................................................................................87

6.16.4 DatasetACCESS..........................................................................................................88

6.16.5 DatasetSECURITY......................................................................................................88

6.16.6 EthicsandLegalrequirements...................................................................................89

6.17 MEDataset-ConsumerData:Salesdata...........................................................................89

6.17.1 DatasetIDENTIFICATION...........................................................................................89

6.17.2 DatasetORIGIN..........................................................................................................89

6.17.3 DatasetFORMAT.......................................................................................................90

6.17.4 DatasetACCESS..........................................................................................................90

6.17.5 DatasetSECURITY......................................................................................................91

6.17.6 EthicsandLegalrequirements...................................................................................91

6.18 JOTDataset-Consumerdata:Trafficsource(Bing)...........................................................91

6.18.1 DatasetIDENTIFICATION...........................................................................................91

6.18.2 DatasetORIGIN..........................................................................................................92

6.18.3 DatasetFORMAT.......................................................................................................92

Page 10: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 10

6.18.4 DatasetACCESS..........................................................................................................93

6.18.5 DatasetSECURITY......................................................................................................94

6.18.6 EthicsandLegalrequirements...................................................................................94

6.19 JOTDataset-Consumerdata:Trafficsource(Google)......................................................94

6.19.1 DatasetIDENTIFICATION...........................................................................................94

6.19.2 DatasetORIGIN..........................................................................................................95

6.19.3 DatasetFORMAT.......................................................................................................95

6.19.4 DatasetACCESS..........................................................................................................96

6.19.5 DatasetSECURITY......................................................................................................97

6.19.6 EthicsandLegalrequirements...................................................................................97

6.20 JOTDataset-Marketdata:Twittertrends.........................................................................98

6.20.1 DatasetIDENTIFICATION...........................................................................................98

6.20.2 DatasetORIGIN..........................................................................................................98

6.20.3 DatasetFORMAT.......................................................................................................98

6.20.4 DatasetACCESS..........................................................................................................99

6.20.5 DatasetSECURITY....................................................................................................100

6.20.6 EthicsandLegalrequirements.................................................................................100

6.21 LODDataset-Geographic:DBpedia.................................................................................100

6.21.1 DatasetIDENTIFICATION.........................................................................................100

6.21.2 DatasetORIGIN........................................................................................................101

6.21.3 DatasetFORMAT.....................................................................................................101

6.21.4 DatasetACCESS........................................................................................................102

6.21.5 DatasetSECURITY....................................................................................................102

6.21.6 EthicsandLegalrequirements.................................................................................103

6.22 LODDataset-Geographic:LinkedOpenStreetMaps......................................................103

6.22.1 DatasetIDENTIFICATION.........................................................................................103

6.22.2 DatasetORIGIN........................................................................................................103

6.22.3 DatasetFORMAT.....................................................................................................104

6.22.4 DatasetACCESS........................................................................................................105

6.22.5 DatasetSECURITY....................................................................................................105

6.22.6 EthicsandLegalrequirements.................................................................................106

6.23 LODDataset-Geographic:LinkedGeoData....................................................................106

6.23.1 DatasetIDENTIFICATION.........................................................................................106

Page 11: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 11

6.23.2 DatasetORIGIN........................................................................................................106

6.23.3 DatasetFORMAT.....................................................................................................107

6.23.4 DatasetACCESS........................................................................................................107

6.23.5 DatasetSECURITY....................................................................................................108

6.23.6 EthicsandLegalrequirements.................................................................................108

6.24 LODDataset-Geographic:GeoNames............................................................................108

6.24.1 DatasetIDENTIFICATION.........................................................................................108

6.24.2 DatasetORIGIN........................................................................................................109

6.24.3 DatasetFORMAT.....................................................................................................109

6.24.4 DatasetACCESS........................................................................................................110

6.24.5 DatasetSECURITY....................................................................................................110

6.24.6 EthicsandLegalrequirements.................................................................................111

6.25 MappingbetweenDatasetandBusinesscase.................................................................111

Chapter7 StorageandRe-use......................................................................................................112

7.1 Storage..................................................................................................................................112

7.2 BackupandRecovery............................................................................................................112

7.3 DataArchiving......................................................................................................................112

7.4 Security.................................................................................................................................113

7.5 Permission............................................................................................................................113

7.6 Access,Re-useandLicensing................................................................................................113

AnnexA–DMPSurvey......................................................................................................................115

Page 12: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 12

ListofTables

TABLE1.ABBREVIATIONSANDACRONYMS.....................................................................................................................16TABLE2.SHORTREFERENCESFORPROJECTPARTNERS.......................................................................................................19TABLE3.ROLESANDRESPONSIBILITIESOFBENEFICIARIES..................................................................................................23TABLE4CORECONCEPTS-EUROPEANDATAPROTECTIONLEGISLATION...............................................................................26TABLE5.DATASETIDENTIFICATION–PURCHASEINTENT...........................................................................................44TABLE6.DATASETORIGIN–PURCHASEINTENT..........................................................................................................44TABLE7DATASETFORMAT–PURCHASEINTENT........................................................................................................45TABLE8MAKINGDATAACCESSIBLE–PURCHASEINTENT...........................................................................................46TABLE9MAKINGDATAINTEROPERABLE–PURCHASEINTENT...................................................................................46TABLE10DATASETSECURITY-PURCHASEINTENT......................................................................................................47TABLE11.DATASETIDENTIFICATION–LOCATIONANALYTICSDATA..............................................................................47TABLE12.DATASETORIGIN–LOCATIONANALYTICSDATA............................................................................................48TABLE13DATASETFORMAT–LOCATIONANALYTICSDATA...........................................................................................48TABLE14MAKINGDATAACCESSIBLE–LOCATIONANALYTICSDATA.............................................................................48TABLE15MAKINGDATAINTEROPERABLE–LOCATIONANALYTICSDATA......................................................................49TABLE16DATASETSECURITY-LOCATIONANALYTICSDATA..........................................................................................49TABLE17.DATASETIDENTIFICATION–LOCATIONANALYTICSDATA..............................................................................50TABLE18.DATASETORIGIN–LOCATIONANALYTICSDATA............................................................................................50TABLE19DATASETFORMAT–LOCATIONANALYTICSDATA...........................................................................................51TABLE20MAKINGDATAACCESSIBLE–LOCATIONANALYTICSDATA.............................................................................51TABLE21MAKINGDATAINTEROPERABLE–LOCATIONANALYTICSDATA......................................................................52TABLE22DATASETSECURITY-LOCATIONANALYTICSDATA..........................................................................................52TABLE23.DATASETIDENTIFICATION–CUSTOMERPURCHASEHISTORY.......................................................................53TABLE24DATASETORIGIN–CUSTOMERPURCHASEHISTORY.......................................................................................53TABLE25DATASETFORMAT–CUSTOMERPURCHASEHISTORY....................................................................................54TABLE26MAKINGDATAACCESSIBLE–CUSTOMERPURCHASEHISTORY.......................................................................54TABLE27MAKINGDATAINTEROPERABLE–CUSTOMERPURCHASEHISTORY...............................................................54TABLE28DATASETSECURITY–CUSTOMERPURCHASEHISTORY...................................................................................55TABLE29.DATASETIDENTIFICATION–CONSUMERINTENTANDINTERACTION...............................................................56TABLE30DATASETORIGIN-CONSUMERINTENTANDINTERACTION...............................................................................56TABLE31DATASETFORMAT–CONSUMERINTENTANDINTERACTION............................................................................56TABLE32MAKINGDATAACCESSIBLE–CONSUMERINTENTANDINTERACTION..............................................................57TABLE33MAKINGDATAINTEROPERABLE–CONSUMERINTENTANDINTERACTION.......................................................57TABLE34DATASETSECURITY–CONSUMERINTENTANDINTERACTION..........................................................................57TABLE35.DATASETIDENTIFICATION–LOCATIONANALYTICSDATA..............................................................................58TABLE36.DATASETORIGIN–LOCATIONANALYTICSDATA..............................................................................................59TABLE37DATASETFORMAT–LOCATIONANALYTICSDATA...........................................................................................59TABLE38MAKINGDATAACCESSIBLE–LOCATIONANALYTICSDATA.............................................................................59TABLE39MAKINGDATAINTEROPERABLE–LOCATIONANALYTICSDATA......................................................................60

Page 13: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 13

TABLE40DATASETSECURITY-LOCATIONANALYTICSDATA..........................................................................................60TABLE41.DATASETIDENTIFICATION–CONTACTANDCONSUMERINTERACTIONHISTORY...............................................61TABLE42DATASETORIGIN–CONTACTANDCONSUMERINTERACTIONHISTORY...............................................................62TABLE43DATASETFORMAT–CONTACTANDCONSUMERINTERACTIONHISTORY............................................................62TABLE44MAKINGDATAACCESSIBLE–CONTACTANDCONSUMERINTERACTIONHISTORY...............................................64TABLE45MAKINGDATAINTEROPERABLE–CONTACTANDCONSUMERINTERACTIONHISTORY.......................................64TABLE46DATASETSECURITY–CONTACTANDCONSUMERINTERACTIONHISTORY...........................................................64TABLE47.DATASETIDENTIFICATION–MARSHISTORICALDATA................................................................................65TABLE48DATASETORIGIN–MARSHISTORICALDATA...............................................................................................65TABLE49DATASETFORMAT–MARSHISTORICALDATA............................................................................................66TABLE50MAKINGDATAACCESSIBLE–MARSHISTORICALDATA...............................................................................66TABLE51MAKINGDATAINTEROPERABLE–MARSHISTORICALDATA.......................................................................66TABLE52DATASETSECURITY–MARSHISTORICALDATA...........................................................................................67TABLE53.DATASETIDENTIFICATION–PRODUCTATTRIBUTES....................................................................................67TABLE54DATASETORIGIN-PRODUCTATTRIBUTES....................................................................................................68TABLE55DATASETFORMAT–PRODUCTATTRIBUTES.................................................................................................68TABLE56MAKINGDATAACCESSIBLE–PRODUCTATTRIBUTES...................................................................................69TABLE57MAKINGDATAINTEROPERABLE–PRODUCTATTRIBUTES............................................................................69TABLE58DATASETSECURITY–PRODUCTATTRIBUTES................................................................................................69TABLE59.DATASETIDENTIFICATION–EVENTREGISTRY............................................................................................70TABLE60DATASETORIGIN–EVENTREGISTRY...........................................................................................................71TABLE61DATASETFORMAT–EVENTREGISTRY.........................................................................................................71TABLE62MAKINGDATAACCESSIBLE–EVENTREGISTRY...........................................................................................71TABLE63MAKINGDATAINTEROPERABLE–EVENTREGISTRY....................................................................................72TABLE64DATASETSECURITY–EVENTREGISTRY.......................................................................................................72TABLE65.DATASETIDENTIFICATION–CONSUMERDATA..........................................................................................73TABLE66DATASETORIGIN–CONSUMERDATA..........................................................................................................73TABLE67DATASETFORMAT–CONSUMERDATA.......................................................................................................74TABLE68MAKINGDATAACCESSIBLE–CONSUMERDATA..........................................................................................74TABLE69MAKINGDATAINTEROPERABLE–CONSUMERDATA..................................................................................75TABLE70DATASETSECURITY–CONSUMERDATA......................................................................................................75TABLE71.DATASETIDENTIFICATION–SALESDATA.....................................................................................................76TABLE72DATASETORIGIN–SALESDATA.................................................................................................................76TABLE73DATASETFORMAT–SALESDATA...............................................................................................................76TABLE74MAKINGDATAACCESSIBLE–SALESDATA..................................................................................................77TABLE75MAKINGDATAINTEROPERABLE–SALESDATA..........................................................................................77TABLE76DATASETSECURITY–SALESDATA..............................................................................................................77TABLE77.DATASETIDENTIFICATION–PRODUCTATTRIBUTES....................................................................................78TABLE78DATASETORIGIN–PRODUCTATTRIBUTES....................................................................................................79TABLE79DATASETFORMAT–PRODUCTATTRIBUTES.................................................................................................79TABLE80MAKINGDATAACCESSIBLE–PRODUCTATTRIBUTES....................................................................................80TABLE81MAKINGDATAINTEROPERABLE–PRODUCTATTRIBUTES............................................................................81TABLE82DATASETSECURITY–PRODUCTATTRIBUTES................................................................................................81TABLE83.DATASETIDENTIFICATION–DOORCOUNTERDATA....................................................................................81TABLE84DATASETORIGIN–DOORCOUNTERDATA....................................................................................................82TABLE85DATASETFORMAT–DOORCOUNTERDATA.................................................................................................82TABLE86MAKINGDATAACCESSIBLE–DOORCOUNTERDATA....................................................................................83TABLE87MAKINGDATAINTEROPERABLE–DOORCOUNTERDATA............................................................................83

Page 14: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 14

TABLE88DATASETSECURITY–DOORCOUNTERDATA................................................................................................83TABLE89.DATASETIDENTIFICATION–PRODUCTATTRIBUTES....................................................................................84TABLE90DATASETORIGIN–PRODUCTATTRIBUTES...................................................................................................84TABLE91DATASETFORMAT–PRODUCTATTRIBUTES.................................................................................................85TABLE92MAKINGDATAACCESSIBLE–PRODUCTATTRIBUTES...................................................................................85TABLE93MAKINGDATAINTEROPERABLE–PRODUCTATTRIBUTES............................................................................85TABLE94DATASETSECURITY–PRODUCTATTRIBUTES................................................................................................86TABLE95.DATASETIDENTIFICATION–PRODUCTSPRICEHISTORY...............................................................................86TABLE96DATASETORIGIN-PRODUCTSPRICEHISTORY................................................................................................87TABLE97DATASETFORMAT–PRODUCTSPRICEHISTORY............................................................................................87TABLE98MAKINGDATAACCESSIBLE–PRODUCTSPRICEHISTORY...............................................................................88TABLE99MAKINGDATAINTEROPERABLE–PRODUCTSPRICEHISTORY.......................................................................88TABLE100DATASETSECURITY–PRODUCTSPRICEHISTORY.........................................................................................88TABLE101.DATASETIDENTIFICATION–SALESDATA................................................................................................89TABLE102DATASETORIGIN-SALESDATA................................................................................................................89TABLE103DATASETFORMAT–SALESDATA.............................................................................................................90TABLE104MAKINGDATAACCESSIBLE–SALESDATA................................................................................................90TABLE105MAKINGDATAINTEROPERABLE–SALESDATA........................................................................................90TABLE106DATASETSECURITY–SALESDATA............................................................................................................91TABLE107DATASETIDENTIFICATION–TRAFFICSOURCE(BING).................................................................................91TABLE108DATASETORIGIN-TRAFFICSOURCE(BING)................................................................................................92TABLE109DATASETFORMAT–TRAFFICSOURCE(BING).............................................................................................92TABLE110MAKINGDATAACCESSIBLE–TRAFFICSOURCE(BING)...............................................................................93TABLE111MAKINGDATAINTEROPERABLE–TRAFFICSOURCE(BING)........................................................................93TABLE112DATASETSECURITY–TRAFFICSOURCE(BING)............................................................................................94TABLE113DATASETIDENTIFICATION–TRAFFICSOURCE(GOOGLE)............................................................................94TABLE114DATASETORIGIN-TRAFFICSOURCE(GOOGLE)...........................................................................................95TABLE115DATASETFORMAT–TRAFFICSOURCE(GOOGLE)........................................................................................95TABLE116MAKINGDATAACCESSIBLE–TRAFFICSOURCE(GOOGLE)...........................................................................96TABLE117MAKINGDATAINTEROPERABLE–TRAFFICSOURCE(GOOGLE)...................................................................97TABLE118DATASETSECURITY–TRAFFICSOURCE(GOOGLE)........................................................................................97TABLE119DATASETIDENTIFICATION–TWITTERTRENDS..........................................................................................98TABLE120DATASETORIGIN–TWITTERTRENDS.........................................................................................................98TABLE121DATASETFORMAT–TWITTERTRENDS......................................................................................................98TABLE122MAKINGDATAACCESSIBLE–TWITTERTRENDS.........................................................................................99TABLE123MAKINGDATAINTEROPERABLE–TWITTERTRENDS..................................................................................99TABLE124DATASETSECURITY–TWITTERTRENDS....................................................................................................100TABLE125.DATASETIDENTIFICATION–DBPEDIA..................................................................................................100TABLE126DATASETORIGIN–DBPEDIA.................................................................................................................101TABLE127DATASETFORMAT–DBPEDIA...............................................................................................................101TABLE128MAKINGDATAACCESSIBLE–DBPEDIA.................................................................................................102TABLE129MAKINGDATAINTEROPERABLE–DBPEDIA..........................................................................................102TABLE130DATASETSECURITY–DBPEDIA.............................................................................................................102TABLE131.DATASETIDENTIFICATION–LINKEDOPENSTREETMAPS........................................................................103TABLE132DATASETORIGIN–LINKEDOPENSTREETMAPS........................................................................................103TABLE133DATASETFORMAT–LINKEDOPENSTREETMAPS.....................................................................................104TABLE134MAKINGDATAACCESSIBLE–LINKEDOPENSTREETMAPS........................................................................105TABLE135MAKINGDATAINTEROPERABLE–LINKEDOPENSTREETMAPS................................................................105

Page 15: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 15

TABLE136DATASETSECURITY–LINKEDOPENSTREETMAPS....................................................................................105TABLE137.DATASETIDENTIFICATION–LINKEDGEODATA.....................................................................................106TABLE138DATASETORIGIN–LINKEDGEODATA....................................................................................................106TABLE139DATASETFORMAT–LINKEDGEODATA..................................................................................................107TABLE140MAKINGDATAACCESSIBLE–LINKEDGEODATA.....................................................................................107TABLE141MAKINGDATAINTEROPERABLE–LINKEDGEODATA.............................................................................107TABLE142DATASETSECURITY–LINKEDGEODATA.................................................................................................108TABLE143.DATASETIDENTIFICATION–GEONAMES..............................................................................................108TABLE144DATASETORIGIN–GEONAMES.............................................................................................................109TABLE145DATASETFORMAT–GEONAMES...........................................................................................................109TABLE146MAKINGDATAACCESSIBLE–GEONAMES.............................................................................................110TABLE147MAKINGDATAINTEROPERABLE–GEONAMES......................................................................................110TABLE148DATASETSECURITY–GEONAMES.........................................................................................................110TABLE149MAPPINGDATASETANDBUSINESSCASE......................................................................................................111

Page 16: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 16

Acronymstable

AbbreviationsandacronymsusedinthisdeliverablearereportedinTable1.

Table1.Abbreviationsandacronyms

DMP DataManagementPlan

DPA DataProtectionAuthority

DPIA DataProtection/PrivacyImpactAssessments

EAN EuropeanArticleNumber

EC EuropeanCommission

EU EuropeanUnion

GDPR GeneralDataProtectionRegulation

GPC GlobalProductClassification

GS1 GlobalStandardsOne

IPR IntellectualPropertyRights

LOD LinkedOpenData

PbD PrivacybyDesign

PD PersonalData

RDB Relationaldatabase

ROM RoughOrderofMagnitude

Page 17: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 17

Ab

Chapter1 Introduction

According to the Guidelines on FAIR DataManagement in Horizon 2020, DataManagement Plan(DMP)isakeyelementofgooddatamanagement.ADMPdescribesthedatamanagementlifecycleforthedatatobecollected,processedand/orgeneratedbyaHorizon2020project.

Thisdocumentwill set-upaDMP inaccordancewithH2020Guidelines, including informationandsuggestionsabout thehandlingofdataduringandafter theendof theproject,whatdatawillbecollected,processedand/orgenerated,whichmethodologyandstandardswillbeapplied,whetherdatawillbeshared/madeopenaccessandhowdatawillbecuratedandpreserved(includingaftertheendoftheproject).

InadditiontotheguidelinesprovidedbytheEuropeanCommission,thisdocumentalsoreferstotheplantoaddressthelegalandethicalissuesrelatedtodatathatwillbecollected.

The deliverable describes the approach established in EW-Shopp to ensure the life-cyclemanagement of the public and proprietary datasets provided by the consortiummembers to theproject as well as other dataset produced by the Consortium during the project execution, asdefinedatM6.

In chapter 1 the document defines which are the principles underlying EW-Shopp DMP, theapproach followed togenerate thestructure, themaincontentsof thedocumentand links to theotherdeliverablesanddocuments.Inchapter2,thedocumentintroducestheEW-Shoppproject,itspurpose, thekindofdataset involved in theproject, theaudienceand the responsibilitiesdefinedaround the DMP. Chapter 3 introduces core concepts and fundamental legal principles as welloutlinesanethicalassessmentfordataownerand,concerninglegalrequirements,providesdetailedguidelinesabout theobligations thatdataownersneed to complywith. InChapter4, ahigh-leveldescription of the four business cases is reported in order to give an overall view of the projectscope. In Chapter 5, relevant information regarding the dataset are explained and the process tocollectall the informationamongdataowners isdescribed.Chapter6shows, foreachdataset,alltheinformationrequiredfordatasetidentification,origin,format,access,securityandwithrespectto ethical and legal requirements. Data storage policies, data archiving, security, permission, dataaccess,re-useandlicensingarediscussedinchapter7.

Finally,thesurveythatwassubmittedtoalldatasetprovidersisreportedinAnnexA.

Page 18: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 18

1.1 PrinciplesunderlyingEW-ShoppDMP

TheEW-Shoppprojectaimsatdeployingandhostingaplatformtoeasedata integrationtasks,byembedding shared datamodels, robust datamanagement techniques and semantic reconciliationmethods. Thisplatformwill offer a framework forunificationof fragmentedbusinessdataand itsintegrationwith external event andweather data, whichwill support data analytics services thatofferkeycompetitiveadvantagesinthemoderncommercespace.Ingeneral,researchdatashouldbe 'FAIR',that is findable,accessible, interoperableandre-usable.These principles precede implementation choices and do not necessarily suggest any specifictechnology,standard,orimplementation-solution.Inthiscontext,theDataManagementPlanisakeyactivityanditwilldeepenthegeneralprinciplesunderlyingEW-ShoppDataManagementPlan(from[DoA]):

• EW-ShoppPrivacyPolicy:WewillsetupandexplicitlydefineaPrivacyPolicyadoptedintheEW-Shoppproject,withwhichallpartnersanddataprocessingactivitiescarriedout intheprojectmust comply. […] In case some PD is used in some intermediate data processingstep,thisinformationwillbeproperlyanonymizedandusedonlyuponconsenttosecondaryusecollectedfromtheusers.TheEW-ShoppPrivacyPolicywillassurethatdataprocessingactivities in EW-Shopp comply with national and EU legislation, including legislation onpersonaldataprotection.

• Statistical data not containing PD: Themajority of datasets consist of statistical data (alldatasetclassifiedasnotcontainingpersonaldatainthedatadescriptiontables).ThesedatadonotcontainPDbutonlyinformationtreatedatanaggregatelevelthatcannotbelinkedback to single individuals. Therefore, the specific data subjects will be not visible/recognizable in such sets of data. These data havebeen collectedby business partners intheir daily operations in compliance with national regulations, both in relation to privacyprotectionandinformedconsenttodataprocessing.

• AnonymizationofdatacontainingPD:Otherdatasetsareclassifiedascontainingpersonaldatainthedatadescriptiontables.Thesedatawillbeanonymizedbeforebeingusedintheproject soas tocomplywith theprivacyprotectionpolicyandnationalandEU legislation.Amongthesedatasets,weconsiderthreenotablecases,forwhichwespecifyhowweplantoensureprivacyprotectionconstraints.

1.2 GeneralApproach

TheEW-ShoppDMPwillbedevelopedbytaking intoaccounttheDMPtemplatethatmatchesthedemands and suggestions of the Guidelines on Data Management in Horizon 2020, and that isavailablethroughtheDMPonlineplatform2.

Theprincipalcontentsindicatedinthetemplateareenlistedherebelow:

2https://dmponline.dcc.ac.uk/

Page 19: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 19

• DatasetDescription• Fairdata(makingdatafindable,accessible,interoperableandreusable)• Datasecurity• Dataarchivingandpreservation• Ethicsandaspects

Thesecontentswereutilizedasaguideandthenthedocumentwascustomizedaccordingtospecificstudyrequirements.

1.3 Applicabledocumentsandreferences

The following documents are applicable to the subject discussed in this deliverable, and will bereferencedasindicatedintoroundbrackets:

1. EW-Shopp–GrantAgreementnumber732590([GA])

2. [GA]Annex1–DescriptionoftheAction([DoA])

3. EW-Shopp–ConsortiumAgreement([CA])

4. D7.2POPD-RequirementNo.2([D7.2])

Short references may be used to refer to project beneficiaries, also frequently referred to aspartners.ReferencesarelistedinTable2.

Table2.Shortreferencesforprojectpartners

No. Beneficiary(partner)nameasin[GA] Shortreference

1 UNIVERSITA’DEGLISTUDIDIMILANO-BICOCCA UNIMIB

2 CENEJEDRUZBAZATRGOVINOINPOSLOVNOSVETOVANJEDOO CE

3 BROWSETEL(UK)LIMITED BT

4 GFKEURISKOSRL. GFK

5 BIGBANG,TRGOVINAINSTORITVE,DOO BB

6 MEASURENCELIMITED ME

7 JOTINTERNETMEDIAESPAÑASL JOT

8 ENGINEERING–INGEGNERIAINFORMATICASPA ENG

9 STIFTELSENSINTEF SINTEF

10 INSTITUTJOZEFSTEFAN JSI

Page 20: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 20

1.4 Updatesofthisdeliverable

Thisdeliverablewillbeupdated,overthecourseoftheproject,wheneversignificantchangesarise,toensurecompliancewithHorizon2020guidelines.Amongthesechangesitispossibletolist:newdatasets thatwillbeadded, changes in consortiumpoliciesor changes in consortiumcompositionandexternalfactors.

Page 21: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 21

Chapter2 ProjectDataManagement

2.1 Projectpurposes

EW-Shoppaimsat supporting companiesoperating in the fragmentedEuropeanecosystemof theeCommerce, Retail and Marketing industries to increase their efficiency and competitiveness byleveragingdeepcustomerinsightsthataretoochallengingforthemtoobtaintoday.

Improved insights will result from the analysis of large amount of data, acquired from differentsources and sectors, and in multiple languages. The integration of consumer and market datacollected by different business partners will ensure to cover customer interactions and activitiesacrossdifferentchannels,providinginsightsonrichcustomerjourneys.Theseintegrateddatawillbefurther enriched with information about weather and events, two crucial factors impactingconsumerchoices.

Byincreasingtheanalyticalpowercomingfromtheintegrationofcross-sectorialandcross-languagedata sourcesandnewdata sources companieswilldeploy real-time responsive services fordigitalmarketing, reporting-style services formarket research,advanceddataand resourcemanagementservicesforRetail&eCommercecompaniesandtheirtechnologyproviders,andenhancedlocationintelligenceservices.Forexample,byusingapredictivemodelbuiltontopofintegrateddataaboutclick-throughrateofproducts,weatherandevents,wewilldevelopaservicethatisabletoincreaseadvertisingoftop-gearsportequipmentonasunnyweekendafternoonduringTourDeFrance.

To realize these objectives, a platform, also referred to as EW-Shopp platform, will be built. Theplatformwillsupport:

• The integration of consumer and market data, covering customer interactions acrossdifferent channels and with different languages, and providing insights on rich customerjourney

• Theenrichmentoftheintegrateddatawithinformationaboutweatherandevents

• Theanalysisoftheenricheddatausingvisual,descriptiveandpredictiveanalytics.

2.2 Projectdata

EW-Shoppmakesuseofamixofpublicandproprietarydatasets.Thebroadclassesofdataincludethefollowing:

• Marketdata–dataextractedfrommarketingresearchandcommercialactivity• Consumerdata–profilesfrommarketingresearch,e-commerce,digitaladvertising,andIoT

devices• Category/productdata–datacomingfromcommercialactivities

Page 22: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 22

• Eventsreportedinmedia–popularonlinemediadata• Weatherdataandforecasts

TheEW-Shoppplatformwillprovidedataservicesandtools toprocessandharmonisedata. Itwillproduceasetofagreeddatamodels,includingasharedsystemofentityidentifierstorepresenttheaforementioneddatasets.Thedatawillfurthermoreberepresentedinawaythatprovidessupportformultipleinputlanguages.

2.3 Audience

Projectdataareorientedto:

• Theconsortiumpartners;• Allstakeholdersinvolvedintheproject;• TheEuropeanCommission.

Because of the sensitiveness of business data used in the EW-Shopp innovation action, nocommitmenttopublishdatasetsprovidedbybusinesspartnersasopendata ismadein [DoA].Forthisreason,wedonotincludeexternalstakeholdersintheaudienceforprojectdata.Withexternalstakeholderswerefertoapartythat:isnotabeneficiary,isnotalinkedthirdpartyinEW-Shopp,isnot the EuropeanCommission.Althoughwedonot expect tomakedatasets openly accessible toexternal stakeholders, models and methodologies developed in the project to supportinteroperabilitybetweendifferentpartieswillbedisseminatedtoalargeraudienceofstakeholders.

2.4 Rolesandresponsibilities

WedescribemainrolesofbeneficiariesintheconsortiumandtheirresponsibilitieswithregardstodataandservicesdevelopedinbusinesscasesinTable3.RolesandResponsibilitiesofBeneficiariesInthetablewithrefertoBusinessCaseswiththeirnumber,whicharefurtherexplainedinChapter4.

Inthetable,wedistinguishbetweentwomainrolesofbeneficiariesintheconsortium:- Business Partners: partners that develop services within the project, by exploiting the

technology developed in the project, i.e., the EW-Shopp platform, on their own data setsand/orwiththehelpofdatasetsprovidedbyotherpartnersintheproject.Thesepartnerswill also contribute indirectly to the technology by driving its development with thespecificationcomingfromtheirbusinesscases.

- Technologypartners:partnerswhosemainroleintheprojectistodevelopthetechnologythatwill support the EW-Shopp platform. These partnerswill also contribute indirectly tothebusinesscasesbyperformingthefollowingactivities:

o Providingorsupportingaccesstocoredatasets,i.e.,datasetssuchasproductdata,locations,weatherandevents,usedtointegrateandenrichbusinessdata.

o Supporting the development of pilots and services by helping business partnersintegrateoranalyzethedata.

Page 23: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 23

Table3.RolesandResponsibilitiesofBeneficiaries

Partner PartnerRole Resp.wrtData Resp.wrtBusinessCases

Business Tech. Owner Facilitator Service Data Tech.Support(Integration)

Tech.Support(Analytics)

UNIMIB X X BC2,BC3

CE X X BC1 BC1

BT X X X BC1 BC1

GFK X X X BC2 BC1,BC2

BB X X BC1 BC1

ME X X BC3 BC3

JOT X X BC4 BC4

ENG X BC4 BCALL

SINTEF X BC1

JSI X X X BCALL

Atagenerallevel,responsibilitieswithrespecttodatamanagedintheprojectcanbesummarizedasfollows:

- Dataowner,apartnerthatprovidestotheconsortiumdatathatitowns- Datafacilitator,apartnerthateasesaccesstodatathatare:

o providedbybeneficiaries(i.e.,UNIMIBwillsupportaccesstoproductdataownedbyGFK)

o providedbylinkedthirdparties(i.e.,JSIwillprovideaccesstoweatherdataprovidedbyECMWF)

o available as open data (i.e., UNIMIB will provide access to relevant data aboutlocationsavailableinsourcessuchasDBpedia3)

Partnersmaythushavedifferentresponsibilitieswithrespecttodevelopmentofbusinesscasesandpilots (see Table 3 for the specification of the responsibilities of individual beneficiaries in eachbusinesscase):

- Servicedeveloper(referredtoas“Service” inthetable) isabeneficiarythat isresponsiblefordevelopingaservicewithinabusinesscase.

- Data provider (referred to as “Data” in the table) is a beneficiary that is responsible forprovidingitsdatatosupportabusinesscase.

- Technical support (integration) is a technical partner that is responsible for providingsupportinabusinesscasebyhelpingbusinesspartnersinthedataintegrationprocess.

3dbpedia.org

Page 24: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 24

- Technicalsupport(analytics)isatechnicalpartnerthatisresponsibleforprovidingsupportinabusinesscasebyhelpingbusinesspartnersinthedataanalyticprocess.

Theassignmentofbusinesscasestotechnologypartnersmaybesubjecttochangeinthecourseoftheproject;Table3reportsassignmentsthathavebeenusedtocollectrequirementsincludedinthisdocument.InadditiontoEW-Shoppbeneficiaries,theprojectalsoincludethreetwopartieshavingaroleintheproject:

- European Centre for Medium-Range Weather Forecasts (ECMWF) is an independentintergovernmental organisation founded in 1975 and supported by 34 states(http://www.ecmwf.int). Data from ECMWF are provided to the EW-Shopp project to beused by every partner. ECMWFwill contribute in EW-Shopp bymaking available, for thescopeoftheproject,itsmeteorologicalarchiveofforecasts(MARS)ofthepast35yearsandsetsofreanalysisforecasts.

- CDE is a Slovene Ltd IT company providing IT solutions for communication and customerrelation management linked to Browsetel (BT). CDE will act as a data and infrastructureprovider and software development in the context of BC1 inWP4,while BTwill focus onbusinessdevelopment.ResponsibilitiesofCDE inEW-Shoppare included inresponsibilitiesofBTinTable3.

Page 25: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 25

Chapter3 EthicsandLegalCompliance

3.1 Legalrequirementsregardingpersonaldata

TheEW-ShoppprojectmustcomplywithallEUlawsregardingdataprotection.Thepurposeofthissection is to explain core principles and concepts of the right to protection of personal data inscientificresearch.4

In the1990s, theEuropeanUnion startedaprocessof codificationofdataprotectionandprivacyrights in order to harmonise different national legislation. Directive 95/46/EC5 (“Data ProtectionDirective”) and Directive 2002/58/EC6 (“E-Privacy Directive”) are the main legal provisions thatreferredtodefinethelegalframework,consideringalsotheEUCharterofFundamentalRights7andtheappropriatenationallegislationthattransposedtheseEUdirectives.

This multilevel legal environment is going to change in 2018, when in May a new EuropeanRegulation comes into force.8 Indeed, theGeneral Data Protection Regulation (GDPR) (Regulation(EU)2016/679)9wasapproved,bytheEUParliament,on14April2016.Itwillenterinforce20daysafter itspublication in theEUOfficial Journal andwill bedirectly application inallmember statestwoyearsafterthisdate.ItisdesignedtoharmonizedataprivacylawsacrossEurope,toprotectandempower all EU citizens' data privacy and to reshape the way organizations across the regionapproachdataprivacy.

AlthoughthenewRegulationconfirmsthemainprinciplesofboththeabove-citedDirectives,itwillsubstitutethemandallnationallegislationondataprotectionandprivacyrights.

3.1.1 Coreconcepts

4According toarticle19Regulation(EU)n.1291/2013 (Horizon2020): “all the researchand innovationactivities carriedunderHorizon2020shallcomplywithethicalprinciplesandrelevantnational,Unionandinternationallegislation,includingthe Charter of Fundamental Rights of the European Union and the European Convention on Human Rights and itsSupplementaryProtocols.Particularattentionshallbepaidtotheprincipleofproportionality,therighttoprivacy,therightto the protection of personal data, the right to the physical and mental integrity of a person, the right to non-discriminationandtheneedtoensurehighlevelsofhumanhealthprotection.”5Directive95/46/ECof theEuropeanParliamentandof theCouncilof24October1995ontheprotectionof individualswithregardtotheprocessingofpersonaldataandonthefreemovementofsuchdata.6 Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing ofpersonaldataandtheprotectionofprivacy intheelectroniccommunicationssector (DirectiveonPrivacyandElectronicCommunications). Later thisDirectivewasamendedwithDirective2009/136/ECof theEuropeanParliamentandof theCouncilof25November2009.7 Article 8 (Protection of Personal Data) of the EU Charter of Fundamental Rights: “1. Everyone has the right to theprotectionofpersonaldataconcerninghimorher.2.Suchdatamustbeprocessedfairlyforspecifiedpurposesandonthebasisoftheconsentofthepersonconcernedorsomeotherlegitimatebasislaiddownbylaw.Everyonehastherightofaccesstodatawhichhasbeencollectedconcerninghimorher,andtherighttohaveitrectified.3.Compliancewiththeserulesshallbesubjecttocontrolbyanindependentauthority.”8Regulation(EU)2016/679oftheEuropeanParliamentandof theCouncilof27April2016ontheprotectionofnaturalpersonswithregardtotheprocessingofpersonaldataandonthefreemovementofsuchdata,andrepealingDirective95/46/EC(GeneralDataProtectionRegulation).9http://www.eugdpr.org/

Page 26: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 26

EuropeanDataProtectionlegislationisbasedonsomecoreconceptsconcerningthesubjectswhoaregoingtoacquire,collect,process,profile,andusedata;thedifferenttypesofdata;andnotificationprocedures. Beloware listed themost important definitions for scientific research activities. Thesedefinitions have been extrapolated from EU legislation, EU and Member State (MS) officialdocuments,orotherlegaldocuments.

Alltextinitalicsiswithrespecttothenew2018Europeanregulationanditsadditionalrequirements.Table4Coreconcepts-EuropeanDataProtectionlegislation

CORECONCEPT Definition

SUBJECTSINDATAPROCESS DataController10:Thenaturalor legalperson,whichaloneor jointlywithothersdeterminesthepurposesandmeansoftheprocessingofpersonaldata.

DataProcessor11:Anaturalorlegalperson,whichprocessespersonaldataonbehalfofthecontroller.

DIFFERENTTYPESOFDATA Personal Data12: Any information relating to an identified oridentifiable natural person (“data subject”); an identifiable person isone who can be identified, directly or indirectly, in particular, byreference to an identification number, location data, an onlineidentifier or to one or more factors specific to his physical,physiological,genetic,mental,economic,culturalorsocial identityofthatnaturalperson.Personaldatamaybeprocessedonly ifthedatasubjecthasunambiguouslygivenhisconsent(“priorconsent”).

NB:Anonymiseddataarenolongerpersonaldata.Seebelow.

Sensitive (Personal) Data13: Personal data revealing racial or ethnicorigin,politicalopinions,religiousorphilosophicalbeliefs,trade-unionmembership, and the processing ofgenetic data, biometric data forthepurposeofuniquely identifyinganaturalperson,dataconcerninghealth or data concerning a natural person’s sex life or sexualorientation.Sensitivedatamaybeprocessedonly if thedatasubjecthasgivenhisexplicit consent to theprocessingof thosedata (“priorwrittenconsent”).

NB:Anonymiseddataarenolongerpersonaldata.Seebelow.

Genetic Data14: personal data relating to the inherited or acquiredgenetic characteristics of a natural person which give unique

10Art.2,lett.d),Directive95/46/ECandart.4,n.7),Regulation(EU)2016/679.11Art.2,lett.e),Directive95/46/ECandart.4,n.8),Regulation(EU)2016/679.12Art.2,lett.a),Directive95/46/ECandart.4,n.1),Regulation(EU)2016/679.13Art.8,Directive95/46/ECandart.9,Regulation(EU)2016/679.14Art.4,n.13),Regulation(EU)2016/679.

Page 27: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 27

informationaboutthephysiologyorthehealthofthatnaturalpersonandwhichresult,inparticular,fromananalysisofabiologicalsamplefromthenaturalpersoninquestion.

NB:Anonymiseddataarenolongerpersonaldata.Seebelow.

Biometric Data15: personal data resulting from specific technicalprocessing relating to the physical, physiological or behaviouralcharacteristicsofanaturalperson,whichalloworconfirmtheuniqueidentificationofthatnaturalperson.

NB:Anonymiseddataarenolongerpersonaldata.Seebelow.

Anonymization(AnonymisedData)16:Processingofdatawiththeaimof removal of information that could lead to an individual beingidentified.Datacanbeconsideredanonymisedwhenitdoesnotallowidentification of the individuals to whom it relates, and it is notpossible thatany individualcouldbe identified fromthedatabyanyfurtherprocessingofthatdataorbyprocessingittogetherwithotherinformation which is available or likely to be available. Use ofanonymiseddatadoesnotrequiretheconsentofthe“datasubject.”

Simulated Data: Imitation or creation of data that closely matchesreal-worlddata,butisnotrealworlddata.Forthesedata,consentisnotnecessarysinceitisnotpossibletoidentifythe“datasubject.”

Pseudonymization17: The processing of personal data in such amanner that the personal data can no longer be attributed to aspecific data subject without the use of additional information,provided that such additional information is kept separately and issubject to technical and organisationalmeasures to ensure that thepersonaldataarenotattributedtoanidentifiedoridentifiablenaturalperson.

Big Data18: High-volume, high-velocity, high-value and high-varietyinformation(4Vs)assetsthatdemandinnovativeformsofinformationprocessing.

15Art.4,n.14),Regulation(EU)2016/679.16 For the definition of, for example, the Irish Data Protection Authority, seehttps://www.dataprotection.ie/docs/Anonymisation-and-pseudonymisation/1594.htm and the UK InformationCommissioner,seehttps://ico.org.uk/for-organisations/guide-to-data-protection/anonymisation/.17Art.4,n.5),Regulation(EU)2016/679.18 For the 4Vs theory seeBigData to SmartData, Iafrate Fernando [2015]. TheUKData ProtectionAuthority refers toGartner’s definitions “Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”, InformationCommissioner’sOffice,BigDataandDataProtection,6[2014].

Page 28: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 28

OpenData19:Datathatcanbefreelyused,re-used,andredistributedbyanyone–subjectonly,atmost,totherequirementtoattributeandshare-alike.

PROCESSES Processing of Personal Data20: Any operation (or set of operations)that is performed upon personal data or on sets of personal data,whether or not by automatedmeans, such as collection, recording,organization, structuring, storage, adaptation or alteration, retrieval,consultation, use, disclosure by transmission, dissemination orotherwise making available, alignment or combination, restriction,erasure,ordestruction.

Profiling21: Any form of automated processing of personal dataconsisting of the use of personal data to evaluate certain personalaspectsrelatingtoanaturalperson,inparticular,toanalyseorpredictaspects concerning that natural person’s performance at work,economicsituation,health,personalpreferences,interests,reliability,behaviour,location,ormovements.

NOTIFICATION

Notification: According to different national legislation, datacontrollers have to notify their National Data Protection Authority(DPA) of their intention to use data before starting to process data.Requirements, notification processes, and conditions vary acrossnationalDPAs.

3.1.2 FundamentalPrinciples

European Data Protection legislation provides that personal data must be collected, used, andprocessed fairly, stored safely, and not disclosed to any other person unlawfully. From thisperspective,wecanoutlinethefollowingfundamentalprinciplesregardingpersonaldatause22:

1. Personaldatamustbeobtainedandprocessedfairly, lawfully,and inatransparentway23:according to EU and MS’s national legislation the data controller has to respect certainconditions,forexampledothenotificationprocessbeforestartingcollectingpersonaldataorobtainprior consent from thenatural person (the “data subject”) before collectinghis/herpersonaldata;

2. Personaldata shouldonlybe collected forspecified, explicit, and legitimatepurposes andnotfurtherprocessed inanyway incompatiblewiththosepurposes:personaldatamustbe

19DefinitionofOpenDataHandbook,http://opendatahandbook.org/guide/en/what-is-open-data/20Art.2, lett.b),Directive95/46/ECandart.4,n.2),Regulation(EU)2016/679.In italicspartofsentencesaddedbythenewRegulation(EU)2016/679.21Art.4,n.4),Regulation(EU)2016/679.22TheseprinciplesareextrapolatedfromDirective95/46/EC.23NewEUregulationhasrequiredalsothatpersonaldataareprocessedinatransparentmanner(article5,Regulation(EU)2016/679.

Page 29: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 29

collected for specific, clear, and lawfully stated purposes,which the data controller has tospecifytothe“datasubject”andtothenationalDataProtectionAuthority(DPA);

3. Personaldatashouldbeusedinanadequate,relevant,andnotexcessivewayinrelationtothepurposes forwhichtheyarecollectedand/or furtherprocessed:processingofpersonaldatashouldbecompatiblewiththespecifiedpurposesforwhichitwasobtained;

4. Keeppersonaldataaccurate,complete,and,wherenecessary,up-to-date;

5. Keep personal data safe and secure: the data controller must assure adequate technical,organisational, and security measures to prevent unauthorised or unlawful processing,alteration,orlossofpersonaldata;

6. Retainpersonaldata forno longer than isnecessary:personaldata shouldnotbekept forlongerthanisnecessaryforthepurposesforwhichitwasobtained;

7. No transfer of personal data overseas: it is prohibited to transfer personal data to anycountryoutsideoftheEuropeanUnionandEuropeanEconomicArea.

ThenewEuropeanRegulationhasalsoaddedsomeotherprinciplestocorrectlymanageprivacyanddataprotectionrights.Thesenewprinciplesprovideasfollows:

• DataControlleraccountability:takingintoaccountthenature,scope,context,purposes,andrisks of processing, the Data Controller has to implement appropriate technical andorganisationalmeasures.24

• Principlesofdataprotectionbydesignandbydefault25mustbeapplied:

o Privacybydesign26:TheDataController,beforestartingcollectionandprocessingofpersonaldataaswellasduringtheprocessing itself(“thewhole lifecycleofdata”),has to implement appropriate technical and organisational measures, such aspseudonymization,whicharedesignedtoimplementdataprotectionprinciples,suchas data minimisation, in an effective manner and to integrate the necessarysafeguards into the processing. In other words, before starting “working” withpersonal data, the entire process from the start has to be designed in compliancewiththerequiredtechnicalandlegalsafeguardsofdataprotectionregulations(e.g.adequatesecurity);

o Privacybydefault:TheDataControllerhastoimplementappropriatetechnicalandorganisationalmeasures forensuring that,bydefault, onlypersonaldata that arenecessaryforeachspecificpurposeoftheprocessingareprocessed.27

Morespecifically“Privacybydesign’s”(PbD)coreconcepts28are:

1. Beingproactivenotreactive,preventativenotremedial:The“PbDapproachischaracterized

24Art.24,Regulation(EU)2016/679.25Art.25,Regulation(EU)2016/679.26The“privacybydesign”approachwasdevelopedbytheInformationandPrivacyCommissionerofOntario,Canadainthemid-1990s, see https://www.ipc.on.ca/wp-content/uploads/2013/09/pbd-primer.pdf and https://www.iab.org/wp-content/IAB-uploads/2011/03/fred_carter.pdf. Some European Data Protection Authorities directly referred to thisapproach,evenbefore“Privacybydesign”wasexplicitlyprovidedforinthenewEuropeanregulation.27 For a practical guide on howprivacy by design and by default principles can bemade concretely and effectively seeEuropeanUnionAgencyforNetworkandInformationSecurity(ENISA),PrivacyandDataProtectionbyDesign:FromPolicytoEngineering,December2014,https://www.enisa.europa.eu/publications/privacy-and-data-protection-by-design.28 Concepts are extrapolated from PbD approach of the Information and Privacy Commissioner of Ontario, seehttps://www.ipc.on.ca/wp-content/uploads/2013/09/pbd-primer.pdf.

Page 30: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 30

by proactive rather than reactive measures. It anticipates and prevents privacy invasiveevents before they happen. PbD does notwait for privacy risks tomaterialize, nor does itofferremediesforresolvingprivacyinfractionsoncetheyhaveoccurred—itaimstopreventthemfromoccurring.Inshort,PrivacybyDesigncomesbefore-the-fact,notafter”;

2. Havingprivacyasthedefaultsetting:“PbDseekstodeliverthemaximumdegreeofprivacybyensuringthatpersonaldataareautomaticallyprotectedinanygivenITsystemorbusinesspractice.Ifanindividualdoesnothing,theirprivacystillremainsintact.Noactionisrequiredonthepartoftheindividualtoprotecttheirprivacy—itisbuiltintothesystem,bydefault”;

3. Havingprivacyembeddedintodesign:“PbDisembeddedintothedesignandarchitectureofITsystemsandbusinesspractices.Itisnotboltedonasanadd-on,afterthefact.Theresultisthat privacy becomes an essential component of the core functionality being delivered.Privacyisintegraltothesystem,withoutdiminishingfunctionality”;

4. Avoiding the pretence of false dichotomies, such as privacy vs. security: “PbD seeks toaccommodateall legitimate interestsandobjectives inapositive-sumwin-winmanner,notthroughadated,zero-sumapproach,whereunnecessarytrade-offsaremade.PbDavoidsthepretenceoffalsedichotomies,suchasprivacyvs.security–demonstratingthatitispossibletohaveboth”;

5. Providingfulllife-cyclemanagementofdata:“PbD,havingbeenembeddedintothesystemprior to the first element of information being collected, extends securely throughout theentirelifecycleofthedatainvolved—strongsecuritymeasuresareessentialtoprivacy,fromstarttofinish.Thisensuresthatalldataaresecurelyretained,andthensecurelydestroyedattheendoftheprocess,inatimelyfashion.Thus,PbDensurescradletograve,securelifecyclemanagementofinformation,end-to-end”;

6. Ensuring visibility and transparency of data: “PbD seeks to assure all stakeholders thatwhateverthebusinesspracticeortechnologyinvolved,itisinfact,operatingaccordingtothestatedpromisesandobjectives,subjecttoindependentverification.Itscomponentpartsandoperationsremainvisibleandtransparent,tousersandprovidersalike.Remember,trustbutverify”;

7. Beinguser-centric and respecting user privacy: “PbD requires architects and operators toprotect the interestsof the individualbyofferingsuchmeasuresasstrongprivacydefaults,appropriatenotice,andempoweringuser-friendlyoptions.Keepituser-centric”.

3.1.3 Notificationprocessanddataprotectionimpactassessment

Generally, every data controller has to notify its national Data Protection Authority (DPA) of itsdecision to start collection of personal data before starting this process. This notification aims atcommunicatinginadvancethecreationofanew“database,”explainingthereasonsforandpurposesof this, and the technical and organisational safeguards in place to protect the personal data.Consequently, DPAs are enabled to verify the legal and technical safeguards required by EUlegislation. However, the conditions attaching to and the procedures for submitting such anotificationdifferfromEUstatetoEUstate,withthestrongestprotectionsinplaceinGermanyandtheNetherlandsandtheleastinIrelandandtheUK.

The new European Regulation will introduce a different way to manage data protection issues,followingPbDprinciples,however.EachDataControllerhastocarryoutanassessmentoftheimpact

Page 31: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 31

ofprocessingoperationson theprotectionofpersonaldatabefore starting theprocessing itself toevaluatetheorigin,nature,particularity,andseverityofrisk29attachingtotheirproposedprocessing.SuchDataProtection/Privacy ImpactAssessments (DPIA)can thenbeutilised todefineappropriatemeasurestoassuredataprotectionandcompliancewithEUlegislation.

ADPIAisrequiredincaseof:

• Systematic and extensive evaluation of personal aspects in automated processing (e.g.profiling);

• Processing on a large scale of sensitive data or of personal data relating to criminalconvictionsandoffences;

• Systematicmonitoringofapubliclyaccessibleareaonalargescale.

ThemainaspectsofDPIAsare:

a) Systematicdescriptionofprocessingoperationsandthepurposesoftheprocessing;

b) Assessmentof thenecessityandproportionalityof theprocessingoperations in relation tothepurposes;

c) Assessmentoftheriskstotherightsandfreedomsofdatasubjects;

d) Measurestodealwiththerisks,includingsafeguards,securitymeasures,andmechanismstoensuredataprotectionandtodemonstratecompliancewithEUlegislation.

IntheeventthataDPIAindicatesahighriskintermsofdataprotectionandprivacyrights,theDataControllermustconsulttheNationalDataProtectionAuthoritypriortotheprocessing.30

3.1.4 NotificationprocessinEW-Shoppproject

TheuseofdatasetwithinEW-Shoppprojecthave tocomplywithapplicable international,EUandnationallaw(inparticular,EUDirective95/46/EC).

Tothisaim,dataownershavebeenaskedtoevaluateeachoftheirdatasetinordertoconfirmthenatureandsensitivityofdatatobeusedwithinEW-Shoppproject.

In order to make this evaluation, dataset owners, for each dataset, have to clarify if their owndataset contains PD. If the dataset contains PD, they have to provide notification and informedconsentforsecondaryuse.

If thedataset, tobeusedforEW-Shoppproject,doesnotcontainPD, it isneededtoclarify if it isderivedfromadatasetwhichcontainsPD.IfthedatasetderivesfromadatasetwhichcontainsPD,thedataownershouldprepareastatementwhichexplainsthathewillnotusedataproducedintheprojecttoenrichdatasetcontainingPDforDMPaimsandprovidealsothenotificationwiththeECregardingtheoriginaldatasetwhichcontainsPDtobeincludedindeliverable[D7.2].

29Art.35,Regulation(EU)2016/679.30Art.36,Regulation(EU)2016/679.

Page 32: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 32

IfthedatasetdoesnotcontainPD(orderivesfromadatasetdoesnotcontainPD),thedataownershouldprovideastatement,whichdetailsthathisowndatasetdoesnotcontainPD(explainingtheimplementedprocedures,etc.).

Allthenotificationsandcopyofopinionsperformedbyownersofdataset,whichcontainsPDwillbecollectedindeliverable[D7.2].

3.2 Ethicsrequirementsregardingtheinvolvementofhumanrights

The EW-Shopp project is implemented considering fundamental ethical standards to ensure thequality and excellence in the process and after the life of the project. In the Horizon 2020 it isspecifiedthatEthicalresearchconductimpliestheapplicationoffundamentalethicalprinciplesandlegislation to scientific research in all possible domains of research. According to the procedureestablished in the Horizon 2020 in terms of Ethics, in order to achieve the engagement of thescientificresearchwiththeethicaldimension, inEW-ShoppprojecteachBCownerhasbeenaskedtoanswerthefollowingquestions:

• Arethereanyethicalissuesthatcanhaveanimpactondatasharing?• Haveyoutakenthenecessarymeasurestoprotectthehumans’rightsandfreedoms?• Howdid/couldthesemeasuresimpacttheBC?• Doyouassesstheriskslinkedtothespecifictypeofdatayourorganizationprovides?

3.3 IntellectualPropertyRights

InthecontextofEW-Shoppproject,theIPRownershipisfundamentallyregulatedbytheunderlyingprinciplesoftwomainofficialdocuments(namely[CA]and[GA]),butfurtherconsiderationswillbedetailed within WP5 frame and provided in its outcome “D5.4 – Update of Exploitation andDisseminationStrategy(M24)”.

TwomainconcernsonIPRmanagementcouldimpactthecurrentdeliverable:

• ExistingordevelopeddatasetswillbeavailabletothewholeConsortiumduringtheprojecttimespan, but any further use in exploitation activities must follow specific limitationsand/orconditions(asstatedinArticle25.3ofthe[GA]anddescribedinitsAttachment1).

• All the identified datasets will be available to all Beneficiaries in order to develop thebusinesscasesusedtovalidatetheprojectresults,asexplicitlymentionedinthedescriptiontablescontainedin“Chapter6-Datasetdescription”(seeDatasetACCESSsection).

Page 33: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 33

Chapter4 Business Case high-leveldescription

ThemainbusinessobjectiveofEW-Shoppistodevelopcross-domaindataintegrationplatformthatwould enable fragmented European business ecosystem to increase efficiency andcompetitivenessthroughbuildingrelevantcustominsightsandbusinessknowledge.Thisplatformwill enable us to regain lost positions in competing against global internet service giants thatmanagedtopositiontheirgrowthandsectortransformationonintensiveexploitationofintegratedbigdatageneratedattheirproprietaryplatforms.

4.1 BingBang,CENEJE(BC1)

The goal of the business case is to follow user experience based on real time cross channel dataintegration. The business case will develop analytical predictive models for managing marketingactivities, sales resources, operations, data quality and content management that will increasepartnerefficiencyandsales.Itwillfurthermoreenablethedevelopmentofmarketdataenrichmentservicesandconsequentmonetization. Thiswill bedone through integrating cross-channel intent,research,interest,interactionandpurchasedatawithpointofsalessolutions.

Thedatathatwillbeintegratedare:

• Purchase intent:A collectionof user journeydata – pageviews, search terms, redirects tosellersandsimilar.

• Product attributes: A collectionof product attributes (varying fromgeneric such as name,EAN, brand, categorization and color to more specific as dimensions or technicalspecifications).

• Productspricehistory:Acollectionofsellerquotesforproducts.• Customer purchase history: Sell out data matched with customer baskets in a defined

timeframe.• Consumer intentand interaction:Acollectionofuser journeydatafromGoogleAnalytics -

pageviews,pageevents,searchterms,redirectstochannels,etc.• Contact and Consumer interaction history: calls (outbound, inbound and simulated calls),

other contacts events (email, SMS, click-through, fax, scan, or any other document) andotherevents.

To achieve the business case goals, in EW-Shopp we will set-up a virtual lab in a data cloudenvironmentwherewewillcreateasetofscenariosbyintegratingpartnerdatasetsofanonymizeduser paths to purchase that should include all possible engagements, decisions and purchaseinformation.Thedatawillbeusedinorderto:

Page 34: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 34

• developmodelsofpurchasebehavior;• clustersimilarbehaviorstooptimizeoperations;• enableuserexperienceadvertising;• developefficientsalespromotions;• provideefficientmarketingandcommunicationtools;• buildsegmentedmailinggroupsforefficientautomatizationofe-mailmarketing;• increaseefficiencyinabove-the-line(massmedia)andbelow-the-line(onetoone)activities;• createefficientPOSsolutionsforsales.

4.2 GfK(BC2)

Thegoalof thebusiness case two, is to findwhichare theexternal variablesand theirweights inpredictingsalesandsuccessofproducts.ExcepttheintegrationbetweenthetwodatasetsprovidedbyGfK,thisbusinesscaseaimsatintegratingalsoexternaldatasuchaseventsandweatherdata,inordertoimprovepredictability.

Thetwoservices,RetailSalesDataReportingSystemandEcho,wheretheformerallowstomaximizesalesandprofit inorder tokeepcustomerscomingback,while the latter tracksand improves theexperiencesofcustomersinreal-time.Thepredictivemodellearnedupontheintegrateddataaboutcustomerfeedbackaswellasthirdpartydatawillidentifywhichactionsdrivegrowth.

Thedatathatwillbeintegratedare:

• Marketdata:Salesdata(techgoods),ProductAttributesandPricesData(techgoods),andPurchaseData

• Consumers data: Demographics, TV Behaviour& Exposure Data (passive / survey), OnlineBehavior&ExposureData,IndividualPurchaseData(passive/survey),andMobileUsage&ExposureData

• Event data, including Sport Events (World cup, Champion, Olympic games, etc.), SocialEvents (strikes, terrorism, epidemics, etc.), Political Events (elections, relevant laws, etc.),NaturalEvents(earthquake,floods,etc.)

• HistoricalWeatherData:relevantweatherinformationacrossdifferentcountries• Socialmediadata:measuresofcustomerengagementacrossdifferentplatforms(e.g.,email

marketing,search• Purchaseintentandsearchdata:dataaboutpurchaseresearchandintentbycategoryand

searchbehaviourbasedonkeywordinteractionthroughadvertising.

4.3 Measurence(BC3)

ThegoalofthebusinesscaseistoimprovetheMeasuranceScout,alocationscoutingsolutionthathelpsinchoosingthebestlocationforthebusiness.Thiswilloptimizetherealestateinvestmentsbyanalyzingthetrafficaroundthelocationoftheirinterest.

Page 35: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 35

The traffic data, after being anonymized, are collected byMeasuranceWiFI technology at a highlevel of granularity.Moreover. in order to understand better the potential location,Measuranceneedalsoexternaldata suchasweatherdata,eventdata,geographicdata, salesdataofbusinessetc.

Thedatathatareplannedtobeintegratedare:

• Weatherdataatahighlevelofgranularity• Events data around a location: we need to be able to filter these events based on their

venueand,ideally,onthenumberofpeopleexpectedtojointheevents• Geographical data: Businesses in the area (shopping, restaurants etc.), schools, tourist

attractions,nightlife,etc.• Sales data: business volume of businesses in the area aggregated by kind of activity (e.g.

restaurants,clothesshop,etc.)

4.4 JOT(BC4)

Thegoalof JOTBusiness case isusingbigdata technologyand integrating crossdomainpurchaseintentiondataonthelevelofsearchandcommunicationandcontentinteractionsinordertoenableJOT to increase its clients’ communication efficiency and marketing effort allocation. Currentmethodsforonlinemarketingpredictionhavefailedsimplybecausethereisnosinglerulethatcanbeuniversallyappliedtoallmarkets,productsandsectors.Theonlywaytoeffectivelyfindanonlinemarketingmethodistoanalyseuserbehaviourandtrafficsources,takingintoaccountthedifferentaspectsofexternalenvironmentalandbehaviouralvariablesthatimpactit.Throughanalysingmarketingcampaignperformance,JOTcanobtainbehaviourpatternsthatcanbeusedtoestablishabehaviouralbaseline.ThankstothisJOTwillbeabletopredictthelikelypatternforcertaindaysortimeszoneswithsimilarcharacteristics.Behaviouranalysiscouldbeobtainedbycross-referencinggeographicaldatawithpeaktimes,baselinetraffic,dailyimpressionstrends,real-time conversion and bounce rates just to name a fewmetrics. Furthermore, in order to achieveaccurate results, a vast amountofdatawill have tobe collected soas toprovideaccuracy to thedatasample.JOT had planned to provide three different datasets within the project (two are proprietary andmeaningfulmainlyonlyintheirownbusinesscase):

• Trafficsources(Bing):HistoricalmarketingcampaignperformancestatisticsofsearchdatainBingadvertisingplatforms.

• Traffic sources (Google): Historical marketing campaign performance statistics of data inGoogleplatform.

• Twittertrends:TrendingtopicsasavailablethroughTwitterAPIs.Inrespectofthe[DoA],JOThasdatasetstosimplifytheusageoftheirdatawithinEW-Shoppprojectwithoutimpactingthesupportoftheservicesforeseeninthisbusinesscase:

• theoriginalPixelDatasethasbeenunifiedwithTrafficsourceGoogleandTrafficsourceBing;

Page 36: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 36

• for the Emailmarketing campaign dataset, the company Impactingwas no longer able toprovide it. JOTconfirmed thisdatasetdoesnotaffect thegoalof theproject,being justacomplement to the Traffic Source ones, so this will not interfere in the business casesuccess.Moreover, thishasallowed to removing,at the source, theproblemrelated to IPandgeo-localisation.

Otherdatasetswillbeaddedtotheabove-mentionedonesinordertorealizetheJOTbusinesscase:

• Events: A dataset covering different kinds of events (sporting, large-scale concerts,congresses,elections)forthedifferentcountriesthatwishtotakepart intheusecasewillbeneeded.ThiskindofdatasetisprovidedthroughEventRegistrydataset.

• Weatherhistory:ThisdatasetwillcontainhistoricaldataontheweatherthatJOTwillutilizefor the project. It will show the real weather conditions, even down to a specific hour /minute,duringthetimeperiodchosenforthestudy.ThisdatasetisprovidedthroughMARS(historicaldata)dataset.

• Weatherforecast:Sametimeperiodasforthepreviousdatasetbutjustthattheinformationwillbe theweather forecastedorpredicted for thegiven times,notnecessarily theactualclimaticconditions.

Thepurposeofthisbusinesscaseisrelatedtocarryoutsystematicanalysestopredicttheeffectofdifferent variables suchasweatherandothereventson theperformanceofmarketing campaign.Theseanalyseswillleadtothedevelopmentofdifferentbusinessservices:

1. Eventandweather-awarecampaignscheduling.Thisservicewillbeusedby JOTtopredicttheverybestmomenttolaunchorrunamarketingcampaignbasedonweatherconditionsandevents.

2. Event-based customer engagement analysis. This service supports the analysis of thepossibleimpactofeventsonOnlineShopping.

3. Event-based digital marketing management. This service supports intelligent bidding ondigitalmarketingplatforms,programmedbasedonevents.

4. Weather-responsive digital marketing. This service offers intelligent bidding on digitalmarketingplatforms,basedonreal-timeweatherconditions.

Page 37: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 37

Chapter5 EW-Shopp Methodology forDMP

Theaimofthischapteristoprovideanexplanationofalltheinformationrequiredtodataownersinordertomakedatafindable,accessible,interoperableandre-usable(FAIR)andtosharetheprocessfollowedinEW-Shopptocollectthesedata.

5.1 ElementsofEW-ShoppDataManagementPlan

TheDMPshouldaddresssomeimportantpointsonadatasetbydatasetbasisandshouldreflectthecurrentstatusofreflectionwithintheconsortiumaboutthedatathatwillbeproduced.TheDMP,asakeyelementofgooddatamanagement,hastodescribethelifecyclemanagementappliedtothedatatobecollected,processedand/orgeneratedbyaHorizon2020project.

Inordertomakedatafindable,accessible,interoperableandre-usable(FAIR),aDMPshouldinclude:

• DatasetIdentification:specifyingwhatdatawillbecollected,processedand/orgenerated.• DatasetOrigin:specifyingifexistingdataisbeingre-used(ifany),theoriginofthedataand

theexpectedsizeofthedata(ifknown).• Dataset Format: describing the structure and type of the data, time and spatial coverage

andlanguageandnamingconventions.• DataAccess:specifyingwhetherdatawillbeshared/madeopenaccess.Inparticular,for:

o Makingdataaccessible: specifying if andwhichdataproducedand/orused in theproject will be made openly available, moreover explaining why certain datasetscannot be shared (or need to be shared under restrictions), separating legal andcontractualreasonsfromvoluntaryrestrictions.

o Making data interoperable: specifying if the data produced in the project isinteroperable,thatisallowingdataexchangeandre-use.Moreover,specifyingwhatdataandmetadatavocabularies,standardsormethodologiesitismeanttofollowtomakedatainteroperable.

• Data Security: specifying which provisions are in place for data security (including datarecoveryaswell as secure storageand transferof sensitivedata). Furthermore, specifyingPersonalDatapresenceand,inthatcase,privacymanagementproceduresputinpractice.

Thefollowingparagraphsaimtogivemoredetails, intermsof theclassofattributes listedabove,andwillbeusedasaguidetodescribedatasetsprovidedforEW-Shopppurpose,inaccordancewiththeGuidelinesonDataManagementinHorizon2020.

Page 38: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 38

5.1.1 DatasetIDENTIFICATION

Firstofall,it’sneededtoidentifythedatasettobeproducedandprovidedatasetdetails,intermsofdescriptionofthedatathatwillbegeneratedorcollected.

FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetidentification:

• Category:Datasettypology(Market,Consumer,Products,Weather,Media).• Dataname:Nameofthedatasetthatshouldbeaself-explainingname.• Description:Descriptionofthedatasetinordertoprovidemoredetails.• Provider:Nameofthebeneficiaryprovidingthedataset(orbeinginchargeofbringingitinto

theproject).• ContactPerson:Nameofthepersontobecontactedforfurtherdetailsaboutthedataset.• BusinessCasesnumber:BCinvolved(i.e.,BCx)

5.1.2 DatasetORIGIN

FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetorigin:

• Availableat(M):Projectmonthinwhichthedatasetwillbeavailable.• CoreData (Y|N): Indicate if the dataset ismandatory andwill be part of the data shared

alongthedifferentUCsorifitisdiscretionaryandpresentonlyalimitedusage.• Size:Aroughorderofmagnitude(ROM)estimationintermsofMB/GB/TB.• Growth: A dynamic rough order of magnitude (ROM) estimate by selecting the most

appropriatefrequencyintermsofMB/GB/TBperhour/day/week/months/other.• Type and format: Dataset format, specifying if it is using, for example, CSV, Excel

spreadsheet,XML,JSON,etc.• Existingdata(Y|N):Thedataalreadyexistoraregeneratedfortheproject’spurpose.• Dataorigin:Howthedatainthedatasetisbeingcollected/generated(i.e.SQLtable,Google

API,etc.)

5.1.3 DatasetFORMAT

FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetformat:

• Dataset structure: description of the structure and type of the data. (i.e. the headercolumns,theJSONschema,RESTresponsefields,etc.).

• Dataset format: definition of the dataset format (i.e. specifying if it is using CSV, Excelspreadsheet,XML,JSON,GeoJSON,Shapefile,HTTPstream,etc.).

• Timecoverage:ifthedatasethasatimedimension,indicationofwhatperioddoesitcover.

Page 39: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 39

• Spatialcoverage:ifthedatasetrelatestoaspatialregion,indicationofwhatisitscoverage.• Languages:languagesofmetadata,attributes,codelists,descriptions.• Identifiability of data: reference to identifiability of data and standard identification

mechanism.• Namingconvention:descriptionabouthowthedatasetcanbeidentifiedifupdatedorafter

aversioningtaskhasbeenperformed,ifthedatasetisnotstatic.• Versioning:referencetohowoftenisthedataupdated(i.e.Noplannedupdating,Annually,

Quarterly,Monthly,Weekly,Daily,Hourly,Everyfewminutes,Everyfewseconds,Real-time)andhowtheversioningismanaged(i.e.ifdaily,everydayanewdatasetisgeneratedwiththenewlycreateddataoreverydayanewdatasetoverridestheoldonecontainingallthedatageneratedfromthebeginningofthecollection,…)

• Metadatastandards:specificationofstandardsformetadatacreation(ifany).Iftherearenostandardsdescriptionofwhatmetadatawillbecreatedandhow.

5.1.4 DatasetACCESS

FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetaccesswiththeaimtomakingdataaccessibleandinteroperable:

• Datasetlicense:ifthedatasetisreleasedasopendata,indicationofthelicenseused:CC031,CC-BY32, CC-BY-SA33, CC-BY-ND34, CC-BY-NC35, CC-BY-NC-SA36, CC-BY-NC-ND37, PDDL38,ODC-by39,ODbL40,otherorproprietary(withlinkifpossible).Otherwise,specifywhohaveaccesstothedataset(forexample,allpartnersintheconsortium,somepartnersforthepurposeoftooldevelopment,onlyasamplewillbedisclosed,etc.)

• Availability(public|private):thedatasetispublicorprivate.• AvailabilitytoEW-Shopppartners(Y|N):thedatasetisavailabletoEW-Shopppartners.• Availabilitymethod: specificationofhowthedatawillbemadeavailable (i.e.webpage in

the browser, web service (REST/SOAP APIs), query endpoint, file download, DB dump,directlysharedbytheresponsibleorganization,etc.).

• Tools toaccess: specificationofwhatmethodsorsoftware toolsareneededtoaccess thedata.

• Dataset source URL: specification of where the data and associated metadata,documentationandcodearedeposited(i.e.datasetsourceURL,etc.)

31https://creativecommons.org/share-your-work/public-domain/cc0/32https://creativecommons.org/licenses/by/2.0/33https://creativecommons.org/licenses/by-sa/2.0/34https://creativecommons.org/licenses/by-nd/2.0/35https://creativecommons.org/licenses/by-nc/2.0/36https://creativecommons.org/licenses/by-nc-sa/2.0/37https://creativecommons.org/licenses/by-nc-nd/2.0/38https://opendatacommons.org/licenses/pddl/39https://opendatacommons.org/category/odc-by/40https://opendatacommons.org/licenses/odbl/

Page 40: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 40

• Access restrictions: specification of how access will be provided in case there are anyrestrictions.

• Keyword/Tags: categorization of the dataset through some relevant keywords/tags (i.e.productcategories,price,etc.)

• Archivingandpreservation:descriptionoftheproceduresthatwillbeputinplaceforlong-termpreservationofthedata.Indicationofhowlongthedatashouldbepreserved,whatisitsapproximatedendvolume,whattheassociatedcostsareandhowtheseareplannedtobecovered.

• Data interoperability: specification of what data andmetadata vocabularies, standards ormethodologieswillbefollowedtofacilitateinteroperability.

• Standard vocabulary: specification ofwhat standard vocabulary, to allow inter-disciplinaryinteroperability,willbeusedforalldatatypespresent inthedataset. Ifnot,amappingtomorecommonlyusedontologieshastobeprovided.

WeprovidesomemoreclarificationsabouttheapproachtodescribeData interoperabilityandStandard vocabulary dimensions in EW-Shopp. Because of the sensitiveness of business dataused in the EW-Shopp innovation action, no commitment to publish datasets provided bybusiness partners as open data is made in [DoA]. Thus, the primary focus concerninginteroperabilityinEW-Shoppisonsupportingdataintegrationtasks,ratherthanonsupportingdiscoverabilityofdatasetsbythirdparties.

Forthisreason,inDatainteroperability,wewillfocusonmethodologiesthatwillbeadoptedtosupportinteroperabilitybetweenthedescribeddatasetandotherdatasets.Herewewillshortlydescribe the interoperability methodologies that we plan to use, while more details will beprovidedinD3.1–InteroperabilityRequirements,whichwillbepublishedatM8.

o Publication as linked data (RDF-ization). Linked data represented with the RDF41languageprovidesupporttodatainteroperabilityby:i)representinginformationaswithgraph-basedabstractions,oftenreferredtoasKnowledgeGraphs(descriptionsoftypedentities, their properties and mutual relations), ii) using global identifiers for entitiesdescribed in a dataset (URIs), iii) using terms (classes, properties, data types) fromshared vocabularies and ontologies. Publishing a source dataset using linked dataprinciples makes it easy to access and use the data for future integration tasks. Thismethodology is used in particular for EW-Shopp core data, i.e., data that are used asjoints to integrate different information sources like product data or productclassificationschemes,whicharenotavailablealreadyaslinkeddata.

o N/A(LinkedOpenData).Fordatathatarealreadyavailableaslinkeddata,weconsiderinteroperabilitymethodologynotapplicable.

o Semantic data enrichment. This is a key pillar of EW-Shopp approach adopted tosupport interoperability. Given an input dataset that is provided in a format differentfrom RDF, and after applying suitable transformations if needed, the dataset will besemanticallyannotatedusingsemantic labelling techniques.Weassumethat the input

41https://www.w3.org/RDF/

Page 41: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 41

datasetistransformedinatableinCSVformat,then,i)theheadersofthecolumntableswill be aligned with shared vocabularies (e.g., XSD used to define the data types, orpredicates of Schema.org42 used to describe offers in eCommerce portals), while ii)values will be linked to shared systems of identifiers (e.g., location identifiers fromDBpedia).Annotationswillsupporttheenrichmentofthedatausingthesharedsystemofidentifiersasjoints,andii)publicationofthedataasKnoweldgeGraphsrepresentedin RDF (if useful). For example, after linking a column representing product names toEAN codes, we can retrieve the brand of each product from a linked product datasource, thus enriching the original dataset. Semantic data enrichment also provides amethodologytopublishdatathatcomeintabularformataslinkeddata.However,suchapublicationisnotamandatorystepinsemanticdataenrichment.

o Referencestosharedsystemsofidentifiersandstandarddatatypes.AdatasourcesismadeinteroperablebyusingsharedsystemsofidentifierswithoutrequiringafullRDF-ization. For example, we may want to invoke weather data APIs using DBpediaidentifiersforlocations.

For Standard vocabulary, we refer to shared vocabularies, where “shared” refer to adoption bycommunityof users.Among shared vocabularieswe consider ISO standards, e.g., ISO860143 dateformats, languagesandvocabularies recommendedbyW3C44,e.g.,RDForTimeOWL245,butalsovocabulariesandsystemsof identifiers thatarebecomingde-factstandardbecauseofusage,e.g.,Schema.org,DBpedia,Wikipedia.Wewillconsiderthefollowingsharedvocabularies,whichwillbeusedintheprojecttosupportinteroperability:

o Terminologiesfromlanguagespecifications• Predicates,classesanddatatypesspecifiedinlanguagesrecommendedbyW3C(i.e.,

XSDDataTypes46,RDF,SKOS47,RDFS48,OWL49);thesetermsareusedthroughouttheproject,thustheywillnotbeaddedtothedescriptionsofindividualdatasets.

o Classifications• Interlinked product classifications. This classificationwill be built in EW-Shopp by

linking Google Categories (from Google product taxonomy), Global ProductClassification by GS1 1 and GFK product categories, i.e., categories used in GFKProduct Catalog 2 (GS1 categories are derived from GFK categories and the twoclassificationsarealigned).

o Domainontologiesandsharedsystemsofidentifiers• Linkedproductdata.

§ Schema-levelterminology(e.g.,Schema.org,GoodRelations50)

42http://schema.org/43https://www.iso.org/iso-8601-date-and-time-format.html44https://www.w3.org/45https://www.w3.org/TR/owl-time/46https://www.w3.org/TR/xmlschema11-1/47https://www.w3.org/2004/02/skos/48https://www.w3.org/TR/rdf-schema/49https://www.w3.org/TR/owl-features/50http://purl.org/goodrelations/

Page 42: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 42

§ Schema-level terminology and identifiers (GfK Product Catalog for retail,withinternalidentifiersandpartiallyalignedtoEANcodes3)

• Temporalontologies.StandardvocabulariesandothervocabulariesandontologiesrecommendedbyW3Ctorepresenttemporalinformation(e.g.,ISO8601,XSDDateandTimeDataTypes,TimeOWL2).

• Spatial ontologies and locations. Ontologies covering spatial schema-levelterminologyaswellasidentifiersoflocationsandadministrativeunitsacrossEurope(e.g., BasicGeoWGS8451,DBpediaOntology52, Schema.org,GeonamesOntology53,LinkedGeoData54,LinkedOpenStreetMaps55)

• Wikipediaentities.WikipediaprovideidentifiersforaverylargenumberandvarietyofentitiesdescribedinWikipedia,whichareadoptedbyaverylargecommunityofdataprovidersandconsumers.WithWikipediaentities,wereferalsoto identifiersused indatasourcesderivedfromWikipedia(e.g.,DBpedia)or linkedtoWikipediaidentifiers (e.g.,WikiData56).While identifiers of location play a prominent role inEW-Shoppandarecoveredbyspatiallocations,herewerefertoentitiesofdifferenttypes,used,e.g.,toannotateevents.

5.1.5 DataSECURITY

FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetsecurity:

• PersonalData(Y|N):Confirmationaboutpersonaldatapresenceinthedataset.• Anonymized(Y|N|NA):confirmationifpersonaldataisanonymized.• Datarecoveryandsecurestorage:Informationabouthowwasmanageddatarecoveryand

securestorage.• Privacy management procedures: Specification about procedure addressed in order to

manageprivacy.• PDAtTheSource(Y|N):ConfirmationaboutPersonaldataabsenceatthesource.• PD-Anonymisedduringproject(Y|N):ConfirmationaboutPersonaldataanonymisedduring

theproject.• PD - Anonymised before project (Y|N): Confirmation about Personal data anonymised

beforetheproject.• LevelofAggregation(forPDanonymizedbyaggregation):Indicationaboutwhichisthelevel

ofaggregationtoallowPersonaldataanonymization.

51https://www.w3.org/2003/01/geo/52http://dbpedia.org/ontology/53http://www.geonames.org/ontology/documentation.html54http://linkedgeodata.org/ontology55Auer,Sören,JensLehmann,andSebastianHellmann."Linkedgeodata:Addingaspatialdimensiontothewebofdata."TheSemanticWeb-ISWC2009(2009):731-746.56https://www.wikidata.org

Page 43: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 43

5.2 Processtocollectdatasetdetails

The goal to collect all the information, described in the previous paragraphs, has been achieved,withrespecttoEW-Shoppdataset,throughtheprocessdescribedherebelow.The first step was intended to set up a table with themain sections of the Dataset description:Dataset Identification,Datasetorigin,Datasetformat,DatasetaccessandDatasetsecurity.Eachofthese sections was further decomposed to contain all the information described in the relatedparagraphsshowedinthisChapter5The second step consisted in preparing a sort of survey in the form of a textual description (seeAnnexA–DMPSurvey),withthescopetogiveaclearunderstandingofalltherequiredinformationandeasethefulfilmentofthetable.Thethirdstepwasrealizedbyperformingacollectionprocess,wheneachBusinesscaseownerhadto fulfill the table and then it was interviewed by a technical partner aiming at discussing theinformationprovided.At theendof theprocess,all the informationcollectedwasmerged inan integratedspreadsheet.The same informationwillbediscussed, in the followingchapter,usinga table format inorder toeasetheunderstandingofeachdatasetdescription.

Page 44: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 44

Chapter6 DatasetdescriptionThe aim of this chapter is to provide, for each dataset, a description trying to answer to all theinformationlistedinChapter5inaccordancewithGuidelinesonFAIRDataManagementinHorizon2020 and with ethics and legal requirements. Dataset, as it’s possible to see in the followingparagraphs, refers to individual dataset but also to families of datasets with the same structurecreatedindifferentmomentsoftimeorunderotherdiscriminatingconditions.

6.1 CEDataset-ConsumerData:PurchaseIntent

6.1.1 DatasetIDENTIFICATION

Thedataset“PurchaseIntent”isproprietaryandcontainsuserjourneymetricsandlogs.

Table5.DATASETIDENTIFICATION–PurchaseIntent

Category ConsumerdataDataname PurchaseintentDescription A collection of user journey data – pageviews,

searchterms,redirectstosellersandsimilar.Datais logged to local databases and we provide datafrom 1. 1. 2015. Local databases consist of SQLdatabasesandNoSQLdatabases.

Provider CenejeContactPerson DavidCreslovnik

UrosMevcBusinessCasesnumber BC1

6.1.2 DatasetORIGIN

This dataset is available from January 2017 and it cannot be defined as “core data”. The datasetalreadyexisted.

Table6.DATASETORIGIN–PurchaseIntent

Availableat(M) M1CoreData(Y|N) NSize Local:

65millionpageviews17milliondeeplinks

Growth 300000pageviewsperday

Page 45: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 45

15000searchesperday25000redirectsperday

Typeandformat structureddocuments,TSV

Existingdata(Y|N) YDataorigin SQLtables

NoSQLdocuments

6.1.3 DatasetFORMAT

Thedatasethasatsv(SQL)orjson(NoSQL)format,thedatastructureisillustratedinthefollowingtable. It collects data not in a specific language, since 2015 and it covers information at Countrylevel. The data is updated daily that means every day the dataset contains only the data newlygenerated.

Table7DATASETFORMAT–PurchaseIntent

Datasetstructure *SQLtables*Productpageviews-IdProduct(INT)-NameProduct(STRING)-L1(STRING):Level1category-L2(STRING):Level2category-L3(STRING):Level3category-IdUsers(INT)-Date(DATETIME)Productdeeplinks(redirectstosellers)-IdProduct(INT)-NameProduct(STRING)-L1(STRING):Level1category-L2(STRING):Level2category-L3(STRING):Level3category-IdUsers(INT)-IdSeller(INT)-Date(DATETIME)*NoSQLdocuments*Pagesearch{"_id":(ObjectId)"IdUsers":(INT),"TimeStamp":(ISODate),"Search":{"NumberOfResults":(INT),

Page 46: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 46

"Query":(STRING)}

Datasetformat SQL:tsvNoSQL:json

Timecoverage since2015Spatialcoverage CountryLanguages notlanguagespecificIdentifiabilityofdata YesNamingconvention /{country}/YYYY/MM/DD.tsvVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A

6.1.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailable through File-download by means of WGET/Curl. Dataset will be deposited on AWS orCenejestaticcontentserverandtheaccessisprovidedbycredentials.

Table8MAKINGDATAACCESSIBLE–PurchaseIntent

Datasetlicense Owner:CenejeAccess:Allmembers

Availability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Filedownload(zip)Toolstoaccess WGET/CurlDatasetsourceURL AWSorCenejestaticcontentserverAccessrestrictions CredentialsKeyword/Tags N/AArchivingandpreservation NO(canbegeneratedondemand)

Table9MAKINGDATAINTEROPERABLE–PurchaseIntent

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Interlinkedproductclassification

• Linkedproductdata• Temporalontologies

6.1.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldatabecausethesewereanonymizedbeforebeingused intheproject.Itisexpectedasecurestorageandregularbackups.

Page 47: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 47

Table10DATASETSECURITY-PurchaseIntent

PersonalData(Y|N) NAnonymized(Y|N|NA) YDatarecoveryandsecurestorage Securestorage,regularbackupsPrivacymanagementprocedures N/APDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) YLevelofAggregation(forPIDanonymizedbyaggregation) UserIdlevel(anonymous)

6.1.6 EthicsandLegalrequirements

ThesourceofthedatacontainsPD,butdataareanonymizedbeforetheprojectandsharedwithinthe project without PD. Since Ceneje already notified to their Data Protection Officer (DPO) thattherewillbenoPDshared,theydon’tneedtogetadditionalopinion.NotificationtoDataProtectionOfficerisincludedindeliverable[D7.2].

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.

6.2 MEDataset-ConsumerData:Locationanalyticsdata(Hourly)

6.2.1 DatasetIDENTIFICATION

The dataset “Location analytics data”, provided by Measurence, focuses on Hourly number ofdeviceswithWiFienabledthatpassthroughanareacoveredbyMeasurenceWiFisensors.

Table11.DATASETIDENTIFICATION–Locationanalyticsdata

Category ConsumerDataDataname LocationanalyticsdataDescription Hourly number of devices withWiFi enabled that pass

throughanareacoveredbyMeasurenceWiFisensorsProvider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3

Page 48: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 48

6.2.2 DatasetORIGIN

Thisdataset isavailablefromJanuary2017and itcannotbedefinedas“coredata”. IthasaAPIs-JSONformatwithasizeof~600GBandagrowthof~5GB/ location/month.Thedatasetalreadyexistedbeforetheproject.

Table12.DATASETORIGIN–Locationanalyticsdata

Availableat(M) M1CoreData(Y|N) NSize ~600GBGrowth ~5GB/location/monthTypeandformat APIs-JSONformatExistingdata(Y|N) YDataorigin Proprietarysensors

6.2.3 DatasetFORMAT

ThedatasethasaJSONandCSVformat.Itcollectsnumericaldatagatheredsince2015anditcoversinformationrelatedtozipcode,coordinates,address,county,city,country.Thedataisupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.

Table13DATASETFORMAT–Locationanalyticsdata

Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat JSONandCSVTimecoverage startingfrom2015Spatialcoverage zipcode,coordinates,address,county,city,countryLanguages EN(numericaldata)Identifiabilityofdata No.Rawdatacontainsahashedversionof therealmacaddresswhich is

anonymizedatthesourceNamingconvention /location_id/YYYY/MM/DD/HHVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A

6.2.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIbymeansofAuthenticatedencryptedchannel.

Table14MAKINGDATAACCESSIBLE–Locationanalyticsdata

Datasetlicense Owner:ME.Access:membersAvailability(public|private) private

Page 49: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 49

AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIToolstoaccess AuthenticatedencryptedchannelDatasetsourceURL APIendpointAccessrestrictions Credentials/APIkeysKeyword/Tags presencedata,locationintelligenceArchivingandpreservation Lifetimearchiveofrawdata.TheAPIsalways

usethelastversionofthealgorithm

Table15MAKINGDATAINTEROPERABLE–Locationanalyticsdata

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

6.2.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpecteddatarecoveryandasecurestorage.

Table16DATASETSECURITY-Locationanalyticsdata

PersonalData(Y|N) NAnonymized(Y|N|NA) Y, prior to storing data in a database (No PD is

storedinanydatabase)Datarecoveryandsecurestorage YPrivacymanagementprocedures Allthedataanonymizedarebeforestorage(read

paragraph6.2.6)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevel of Aggregation (for PD anonymized byaggregation)

N/A

6.2.6 EthicsandLegalrequirements

The MAC addresses that Measurence's sensors collect (which can be unique identifiers of WiFitransmitters) are hashed with the cryptographic hash function SHA-2 256bits – which is a set ofcryptographic hash functions57 designed by the United States National Security Agency (NSA).Measurence followed a privacy by design approach, so after hashing has been performed, the

57https://simple.wikipedia.org/wiki/Cryptographic_hash_function

Page 50: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 50

hashedMACaddressissenttoourserversandtheoriginalMACaddressgetsdiscardeddirectlybythesensor:weneverstoretherealmacaddressonourservers.GivenahashedMACaddressthereisnowaytoreconstructthecorrespondingoriginalMACaddress,otherthanattemptabruteforceattack (which, obviously, is applicable to any cryptographic function). Based on the abovedescription,thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframework that regulates the use of personal data does not apply and copy of opinion is notrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.3 MEDataset-ConsumerData:Locationanalyticsdata(Daily)

6.3.1 DatasetIDENTIFICATION

Thedataset“Locationanalyticsdata”,providedbyMeasurence,focusesondailynumberofdeviceswithWiFienabledthatpassthroughanareacoveredbyMeasurenceWiFisensors.

Table17.DATASETIDENTIFICATION–Locationanalyticsdata

Category ConsumerDataDataname LocationanalyticsdataDescription Daily number of devices with WiFi enabled that

passthroughanareacoveredbyMeasurenceWiFisensors

Provider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3

6.3.2 DatasetORIGIN

Thisdataset isavailablefromJanuary2017and itcannotbedefinedas“coredata”. IthasaAPIs-JSONformatwithasizeof~600GBandagrowthof~5GB/ location/month.Thedatasetalreadyexistedbeforetheproject.

Table18.DATASETORIGIN–Locationanalyticsdata

Availableat(M) M1CoreData(Y|N) NSize ~600GBGrowth ~5GB/location/month

Page 51: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 51

Typeandformat APIs-JSONformatExistingdata(Y|N) YDataorigin Proprietarysensors

6.3.3 DatasetFORMAT

ThedatasethasaJSONandCSVformat.Itcollectsnumericaldatagatheredstartingfrom2015anditcovers information related to zip code, coordinates, address, county, city, country. The data isupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.

Table19DATASETFORMAT–Locationanalyticsdata

Datasetstructure N/AbecausethereisnoaccesstothedatathroughURL

Datasetformat JSONandCSV

Timecoverage startingfrom2015

Spatialcoverage zipcode,coordinates,address,county,city,country

Languages EN(numericaldata)

Identifiabilityofdata No. Raw data contains an hashed version of the realmac addresswhichisanonymizedatthesource

Namingconvention /location_id/YYYY/MM/DD/

Versioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)

Metadatastandards N/A

6.3.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIbymeansofAuthenticatedencryptedchannel.

Table20MAKINGDATAACCESSIBLE–Locationanalyticsdata

Datasetlicense Owner:ME.Access:membersAvailability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIToolstoaccess AuthenticatedencryptedchannelDatasetsourceURL TBD/APIendpointAccessrestrictions Credentials/APIkeysKeyword/Tags presencedata,locationintelligence

Page 52: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 52

Archivingandpreservation Lifetime archive of raw data. The APIsalwaysusethelastversionofthealgorithm

Table21MAKINGDATAINTEROPERABLE–Locationanalyticsdata

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

6.3.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpecteddatarecoveryandasecurestorage.

Table22DATASETSECURITY-Locationanalyticsdata

PersonalData(Y|N) NAnonymized(Y|N|NA) Y,priortostoringdatainadatabase(NoPDisstored

inanydatabase)Datarecoveryandsecurestorage YPrivacymanagementprocedures All the data anonymized are before storage (read

paragraph6.3.6)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation)

N/A

6.3.6 EthicsandLegalrequirements

The MAC addresses that Measurence's sensors collect (which can be unique identifiers of WiFitransmitters) are hashed with the cryptographic hash function SHA-2 256bits – which is a set ofcryptographic hash functions designed by the United States National Security Agency (NSA).Measurence followed a privacy by design approach, so after hashing has been performed, thehashedMACaddressissenttoourserversandtheoriginalMACaddressgetsdiscardeddirectlybythesensor:weneverstoretherealmacaddressonourservers.GivenahashedMACaddressthereisnowaytoreconstructthecorrespondingoriginalMACaddress,otherthanattemptabruteforceattack (which, obviously, is applicable to any cryptographic function). Based on the abovedescription,thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframework that regulates the use of personal data does not apply and copy of opinion is notrequiredtobecollected.

Page 53: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 53

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.4 BBDataset-ConsumerData:CustomerPurchaseHistory

6.4.1 DatasetIDENTIFICATION

Thedataset “Customerpurchasehistory” isproprietaryandcontainsdataon customersand theirpurchases.

Table23.DATASETIDENTIFICATION–CustomerPurchaseHistory

Category ConsumerdataDataname CustomerpurchasehistoryDescription Sell outdatamatchedwith customerbaskets in a

definedtimeframe.Provider BigBangContactPerson MatijaTorlakBusinessCasesnumber BC1

6.4.2 DatasetORIGIN

ThisdatasetisavailablefromJanuary2017anditcannotbedefinedas“coredata”.Ithasasizeof29000productsandagrowthof2000newproductsperyear.Thedatasetalreadyexistedbeforetheproject.

Table24DATASETORIGIN–CustomerPurchaseHistory

Availableat(M) M1CoreData(Y|N) NSize 29000productsGrowth 2000newproductsperyearTypeandformat structuredtabulardataExistingdata(Y|N) YDataorigin Google Analytics, DWH (SQL tables), Excel

structureddata

Page 54: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 54

6.4.3 DatasetFORMAT

The dataset has a CSV/XLS format. It collects data gathered since 2013 and it covers informationrelatedtototalorperstore location(18stores+web).Thedata isupdateddailyandcontainsthedatanewlygeneratedandhistory.

Table25DATASETFORMAT–CustomerPurchaseHistory

Datasetstructure BBClassification-canbematchedwithGPCClassification; purchase data table structured (SQL)

Datasetformat CSV/XLSTimecoverage since2013Spatialcoverage Totalorperstorelocation(18stores+web)Languages slovenianIdentifiabilityofdata YesNamingconvention /{country}/companyname/purchaseid.jsonVersioning daily(new+history)Metadatastandards GoogleAnalytics

6.4.4 DatasetACCESS

Thedatasetispublic,butitisaccessiblethroughpassword.Thedatawillbemadeavailablethroughdownload.

Table26MAKINGDATAACCESSIBLE–CustomerPurchaseHistory

Datasetlicense Admin-FullUser(Owner)

AccessallmembersthroughpassandusernameAvailability(public|private) public(password,usernamerestricted)AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Download,view,edit(basedonlicense)Toolstoaccess AccessibleonwebDatasetsourceURL BBvirtualserverAccessrestrictions CredentialsKeyword/Tags OrderId,ProductId,StoreId,….SameastheSampleArchivingandpreservation Canbegeneratedondemand

Table27MAKINGDATAINTEROPERABLE–CustomerPurchaseHistory

Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment

Standardvocabulary • Interlinkedproductclassification• Linkedproductdata

Page 55: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 55

• Temporalontologies

6.4.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpectedsecurestorageandconstantdownloadoptions.

Table28DATASETSECURITY–CustomerPurchaseHistory

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,constantdownloadoptionsPrivacymanagementprocedures Personaldatawillnotbeprocessedduringtheproject.

All data are returned by analytics engine thatwill notprovidePD.

PDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevel of Aggregation (for PDanonymizedbyaggregation)

N/A

6.4.6 EthicsandLegalrequirements

ThesourceofthedatacontainsPD,butdataareanonymizedbeforetheprojectandsharedwithintheprojectwithoutPD.SinceBingBangalreadynotifiedtotheirDataProtectionOfficer(DPO)thattherewillbenoPDshared,theydon’tneedtogetadditionalopinion.NotificationtoDataProtectionOfficerisincludedindeliverable[D7.2].

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.5 BBDataset-ConsumerData:ConsumerIntentandInteraction

6.5.1 DatasetIDENTIFICATION

Page 56: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 56

The dataset “Consumer intent and interaction” is proprietary and contains data on customerjourneysrecorderusingGoogleanalytics.

Table29.DATASETIDENTIFICATION–ConsumerIntentandInteraction

Category ConsumerdataDataname ConsumerintentandinteractionDescription A collection of user journey data from Google

Analytics - pageviews, page events, search terms,redirects to channels, etc. Data is recorded sinceDecember2012.

Provider BigBangContactPerson MatijaTorlakBusinessCasesnumber BC1

6.5.2 DatasetORIGIN

This dataset is available from January 2017 and it cannot be defined as “core data”. The datasetalreadyexistedbeforetheproject.

Table30DATASETORIGIN-ConsumerIntentandInteraction

Availableat(M) M1CoreData(Y|N) NSize 130millionpageviews,

20millionsessions,8millionusers,70000transactions(sinceDecember2012)

Growth 10.000 users per dayTypeandformat numericExistingdata(Y|N) YDataorigin GoogleAnalytics

6.5.3 DatasetFORMAT

ThedatasethasaCSVformat. Itcollectsdatagatheredsince2013anditregardsthewholeworld.Thedataisupdateddailyandcontainsthedatanewlygeneratedandhistory.

Table31DATASETFORMAT–ConsumerIntentandInteraction

Datasetstructure GoogleAnalyticsspecifiedDatasetformat CSV,XLSTimecoverage since2013Spatialcoverage Global

Page 57: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 57

Languages notlanguagespecificIdentifiabilityofdata YesNamingconvention N/AVersioning daily(new+history)Metadatastandards GoogleAnalytics

6.5.4 DatasetACCESS

Thedatasetispublic,butitisaccessiblethroughpassword.Thedatawillbemadeavailablethroughdownload.

Table32MAKINGDATAACCESSIBLE–ConsumerIntentandInteraction

Datasetlicense Admin-FullUser(Owner)

AccessallmembersthroughpassandusernameAvailability(public|private) public(password,usernamerestricted)AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Download,view,edit(basedonlicense)Toolstoaccess N/ADatasetsourceURL BBvirtualserverAccessrestrictions CredentialsKeyword/Tags GooglesearchtagsArchivingandpreservation N/Abecausedataisusedjustforanalytical

Table33MAKINGDATAINTEROPERABLE–ConsumerIntentandInteraction

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Interlinkedproductclassification

• Linkedproductdata• Temporalontologies

6.5.5 DatasetSECURITY

Thedatasetdoesnot containpersonaldata. It is expected secure storageandconstantdownloadoptions.

Table34DATASETSECURITY–ConsumerIntentandInteraction

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,backupPrivacymanagementprocedures Google Analytics data only, so no PD included. In this case

Page 58: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 58

dataisonthelevelofproduct/categories/page.PDatthesource(Y|N) NPD - anonymised during project(Y|N)

N

PD- anonymised before project(Y|N)

N

Level of Aggregation (for PDanonymizedbyaggregation)

N/A

6.5.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription,thedatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.6 MEDataset-ConsumerData:Locationanalyticsdata(Weekly)

6.6.1 DatasetIDENTIFICATION

The dataset “Location analytics data”, provided by Measurence, focuses on weekly number ofdeviceswithWiFienabledthatpassthroughanareacoveredbyMeasurenceWiFisensors.

Table35.DATASETIDENTIFICATION–Locationanalyticsdata

Category ConsumerDataDataname LocationanalyticsdataDescription WeeklynumberofdeviceswithWiFienabled that

passthroughanareacoveredbyMeasurenceWiFisensors

Provider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3

6.6.2 DatasetORIGIN

Page 59: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 59

Thisdataset isavailablefromJanuary2017and itcannotbedefinedas“coredata”. IthasaAPIs-JSONformatwithasizeof~600GBandagrowthof~5GB/ location/month.Thedatasetalreadyexistedbeforetheproject.

Table36.DatasetORIGIN–Locationanalyticsdata

Availableat(M) M1CoreData(Y|N) NSize ~600GB

Growth ~5GB / location / month

Typeandformat APIs - JSON format

Existingdata(Y|N) Y

Dataorigin Proprietarysensors

6.6.3 DatasetFORMAT

ThedatasethasaJSONandCSVformat.Itcollectsnumericaldatagatheredstartingfrom2015anditcovers information related to zip code, coordinates, address, county, city, country. The data isupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.

Table37DATASETFORMAT–Locationanalyticsdata

Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat JSONandCSVTimecoverage startingfrom2015Spatialcoverage zipcode,coordinates,address,county,city,countryLanguages EN(numericaldata)Identifiabilityofdata No. Raw data contains a hashed version of the real mac address

whichisanonymizedatthesourceNamingconvention /location_id/YYYY/weeknumVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A

6.6.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIbymeansofAuthenticatedencryptedchannel.

Table38MAKINGDATAACCESSIBLE–Locationanalyticsdata

Datasetlicense Owner:ME.Access:membersAvailability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod API

Page 60: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 60

Toolstoaccess AuthenticatedencryptedchannelDatasetsourceURL TBD/APIendpointAccessrestrictions Credentials/APIkeysKeyword/Tags presencedata,locationintelligenceArchivingandpreservation Lifetimearchiveof rawdata.TheAPIsalwaysuse

thelastversionofthealgorithm

Table39MAKINGDATAINTEROPERABLE–Locationanalyticsdata

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

6.6.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpecteddatarecoveryandasecurestorage.

Table40DATASETSECURITY-Locationanalyticsdata

PersonalData(Y|N) NAnonymized(Y|N|NA) Y,prior tostoringdata inadatabase

(NoPDisstoredinanydatabase)Datarecoveryandsecurestorage YPrivacymanagementprocedures All the data anonymised are before

storage(readparagraph6.6.6)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

6.6.6 EthicsandLegalrequirements

The MAC addresses that Measurence's sensors collect (which can be unique identifiers of WiFitransmitters) are hashed with the cryptographic hash function SHA-2 256bits – which is a set ofcryptographic hash functions designed by the United States National Security Agency (NSA).Measurence followed a privacy by design approach, so after hashing has been performed, thehashedMACaddressissenttoourserversandtheoriginalMACaddressgetsdiscardeddirectlybythesensor:weneverstoretherealmacaddressonourservers.GivenahashedMACaddressthereisnowaytoreconstructthecorrespondingoriginalMACaddress,otherthanattemptabruteforceattack (which, obviously, is applicable to any cryptographic function). Based on the above

Page 61: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 61

description,thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframework that regulates the use of personal data does not apply and copy of opinion is notrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.7 BTDataset-CustomerCommunicationData:ContactandConsumerInteractionHistory

6.7.1 DatasetIDENTIFICATION

The dataset “Contact and Consumer Interaction history” is proprietary and contains data oncommunicationswithcustomers.

Table41.DATASETIDENTIFICATION–ContactandConsumerInteractionHistory

Category CustomerCommunicationDataDataname ContactandConsumerInteractionHistoryDescription Thedatasetcontainsthefollowingdata:

• callso everyoutboundcall;successfulornot(everyattemptcounts)o everyinboundcall;successfulornoto everysimulatedcall

• othercontactseventso every inbound email, SMS, click-through, fax, scan, or any otherdocument

o everyoutboundemail,SMS,fax,oranyothersentdocument• othereventso arecordofagent'stimespentonwaitingforacontacto arecordofeverytimeanagentlogsinorouto arecordofeverytimeanagentjoinsorleavesacampaigno a record of every CCServer (CDE COCOS CEP Contact CenterServer)startuporshutdown

Usingthisdata,it ispossibletocreatestatisticsandreportsregardingtelephony and performance of single agents, groups of agents,campaignsandcallcenter.NearlyallthereportsprovidedbyCCServeraremadefromthistable.Althoughthistable isn'tmeanttoserveasabasisforcontentrelatedreports (i.e., interview statistics), there are some fields in the tablethatmaybeusedforthiskindofreportsaswell.

Page 62: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 62

Dataset data are either generated from the CCServer system orcollectedfromcollectedfromthecontactsignaling(protocol).The data are intended for handling the Customer EngagementPlatform (CEP) campaigns, theyarealreadyused for these intentionsandareinfutureintendedforthesamepurposes.Existing data is carrying all information about realized connectiontypes and services and will be reused and upgraded with newcommunicationchannels,trendsandservices.

Provider Browsetel/CDEContactPerson MatejŽvan

AlešŠtorBusinessCasesnumber BC1

6.7.2 DatasetORIGIN

ThisdatasetisavailablefromMarch2017anditcanbedefinedas“coredata”.Itssizeisof5-20GBwithagrowthof5-20GB/year.Thedatasetalreadyexistedbeforetheproject.

Table42DATASETORIGIN–ContactandConsumerInteractionHistory

Availableat(M) M3CoreData(Y|N) YSize 5-20GBGrowth 5-20GB/yearTypeandformat Current format is SQL, target format CSV UTF-8

Textfile(compressed)Existingdata(Y|N) YDataorigin Contact center and Customer Interaction

Managementdata

6.7.3 DatasetFORMAT

ThedatasethasCSVUTF8format.ItcoversinformationrelatedtoSloveniaareainEnglish.Thedataisupdatedmonthly.

Table43DATASETFORMAT–ContactandConsumerInteractionHistory

Datasetstructure RAWdata.OptimizedData fromthesystem“CallHistory” tableandhistory fromCustomerInteractionManagement.Recordsdescribingcontactscanbedescribedbyadditionalinformationrecords.EVENTIDCAMPAIGNRESULT_CCS

Page 63: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 63

RESULT_CODECALL_PRIORITYATTEMPT_NRMANUAL_MODECCS_ENDSTATECOSTCONTACT_COUNTFOR_APPOINTMENTCALL_TYPECALL_DIRECTIONDISC_CAUSEDISC_CAUSE_DESCQUEUE_SIZEALL_QUEUE_SIZEDISC_BY_CUSTOMERCUSTOM_DATACALLED_NUMBERVRU_NUMBERTRANSFERSREJECTSIGNORES...CALL_REASONEVENT_SERVICE_ORIGINEVENT_ORIGINEVENT_TYPEEVENT_DATEEVENT_LOCATIONMEDIA_TYPETOTAL_TIMECONVERSATION_TIME

Datasetformat CSVUTF8Timecoverage 1year(atthestart),updatedduringtheprojectdurationSpatialcoverage SloveniaLanguages EnglishIdentifiabilityofdata Persistent and unique identifiers are used e.g. EVENT_ID,

CAMPAIGN_ID,CHANNEL_ID…Namingconvention NotusedVersioning MonthlyMetadatastandards Proprietarysolutioninformofrelationaltables

Page 64: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 64

6.7.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablefromSecureFTPincompressedCSVUTF-8.

Table44MAKINGDATAACCESSIBLE–ContactandConsumerInteractionHistory

Datasetlicense No licencing for the timeof EWShoppproject duration.Access viaACLisenabledforallpartnersintheconsortium

Availability (public |private)

private

Availability to EW-Shopppartners(Y|N)

Y

Availabilitymethod DataavailablefromSecureFTPincompressedCSVUTF-8.Toolstoaccess SecureFTPClientDatasetsourceURL Browsetel,securefileserverAccessrestrictions CredentialsKeyword/Tags ContactsArchivingandpreservation Datawill be preserved for the time of EW Shopp project duration.

Endvolumeisapproximatedtobe20GB.

Table45MAKINGDATAINTEROPERABLE–ContactandConsumerInteractionHistory

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

6.7.5 DatasetSECURITY

ThedatasetdoesnotcontainPDbecausePDwasremovedatthesource.

Table46DATASETSECURITY–ContactandConsumerInteractionHistory

PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage NPrivacymanagementprocedures Callernumberisignoredandnotrecorded(notneeded

inanalyticalprocessing)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation)

N/A

Page 65: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 65

6.7.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription,thedatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.8 ECMWFDataset-Weather:MARSHistoricalData

6.8.1 DatasetIDENTIFICATION

Thedataset“MARSHistoricalData”isproprietaryandcontainsmeteorologicaldata.

Table47.DATASETIDENTIFICATION–MARSHistoricalData

Category WeatherDataname Meteorological Archival and Retrieval System

(MARS)HistoricalDataDescription Meteorologicalarchiveofforecastsofthepast35years

andsetsofreanalysisforecasts.Provider European Centre forMedium-RangeWeather Forecasts

(ECMWF)ContactPerson AljažKošmerljBusinessCasesnumber BC1,BC2,BC3,BC4

6.8.2 DatasetORIGIN

ThisdatasetisavailablefromApril2017anditcanbedefinedas“coredata”.Itssizeis>85PT.Thedatasetalreadyexistedbeforetheproject.

Table48DATASETORIGIN–MARSHistoricalData

Availableat(M) M4CoreData(Y|N) YSize >85PTGrowth CompletestatusofatmospheretwiceadayTypeandformat structured, CSVExistingdata(Y|N) YDataorigin ECMWFMARSAPI

Page 66: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 66

6.8.3 DatasetFORMAT

ThedatasethasCSV format. It covers information related towholeearth inEnglish language.Thedataisupdatedreal-time.

Table49DATASETFORMAT–MARSHistoricalData

Datasetstructure N/ADatasetformat CSVTimecoverage past35yearsSpatialcoverage GlobalLanguages EnglishIdentifiabilityofdata YesNamingconvention /{country}/YYYY/MM/DD.CSVVersioning Real-timeMetadatastandards N/A

6.8.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablebyAPIaccess.

Table50MAKINGDATAACCESSIBLE–MARSHistoricalData

Datasetlicense Owner:ECMWF.Access:AllmembersAvailability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIaccessToolstoaccess RESTAPI,PythonAPIDatasetsourceURL http://apps.ecmwf.int/mars-catalogue/Accessrestrictions CredentialsKeyword/Tags weather,climateArchivingandpreservation ECMWFmaintainedarchive

Table51MAKINGDATAINTEROPERABLE–MARSHistoricalData

Datainteroperability • Semanticdataenrichment• References to shared systems of identifiers

andstandarddatatypesStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations• Wikipediaentities

Page 67: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 67

6.8.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table52DATASETSECURITY–MARSHistoricalData

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage yes,bothmanagedbyECMWFPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A

6.8.6 EthicsandLegalrequirements

Based on the above dataset description, the dataset “MARS Historical Data” does not containpersonal data, therefore the national and European legal framework that regulates the use ofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.

6.9 CEDataset-ProductsandCategories:ProductAttributes

6.9.1 DatasetIDENTIFICATION

Thedataset“Productattributes”isproprietaryandcontainsinformationaboutindividualattributesforvariousproducts.

Table53.DATASETIDENTIFICATION–ProductAttributes

Category ProductsandcategoriesDataname ProductattributesDescription A collection of product attributes (varying from generic

such as name, EAN, brand, categorization and color tomore specific as dimensions or technical specifications).Data is collected from more than one thousand onlinestoresin5countriesandthenautomaticallyandmanuallymergedintoanorganizeddataset.

Provider CenejeContactPerson DavidCreslovnik

UrosMevc

Page 68: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 68

BusinessCasesnumber BC1

6.9.2 DatasetORIGIN

ThisdatasetisavailablefromJanuary2017anditcanbedefinedas“coredata”.Thedatasetalreadyexistedbeforetheproject.

Table54DATASETORIGIN-ProductAttributes

Availableat(M) M1CoreData(Y|N) YSize 12millionproducts

10millionproductspecificationsGrowth 10000newproductsperday

7000productspecificationsperdayTypeandformat structuredtabulardataExistingdata(Y|N) YDataorigin SQLtables

6.9.3 DatasetFORMAT

Thedatasetcollectsdatastartingfrom2016andrelatedtoCountryinSlovenian,Croatian,Serbianlanguage.ThedataisupdatedDaily.

Table55DATASETFORMAT–ProductAttributes

Datasetstructure Productattributes-IdProduct(INT)-NameProduct(STRING)-L1(STRING)-L2(STRING)-L3(STRING)-AttName(STRING)-AttValue(STRING)

Datasetformat SQL:tabular

Timecoverage since2016

Spatialcoverage Country

Languages slovenian,croatian,serbian

Identifiabilityofdata Yes

Page 69: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 69

Namingconvention /{country}/product_attributes.tsv

Versioning Daily(everydaythedatasetcontainsfullgenerateddata)

Metadatastandards N/A

6.9.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughFiledownload.

Table56MAKINGDATAACCESSIBLE–ProductAttributes

Datasetlicense Owner:Ceneje.Access:Allmembers

Availability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Filedownload(zip)Toolstoaccess WGET/CurlDatasetsourceURL AWSorCenejestaticcontentserverAccessrestrictions CredentialsKeyword/Tags N/AArchivingandpreservation NO(canbegeneratedondemand)

Table57MAKINGDATAINTEROPERABLE–ProductAttributes

Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment

Standardvocabulary • Interlinkedproductclassification• Linkedproductdata

6.9.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table58DATASETSECURITY–ProductAttributes

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,regularbackupsPrivacymanagementprocedures N/APDatthesource(Y|N) N

PD-anonymisedduringproject(Y|N) N

Page 70: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 70

PD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) Productlevel

6.9.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription,thedataset“ProductAttributes”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.10 JSIDataset-Media:EventRegistry

6.10.1 DatasetIDENTIFICATION

Thedataset“EventRegistry” isproprietaryandcontainsclustered informationabouteventsbasedonnewsarticlesonline.

Table59.DATASETIDENTIFICATION–EventRegistry

Category DatasetMediaDataname EventRegistryDescription Aregistryofnewsarticleswhichareautomatically

clusteredintoevents-setsofarticlesaboutthesamereal-worldevent.Thearticlesarecollectedfromover150thousandsourcesfromallovertheworldandin21languages.Articletextisprocessedandannotatedusingalinguisticandsemanticanalysispipeline.Thearticlesandeventsarelinkedbasedoncontentsimilarity.Theselinksaremadeautomaticallyandacrossdifferentlanguages.

Provider JSIContactPerson AljažKošmerljBusinessCasesnumber BC1,BC2,BC3,BC4

6.10.2 DatasetORIGIN

ThisdatasetisavailablefromJanuary2017anditcanbedefinedas“coredata”.Thedatasetalreadyexistedbeforetheproject.

Page 71: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 71

Table60DATASETORIGIN–EventRegistry

Availableat(M) M1CoreData(Y|N) YSize 136millionarticlesand4.8millioneventsGrowth 150 thousand articles and 400 events added per

dayTypeandformat text + metadataExistingdata(Y|N) YDataorigin onlinenewssites,EventRegistryAPI

6.10.3 DatasetFORMAT

ThedatasetcollectsdatastartingfromDecember2013,relatedtowholeearthinmanylanguages.Thedataisupdatedreal-time.

Table61DATASETFORMAT–EventRegistry

Datasetstructure Full documentation available at: https://github.com/EventRegistry/event-registry-python/wiki/Data-models

Datasetformat JSONTimecoverage sinceDecember2013Spatialcoverage WholeEarthLanguages English, German, Spanish, Catalan, Portuguese, Italian, French, Russian,

Chinese, Slovene, Croatian, Serbian, Arabic, Turkish, Persian, Armenian,Kurdish,Lithuanian,Somali,Urdu,Uzbek

Identifiabilityofdata YNamingconvention WikipediaURIsVersioning Real-timeMetadatastandards N/A

6.10.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIaccess.

Table62MAKINGDATAACCESSIBLE–EventRegistry

Datasetlicense Owner:JSIAccess:Allmembers

Availability(public|private) limited open and private (subscription-based); fullaccesswillbeavailabletoprojectmembers

AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIaccess

Page 72: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 72

Toolstoaccess REST,PythonAPIDatasetsourceURL http://eventregistry.org/Accessrestrictions CredentialsKeyword/Tags news,articles,eventsArchivingandpreservation long-termdatabasestorage

Table63MAKINGDATAINTEROPERABLE–EventRegistry

Datainteroperability • Semanticdataenrichment• References to shared systems of identifiers

andstandarddatatypesStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations• Wikipediaentities

6.10.5 DatasetSECURITY

ThedatasetdoesnotincludePDcollecteddirectlyfromitsusers.ThedatasetcontainsonlypubliclyavailablePD (mentionsofnatural persons innewsarticles) aspartof its newsarchive. PD canberemoveduponrequestbyanyindividual.

Table64DATASETSECURITY–EventRegistry

PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage Securestorage,nosensitivedata,localbackupsPrivacymanagementprocedures "Righttobeforgotten"guaranteedPDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevel of Aggregation (for PD anonymized byaggregation)

N/A

6.10.6 EthicsandLegalrequirements

JSI has already obtained an opinion of the Slovenian Information Commissioner regarding use ofEventRegistrydatainanotherEUproject.(H2020projectRENOIR,grantagreementNo691152).Acopy of this opinion and an explanation why it is applicable also for the EW-Shopp project areincluded in deliverable [D7.2]. The opinion states that even though Event Registry collects andindexesnewsdatawhich ispubliclyavailable, itmaystill constituteasprocessingofpersonaldataandsomeusersmaywanttohavetheirdataremovedfromtheindex.Thisistheso-called“righttobeforgotten”whichmustalsobeofferedbywebsearchenginessuchasGoogle.Itcanbedefinedas

Page 73: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 73

“the right to silence on past events in life that are no longer occurring” and allows individuals tohave information about themselves deleted from certain internet records so that they cannot befound by search engines. To comply with this, Event Registry supports the option to request aremovalofpersonallinksfromitsindex.TheInformationCommissionerdoesnotforeseeanyothernecessaryprivacyprotectionmeasures.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.11 GfKDataset-Consumerdata:Consumerdata

6.11.1 DatasetIDENTIFICATION

Thedataset“Consumerdata”isproprietaryandcontainsclusteredinformationabouteventsbasedonnewsarticlesonline.

Table65.DATASETIDENTIFICATION–Consumerdata

Category ConsumerdataDataname ConsumerdataDescription TVBehavior&Exposure,OnlineBehavior&Exposure,HH

&IndividualPurchaseLevel,MobileUsage,Household&IndividualDemographicandSegmentationInformationinItaly,Poland,NetherlandsandItaly.

Provider GfKContactPerson StefanoAlbanoBusinessCasesnumber BC2

6.11.2 DatasetORIGIN

ThedatasetisavailablefromMay2017anditcan’tbedefinedas“coredata”.Itssizeisof80GBwithagrowthof40GBperyear.Thedatasetalreadyexistedbeforetheproject.

Table66DATASETORIGIN–Consumerdata

Availableat(M) M5CoreData(Y|N) NSize 80GBGrowth 40GBperyear

Typeandformat structuredtabulardata,CSV

Page 74: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 74

Existingdata(Y|N) YDataorigin GfK receive the data directly form the panelists

that are connected to GfK via GPRS technologywithanadhoctablet/viawebwithaPC/Laptop/viasmartphone.Dataarecollectedactively(withaquestionnaires) or passively (installed apps). Dataare anonymized and stored in GfK’s storagesystems.

6.11.3 DatasetFORMAT

ThedatasethasaCSVformat.Itcollectsnumericaldatasince2016anditcoversinformationrelatedtoItaly,Germany,Poland,Netherlands.Thedataisupdatedmonthly.

Table67DATASETFORMAT–Consumerdata

Datasetstructure Data are stored in data warehouse and can be extracted orvisualizedthroughasoftware.

Datasetformat structuredtabulardata,CSVTimecoverage Monthly/dailydatasince2016Spatialcoverage Italy,Germany,Poland,NetherlandsLanguages EN(numericaldata)Identifiabilityofdata N/ANamingconvention StaticDBVersioning MonthlyMetadatastandards N/A

6.11.4 DatasetACCESS

Thedatasetisprivateanditisnotavailabletoconsortiummembers.

Table68MAKINGDATAACCESSIBLE–Consumerdata

Datasetlicense AvailableonlyforGfKAvailability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) NAvailabilitymethod N/AToolstoaccess N/ADatasetsourceURL N/AAccessrestrictions N/AKeyword/Tags N/AArchivingandpreservation N/A

Page 75: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 75

Table69MAKINGDATAINTEROPERABLE–Consumerdata

Datainteroperability

• Semanticdataenrichment

Standardvocabulary

• Interlinkedproductclassification• Linkedproductdata• Temporalontologies• Spatialontologiesandlocations

6.11.5 DatasetSECURITY

ThedatasetdoesnotcontainPDbecausethosedatawasremovedatthesource.

Table70DATASETSECURITY–Consumerdata

PersonalData(Y|N) NAnonymized(Y|N|NA) YDatarecoveryandsecurestorage YPrivacymanagementprocedures See6.11.6PDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) YLevelofAggregation(forPDanonymizedbyaggregation) Dataarenotaggregated

6.11.6 EthicsandLegalrequirements

GfKcollectsthedataaccordingthecurrentPrivacylaw,askingeachpanelisttheconsenttotransferthe data toGfK for data analysis. GfK has performed notification to theNational Data ProtectionAuthority(attachedin[D7.2]).

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.12 GfKDataset-Marketdata:Salesdata

6.12.1 DatasetIDENTIFICATION

The dataset “Sales data” contains monthly data (in value / number) of Consumer Electronic,Information Technology, Telecommunication, Major Domestic Appliances and Small DomesticAppliancesproducts.

Page 76: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 76

Table71.DatasetIDENTIFICATION–Salesdata

Category MarketdataDataname SalesdataDescription Monthlydata(invalue/number)ofConsumer

Electronics,InformationTechnology,Telecommunication,MajorDomesticAppliancesandSmallDomesticAppliancesproducts.

Provider GfKContactPerson AlessandroDeFazioBusinessCasesnumber BC1,BC2,BC3

6.12.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.Itssizeisof80GBwithagrowthof5GBpercountryperyear.

Table72DATASETORIGIN–Salesdata

Availableat(M) M1CoreData(Y|N) NSize 80GBpercountryGrowth 5GBpercountryperyearTypeandformat structuredtabulardata,CSVExistingdata(Y|N) YDataorigin GfKreceivefromthePOSsalesdatasplitperproductindifferent

formats(electronicandmanual).Dataarechecked,verifiedanduploaded into a tool where the data are connected to theproductsheet.ThedataarecollectedonarepresentativesampleofPOSandareexplodedtotheuniverse.

6.12.3 DatasetFORMAT

ThedatasethasaCSVformat.Itcollectsdatasince2004relatedtoallEuropeancountries(except:Albania,Kosovo,MacedoniaandMontenegro).Thedataisupdatedmonthly.

Table73DATASETFORMAT–Salesdata

Datasetstructure Dataarestoredinaglobaldatawarehouseaccessibleonline.TheinputsarefourdimensionsProduct,Time,Facts,Channelsthatcanbeprocessedlikeanexcelpivottable.

Datasetformat structuredtabulardata,CSVTimecoverage Monthlydatasince2004Spatialcoverage AllEuropean(except:Albania,Kosovo,MacedoniaandMontenegro)

Page 77: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 77

Languages EnglishIdentifiabilityofdata YesNamingconvention StaticDBVersioning MonthlyMetadatastandards N/A

6.12.4 DatasetACCESS

ThedatasetisprivateanditisavailableonlyforUniversitàBicocca.Thedataisavailablethroughftpbutusernameandpasswordarerequired.

Table74MAKINGDATAACCESSIBLE–Salesdata

Datasetlicense Thedatawillbe transferred toUniversitàBicocca fordataanalysiswhiletheanalysis(notthedata)willbetransferredbyUniversitàBicoccatotheconsortium.

Availability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod CSVfilesviaftpToolstoaccess NoDatasetsourceURL FTPAccessrestrictions usernameandpasswordneededtoaccessftpKeyword/Tags salesdataArchivingandpreservation N/A

Table75MAKINGDATAINTEROPERABLE–Salesdata

Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment

Standardvocabulary • Interlinkedproductclassification• Linkedproductdata• Temporalontologies• Spatialontologiesandlocations

6.12.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table76DATASETSECURITY–Salesdata

PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage N/A

Page 78: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 78

Privacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

6.12.6 EthicsandLegalrequirements

Basedon theabovedatasetdescription, thedataset “SalesData”doesnot containpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.13 GfKDataset–Products&Categories:Productattributes

6.13.1 DatasetIDENTIFICATION

The dataset “Product attributes” contains Technical Product Data Sheets of all the products ofConsumer Electronics, IT, Telecommunication, Major domestic appliances, Small domesticAppliancessectors.

Table77.DATASETIDENTIFICATION–Productattributes

Category Products&CategoriesDataname ProductattributesDescription TechnicalProductDataSheetsofalltheproductsof

ConsumerElectronics,IT,Telecommunication,Majordomesticappliances,SmalldomesticAppliancessectors.ProductssheetsaredefinedwithintheGfKcategorizationandinclude:Brand,Productname,Model,ID,data,EANcode(on80%oftheproducts)andTechnicalfeatures.

Provider GfKContactPerson MarcoTobaldoBusinessCasesnumber BC1,BC2,BC4

Page 79: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 79

6.13.2 DatasetORIGIN

Thedataset isavailablefromFebruary2017anditcanbedefinedas“coredata”. Itssize isof2GBpercountry(Germany,UK,Italy)withagrowthof2%peryear.

Table78DATASETORIGIN–Productattributes

Availableat(M) M2CoreData(Y|N) YSize 2GBpercountry(Germany,UK,Italy)Growth 2%peryearTypeandformat RelationalExistingdata(Y|N) YDataorigin GfKreceivethedataofallthesoldproductsinPOS.

When there is a new product GfK set its sheetgetting the features of the product from themanufacturer.Allthesheetsarecreatedmanually,accordingtheGfKdataplan, inthecountrywherethenewproducthasbeensold.

6.13.3 DatasetFORMAT

The dataset has a CSV or xml format. It collects product data since 1982 and has a Europeancoverage. The dataset is updated daily (every day the dataset contains only the data newlygenerated).

Table79DATASETFORMAT–Productattributes

Datasetstructure

We describe here the main structure of the relational database (RDB), bydescribingthefourCSVfilesthatweextractfromitandshareinEW-Shopp:Country_EWS_2017_12_31_Feature_Data.txt(Valueofthetechnicalfeaturesoftheproducts)Country_EWS_2017_12_31_Feature_List.txt(nameofthefeaturesoftheproducts)Country_EWS_2017_12_31_Feature_Value_List.txt(codeframeofthefeatures)Country_EWS_2017_12_31_Master_Data.txt (main information about theproducts)Country_EWS_2017_12_31_Productgroup_Feature_List.txt (list of the technicalfeaturesavailableforeachproduct)Each file contains several columns, thus for the complete structure we refer todocumentation in "Spex_retail_CSVrelationalidbased.pdf" shared with theconsortium.

Datasetformat

structured(R-DB),CSVoxml

Page 80: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 80

Timecoverage

Thedatasetincludesproductdatasince1982anditisdailyupdated

Spatialcoverage

European coverage: Austria, Belgio, Danimarca, Finlandia, Francia, Germania, UK,Grecia, Italia, Lussemburgo, Olanda, Polonia, Portogallo, Repubblica ceca,Slovacchia, Italia, Svezia, , Norvegia, Ungheria. Catalog not available in Irlanda,Slovenia,Croazia.Bulgaria,Cipro,Estonia,Lettonia,Lituania,Malta,Romania,

Languages Arabic, Czech, Chinese, Korean, Danish, French, Greek, English, Italian, Dutch,Polish,Portuguese,Russian,Slovak,Spanish,Swedish,German,Turkish,Hungarian

Identifiabilityofdata

Yes

Namingconvention

Country_EWS_2017_12_31_Feature_Data.txtCountry_EWS_2017_12_31_Feature_List.txtCountry_EWS_2017_12_31_Feature_Value_List.txtCountry_EWS_2017_12_31_Master_Data.txtCountry_EWS_2017_12_31_Productgroup_Feature_List.txt

Versioning Daily (every day the dataset contains only the data newly generated).Overwriteolddata.

Metadatastandards

N/A

6.13.4 DatasetACCESS

Thedatasetisprivateanditisavailabletoallconsortiummembers.Thedataareavailablethroughftp.

Table80MAKINGDATAACCESSIBLE–Productattributes

Datasetlicense Private license: The data will be transferred toUniversità Bicocca for data analysis while the analysis(not thedata)will be transferredbyUniversitàBicoccatotheconsortium

Availability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod ftpToolstoaccess NotoolsDatasetsourceURL ItwillbecreatedwhenneededAccessrestrictions usernameandpasswordneededtoaccessftpKeyword/Tags productcategories/productfeatures/valueArchivingandpreservation Regulardisasterrecovery/backuponoriginaldata

Page 81: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 81

Table81MAKINGDATAINTEROPERABLE–Productattributes

Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment

Standardvocabulary • Interlinkedproductclassification• Linkedproductdata

6.13.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table82DATASETSECURITY–Productattributes

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A

6.13.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription,thedataset“Productattributes”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.14 MEDataset-ConsumerData:Doorcounterdata

6.14.1 DatasetIDENTIFICATION

Thedataset“Doorcounterdata”containsdatafromcustomers'doorcounters.

Table83.DATASETIDENTIFICATION–Doorcounterdata

Category ConsumerDataDataname Doorcounterdata

Page 82: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 82

Description Datafromcustomers'doorcountersProvider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3

6.14.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.Itssizeisof2Mbwithagrowthof60kB/mb/location.Thedatasetalreadyexisted.

Table84DATASETORIGIN–Doorcounterdata

Availableat(M) M1CoreData(Y|N) NSize 2MbGrowth 60kB/mb/locationTypeandformat structureddataExistingdata(Y|N) YDataorigin Measurence'scustomersowndata

6.14.3 DatasetFORMAT

The dataset has a CSV format. It collects numerical data since 2016 related to Milan area. Thedatasetisupdateddaily.

Table85DATASETFORMAT–Doorcounterdata

Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat CSVTimecoverage 2016Spatialcoverage MilanLanguages EN(numericaldata)Identifiabilityofdata N/ANamingconvention /location_idYYYY/MM/weekVersioning DailyMetadatastandards N/A

Page 83: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 83

6.14.4 DatasetACCESS

Thedatasetisprivateanditisnotavailabletoallconsortiummembers.

Table86MAKINGDATAACCESSIBLE–Doorcounterdata

Datasetlicense Owner:ME.Availability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) NAvailabilitymethod CSVToolstoaccess texteditor/spreadsheetDatasetsourceURL N/AAccessrestrictions N/AKeyword/Tags doorcountersArchivingandpreservation cloud

Table87MAKINGDATAINTEROPERABLE–Doorcounterdata

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

6.14.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table88DATASETSECURITY–Doorcounterdata

PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage YPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N

6.14.6 EthicsandLegalrequirements

Thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthat regulates the use of personal data does not apply and copyof opinion is not required to becollected.

Page 84: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 84

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.15 BBDataset-ProductsandCategories:ProductAttributes

6.15.1 DatasetIDENTIFICATION

Thedataset“Productattributes”isproprietaryandcontainsdataonproductspecifications.

Table89.DATASETIDENTIFICATION–ProductAttributes

Category ProductsandcategoriesDataname ProductattributesDescription Detailedproductspecifications forproductswhich

are included in Big Bang's selling portfolio (fromgenerictospecifictechnicaldetails)

Provider BigBangContactPerson MatijaTorlakBusinessCasesnumber BC1

6.15.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcanbedefinedas“coredata”.Itssizeisof20000 productswithagrowthof1.000newproductsperyear.

Table90DATASETORIGIN–ProductAttributes

Availableat(M) M1CoreData(Y|N) YSize 20000productsGrowth 1.000newproductsperyearTypeandformat characterandnumericExistingdata(Y|N) YDataorigin DWH

Page 85: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 85

6.15.3 DatasetFORMAT

ThedatasethasaXLSformat.ItcollectsdatarelatedtoSloveniainSlovenianandEnglishlanguages.Thedatasetisupdateddaily.

Table91DATASETFORMAT–ProductAttributes

Datasetstructure BBClassification-canbemostlymatchedwithGS1ClassificationDatasetformat XLS,SQL,CSVTimecoverage AllTimeSpatialcoverage SloveniaforallProductsLanguages Slovenian,EnglishIdentifiabilityofdata YesNamingconvention BB_productCategoriesYYYY/MM/ddVersioning daily(new+history)Metadatastandards N/A

6.15.4 DatasetACCESS

Thedataset is privatebut it is available to all consortiummembers. Thedata is available throughdownloadbymeansofVPN.

Table92MAKINGDATAACCESSIBLE–ProductAttributes

Datasetlicense Owner:BigBang.Access:AllmembersAvailability (public |private)

Public,restrictedwithcredentials

Availability to EW-Shopppartners(Y|N)

Y

Availabilitymethod Download,viewToolstoaccess URLwithCredentialsDatasetsourceURL URLlinksecuredwithCredentialsAccessrestrictions CredentialsKeyword/Tags DatabaseKeywordsArchivingandpreservation SecureStorage,Backup

Table93MAKINGDATAINTEROPERABLE–ProductAttributes

Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment

Standardvocabulary • Interlinkedproductclassification• Linkedproductdata

Page 86: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 86

6.15.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table94DATASETSECURITY–ProductAttributes

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,dailybackupPrivacymanagementprocedures Data only on the level of product /

categoryPDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

6.15.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription,thedatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.16 CEDataset-Marketdata:Productspricehistory

6.16.1 DatasetIDENTIFICATION

Thedataset“Productspricehistory”isproprietaryandcontainsquotesforvariousproducts.

Table95.DATASETIDENTIFICATION–Productspricehistory

Category MarketdataDataname ProductspricehistoryDescription A collection of seller quotes for products. Prices for all of

Ceneje's organized products have been recorded andregularlyarchivedsince2016.

Provider CenejeContactPerson DavidCreslovnik

UrosMevc

Page 87: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 87

BusinessCasesnumber BC1

6.16.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.Itssizeisabout3billionquoteswithagrowthof2millionperday.

Table96DATASETORIGIN-Productspricehistory

Availableat(M) M1CoreData(Y|N) NSize about3billionquotesGrowth 2millionperdayTypeandformat structuredtabulardataExistingdata(Y|N) YDataorigin SQLtables

6.16.3 DatasetFORMAT

ThedatasetcollectsdatarelatedtoCountryareasince2016.Thedatasetisupdateddaily.

Table97DATASETFORMAT–Productspricehistory

Datasetstructure History-IdProduct(INT)-NameProduct(STRING)-L1(STRING)-L2(STRING)-L3(STRING)-IdSeller(INT)-Price(MONEY)-Timestamp(SloveniantimeGMT+1)

Datasetformat SQL:tabular(tsv)Timecoverage since2016Spatialcoverage CountryLanguages notlanguagespecificIdentifiabilityofdata YesNamingconvention {country}/YYYY/mm/DD/history.tsvVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A

Page 88: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 88

6.16.4 DatasetACCESS

Thedatasetisprivatebutitisavailabletoallconsortiummembers.Thedataisavailablethroughfiledownload.

Table98MAKINGDATAACCESSIBLE–Productspricehistory

Datasetlicense Owner:CenejeAccess:Allmembers

Availability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Filedownload(zip)Toolstoaccess WGET/CurlDatasetsourceURL N/AAccessrestrictions CredentialsKeyword/Tags N/AArchivingandpreservation N/A(canbegeneratedondemand)

Table99MAKINGDATAINTEROPERABLE–Productspricehistory

Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment

Standardvocabulary • Interlinkedproductclassification• Linkedproductdata• Temporalontologies

6.16.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table100DATASETSECURITY–Productspricehistory

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,regularbackupsPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) Product|Sellerlevel

Page 89: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 89

6.16.6 EthicsandLegalrequirements

Based on the above dataset description, the dataset “Products price history” does not containpersonal data, therefore the national and European legal framework that regulates the use ofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.17 MEDataset-ConsumerData:Salesdata

6.17.1 DatasetIDENTIFICATION

Thedataset“Salesdata”containsnumberofreceiptsgetfromcustomers.

Table101.DATASETIDENTIFICATION–Salesdata

Category ConsumerDataDataname SalesdataDescription numberofreceiptswegetfromourcustomersProvider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC2

6.17.2 DatasetORIGIN

Thedataset isavailable fromJanuary2017and itcan’tbedefinedas“coredata”. Its size isabout2Mbwithagrowthof60kB/mb/location.

Table102DATASETORIGIN-Salesdata

Availableat(M) M1CoreData(Y|N) NSize 2MbGrowth 60kB/mb/locationTypeandformat structureddataExistingdata(Y|N) YDataorigin Measurencecustomers'owndata

Page 90: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 90

6.17.3 DatasetFORMAT

ThedatasetcollectsdatarelatedtoMilanareasince2016.Thedatasetisupdatedweekly.

Table103DATASETFORMAT–Salesdata

Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat CSVTimecoverage 2016Spatialcoverage MilanLanguages EN(numericaldata)Identifiabilityofdata N/ANamingconvention /location_id/YYYY/MM/weekVersioning weeklyMetadatastandards N/A

6.17.4 DatasetACCESS

Thedatasetisprivateanditisnotavailabletoconsortiummembers.

Table104MAKINGDATAACCESSIBLE–Salesdata

Datasetlicense Owner:MEAvailability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) NAvailabilitymethod CSVToolstoaccess texteditor/spreadsheetDatasetsourceURL N/A because company’s dataset is not

availablethroughURLAccessrestrictions N/AKeyword/Tags receiptsArchivingandpreservation Cloud

Table105MAKINGDATAINTEROPERABLE–Salesdata

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

Page 91: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 91

6.17.5 DatasetSECURITY

ThedatasetdoesnotcontainPD.

Table106DATASETSECURITY–Salesdata

PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage YPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N

6.17.6 EthicsandLegalrequirements

Thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthat regulates the use of personal data does not apply and copyof opinion is not required to becollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.

6.18 JOTDataset-Consumerdata:Trafficsource(Bing)

6.18.1 DatasetIDENTIFICATION

Thedataset “Traffic sources (Bing)”,providedby JOT, focusesonhistorical campaignperformancestatisticsofsearchdatainBingadvertisingplatforms.

Table107DATASETIDENTIFICATION–Trafficsource(Bing)

Category ConsumerDataDataname Trafficsources(Bing)Description Historical campaign performance statistics of

searchdatainBingadvertisingplatformsProvider JOTContactPerson IgnacioMartínez/ElíasBadenesBusinessCasesnumber BC4

Page 92: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 92

6.18.2 DatasetORIGIN

This dataset is available from February 2017 and it cannot be defined as “core data”. It has astructuredformatwithasizeof1TBandagrowthof1.5GBdaily.Thedatasetisgeneratedexpresslyfortheproject’spurposeinCSVformat.

Table108DATASETORIGIN-Trafficsource(Bing)

Availableat(M) M2CoreData(Y|N) NSize 1TBGrowth 1.5GBdailyTypeandformat structured,CSVExistingdata(Y|N) NDataorigin BINGAPI

6.18.3 DatasetFORMAT

Thedataset“Trafficsource(Bing)”hasaCSVformat,thedatastructureisillustratedinthefollowingtable. It collects data gathered fromdifferent European countries, in different language (German,Spanish,French,English),since2016anditcovers informationrelatedtoCity/Region/Country.Thedataisupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.

Table109DATASETFORMAT–Trafficsource(Bing)

Datasetstructure Country:Countrywherethecampaignisoriented.Language:Languageofthekeywordsandads.Category:Topicof thekeyword.Wehave22categoriessuchasTravel,Finance,Vehiclesandsoforth58CampaignName:Anaccount is formbycampaigns.Thenameofthesecampaigncontainssomeinformationlikethelanguageorthecategory.AdgroupId:NumbergivenbyBingthatidentifyanadgroup.Acampaignisformbyadgroups.AdNetworkType2:Thenetworkwherekeywordsappear. ItcanbeBingsearch (the typical bing search engine in www.bing.com) or partnernetwork(otherwebpageswiththebingsearchbox).Clicks:Whenauserclicksyourad.Impressions:Eachtimeyouradisservedandappearsontheweb.Date:Date(XXXX/XX/XX)whentheadappears.DayOfWeek:Dayoftheweekwhentheadappears.Device:Thedevice(PC,Tablet,Mobile)wheretheadappears.

58 Taxonomy behind the categories used in BING can be found here:https://advertiseonbing.blob.core.windows.net/blob/bingads/media/library/o/blogpost/june%202015/bing_category_taxonomy.txt

Page 93: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 93

MonthOfYear:Monthoftheyearwhentheadappears.Keyword:It’sthesearchthattheusertypes.Bing_posicion_anuncio (Bing_Ad_Position): Position of the ad in thebrowser.Location:City/Region/CountryConcordancia (Match type):Match typeof the keyword. It showshowsimilarneedstobethequeryofausertoshowanad

Datasetformat CSVTimecoverage since2016Spatialcoverage City/Region/CountryLanguages German,Spanish,French,EnglishIdentifiabilityofdata YesNamingconvention BING_YYMMDD_XXVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A

6.18.4 DatasetACCESS

The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailable throughFile-downloadbymeansof FTPClient.DatasetaredepositedonAzurePlatformandtheaccessisprovidedbycredentials.

Table110MAKINGDATAACCESSIBLE–Trafficsource(Bing)

Datasetlicense Owner:JOT.Access:AllmembersAvailability(public|private) privateAvailability to EW-Shopppartners(Y|N)

Yes

Availabilitymethod File-downloadToolstoaccess FTPClient(OpenSource)orWebPageDatasetsourceURL Azure platform. The URL will be created

whenneeded.Accessrestrictions CredentialsKeyword/Tags OnlineSearches(Keywords)Archivingandpreservation 5yearsafterprojectend

Table111MAKINGDATAINTEROPERABLE–Trafficsource(Bing)

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

Page 94: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 94

6.18.5 DatasetSECURITY

Thedataset“Traffic source (Bing)”doesnotcontainpersonaldata. It isexpectedasecurestorageandJOTdatarecovery.

Table112DATASETSECURITY–Trafficsource(Bing)

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Secure storage, no sensitive data, JOT

datarecoveryPrivacymanagementprocedures N/APDatthesource(Y|N) N/APD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A

6.18.6 EthicsandLegalrequirements

All the data that JOT Internet is generating, sharing and processing (in compliance with SpanishOrganicLaw15/1999forpersonaldataprotection,ISO/IEC2382-1andtheGeneralDataProtectionRegulation (GDPR)) for thepurposeofEWShoppprojectdoesnot includepersonaldata. For thatreason,JOTbelievethatdatamanagedintheprojectdoesnotincludeanypersonaldataandthatiswhynofurtheractionisneeded.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.

6.19 JOTDataset-Consumerdata:Trafficsource(Google)

6.19.1 DatasetIDENTIFICATION

Thedataset“Trafficsources(Google)”,providedbyJOT,focusesonhistoricalcampaignperformancestatisticsofsearchdatainGoogleplatforms.

Table113DATASETIDENTIFICATION–Trafficsource(Google)

Category ConsumerDataDataname Trafficsources(Google)Description Historical campaignperformance statisticsofdata

inGoogleplatform.

Page 95: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 95

Provider JOTContactPerson IgnacioMartínez/ElíasBadenesBusinessCasesnumber BC4

6.19.2 DatasetORIGIN

The dataset is available from February 2017 and it is defined as “core data”. It has a structuredformat(i.e.CSV)withasizeupto3TBandagrowthof4GBdaily.Thedatasetisgeneratedexpresslyfortheproject’spurpose.

Table114DATASETORIGIN-Trafficsource(Google)

Availableat(M) M2CoreData(Y|N) YSize >3TBGrowth 4GBdailyTypeandformat structured,CSVExistingdata(Y|N) NDataorigin GOOGLEAPI

6.19.3 DatasetFORMAT

The dataset “Traffic source (Google)” has a CSV format. It collects data gathered from differentcountries, in different language (German, Spanish, Italian, Dutch, French, English, Portuguese,Russian),since2016anditcovers informationrelatedtoCity/Region/Country.Thedata isupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.Thedatastructureisillustratedinthefollowingtable.

Table115DATASETFORMAT–Trafficsource(Google)

Datasetstructure Country:Countrywherethecampaignisoriented.Language:Languageofthekeywordsandads.Category: Topic of the keyword. We have 22 categories such as Travel,Finance,Vehiclesandsoforth59Campaign Name: An account is form by campaigns. The name of thesecampaigncontainssomeinformationlikethelanguageorthecategory.AdgroupId:NumbergivenbyGooglethatidentifyanadgroup.Acampaignisformbyadgroups.AdNetworkType2: The network where keywords appear. It can be Googlesearch (the typical google search engine in www.google.com) or partner

59 Taxonomy behind the categories used in Google can be found here:https://support.google.com/ads/answer/2842480?hl=en

Page 96: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 96

network(otherwebpageswiththegooglesearchbox).Clicks:Whenauserclicksyourad.Impressions:Eachtimeyouradisservedandappearsontheweb.Date:Date(XXXX/XX/XX)whentheadappears.DayOfWeek:Dayoftheweekwhentheadappears.Device:Thedevice(PC,Tablet,Mobile)wheretheadappears.MonthOfYear:Monthoftheyearwhentheadappears.Keyword:It’sthesearchthattheusertypes.Google_posicion_anuncio (Google_Ad_Position): Position of the ad in thebrowser.Location:City/Region/CountryConcordancia(Matchtype):Matchtypeofthekeyword.Itshowshowsimilarneedstobethequeryofausertoshowanad

Datasetformat CSVTimecoverage since2016Spatialcoverage City/Region/CountryLanguages German,Spanish,Italian,Dutch,French,English,Portuguese,RussianIdentifiability ofdata

Yes

Namingconvention GOOGLE_YYMMDD_XXVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A

6.19.4 DatasetACCESS

The dataset is private but it is accessible to all the consortiummembers. The datawill bemadeavailable throughFile-downloadbymeansof FTPClient.DatasetaredepositedonAzurePlatformandtheaccessisprovidedbycredentials.

Table116MAKINGDATAACCESSIBLE–Trafficsource(Google)

Datasetlicense Owner:JOT.Access:AllmembersAvailability(public|private) privateAvailability to EW-Shopppartners(Y|N)

Yes

Availabilitymethod File-downloadToolstoaccess FTPClient(OpenSource)orWebPageDatasetsourceURL Azure platform. The URL will be created

whenneeded.Accessrestrictions CredentialsKeyword/Tags OnlineSearches(Keywords)Archivingandpreservation 5yearsafterprojectend

Page 97: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 97

Table117MAKINGDATAINTEROPERABLE–Trafficsource(Google)

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations

6.19.5 DatasetSECURITY

Thedataset“Trafficsource(Google)”doesnotcontainpersonaldata.ItisexpectedasecurestorageandJOTdatarecovery.

Table118DATASETSECURITY–Trafficsource(Google)

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Secure storage, no sensitive data, JOT

datarecoveryPrivacymanagementprocedures N/APDatthesource(Y|N) N/APD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A

6.19.6 EthicsandLegalrequirements

All the data that JOT Internet is generating, sharing and processing (in compliance with SpanishOrganicLaw15/1999forpersonaldataprotection,ISO/IEC2382-1andtheGeneralDataProtectionRegulation (GDPR)) for thepurposeofEWShoppprojectdoesnot includepersonaldata. For thatreason,JOTbelievethatdatamanagedintheprojectdoesnotincludeanypersonaldataandthatiswhynofurtheractionisneeded.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.

Page 98: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 98

6.20 JOTDataset-Marketdata:Twittertrends

6.20.1 DatasetIDENTIFICATION

The dataset “Twitter Trends” is Open data and focuses on trending topics as available throughTwitterAPIs.

Table119DATASETIDENTIFICATION–Twittertrends

Category TwitterTrendsDataname MarketdataDescription TrendingtopicsasavailablethroughTwitterAPIsProvider OpenDataContactPerson IgnacioMartínez/ElíasBadenesBusinessCasesnumber BC4

6.20.2 DatasetORIGIN

Thedataset“TwitterTrends”isavailablefromMay2017anditcannotbedefinedas“coredata”.Ithas a structured formatwith a growth of 10MB daily. The dataset is generated expressly for theproject’spurpose.

Table120DATASETORIGIN–Twittertrends

Availableat(M) M5CoreData(Y|N) NSize N/AGrowth 50 trending topic / every 15min / country (10MB

daily)Typeandformat structured,CSVExistingdata(Y|N) NDataorigin TwitterAPI

6.20.3 DatasetFORMAT

Thedataset“Twittertrends”hasaCSVformat,thedatastructureisillustratedinthefollowingtable.The dataset does not dependon language. Its spatial coverage is the country and it collects datasinceMay2017.Thedataisupdateddaily.

Table121DATASETFORMAT–Twittertrends

Datasetstructure Location:Countryofthehashtag.Date:Dayofthelist.

Page 99: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 99

Hashtag:Nameofthehashtag.Promoted_Content:Showsisahashtagispromotedornot.Tweets_Volume:Numberoftweetsofahashtag.Relevance:Hashtag'sposition.

Datasetformat CSVTimecoverage M5Spatialcoverage CountryLanguages N/AIdentifiabilityofdata YesNamingconvention TWITTER_YYMMDD_XXVersioning DailyMetadatastandards N/A

6.20.4 DatasetACCESS

The dataset is private but it is accessible to all the consortiummembers. The datawill bemadeavailable throughFile-downloadbymeansof FTPClient.DatasetaredepositedonAzurePlatformandtheaccessisprovidedbycredentials.

Table122MAKINGDATAACCESSIBLE–Twittertrends

Datasetlicense Owner:JOT.Access:AllmembersAvailability(public|private) privateAvailability to EW-Shopppartners(Y|N)

Yes

Availabilitymethod File-downloadToolstoaccess FTPClient(OpenSource)orWebPageDatasetsourceURL Azure platform. The URL will be created

whenneeded.Accessrestrictions CredentialsKeyword/Tags HashtagsArchivingandpreservation 5yearsafterprojectend

Standardvocabularyortaxonomyisnotavailablefor“Twittertrends”dataset.

Table123MAKINGDATAINTEROPERABLE–Twittertrends

Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies

• Spatialontologiesandlocations• Wikipediaentities

Page 100: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 100

6.20.5 DatasetSECURITY

Thedataset“Twittertrends”doesnotcontainpersonaldata.ItisexpectedasecurestorageandJOTdatarecovery.

Table124DATASETSECURITY–Twittertrends

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Secure storage, no sensitive data, JOT

datarecoveryPrivacymanagementprocedures N/APDatthesource(Y|N) N/APD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A

6.20.6 EthicsandLegalrequirements

All the data that JOT Internet is generating, sharing and processing (in compliance with SpanishOrganicLaw15/1999forpersonaldataprotection,ISO/IEC2382-1andtheGeneralDataProtectionRegulation (GDPR)) for thepurposeofEWShoppprojectdoesnot includepersonaldata. For thatreason,JOTbelievethatdatamanagedintheprojectdoesnotincludeanypersonaldataandthatiswhynofurtheractionisneeded.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.

6.21 LODDataset-Geographic:DBpedia

6.21.1 DatasetIDENTIFICATION

Thedataset“DBpedia”ispubliclyavailableandcontainsfactualinformationfromdifferentareasofhumanknowledgeextractedfromWikipediapages.

Table125.DATASETIDENTIFICATION–DBpedia

Category GeographicDatasetDataname DBpediaDescription DBpediaisacrowd-sourcedcommunityefforttoextract

structured information fromWikipediaand make this

Page 101: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 101

informationavailableontheWeb.TheEnglishversionofthe DBpedia knowledge base describes 4.58 millionthings, out of which 4.22 million are classified in aconsistent ontology, including 1,445,000 persons,735,000 places (including 478,000 populated places),411,000creativeworks(including123,000musicalbums,87,000 films and 19,000 video games), 241,000organizations (including 58,000 companies and 49,000educational institutions), 251,000 species and 6,000diseases

Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4

6.21.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.

Table126DATASETORIGIN–DBpedia

Availableat(M) M1CoreData(Y|N) NSize 735,000places(including478,000populatedplaces)Growth Notafixednumber,e.g,Dbpedia3.82.8GB,Dbpedia

3.9 2.4GB, while DBpedia2015-04 4.7GB.More infohttp://wiki.dbpedia.org/downloads-2016-04

Typeandformat rdf, tuples

Existingdata(Y|N) YDataorigin http://wiki.dbpedia.org/datasets

6.21.3 DatasetFORMAT

ThedatasethasaworldwidecoverageandcollectsdatasinceOctober2016in125languages.

Table127DATASETFORMAT–DBpedia

Datasetstructure provides data in n-triple format (<subject> <predicate> <object> .)

Datasetformat .ttl, .qtl

Timecoverage up to 10/2016

Spatialcoverage Global

Languages Localized versions of DBpedia in 125 languages. English, German,Spanish, Catalan, Portuguese, Italian, French, Russian, Chinese,Slovenian,Croatian,Serbian,Arabic,Turkish,etc.

Page 102: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 102

Identifiabilityofdata NoNamingconvention dbpedia_version/yearVersioning No

Metadatastandards Yes:DBO,FOAF,SCHEMA.ORG,SKOS,etc.

6.21.4 DatasetACCESS

Thedatasetispublicanditisaccessibletoalltheconsortiummembers.

Table128MAKINGDATAACCESSIBLE–DBpedia

Datasetlicense GNUFreeDocumentationLicense.Availability(public|private) Public

AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod SPARQLENDPOINT,DUMPToolstoaccess webservice(REST/SOAPAPIs),queryendpointDatasetsourceURL http://wiki.dbpedia.org/datasetsAccessrestrictions NoaccessrestrictionKeyword/Tags cross-domain: places, person, films, food,

music,historyetc.Archivingandpreservation N/A

Table129MAKINGDATAINTEROPERABLE–DBpedia

Datainteroperability • N/A(LinkedOpenData)Standardvocabulary • Temporalontologies

• Spatialontologiesandlocations• Wikipediaentities

6.21.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldata.

Table130DATASETSECURITY–DBpedia

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

Page 103: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 103

6.21.6 EthicsandLegalrequirements

Based on the above dataset description, the dataset “DBpedia” does not contain personal data,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.

6.22 LODDataset-Geographic:LinkedOpenStreetMaps

6.22.1 DatasetIDENTIFICATION

Thedataset“LinkedOpenStreetMaps”ispubliclyavailableandcontainseditablemapofthewholeworld.

Table131.DATASETIDENTIFICATION–LinkedOpenStreetMaps

Category GeographicDatasetDataname LinkedOpenStreetMapsDescription OpenStreetMapisbuiltbyacommunityofmappersthat

contribute andmaintain data about roads, trails, cafés,railwaystations,andmuchmore,allovertheworld.

Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4

6.22.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.

Table132DATASETORIGIN–LinkedOpenStreetMaps

Availableat(M) M1CoreData(Y|N) NSize 5,027,330,590GPSpointsGrowth NotafixednumberTypeandformat DatanormallycomesintheformofXMLformattedOSMfilesExistingdata(Y|N) YDataorigin http://planet.openstreetmap.org/planet/planet-

latest.osm.bz2

Page 104: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 104

6.22.3 DatasetFORMAT

Thedatasethasaworldwidecoverageandcollectsdatainalllanguages.

Table133DATASETFORMAT–LinkedOpenStreetMaps

Datasetstructure XML

Datasetformat ThetwomainformatsusedarePBForcompressedOSMXML.PBFisa binary format that is smaller to download and much faster toprocess and should be used when possible. Most common toolsusingOSMdatasupportPBF.

Timecoverage uptodate

Spatialcoverage Worldwide.Allthenodes,waysandrelationsthatmakeupourmap

Languages Alllanguages

Identifiabilityofdata No

Namingconvention N/A

Versioning Eachweek,anewandcompletecopyofalldatainOpenStreetMapismade available as both a compressed XML file and a custom PBFformatfile.Alsoavailableisthe'history'file,whichcontainsnotonlyup-to-date data but also older versions of data and deleted dataitems.

Metadatastandards Yes:DBO,FOAF,SCHEMA.ORG,SKOS,etc.

Page 105: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 105

6.22.4 DatasetACCESS

Thedatasetispublicanditisaccessibletoalltheconsortiummembers.

Table134MAKINGDATAACCESSIBLE–LinkedOpenStreetMaps

Datasetlicense OpenStreetMapisopen data, licensed under theOpenData Commons Open Database License(ODbL) bytheOpenStreetMapFoundation(OSMF).

Availability(public|private) PublicAvailability to EW-Shopp partners(Y|N)

Y

Availabilitymethod dump,keywordbasedToolstoaccess API/dump,SPARQLwrapperDatasetsourceURL http://wiki.openstreetmap.org/wiki/Use_OpenStreetMapAccessrestrictions NoaccessrestrictionKeyword/Tags cities,towns,places,municipalities,etc.Archivingandpreservation N/A

Table135MAKINGDATAINTEROPERABLE–LinkedOpenStreetMaps

Datainteroperability • N/A(LinkedOpenData)Standardvocabulary • Temporalontologies

• Spatialontologiesandlocations• Wikipediaentities

6.22.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldata.

Table136DATASETSECURITY–LinkedOpenStreetMaps

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

Page 106: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 106

6.22.6 EthicsandLegalrequirements

Basedon theabovedatasetdescription, thedataset “LinkedOpenStreetMaps”doesnot containpersonal data, therefore the national and European legal framework that regulates the use ofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.

6.23 LODDataset-Geographic:LinkedGeoData

6.23.1 DatasetIDENTIFICATION

Thedataset“LinkedGeoData”ispubliclyavailableandcontainsgeographicinformationforplaces,cities,countries,etc..

Table137.DATASETIDENTIFICATION–LinkedGeoData

Category GeographicDatasetDataname LinkedGeoDataDescription LinkedGeoDataisan effort toadd aspatial dimension

totheWebofData/SemanticWeb.LinkedGeoDatausestheinformationcollectedbytheOpenStreetMapprojectandmakes itavailable asan RDFknowledge baseaccording tothe LinkedData principles. Itinterlinks thisdata with other knowledge bases inthe Linking OpenDatainitiative.

Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4

6.23.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.

Table138DATASETORIGIN–LinkedGeoData

Availableat(M) M1CoreData(Y|N) NSize 8,3GB

Growth NotafixednumberTypeandformat .nt

Existingdata(Y|N) Y

Page 107: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 107

Dataorigin http://downloads.linkedgeodata.org/releases/

6.23.3 DatasetFORMAT

ThedatasetcollectsdatasinceNovember2015inEnglish.

Table139DATASETFORMAT–LinkedGeoData

Datasetstructure N-triplesDatasetformat .ntTimecoverage uptonovember2015Spatialcoverage Itconsists ofmore than 3 billion nodes and300 million ways andthe

resultingRDFdatacomprisesapproximately20billion triples.ThedataisavailableaccordingtotheLinkedDataprinciplesandinterlinkedwithDBpediaandGeoNames.

Languages EnglishIdentifiabilityofdata NoNamingconvention N/AVersioning NoversioningMetadatastandards Linkedopengeovocabulary

6.23.4 DatasetACCESS

Thedatasetispublicanditisaccessibletoalltheconsortiummembersthroughdump.

Table140MAKINGDATAACCESSIBLE–LinkedGeoData

Datasetlicense ODbL

Availability(public|private) PublicAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod dump,Toolstoaccess dumpDatasetsourceURL http://downloads.linkedgeodata.org/releases/Accessrestrictions NoaccessrestrictionKeyword/Tags cities,towns,places,municipalities,etcArchivingandpreservation N/A

Table141MAKINGDATAINTEROPERABLE–LinkedGeoData

Datainteroperability • N/A(LinkedOpenData)

Page 108: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 108

Standardvocabulary • Temporalontologies• Spatialontologiesandlocations• Wikipediaentities

6.23.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldata.

Table142DATASETSECURITY–LinkedGeoData

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

6.23.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription, thedataset“LinkedGeoData”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.

6.24 LODDataset-Geographic:GeoNames

6.24.1 DatasetIDENTIFICATION

Thedataset“GeoNames”ispubliclyavailableandcontainsgeographicinformationforplaces,cities,countries,etc.

Table143.DATASETIDENTIFICATION–GeoNames

Category GeographicDatasetDataname GeoNamesDescription The GeoNames geographical database is available for

download free of charge under a creative commonsattribution license. It contains over 10 milliongeographicalnamesandconsistsofover9millionuniquefeatures whereof 2.8 million populated places and 5.5

Page 109: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 109

millionalternatenames.Allfeaturesarecategorizedintoone out of nine feature classes and furthersubcategorizedintooneoutof645featurecodes.

Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4

6.24.2 DatasetORIGIN

ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.

Table144DATASETORIGIN–GeoNames

Availableat(M) M1CoreData(Y|N) NSize 10.6GBzippedGrowth NotafixednumberTypeandformat RDFExistingdata(Y|N) YDataorigin https://drive.google.com/file/d/0B1tUDhWNTjO-

WEZZb2VwOG5vZkU/edit?usp=sharing/

6.24.3 DatasetFORMAT

Thedatasetcollectsdatarelatedtoallcountries.

Table145DATASETFORMAT–GeoNames

Datasetstructure RDFDatasetformat RDFTimecoverage uptodateSpatialcoverage Allcountriesandpointsindegree(long&lat)Languages EnglishIdentifiabilityofdata NoNamingconvention NoVersioning dailydumpMetadatastandards geonamesvocab

Page 110: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 110

6.24.4 DatasetACCESS

Thedatasetispublicanditisaccessibletoalltheconsortiummembersthroughdump.

Table146MAKINGDATAACCESSIBLE–GeoNames

Datasetlicense CC-BY3.060Availability(public|private) PublicAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod dump,Toolstoaccess dumpDatasetsourceURL http://download.geonames.org/export/dump/Accessrestrictions NoaccessrestrictionKeyword/Tags cities,towns,places,municipalities,etc.Archivingandpreservation N/A

Table147MAKINGDATAINTEROPERABLE–GeoNames

Datainteroperability • N/A(LinkedOpenData)Standardvocabulary • Temporalontologies

• Spatialontologiesandlocations• Wikipediaentities

6.24.5 DatasetSECURITY

Thedatasetdoesnotcontainpersonaldata.

Table148DATASETSECURITY–GeoNames

PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A

60https://creativecommons.org/licenses/by/3.0

Page 111: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 111

6.24.6 EthicsandLegalrequirements

Basedontheabovedatasetdescription, thedataset“GeoNames”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.

Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.

6.25 MappingbetweenDatasetandBusinesscase

Inthefollowingtableitispossibletoseewhichareallthedatasetsthatrefertoabusinesscase.

Table149MappingDatasetandBusinesscase

id Datasetname Provider BC1 BC2 BC3 BC41 Purchaseintent Ceneje X 2 Locationanalyticsdata(hourly) Measurence X 3 Locationanalyticsdata(daily) Measurence X 4 CustomerPurchaseHistory BigBang X 5 ConsumerIntentandInteraction BigBang X 6 Locationanalyticsdata(Weekly) Measurence X 7 ContactandConsumerInteraction

HistoryBrowsetel X

8 MARS(historicaldata) ECMWF X X X X9 Productattributes Ceneje X 10 EventRegistry JSI X X X X11 Consumerdata GfK X 12 Salesdata GfK X X X 13 Productattributes GfK X X X14 Doorcounterdata Measurence X 15 Productattributes BingBang X 16 Productspricehistory Ceneje X 17 Salesdata Measurence X 18 Trafficsources(Bing) JOT X19 Trafficsources(Google) JOT X20 TwitterTrends JOT X21 Dbpedia LOD X X X X22 LinkedOpenStreetMaps LOD X X X X23 LinkedGeoData LOD X X X X24 GeoNames LOD X X X X

Page 112: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 112

Chapter7 StorageandRe-use

7.1 Storage

DataintheEW-Shoppwillbeexchangedandmadeavailablethroughatwo-tierstoragepolicy.Thepolicywillconsistof:

• Tier1:ashareddataspaceforexchangingrawinputdatabetweenConsortiumpartners.• Tier2:structureddatastoragewithintegrateddatabasedontheDataGraftplatform,which

willbeusedtoproducetheintegrateddataaccordingtoashareddatamodel.

Tier 1 will be implemented using a file or data sharing solution. It will use cloud hostinginfrastructure services toenableeasyaccessover theweb.Datawill be storedusingdatahostingserviceandsecuredatasharingprotocolstoensurethatdataarenotcompromised.

Tier2willbe implementedbasedontheDataGraftplatformwheretheshareddatamodelwillbepublishedandtheoutputdatawillbeimportedinadatabasemanagementsystemandregisteredinthecatalogue,takingintoaccounttheuseraccessrestrictionsforeachdataset.

7.2 BackupandRecovery

Back-upandrecoverymechanismswillbeimplementedonacasebycasebasiswithrespecttoeachoutputdatasets.Inputdatasetshavealreadyback-upandrecoveryinplace(whenneeded)andaredirectlymanagedbythedataproviders;therefore,nobackupand/orrecoverymechanismforinputdatasetsfallswithinthescopeoftheEW-Shoppplatform.

Theconcretedataback-upandrecoverymechanismstobeadoptedatEW-Shoppplatformlevelwillbe discussed in the future versions of theDataManagement Plan as they evolve throughout theproject, or inotherdeliverablesdealingwith technical aspects (suchas thedetaileddesignof theplatformorthebusinesscasesimplementationplans).

7.3 DataArchiving

Thedatausedandproducedduringtheprojectdevelopmentwillbeupdatedeachtimetheychangein project lifetime. For each dataset update, a reference document will also be produced. Thisdocumentwillreportthechangesofthedatasetrespecttopreviousversion.

EW-Shoppdatasetsusedinthedemonstratorwillbemaintainedforatleastfiveyearsafterprojecttermination. Sensitive data preservationwill follow the guidelines that EW-Shopp consortiumwillprovideduringtheprojectdevelopment.

Page 113: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 113

7.4 Security

The EW-Shopp frameworkwill ensure the secure storage and exchange of data in the project toprotectagainstcompromisingofsensitivedata.OneofthemaincomponentsthatwillbeusedfortheEW-ShoppframeworkandsetupofdataistheDataGraftplatform(tier2).DataGraftsecurityisimplementedonseverallayersasfollows:

1) User login – Account information is protected by a password, which is encrypted andDataGraftdoesnotstorethenon-encryptedversion.Furthermore,currentdeploymentsofDataGraft use SSL certificates enabled through the CloudFront CDN on AWS. OtherconfigurationsofSSLarealsopossibleifnecessary;

2) OAuth2 – DataGraft uses a standard implementation of RFC 6749 – token-basedauthorisationlayerforcontrolofclientaccesstoresources;

3) APIkeysfordatabase–ThepublicAPIoftheback-enddatabaseofDataGraft(OntotextS4)isaccessiblethroughanAPIkey,whichcanbecreatedandmanagedbyregisteredusersoftheplatform;and

4) Encrypted cookies – Front-end cookies containing session information are exchangedbetweenthewebUIandtheback-end.ThiscookiestoresasessionidentifierandencryptedsessiondatawhenusersareloggedintotheDataGraftPortal.

Securitywillbeconsideredadditionallyforthepurposesofdataexchangebetweenpartners(tier1)and sharing before the final data integration/publication. Theparticular securitymeasureswill betakenonacasebycasebasisbasedonthemediumfordataexchangeandthepreciseneedsofeachdataprovide.Theywillincludethefollowing:

1) Settingupsecuritypoliciesoncloudserviceproviders2) SettingupsecureFTPserverforfiletransferofanyfilesovertheInternet3) SettingupsecretSSHkeysforaccessingservers/clustersofserverswithrunningdatabases

thathostanyshareddataset

7.5 Permission

PermissionpolicieswillbeprovidedtomakeEW-Shoppcompliantwiththeprivacy-preservingdatamanagement. The platformwill provide authenticationmechanisms that ensure data security, asstated inSection7.4 (supportedbythechosendataexchangemedium intier1andtheDataGraftplatform), inorder to restrictaccess todata files to the researchpersonnel involved inEW-Shoppdevelopment

7.6 Access,Re-useandLicensing

The individual input dataset sharing canbe found inChapter 6 under "DatasetACCESS", togetherwiththe individual licenseforeachofthem.TothisendaccesswillbeprovidedtothewholeEW-ShoppConsortiumandexclusivelyfortheprojectobjectives.

Page 114: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 114

DatasetsproducedasaresultoftheprojectworkwillbesharedwithintheConsortiumandwillonlybeallowedforexternalsharingwithaconsensualConsortiumapprovaloftherelevantstakeholders,byacceptingthetermsandconditionsofuse,asappropriate.Thelicensefortheaccess,sharingandre-useofEW-Shoppmaterial andoutputdatasetswillbedefinedby theConsortiumona casebycasebasis.

Theresearchdatawillbepresentinscientificpublicationsthattheconsortiumwillwriteandpublishduringthefundingperiod.MaterialsgeneratedundertheProjectwillbedisseminatedinaccordancewithConsortiumAgreement.

Page 115: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

AnnexA–DMPSurvey

HereistheDMPsurveycontainingthequestionsaskedtodataproviders.

Topic Question

DATASETIDENTIFICATION a. Nameofthedataset.Specifyaself-explainingnameofthedataset.

b. Datasetowner/publisher/providername.Specifythenameofthebeneficiaryprovidingthedataset(orbeinginchargeofbringingitintotheproject).

c. ContactpersonSpecifynameandcontactsofthepersontobecontactedforfurtherdetailsaboutthedataset

d. Statethepurposeofthedatacollection/generationSpecifywhatisthedataaboutandwhatisthedatacurrentlyusedfor

e. ExplaintherelationtotheobjectivesoftheprojectWhichistherelatedbusinesscase?

DATASETORIGIN a. Specifythetypesandformatsofdatagenerated/collectedProvideahigh-leveldescriptionoftheconstitutingthedataset

b. Specifyifexistingdataisbeingre-used(ifany)Specifyifthedatasetisre-usingexistingdata,andfromwhere

c. SpecifytheoriginofthedataSpecify how the data in the dataset is being collected/generated.Selectoneof the followingcategories:Webscraping,Webmeteringsoftware, derived from other datasets, Instrumentation or sensors,Administrativearchives,Crowdsourcing,Surveyorcensus,Other(addexplanation),N/A

d. Statetheexpectedsizeofthedataset(ifknown)Provide a ROM estimate in case of static dataset in terms ofMB/GB/TB, or provide a ROM estimate of a dynamic dataset byselecting themostappropriate frequency in termsofMB/GB/TBperhour/day/week/months/other.

DATASETFORMAT a. Outlinethedatasetstructure(metadataprovision)Describethestructureandtypeofthedata.Forexample,describethe

Page 116: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 116

headercolumns,describetheJSONschema,RESTresponsefields,etc.

b. OutlinethedatasetformatOutlinethedatasetformat,specifyingif it isusing,forexample,CSV,Excelspreadsheet,XML,JSON,GeoJSON,Shapefile,HTTPstream,etc.

c. TimecoverageIfthedatasethasatimedimension,whatperioddoesitcover?

d. SpatialcoverageIfthedatasetrelatestoaspatialregion,whatisitscoverage?

e. LanguageLanguagesofmetadata,attributes,codelists,descriptions

f. Outline the identifiability of data and refer to standardidentificationmechanism.

Doyoumakeuseofpersistentandunique identifiers suchasDigitalObjectIdentifiers?g. OutlinenamingconventionsusedIfthedatasetisnotstatic,describehowthedatasetcanbeidentifiedifupdatedorafteraversioningtaskhasbeenperformed

h. OutlinetheapproachforversioningHow often is the data updated (No planned updating, Annually,Quarterly,Monthly,Weekly,Daily,Hourly, Every fewminutes, Everyfewseconds,Real-time)?Howismanagedtheversioning(e.g.ifdaily,everydayanewdatasetisgeneratedwiththenewlycreateddataoreverydayanewdatasetoverridestheoldonecontainingallthedatageneratedfromthebeginningofthecollection,…)?

i. Specify standards formetadata creation (if any). If there are nostandards in your discipline describe what metadata will becreatedandhow

Ifyouannotatesomemetadatatoyourdataset,pleasespecifyifyouare using any existing standard (e.g. if your dataset contain a textentryandyouannotatethetextwithmetadatabelongingtoaspecifictaxonomy,pleasespecifythetaxonomyyouarereferringto).

MAKINGDATAACCESSIBLE a. SpecifydatasetlicenseIfthedatasetisreleasedasopendata,specifythelicenseused:CC0,CC-BY, CC-BY-SA, CC-BY-ND, CC-BY-NC, CC-BY-NC-SA, CC-BY-NC-ND,PDDL, ODC-by, ODbL, Other or proprietary (please provide link ifpossible). Otherwise, specify who have access to the dataset (forexample, all partners in the consortium, some partners for thepurposeoftooldevelopment,onlyasamplewillbedisclosed,etc.).

Page 117: Data Management Plan - Ew-Shopp · 2017-09-14 · Data Management Plan Deliverable n: 2.1 Date: 30 June 2017 Status: Final Version: 1.0 Authors: Angelo Marguglio (ENG), Andrea Maurino

EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 117

b. SpecifyhowthedatawillbemadeavailableForexample,webpageinthebrowser,webservice(REST/SOAPAPIs),query endpoint, file download, DB dump, directly shared by theresponsibleorganisation,etc.

c. Specifywhatmethodsorsoftwaretoolsareneededtoaccessthedata?Isdocumentationaboutthesoftwareneededtoaccessthedataincluded?Isitpossibletoincludetherelevantsoftware(e.g.inopensourcecode)?

d. Specifywherethedataandassociatedmetadata,documentationandcodearedeposited(providedatasetsourceURL,ifapplicable)

e. Specify how access will be provided in case there are anyrestrictions

f. Theme/tagsCategorize the dataset and provide some relevant keywords/tags.Selectoneof the followingcategories: "product categories", "price","consumerelectronics","other(addexplanation)"

g. Archivingandpreservation(includingstorageandbackup)Descriptionof theprocedures thatwillbeput inplace for long-termpreservationof thedata. Indicationof how long thedata should bepreserved,whatisitsapproximatedendvolume,whattheassociatedcostsareandhowtheseareplannedtobecovered.

MAKINGDATAINTEROPERABLE

a. Assess the interoperability of your data. Specify what data andmetadata vocabularies, standards or methodologies you willfollowtofacilitateinteroperability.

b. Specifywhetheryouwillbeusingstandardvocabularyforalldatatypes present in your data set, to allow inter-disciplinaryinteroperability? If not, will you provide mapping to morecommonlyusedontologies?

DATASECURITY a. Address data recovery aswell as secure storage and transfer ofsensitivedata

b. Refertoothernational/funder/sectorial/departmentalproceduresfordatamanagementand/orprivacythatyouareusing(ifany)

c. DataprivacySpecifyifthedatasetincludespersonallyidentifiableinformation(PII).Are data anonymised? If so, what is the used technique? Does thedataownercollectprivacypermissiontoelaboratedata(andwhatarethelimitations)?DoesyourdataincludeIPaddresses?


Recommended