+ All Categories
Home > Technology > Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Date post: 11-Jul-2015
Category:
Upload: kimmens
View: 212 times
Download: 1 times
Share this document with a friend
23
Managing the Consistency of (Evolving) Informa8on Systems with Intensional Views and Rela8onal Algebra applied to a case of IP phone localisa1on David Colpaert, Kim Mens & Bernard Lambeau Presented at BENEVOL 2014, Amsterdam — 28 November 2014 based on David Colpaert’s Master Thesis in Computer Science at UCL, Belgium — 19 June 2014
Transcript
Page 1: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Managing  the  Consistency  of  (Evolving)  Informa8on  Systems  with  

Intensional  Views  and  Rela8onal  Algebra    

applied  to  a  case  of  IP  phone  localisa1on  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  Presented  at  BENEVOL  2014,  Amsterdam  —  28  November  2014  

based  on  David  Colpaert’s  Master  Thesis  in  Computer  Science  at  UCL,  Belgium  —  19  June  2014  

Page 2: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Managing  the  Consistency  of  (Evolving)  Informa8on  Systems  with  

Intensional  Views  and  Rela8onal  Algebra    

applied  to  a  case  of  IP  phone  localisa1on  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  Presented  at  BENEVOL  2014,  Amsterdam  —  28  November  2014  

based  on  David  Colpaert’s  Master  Thesis  in  Computer  Science  at  UCL,  Belgium  —  19  June  2014  

MAY CONTAIN

TRACES OF

FRENCH

Page 3: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau   2/20  

Intro-­‐duc8on  

Ini8al  Solu8on  

Case  Study   Valida8on   Improved  

Solu8on  Intensio-­‐nal  Views   Valida8on   Conclusion  

Ini8al  Solu8on   Improved  Solu8on  

Page 4: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

When  evolving,  migra8ng  or  merging  databases,  how  to  detect  poten8al  inconsistencies  that  may  exist  in  the  data?  

•  Data  coming  from  mul8ple  contradictory  or  incomplete  sources  •  Preferably  via  an  easy-­‐to-­‐understand  graphical  user  interface  

Case  study  :  localisa8on  of  IP  telephones  at  a  university  

Need  a  generic  tool  to  describe  and  detect  consistency  rules  

3/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  Views   8.  Valida8on   9.  Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 5: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Localise  IP  telephones  at  a  university  in  case  of  emergency  calls  •  By  merging  data  coming  from  different  sources  •  Via  automated  scripts  • While  iden8fying  poten8al  errors  in  the  data  

Generic  tool  for  managing  IP  telephones  

Web  Interface  

4/20  

1.  Plan   2.  Intro   3.  Ini0al  Solu0on   4.  Case  Study   5.  Valida8on   6.  Improved  

Solu8on  7.  Intensional  

Views   8.  Valida8on   9.  Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 6: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

3  sources  for  localisa8on  data:  • Via  the  «  deployment  »  Excel  files  • Via  the  network  (IP  switches)  • Via  the  telephone  exchange  system  MX1  and  SAP  system  

Merging  

MX1+SAP  

Network  

Deployment  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  

Solu8on  7.  Intensional  

Views   8.  Valida8on   9.  Conclusion  

Numéro  UCL  

Bâ0ment   Local  

UA-­‐00001   SC16   A  001  

Extract  of  the  deployment  file  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau   5/20  

Page 7: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

6/20  D.  Colpaert,  K.  Mens  &  B.  Lambeau  

Page 8: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

3  sources  for  localisa8on  data:  • Via  the  «  deployment  »  Excel  files  • Via  the  network  (IP  switches)  • Via  the  telephone  exchange  system  MX1  and  SAP  system  

One  script  per  source  

Merging  of  the  data  from  these  sources  

Scripts  and  merges  executed  daily  

20.000  lignes  of  code  

Currently  used  in  produc8on  at  the  university  Merging  

MX1+SAP  

Network  

Deployment  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  

Solu8on  7.  Intensional  

Views   8.  Valida8on   9.  Conclusion  

Numéro  UCL  

Bâ0ment   Local  

UA-­‐00001   SC16   A  001  

Extract  of  the  deployment  file  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau   7/20  

Page 9: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra
Page 10: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

9/20  D.  Colpaert,  K.  Mens  &  B.  Lambeau  

Page 11: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Mul8ple  errors,  inconsistencies  and  lacking  informa8on  •  In  each  of  the  sources  individually  

• When  merging  the  data  

Errors  logged  in  files  • Difficult  to  manipulate  • Difficult  to  understand  • Difficult  to  solve  • Hard  to  see  the  «bigger  picture»  

10/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida0on   6.  Improved  Solu8on  

7.  Intensional  Views   8.  Valida8on   9.  Conclusion  

[ERROR]  Le  local  pour  le  numéro  de  téléphone  43325  est  manquant  à  la  ligne  34.  

[WARNING]  Un  déploiement  existe  déjà  pour  le  numéro  UCL  TA-­‐00803  à  la  ligne  47.    

[ERROR]  Iden8fiant  du  switch  non  trouvé  :  SalleOleffe01281098  (172.31.28.137)  pour  l'adresse  MAC  :  00:08:5d:35:32:cc  

[ERROR]  Le  bâ8ment  pour  l'adresse  MAC  00:08:5d:35:3c:d2  est  manquant  

[ERROR]  Le  bâ8ment  Logement  456  ne  peut  être  conver8  en  un  code  ba8ment  car  ce  nom  est  inconnu  dans  la  table  buildings  

[ERROR]  Le  numéro  de  téléphone  73999  n'existe  pas  dans  le  fichier  SAP  

Examples  of  some  errors  and  warnings  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 12: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Mul8ple  errors,  inconsistencies  and  lacking  informa8on  •  In  each  of  the  sources  individually  

• When  merging  the  data  

Errors  logged  in  files  • Difficult  to  manipulate  • Difficult  to  understand  • Difficult  to  solve  • Hard  to  see  the  «bigger  picture»  

10/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida0on   6.  Improved  Solu8on  

7.  Intensional  Views   8.  Valida8on   9.  Conclusion  

MISSING  DATA  

INCONSISTENT  DATA  

MISSING  DATA  

MISSING  DATA  

INCORRECT  DATA  

MISSING  DATA  Different  kinds  of  inconsistencies  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 13: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

10/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida0on   6.  Improved  Solu8on  

7.  Intensional  Views   8.  Valida8on   9.  Conclusion  

Errors  in  the  deployment  files   Errors  in  the  network  data  

Errors  in  the  MX1+SAP  data   Errors  in  the  merged  data  

Page 14: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

11/20  

Expressing  constraints  on  the  data  • Over  mul8ple  tables  and  fields  • While  filtering  irrelevant  entries  

Detec8ng  and  inspec8ng  inconsistencies  • with  respect  to  these  constraints  

Simplicity  of  expression  without  sacrificing  expressiveness  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu0on  

7.  Intensional  Views   8.  Valida8on   9.  Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 15: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

12/20  

Combine  2  exis8ng  ideas:  

•  Intensional  Views  • Rela8onal  Algebra  Querying  with  Alf  

Make  a  generic  tool  for  defining  and  checking  constraints  over  the  data  

Via  an  easy-­‐to-­‐use  user  interface  

Valida8on  by  applying  it  to  our  case  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  

Views  8.  Valida8on   9.  Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 16: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Intensional  Views  

Originally  designed  for  sovware  (code)  quality  assurance  purposes  

Allows  expressing  and  verifying  structural  source-­‐code  regulari8es  

Reuse  this  idea  for  expressing  and  detec8ng  database  constraints  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  

Views  8.  Valida8on   9.  Conclusion  

Page 17: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Alf  (www.try-­‐alf.org)  

A  database  query  language  based  on  rela8onal  algebra  

project,  restrict,  join,  union,  intersect,  

minus,…  

«  join_on(le6_table,  right_table,  [:mac])  »  

Vues  intensionnelles  

Ini8alement  des8né  à  la  maintenance  

logicielle  

Vérifier  des  contraintes  sur  un  

code  source  

Ici,  contraintes  sur  une  base  de  données  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  

Views  8.  Valida8on   9.  Conclusion  

It is also possible to define some additional filters inorder to consider only a subset of the data, for instance, onlyconsidering one particular building. This can be useful, forexample, when analyzing very large databases with lots ofinconsistencies, and the user wants to inspect the inconsisten-cies for a particular subset of the data only. In our particularexample, we didn’t apply any such filters.

Finally, when the user clicks on the ‘Check constraint’button, three different Alf queries are generated. The firstone is a query to find the positive results, i.e. all tuples thatsatisfy the declared constraint. A second query will calculatethe mismatches in the source table, i.e. all tuples in the sourcetable that do not satisfy the declared constraint. A third querycalculates the mismatches in the target table.

The generated Alf query for the positive results lookssomewhat like this:

r e s t r i c t (jo in on ( s o u r c e t a b l e , t a r g e t t a b l e ,

common key ) ,eq ( : s o u r c e t a b l e b u i l d i n g ,

: t a r g e t t a b l e b u i l d i n g ) &eq ( : s o u r c e t a b l e r o o m ,

: t a r g e t t a b l e r o o m ) )

From this query it can be observed that the two concernedtables are first joined based on their common key, and thenthe results are restricted to the tuples satisfying all conditions,i.e. that the buildings and rooms must be equal. In reality,the actual generated query 2 is a bit more complex than this,to take into account custom mappings (in our example, forinstance, there is no common key but the correspondencebetween MAC addresses and UCL ID’s needs to be lookedup in an intermediate table), renaming (for instance, whentwo corresponding fields have a different name in the differenttables), and filters (an extra restriction based on the specifiedfilters should be applied).

Each of these generated queries are then executed throughAlf. As exemplified by Figure 1, positive results are displayedin the table at the bottom center of the GUI, whereas negativeresults are shown on the bottom left and right, respectively.(For non-bidirectional relations there will no table either onthe left or on the right.)

In our example, we see that only one phone (the onewith MAC address 00:08:5d:00:00:01 and UCL-ID UA00001)satisfies the constraint of having the same location in bothsources. For all other phones, we find inconsistencies and theythus end up in the negative results. A negative result means thateither the building or room was different in the other table, orthat no correspondence whatsoever was found for this phonein the other table.

Whereas the presented positive and negative results al-ready provide a lot of useful information about detected(in)consistencies in the data, they are not always easy to inter-pret by the end-user because they are not shown in the contextof the original tables. For this purpose, our tool providesan alternative highlighted view which simply highlights thedetected (in)consistencies in the original tables. To open this

2More details on the query generation process can be found in [8].

Fig. 2. Inspecting data (in)consistencies with the highlighted view.

view it suffices to click on the button ‘Highlighted view’ atthe bottom of the intensional view editor.

Figure 2 illustrates what this highlighted view would looklike for our previous example. It displays each of the concernedtables, that is, the source and target tables but also theintermediate table used for defining the key mapping. For eachof these tables the tuples are coloured either in red if theycorrespond to an inconsistency, in green if they correspond toa positive result, or just appear in white if the tuple is notconcerned by this particular constraint.

In our example, we see that three tables are concerned. Thelocations from the network and from deployments, but also theintermediate attribution table which maps MAC addresses tophone IDs. The only positive case appears in green, all othersin red. One element in the attributions table appears in whitebecause no element in either the network or deployments tablehad such MAC address or UCL-ID.

Using the highlighted view we can observe, for in-stance, that the information for the phone with MAC address00:08:5d:00:00:02 and UCL-ID UA00002 is inconsistent, sinceit appears with location SC052–A003 in the network table,whereas it has location SC051–A 002 in the deployments table.

IV. VALIDATION

As explained above, intensional views allow the end-user to declare high-level constraints between data sourceswith relative ease, and reported inconsistencies can then beinspected in two different views to help him identify the causesof the inconsistencies.

In our actual case study, containing the data for about6500 phones, many inconsistencies were found, such as miss-ing phones, missing information for a given phone, missingmappings between phone IDs and their MAC address, andunknown buildings. All these inconsistencies can be foundwith our tool. The amount of phones dealt with and the amountof inconsistencies discovered were simply too large to behandled manually, which was the prime the reason for creatingthis intensional view tool for analyzing data inconsistencies.

For the constraint declared in the previous section, forexample, when applied to the 6500 IP phones in use at theuniversity, comparing locations in network and deployments

Page 18: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

If  8me  (and  beamer)  permit:  video  of  

comparing  localisa8on  from  network  and  deployment  sources  

15/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  

Views  8.  Valida8on   9.  Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 19: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

1.  Plan   2.  Intro   3.  Objec8fs  ini8aux   4.  Cas  d’étude   5.  Valida8on   6.  Nouveaux  

objec8fs  

7.  Vues  intensionnel

les  8.  Valida8on   9.  Conclusion  

Page 20: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  

Views  8.  Valida8on   9.  Conclusion  

17/20  David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 21: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

•  1322  posi8ve  results  •  2787/4098  (68%)  nega8ves  in  the  network  source  •  2057/3368  (61%)  nega8ves  in  the  deployment  source  

Consistency  of  localisa8on  data  (network  vs.  deployment)  

•  ~1000/7512  (13%)  cases  

Iden8cal  localisa8on  data  in  all  three  sources  

•  104/7512  (1%)  cases  

Contradictory  data  in  all  three  sources  

•  701/4098  (17%)  cases  

Missing  ayribu8ons  in  the  network  source  

18/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  Views   8.  Valida0on   9.  Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 22: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

Achieved  Objec8ves:  •  New  approach  to  express,  verify  and  visualise  data  constraints  •  Combining  intensional  views  and  rela8onal  algebra  

Possible  Improvements:  •  Contraints  on  more  than  two  tables  •  Increase  expressivity  (aggrega8on,  user-­‐def.  pred.,  devia8ons,  logic  queries)  •  Ergonomy  of  the  user  interface,  efficiency  improvements,  …  

Cross  Fer8lisa8on:  •  Think  out  of  the  box  •  Apply  old  ideas  to  new  domains  (here  we  applied  code  tools  to  data)  •  (Could  DB  tools  also  be  applied  to  code  by  seeing  it  as  structured  data?)  

19/20  

1.  Plan   2.  Intro   3.  Ini8al  Solu8on   4.  Case  Study   5.  Valida8on   6.  Improved  Solu8on  

7.  Intensional  Views   8.  Valida8on   9.  

Conclusion  

David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  

Page 23: Managing the Evolution of Information Systems with Intensional Views and Relational Algebra

20/20  David  Colpaert,  Kim  Mens  &  Bernard  Lambeau  


Recommended