+ All Categories
Home > Technology > Scoda openrefine-directordata

Scoda openrefine-directordata

Date post: 27-Jan-2015
Category:
Upload: tony-hirst
View: 102 times
Download: 0 times
Share this document with a friend
Description:
 
17
A recipe for grabbing director informa-on from OpenCorporates using OpenRefine given an OpenCorporates company ID or OpenCorporates company page URL For more informa<on, contact: schoolOfData.org 1
Transcript
Page 1: Scoda openrefine-directordata

A  recipe  for  grabbing  director  informa-on  from  OpenCorporates  using  OpenRefine  given  an  OpenCorporates  company  ID  or  OpenCorporates  company  page  URL    

For  more  informa<on,  contact:  schoolOfData.org  

1  

Page 2: Scoda openrefine-directordata

Here’s  the  start  of  thing  we’re  star<ng  with  –  a  list  of  companies…  

2  

Page 3: Scoda openrefine-directordata

Here’s  the  sort  of  thing  we  want  –  lists  of  directors  associated  with  each  company  (where  that  informa<on  is  available).  

3  

Page 4: Scoda openrefine-directordata

The  first  step  is  to  create  a  web  address/URL  to  call  the  OpenCorporates  API  and  ask  it  for  data  about  a  par<cular  company.  OpenRefine  can  create  a  new  column  populated  with  the  contents  of  calls  made  to  a  URL  contained  in,  or  generated  from,  another  column.  

4  

Page 5: Scoda openrefine-directordata

The  URLs  should  take  the  form:  

h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID  

If  you  already  have  company  page  URLs  in  a  column,  add  column  based  on  that  column  using:  value.replace(‘h"p://’,’h"p://api”)  

If  you  have  JURISDICTION/COMPANY_ID  in  a  column,  use  the  formula:  “h"p://api.opencorporates.com/companies/”+value  

5  

Page 6: Scoda openrefine-directordata

The  data  comes  back  as  JSON  data,  which  we  will  need  to  process.  

Each  JSON  result  contains  the  data  for  a  single  company.  The  data  rela<ng  to  the  directors  can  be  found  as  a  list  down  the  path  value.parseJson()['results']['company']['officers’]  

6  

Page 7: Scoda openrefine-directordata

Let’s  parse  the  JSON  data  an  put  the  directors  informa<on  into  another  column…  

7  

Page 8: Scoda openrefine-directordata

What  we  are  aiming  for  is  a  contrivance  based  on  the  form:  

32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null  32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22  32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null  …  

where  we  list  director  ID,  name,  posi<on,  appointment  date,  termina<on  date.  

8  

Page 9: Scoda openrefine-directordata

This  func<on  will  parse  the  data  into  string  with  the  form:  

32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null||32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22||32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null||…  

The  func<on  reads  as  follows:  “for  each  officer,  join  their  ID,  name,  posi<on,  start  date  and  end  data  with  ::,  then  join  each  of  these  director  descrip<ons  using  ||”.  

The  use  of  two  different  –  and  hopefully  unique  –  delimiters  means  we  can  split  the  data  on  each  delimiter  type  separately.  

9  

Page 10: Scoda openrefine-directordata

The  parsed  data  is  put  into  a  new  column  in  this  combined  list  form.  

10  

Page 11: Scoda openrefine-directordata

We  can  then  split  the  data  so  that  we  create  a  new  row  for  each  director  using  the  delimiter  we  defined:  ||  

11  

Page 12: Scoda openrefine-directordata

Note  that  values  from  the  other  columns  will  not  be  copied  into  any  newly  created  rows  –  we  will  have  to  do  that  ourselves  either  now,  or  later.  

12  

Page 13: Scoda openrefine-directordata

For  each  director,  we  now  want  to  split  their  details  out  across  several  columns,  one  for  each  data  field  (ID,  name,  posi<on,  appointment  date,  termina<on  date).  

13  

Page 14: Scoda openrefine-directordata

We  can  do  this  by  splijng  on  the  other  separator  type  we  used:  ::  

14  

Page 15: Scoda openrefine-directordata

The  newly  created  columns  are  labeled  with  automa<cally  generated  names.  It  would  probably  make  sense  to  rename  them  to  something  slightly  more  convenient.  

15  

Page 16: Scoda openrefine-directordata

Finally,  we  can  do  a  likle  more  <dying.  For  any  columns  we  want  to  export,  such  as  company  name,  or  company  ID,  we  can  Fill  down  using  the  corresponding  values  from  the  original  row  the  directors’  informa<on  was  pulled  from.  

16  

Page 17: Scoda openrefine-directordata

If  you  want  to  know  more,  contact  us…  

17  


Recommended