PangeaMT - · PangeaMT Manuel’Herranz’ PangeaMT : A Solution Built by the Language...

Post on 24-Sep-2020

0 views 0 download


PangeaMT Manuel  Herranz  

PangeaMT : A Solution Built by the

Language Industry for Language as a Business

#manuelhrrnz    #pangeanic E:  pangeanic  

Why machine Translation?

The  Data  Deluge  ü  As  of  May  2009:  487  Billion  gigabytes  or  1,000,000,000  *  487,000,000,000  =  4,87  x  1020  ü  EsAmates  §   Up  50%  a  year  (Oracle)  §   Doubles  every  11  hours  (IBM)  

As  Content  Volume  Explodes,  Machine  TranslaCon  Becomes  an  Inevitable  Part  of  Global  Content  Strategy    hDp://    

§  In  2011,  it  took  about  two  days  for  the  world  to  create  the  same  5  exabytes  of  data  that  it  took  human  eons  to  generate.    

§  In  2013,  it  took  the  world  just  10  minutes  to  create  5  exabytes.  

 §  Humankind  has  stored  more  than  295  

billion  gigabytes  (or  295  exabytes)  of  data  since  1986  

                                                     ComputerWorld  -­‐  2011    

§  Researchers  at  the  University  of  California,  Berkeley,  that  found  the  amount  of  data  generated  from  the  dawn  of  Ame  through  2002  was  about  5  exabytes.  

Where is data stored?

MT Usage Machine  TranslaAon  applicaAon,  NEW  usage  and  success  depend  on  

ü  MT  for  assimilaCon:  “gisCng”  or  “understanding“                

Sports   Politics  

Social   etc  

Output  format  

•  Prac?cally  unlimited  demand;  but  free  web-­‐based  services  reduce  incen?ve  to  improve  technology  

•  Coverage  +  important.  Instant  quality  ü  MT  for  disseminaCon:  “publicaCon“  

ü  MT  for  direct  communicaCon  

Output  format  

Sports   Politics  

Social   etc  

•  Publishable  quality  that  can  only  be  achieved  by  humans.  MT  &  tools  a  produc?vity  booster  

Output  format  

Output  format  

Sports   Politics  

Social   etc  •  Current  R&D,  Military  uses  systems  for  

spoken  MT,  first  applica?ons  for  smartphones,  online  help,  mul?lingual  chat  systems  

PangeaMT System – Domain Creation

PangeaMT System – Data Cleaning

PangeaMT System – Engine Creation

PangeaMT System – Engine Training

A Success Story Sony  Professional  Europe,  Salomé  Lopez-­‐Lavado  Needs  -­‐  Improve  

publicaCon  French,  Italian,  Spanish  

-­‐  8M  words  training  set  

-­‐  Cme-­‐to-­‐market:  from  3  days  down  to  1,5  days:  html,  InDesign,    

-­‐  Outsourcing  cost:  -­‐20%  

-­‐  Volume:  1,5M  words/year  

Japanese  AutomoCve  manufacturer  -­‐  Spanish  -­‐  8M  words/year  -­‐  Time  to  market  

reduced  by  2  week  –  3  weeks  from  8  to  6  or    5  weeks  

-­‐  Team  of  17  freelancers  down  to  4-­‐7  post-­‐editors  

-­‐  Outsourcing  cost:  -­‐30%  

Spanish  LSP  working  for  banking  sector  -­‐  Spanish  -­‐  1-­‐2M  words/year  -­‐  Time  to  market:  1-­‐

week  to  2  days!!!!  -­‐  Docx,  html,  tmx  -­‐  Down  from  2-­‐3  in-­‐

house  staff  and  2-­‐3  freelancers  to  2  in-­‐house!!!  

Successfully  applied  (third-­‐party  applicaCons  /  beneficiaries)  

Use Case -

✔  Even with small data sets!!

•  PangeaMT can be self-hosted when data security is critical (all processes internal to the organization) - commercially sensitive data, - financial, legal, institutional, - intelligence, knowledge-gathering, - product pre-release, etc

•  Control Panel + full system statistics

•  Re-trainings and updates by the client for data privacy / more accuracy

Potential Uses of Machine Translation

•  Information discovery: patent, unknown documents,

•  Automatic, on-demand creation of foreign language versions / web apps – keyword testing

•  multilingual crawling, data discovery

•  Pre-translation

Potential Uses of Machine Translation

Myth:  MT  will  never  be  as  good  as  humans  

       “We  cannot  solve  the  problem  using  the  same  tools  and  the  way  of  thinking  that  created  it”        A.  Einstein    

uhmmm,  it  is  going  to  get  really  good...  

2nd  stage  PE  material  and  more  data  make  engines  even  

more  predictable.  More  specialist  engines  

3rd  stage  Beyond  2030...  no  predictions  

1st  stage  We  are  creating  usable  engines,  ?irst  PE  

experiences  2009-­‐2015  or  2020  

PangeaMT Manuel  Herranz  



#manuelhrrnz    #pangeanic E:  pangeanic