Date post: | 17-Feb-2017 |
Category: |
Business |
Upload: | konstantin-dranch |
View: | 101 times |
Download: | 0 times |
Can Big Data from the CloudRevolutionize Translation
Metrics?
Quick intro• 2010: Memsource founded• 2015: 50,000 users & 100+ million words translated monthly• Some of the world’s largest translation providers and buyers are
customers
SEGA FUJIFILM
Cloud tools lead to Big Data Server tools – private data silos Cloud tools – centralized data
And the clouds are getting bigger…
In May alone, users processed 0.8 billion words in Memsource
…Which opens opportunities for benchmarking and trendwatching
Impact
• Find market pain points
• Usage stats
• Universal performance metrics
• Eliminate free tests
• ROI tracking• Identify
synergies
• Higher margins• Real-time benchmarking• Notifications that help
manage operations
Translation companies Buyers
Technology providers
Freelancers and Project managers
Example problem - quality• Free testing• Since the end of LISA everyone has a unique quality metric
• Can we embed a certain standard into the tool itself?
Freelancer profile on Upwork.com
Our analytics building blocks
SQL technology
Visualization: 400 filters
Legacy solution
Visualization: about 20 filters
Kibana console look
So what can we track there?• In theory, anything:
• Translation data• Productivity• Business analytics• Notifications
• In practice (challenges):• Data clean-up• Relevance• Interpretation
Translation memory used for 85% of jobs
Users save 10 to 40% with TM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 500%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
overall TM Leverage by top 50 volume users
repetitions tm.match101 tm.match100 tm.match95 tm.match85 tm.match75 tm.match50 tm.match0
users
Data for jobs where post-editing analysis has been performed, December 2015 - May 2016
Sample9 bn words
Savings approx. $300 million
MT is currently used on 31% of projects
Top MT Engines
ENGINE %
Microsoft with Feedback 15.8%
Microsoft Translator Hub 9.9%
Google Translate 2.6%
Microsoft Translator 2.5%
SDL BeGlobal 0.4%
Other 0.6%
MT not used 68.2%
Up to 80% content pasted from MT then edited
Sample size 20 million words, December 2015 - May 2016
en:es pt:en en:pt es:en en:ru ru:en en:de pt:es en:fr es:pt0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
EDIT DISTANCE FOR MAJOR LANGUAGE PAIRS
mt.match100 mt.match95 mt.match85 mt.match75 mt.match50 mt.match0
SAMPLE LANGUAGE PAIRS
% O
F W
ORD
S IN
SEG
EMEN
TS F
ROM
MT MT not used
Raw MT
Moderate edits
Heavily edited
Many linguists translatemore than 10 pages a day consistently
2 30 58 86 1141421701982262542823103383663944224504785065345625906186466747027307587868148428708989269549820
200
400
600
800
1000
1200
1400
1600
Top 1000 Linguist role Productivity, Pages in April 2016
Users
Page
s Com
plet
ed i
n Ap
ril
Norm:8 pages a day x 20 days
20 pages a day
Probably not human translation
10 pages a day
Project manager productivity
Renato Joana Kris John Bill Robert Alex Sandor Dave Millingan Mihiko Olga Barbora0
50
100
150
200
250
300
350
400
450
408
325313
263
159143
122
74 68 63
3110 5
Job Created by PMs and Completed by Linguists in the last 30 days
– test organization
Benchmarking possibilities
1 or less from 1 to 10 from 11 to 100
from 101 to 200
from 200 to 300
from 300 to 400
from 400 to 500
from 500 to 600
from 501 to 1000
from 1001 to 2000
more than 2000
0
100
200
300
400
500
600
700
800
674
440 428
94
3713 9 5 12 10 7
PM Productivity, Completed Jobs Per Month
Number of jobs completed
Num
ber o
f use
rs
December – May 2016
Top 10%
Project manager productivity
Renato Joana Kris John Bill Robert Alex Sandor Dave Millingan Mihiko Olga Barbora0
50
100
150
200
250
300
350
400
450
408
325313
263
159143
122
74 68 63
3110 5
Job Created by PMs and Completed by Linguists in the last 30 days
Top 10% of Global PM User Population
“In fact, Big Data applications are bound only by the human imagination”.
Peter Pham
What you can do now• What to track?• How can organizations benefit from each other’s data?• Which data should not be shared?