+ All Categories
Home > Documents > Mathematics, Algorithms, Technologies and Products

Mathematics, Algorithms, Technologies and Products

Date post: 12-Jan-2023
Category:
Upload: pitt
View: 0 times
Download: 0 times
Share this document with a friend
34
Copyright 2011 Trend Micro Inc. Classification 8/2/2013 1 Mathematics, Algorithms, Technologies and Products Liwei Ren, Ph.D Data Security Research, Trend Micro March, 2013, NUS, Singapore
Transcript

Copyright 2011 Trend Micro Inc.Classification 8/2/2013 1

Mathematics, Algorithms, Technologies and Products

Liwei Ren, Ph.D

Data Security Research, Trend Micro

March, 2013, NUS, Singapore

Copyright 2011 Trend Micro Inc.

Agenda

• Introduction

• Mathematical Problems Products

• Amazing Stories of Mathematical Success

• My Practices

• Conclusions

• Q&A

Classification 8/2/2013 2

Copyright 2011 Trend Micro Inc.

Introduction

Classification 8/2/2013 3

• Liwei Ren– Education

• MS/BS in mathematics, Tsinghua University, Beijing, China

• MS in information science, University of Pittsburgh , USA

• Ph.D in mathematics, University of Pittsburgh, USA

– Research Interests

• Data security, differential compression, storage optimization, and fast data transfer protocols.

– Major Works

• N academic papers, M patents and K startup company where N≥10, M ≥15 and K=1

• Trend Micro™ – Global security software company with headquarter in Tokyo, and R&D centers in

Silicon Valley, Nanjing, and Taipei.

– One of top cloud security vendors

– One of top 3 anti-malware vendors

Copyright 2011 Trend Micro Inc.

Introduction

Classification 8/2/2013 4

• Why am I visiting NUS?– To share how mathematics can make differences in industry with

successful technologies and products

– To share how mathematicians can make significant contributions to the society in industry.

• What do you think about the role of mathematics in the world?– A philosophy

– An academic discipline

– An art of abstract elegance and supreme beauty

– A language for describing sciences

– A tool for our life

– All of above

Copyright 2011 Trend Micro Inc.

Introduction

5

• Why did you become a mathematician?– To pursue truth and virtue

– I enjoy it and it is a part of my life

– I can make most contributions to the world with my talent

– To teach mathematics with joy

– To publish many papers and have great achievement

– It is a way of making a living

– To solve practical problems as an applied mathematician

Copyright 2011 Trend Micro Inc.

Introduction

6

• In the era of Internet and cloud computing, – There are so many challenging mathematical problems.

– Applied mathematicians and software engineers invent advanced technologies by solving real mathematical problems.

– Some of them even go further to found start-up companies with their new inventions.

Copyright 2011 Trend Micro Inc.

Mathematical Problems Products

• Mathematical problems Products ?

• How?

• Two general approaches:– Top-down

– Bottom-up

Classification 8/2/2013

Copyright 2011 Trend Micro Inc.

Mathematical Problems Products

Top-Down Approach

Classification 8/2/2013 8

Copyright 2011 Trend Micro Inc.

Mathematical Problems Products

Bottom-Up Approach

Classification 8/2/2013 9

Copyright 2011 Trend Micro Inc.

Amazing Stories of Mathematical Success

• Some mathematicians and computer scientists from Universities built successful hi-tech companies with algorithmic technologies.

• Three excellent examples.

Classification 8/2/2013 10

Copyright 2011 Trend Micro Inc.

Amazing Stories of Mathematical Success

• RSA Security, Inc.– Founded by 3 applied mathematicians Ron Rivest, Adi Shamir and Len

Adleman in 1982

– Key technology: RSA public key cryptography algorithm

– Industry Sector: Data Security Software

– Affiliation: MIT

– Excellence in Mathematics Award at RSA Conference

– Acquired by EMC with $2.1B in 2006

Classification 8/2/2013 11

Copyright 2011 Trend Micro Inc.

Amazing Stories of Mathematical Success

• Akamai Technologies, Inc– Founded by mathematicians Prof. Tom Leighton and his student Daniel

M. Lewin in 1998

– Key technology: dynamic content routing algorithms

– Industry Sector: content delivery network (CDN)

– Affiliation: MIT

– A Mathematical Success Story, SIAM News: Vol 32, Num 10, 1999

– Akamai is a public company (Nasdaq: AKAM) with revenue $1.27B in 2012.

Classification 8/2/2013 12

Copyright 2011 Trend Micro Inc.

Amazing Stories of Mathematical Success

• Data Domain, Inc.– Founded by computer scientist Prof. Kai Li and others in 2001

– Key technology: data de-duplication algorithms

– Industry Sector: computer storage.

– Affiliation: Princeton University

– Also acquired by EMC with $2.5B in 2009

Classification 8/2/2013 13

Copyright 2011 Trend Micro Inc.

My Practices

• Let me share my mathematical practices in industry.

• My relevant experience in software industry– Sr. software engineer, 2 IT companies, 1996 -- 2002

– Principal research engineer, InnoPath Software, 2002– 2005

– Chief scientist & co-founder, Provilla Technologies, 2005– 2007

– Sr. architect and research director, Trend Micro, 2007– present

• Two technical domains with mathematical practices:– Data Loss Prevention (DLP)

– Firmware Over The Air (FOTA)

Classification 8/2/2013 14

Copyright 2011 Trend Micro Inc.

My Practices

• Two simple yet valuable problems to share with you:

1. Near duplicate document identification (NDDI)

2. Differential compression for executable files (DCE)

15

NDDI ODCE

Math Model Textual Fixed Points Secondary Code Change

Algorithm DataDNA Secondary Change

Removal

Technology Document Fingerprinting Differential Compression

of Executables

Product LeakProof™ DeltaUpgrade™

Technical domain Data Loss Prevention

(DLP)

Firmware Over the Air

(FOTA)

Company Provilla / Trend Micro InnoPath

Contribution Created a company with

many jobs

Better FOTA for 30

million phones

Copyright 2011 Trend Micro Inc.

My Practices

• NDDI (near duplicate document )is a fundamental problem that must be solved for a DLP system.

• Problem Definition:– Let S= { T1, T2, …,Tn} be a set of known texts

– Given a query text T, one needs to identify one or more documents t ϵ S such that T and t share common textual content significantly.

Classification 8/2/2013 16

• A technology solving this problem is named as document fingerprinting.

Copyright 2011 Trend Micro Inc.

My Practices

• Alternate Problem Definition:– Let S= { T1, T2, …,Tn} be a set of known texts

– Given a query text T and X%, one needs to identify one or more texts t ϵ S such that SIM(T,t)≥X%

Classification 8/2/2013 17

where SIM(x,y) is a similarity function that needs to be defined mathematically.

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Modeling– Observation:

• Across multiple versions of a text, many characters are not changed with respect to their neighborhood

• For example:

– … The research required to solve mathematical problems can take years or even centuries of sustained inquiry…

– …………The research required to solve mathematical problems can take many years of sustained inquiry…

– Textual Fixed Points:

• If a character and its neighborhood as a textual string exist in two texts, this character is a fixed point of the two texts.

• Two near duplicate texts have many fixed points.

– We only need a subset for the efficiency.

• One needs to extract a subset of fixed points from a given text T and generate hash values from their neighborhood.

• Lets denote the extracted subset of fixed points as FS(T).

Classification 8/2/2013 18

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Modeling– The concept of near duplicate texts can be presented as:

• FS(T1) ∩ FS(T2) ≠ Φ

– The NDDI problem can be described as:

• Given a query text T, one needs to identify one or more documents t ϵ S such

that | FS(T) ∩ FS(t) | ≥ M where M is a pre-defined integer.

Classification 8/2/2013 19

Copyright 2011 Trend Micro Inc.

My Practices

• A solution is an algorithm to solve the problem described by this mathematical model. – In industry, the corresponding technology is called document

fingerprinting

– We designed two different fingerprinting technologies over the years.

• DataDNA 1.0– Liwei Ren & el., US patent 7516130, Matching engine with signature generation.

– Liwei Ren & el., US patent 7747642, Matching engine for querying relevant documents.

– Liwei Ren & el., US patent 7860853, Document matching engine using asymmetric signature generation.

• DataDNA 2.0– Liwei Ren & el., US patent 8359472, Document fingerprinting with asymmetric

selection of anchor points.

Classification 8/2/2013 20

Copyright 2011 Trend Micro Inc.

My Practices

• Summary:– A document fingerprinting technology was developed based on the

DataDNA 1.0 algorithm

– Provilla raised funding from investors with this technology in early 2005

– We developed a DLP product LeakProof™ with document fingerprinting as our core technology.

– The company Provilla™ was acquired Trend Micro™ in late 2007

• DataDNA technology played an important role when Trend Micro decided to acquire Provilla

Classification 8/2/2013 21

Copyright 2011 Trend Micro Inc.

My Practices

• Differential Compression of Executables (ODC) is a mathematical problem for a FOTA system.

• Differential Compression in general:

Classification 8/2/2013 22

where T and R are general files. T stands for target and R for reference.

Copyright 2011 Trend Micro Inc.

My Practices

• Differential Compression for FOTA:

Classification 8/2/2013 23

Copyright 2011 Trend Micro Inc.

My Practices

• If T and R are executable files, we should have better diff rate than general files according to the information theory.– How can we achieve this?

• Mathematical Modeling:– To optimize the differential compression, one way is to reduce the

differences between two files.

– We need to figure out what the code changes are between two versions of a software executable.

• Primary code change: instructions are altered due to source code changes.

• Secondary code change: an instruction is altered at the byte level due to code change happening at other places.

• We use JUMP as an example to illustrate the concept. An JUMP instruction is a few bytes that encode the distance between the source and destination.

Classification 8/2/2013 24

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Model:– Secondary code change for JUMP

Classification 8/2/2013 25

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Modeling:– Secondary Change Removal for JUMP:

• The secondary code change causes instr1 ≠ instr2

• Given the file R, if we can derive instr2 from instr1, we can replace instr1 in R with instr2.

• For all such instructions in R, we can do the same substitution, we transfer R into another file and denote it as P(R,H(R,T)) where H stands for hints. We have the new formal presentation:

Classification 8/2/2013 26

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Modeling:– Lets start with common code blocks between two versions where we

usually can identify the common blocks across versions.

Classification 8/2/2013 27

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Modeling:– How to derive a new JUMP instruction from an old one?

Classification 8/2/2013 28

Copyright 2011 Trend Micro Inc.

My Practices

• Mathematical Modeling:– How to derive a new JUMP instruction from an old one?

• instr2 = Encode(destAddr2 – srcAddr2)

• destAddr2 – srcAddr2 = (destAddr1 – srcAddr1) + (destAddr2 – srcAddr2 ) -(destAddr1 – srcAddr1) = Decode(instr1) + (destAddr2 – destAddr1) –(srcAddr2 - srcAddr1) = Decode(instr1) + (destBlkAddr2 – destBlkAddr1) –(srcBlkAddr2 - srcBlkAddr1)

– instr2 = Decode(instr1) + (destBlkAddr2 – destBlkAddr1) – (srcBlkAddr2 -srcBlkAddr1)

– We can do the similar to other instructions such as data pointers.

– All these instructions such as JUMP or data pointers are called profitable instructions.

Classification 8/2/2013 29

Copyright 2011 Trend Micro Inc.

My Practices

• A solution is an algorithm to identify all the profitable instructions and remove the code changes accordingly.

• US patent 7089270 provides one of the solutions– Liwei Ren & el., Processing software image for use in generating difference files.

Classification 8/2/2013 30

Copyright 2011 Trend Micro Inc.

My Practices

• Summary:– An optimized differential compression technology was developed based

on the algorithm described in patent 7089270.

– InnoPath™ enhanced its flagship product DeltaUpgrade™ significantly by integrating this advanced technology.

– InnoPath™ won many new customer deals due to its superior technical advantage over its competitors.

– This technology supports 30 millions mobile phones.

Classification 8/2/2013 31

Copyright 2011 Trend Micro Inc.

My Practices

• Other than NDDI and DCE, there are many math problems in practices that I have worked:– Subgraph isomorphism

– Multi-value binary search

– RegEx pattern optimization

– Keyword proximity match

• An interesting extension is the minimal M-color enclosing circle problem.

– Remote differential compression

– Malware clustering and detection

– ……

Classification 8/2/2013 32

Copyright 2011 Trend Micro Inc.

Conclusions

• Mathematics can make differences in industry with mathematical models and algorithms

• Mathematicians can contribute to the society significantly by inventing novel technologies, building useful products and creating job opportunities.

• There has never been a better time to be a mathematician.— James R. Schatz, Chief of Math Research Group, NSA

Classification 8/2/2013 33

Copyright 2011 Trend Micro Inc.

Q&A session

• I hope you enjoy my sharing even though it is a not-quite-academic topic

• Thank you.

• Please do not hesitate to ask if you have questions.

Classification 8/2/2013 34


Recommended