    Multimedia TechnologyLecture 1: Overview and Arrangement

    Lecturer: Dr.Wan-Lei Zhao

    Autumn Semester 2015

    About this Course


    1 About this Course

    2 Syllabus

    3 Course plan

    4 Brief History about IR and Web

    5 Brief History about WWW

    About this Course

    Major subjects Deal with information such as text, image and video

    Text retrieval, content-based image retrieval and video retrieval

    Focus on how to retrieve above mentioned information Popular machine learning approaches will be covered

    K-means, SVM and decision tree Popular model fitting approaches will be covered

    RANSAC and Hough transform

    Popular algorithms in computer vision will be covered

    SIFT, BoVW and Hamming Embedding Objectives

    Bring you into this interesting topic Get you familiar with basic & popular algorithms in this field Able to build a simple but workable search engine on your own Able to apply algorithms to solve the problems in your field

    Text Retrieval (42 hours) Brief History about IR and Web Pre-processing on Text Information Three Retrieval Models

    Boolean, vector and probability models

    Evaluation Measure Web Search Parallel Computing in IR

    Machine Learning Approaches (22 hours)

    K-means Spectral clustering Decision Tree K-Nearest Neighbour

    Support Vector Machine (SVM) Nearest Neighbour Search (12 hours)

    R-Tree KD-Tree

    Locality Sensitive Hashing Product Quantizer

    Model Fitting RANSAC Hough Transform

    Image & Video Retrieval (22 hours) Challenges & Trends

    Image Features: SIFT and et al. BoVW Framework Fisher Kernel Framework Challenges in Video Retrieval

    Temporal Verification Approach Image Classification and MISC (12 hours)

    Challenges & Trends One-against-all Framework Tricks in model training Convolutional Neural Network

    Course work in the lab (32 hours) Three experiments Subjects that you learn in the class Keep secret until the lab time Each time, it is also aquiz 10 marksfor each experiment NOteam work!!! Late submission is allowed, but with 30% discount

    Presentation of the course project (22 hours)

    Two course projects Implement after class Team work is encouraged, butsize(team)4 15 minutes for each team to present their project

    A hardcopy of the project report is also required

    Prerequisites of this course

    Data Structure You have to be familiar with it Otherwise, you are not suggested to take this course

    Good at C/C++ It will be used in the lab It is recommended for your course project

    Basic knowledge about Internet Internet protocols Mechanism of WWW HTML and Javascript

    Matlab is a plus It will be used in the lab Even you do not know, it does not matter You will learn its basics during this course

    Teaching assistant for this course

    Mr. Zhihui Chen will be in charge of the course project related issues

    Miss Haihui Liu helps to do proofreading on the course materials

    Experiment lectures are held in Labotrary building, Room 501

    Time slot: 2:30pm -4:20pm, in the 6th, 8th and 10th weeks I will remind you one week ahead

    Course website

    Platform of online teaching in XMU URL: l.xmu.edu.cn, please go to there and register the course Password: 007

    Language in the Class

    English or Chinese?

    You might be uncomfortable at

    the beginning Me too:)

    Several advantages: Computer science is defined in

    English Get you guys used to English


    Intersection of four disciplines



    Related (top-ranked) Journals: IEEE Trans. on Knowledge and Data Engineering IEEE Trans. on Pattern Analysis and Machine Intelligence International Journal of Computer Vision IEEE Trans. on Multimedia IEEE Trans. on Image Processing Computer Vision and Image Understanding

    Reference Books R. Baeza-Yates and et al., Modern Information Retrieval: The

    Concepts and Technology behind Search (2nd edition) Richard Szeliski, Computer Vision: Algorithms and Applications Lecture notes of Machine Learning by Dr. Andrew Ng, from

    Stanford University

    Related papers will be suggested to read as assignment Online Resources:

    Youku Wikipedia Baidu Baike


    Question: can our brain understand how our brain works? We are going to have a taste that how tough this question is from

    two aspects

    1 Computer Vision2 Machine Learning


    Course plan

    Evaluation: 3 lab experiments + 2 course projectsS= 30% + 35% + 35%

    About course projects Implemented in C, C++/Python, Matlab If you do not know Python or Matlab, learn it!!

    Sample codes will be given, you only need to fill blanks Team work is encouraged for the two course projects Team leader will be marked 5 credits higher or lower depending on the


    Report (only the second one) and presentation (both) are required (inEnglish if possible)

    Failure is acceptable but nocheatingorplagiarism

    If it happens, you are OUT!! Any questions?


    Course plan

    Be an Active Learner

    Level 1 Catch the concept

    Level 2

    Understand the idea Know how to use it

    Level 3 Able to re-implement the algorithms Knows where it works Knows where it fails


    Brief History about IR and Web

    1 About this Course

    2 Syllabus


    Course plan

    4 Brief History about IR and Web

    5 Brief History about WWW


    Brief History about IR and Web

    Human Languages (1)

    7,000 languages in the world

    90% of these languages are used by less than 100,000 people

    Based on your knowledge and imagination Please list out top-5 most popularly used languages

    Give the rank also, do it now ...


    Brief History about IR and Web

    Human Languages (1)

    7,000 languages in the world 90% of these languages are used by less than 100,000 people

    Language Population Category Region

    Mandarin 1.2 billion isolating language China

    English 508 million reflecting language UK, North America

    Hindi 497 million reflecting language India & Pakistan

    Spanish 392 million reflecting language Span & South AmericaRussian 277 million reflecting language Russia & East Europe

    Mainly talk about retrieval on English documents Mention a little about processing on Chinese documents


    Brief History about IR and Web

    Human Languages (2)

    Figure : Weights of real impact to the world.

    In terms of real influence, the rank changes1

    Influence: economically, politically, size of population and number ofcountries

    Conducted by Webb.

    Brief History about IR and Web

    Distribution of World Languages

    Pay attention that not all the languages have their written forms


    Brief History about IR and Web

    Evolution of Storage Media

    Egyptian papyrus2 Babylonian clay tablet (3000 B.C.) Chinese Oracle (1400 B.C.)

    In 105 A.D., paper was invented in China

    It is not paper in real sense.

    Brief History about IR and Web

    Story of Rosetta Stone

    Written in both acient Egyptian and Greek, discovered in 1799

    in 196 BC on behalf of King Ptolemy V.

    Key to understanding of acient Egyptian J.-F. Champollion decoded the language


    Brief History about IR and Web

    library comes from Latin word liber, means book

    bibliothek comes from Greek word biblion, means book writtenon papyrus


    Brief History about IR and Web

    Spread of ancient civilizations

    Five ancient civilizations: ancient Egypt, ancient Babylion, ancientIndia, ancient China, ancient Maya


    Brief History about IR and Web

    The first library (as far as we know) was established in north Syria,around 3000 BC

    Later, Empire Assyria built Library Nineveh (current Mosul) in 612BC

    Best well-known library was built by Alexander the Great about 350

    BC in Egypt

    In China, library appeared around 800 BC


    Brief History about IR and Web

    Evolution of Storage Media

    After the advent of computer


    Brief History about IR and Web

    IR in two different eras

    before WWW WWW era

    Media text document, TV, film & CD in electronic forms

    Publishing months or years hoursStorage books & papers disc, DVD and etc & web

    Indexing title, author, keywords and date and contents

    Interface library browser

    According to IBM, 90% of the knowledge in the world are created inlast two years

    Powerful IR system is required to coordinate the distribution ofinformation/knowledge


    Brief History about WWW

    The Birth of WWW

    1981-1991: the invention of the Web In 1980, Tim Berners-Lee worked in CERN (European Organization for

    Nuclear Research) Manage information for physicists such that they can share In 1984, he returned to CERN In 1989, he wrote a proposal about large hypertext database By Christmas 1990, he built all necessary elements for web HTTP, HTML, web browser and httpd


    Brief History about WWW

    The growth of World Wide Web

    Early times of growth (1991-1995) Microsoft has its first browser: Cello Mosaic (from UIUC) is the first successful browser W3C was founded by Berners-Lee in 1994 at MIT

    Commercialize (1996-1998) More and more dot-coms appeared

    Boom and Bust (1999-2001)

    More and more dot-coms appeared Internet becomes popular in China Many currently well-known companies were established: Baidu,Alibaba Search Engines were born


    Brief History about WWW

    The growth of World Wide Web

    Early times of growth (1991-2001) First version of Java was released in 1995 First version of PHP was released in 1995 JavaScript was invented by Netscape in 1995 Static web to dynamic web Strong support for multimedia


    Brief History about WWW

    WWW is everywhere

    Ubiquitous web (2002-present) Introduction of Web 2.0 is the milestone Wikipedia was born in 2001 Flickr was born in 2004 Facebook was born in 2004 Youtube was born in 2006 Twitter was born in 2006 Smartphone was released in 2007

    All technologies and media are intertwined to reshape the world

    Impact on our daily life of many aspects

    IR becomes the main interface to them all


    Brief History about WWW

    Semantic Web

    Web 3.0 (20??) Proposed by Berners-Lee3

    Websites are linked by semantic meta data Machine builds the link automatically Requires technology of natural language understanding Still a vague concept

    Automatic documenting, e.g. books and recipes

    All rights are reserved by Wan-Lei zhao

    Statistics on WWW




    2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013



    Num. of websites and users (2000-1013)

    Num. of sitesNum. of users

    The growth rate of user is much higher than that of websites The growth rate of clicks would be even much higher


    Brief History about WWW

    Challenges in Modern Information Retrieval

    How to bridge such a semantic gap

    A word is worth a thousand pictures

    A picture is worth a thousand of words


    Brief History about WWW

    Scalability in the age ofBIG data (1)

    A glance at big data today 1.1billion websites until Nov. 2014 >3,000images uploaded to Flickr in every minute4

    >200,000videos uploaded per day to YouTube (>1,000years) TV News: thousands hours of programs broadcasted each day >100 billion photos in Facebook till Jun. 2011

    Challenges: facilitate fast browsing and sharing How to store? How to organize? How to retrieve?

    Statistics was collected on Apr. 28th 2010.

    Brief History about WWW

    Scalability in the age ofBIG data (2)

    Given the thickness of one photo: 0.2 mm 36/42All rights are reserved by Wan-Lei zhao

    Top Rank Search Engines

    Top Rank Search Engines

    Google takes lions share of the market

    Baidu is not in the rank (unfortunately)5

    Cited from: http://www.ebizmba.com/articles/search-engines

    Brief History about WWW

    Sketch the framework of a search engine

    Draw a framework about a search engine in 5 minutes

    Put all elements you could figure out, do it now ...


    Brief History about WWW

    Framework of a search engine

    Observations Information are highly distributed in Internet The indexer (search engine) keeps information in a centralized manner


    Brief History about WWW

    Structure of a crawler


    Crawler plays very important role Experiences of using Baidu and Google


    Q & A


    Thanks for your attention!


