1
Introduction to Computer Science
Jiaheng LuDepartment of Computer Science
Renmin University of Chinawww.jiahenglu.net
2
Today’s Class
What is Computer Science?Subfields of Computer Science
3
What is Computer Science?
Bierman: Computer science is the study of algorithms
how to conceive them and write them down, programming-in-the-small vs. programming-in-the-largehow to execute them (why does a machine act the way it does, what are limitations, what improvements are possible)
4
What is Computer Science? (v. 2)Brookshear: "Computer Science is the discipline that seeks to build a scientific foundation for such topics as computer design, computer programming, information processing, algorithmic solutions of problems, and the algorithmic process itself."
Most fundamental concept of CS is an algorithm: a set of steps that defines how a task is performedAn algorithm is instantiated in a program and then executed on a machine
5
Brookshear's Diagram
Algorithm
Limitations of Execution of
Communication ofAnalysis of
Discovery of Representation of
theory of computation,… architecture, operating systems,networks,…
software engineering,…algorithmics,…
artificial intelligence,… data structures, programminglanguage design,…
6
What is Computer Science? (v. 3)
Schneider and Gersting start with what computer science is not:
1. Computer science is not the study of computers. Fellows and Parberry: "Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes, or chemistry is about beakers and test tubes. Science is not about tools. It is about how we use them, and what we find out when we do."
7
What is Computer Science? (v. 3)
2. Computer science is not the study of how to write computer programs.
Programming is a very important tool for studying new ideas and building and testing new solutions.
A program is a means to an end (solving some problem), not the end in itself.
8
What is Computer Science? (v. 3)
3. Computer science is not the study of the uses and applications of computers and software.
Schneider and Gersting: "Learning to use a software package is no more a part of computer science than driver's education is a branch of automotive engineering."
Computer scientist works on specifying, designing, building, and testing software for others to use.
9
What is Computer Science? (v. 3)
Schneider and Gersting: Computer science is "the study of algorithms, includingtheir formal and mathematical properties
1. their hardware realizations2. their linguistic realizations3. their applications"
10
Schneider & Gersting's Diagram
Algorithmic Foundations of CS
The Hardware World
The Virtual Machine
The Software World
Applications
Social Issues
design & analysisof algorithms,…
computerorganization,…
assemblers,operating systems…
programming langs,compilers,…
artificial intelligence,…
11
What is Computer Science? (v. 4)C.A.R. Hoare: the central core of computer science is "the art of designing efficient and elegant methods of getting a computer to solve problems"D. Reed: Identifies 3 main themes:
hardware: circuit design, chip manufacturing, systems architects, parallel processingsoftware: systems software (e.g., operating systems), development software (e.g., compilers), applications software (e.g., web browsers)theory: understand inherent capabilities and limitations of different models of computation (for instance, proving that certain problems CANNOT be solved algorithmically)
12
Subfields of Computer Science
Algorithms and Data StructuresArchitectureOperating Systems and NetworksSoftware EngineeringArtificial Intelligence and RoboticsBioinformatics
Programming LanguagesDatabases and Information RetrievalGraphicsHuman-Computer InteractionComputational ScienceOrganizational Informatics
13
Research in Theoretical Computer Science
14
OverviewPart I: Introduction to Theory of Computation.
Part II: Perspective on (immediate) relevance.
Part III: A current research direction.Introverted AlgorithmsCommunication with errors: Meaning of bits
15
Part I: Introduction to Theory of CS
16
Theory of ComputingMathematical study of Computation and its consequences.Computation: Sequence of simple steps, leading to complex change in information.Measures: Efficiency of algorithm/program:
Depends on hardware and implementation.Can ask how it scales?
If I double the hardware capacity (speed/memory)Will this increase the biggest size of problem I can solve by constant factor? (polynomial solution)Or by additive constant? (exponential solution)
17
Theory of ComputingMathematical study of Computation and its consequences.Computation: Sequence of simple steps, leading to complex change in information.Issues:
Algorithms: Design efficient sequence of steps that produce a desired effect. What is efficient?Complexity: When is inefficiency inherent? Implications: What effect does (in)efficiency have on human (intelligent) interaction?
Surprisingly broad in scope and impact.
18
Example: Integer ArithmeticAddition:
Multiplication:
Factoring:
2 3 1 5 6 7+ 5 8 9 1 4
18 14 8 10 4 8 19 0 4 8 12 9 0 4 8 1
Linear!
19
1 3 6 4 2 5 3 8 2 3 8
Example: Integer ArithmeticAddition: Linear!
Multiplication:
2 3 1 5 6 7x 5 8 9 1 4
9 2 6 2 6 82 3 1 5 6 7
2 0 8 4 1 0 31 8 5 2 5 3 6
1 1 5 7 8 3 5
Quadratic! Fastest? Not Linear?
20
Addition: Linear!
Multiplication: Quadratic! Fastest? Not-linear
Factoring? Write 13642538238 as product of two integers (each less than 1000000)
Inverse of above problem. Not known to be linear/quadratic/cubic.Believed to require exponential time.
Example: Integer Arithmetic
21
Algorithms: Given a task (e.g., multiplication) find fast algorithms.
First algorithm we think of may not be fastest.Complexity: Prove lower bounds on resources required to solve problem.
Is multiplication harder than addition?Is factoring harder than multiplication?
Implications: Cryptography …Economics: Markets implement efficient computation.Biology: Nature implements efficient computation.Networks: Errors implement efficient computation.
Fundamental quests of CS Theory
22
Long-range questionsIs “P=NP?”
Formally, Is all computation reversible? (e.g., multiplication vs. factoring?)Philosophically, can every designer (mathematician, physicist, engineer, biologist) be replaced by a computer?
- (Most of us don’t expect this).- Can we factor integers efficiently?
- (Hopefully, still no).- If not, can we build secure communication based
on this? - Led to RSA. Still many challenges today.
23
Modern addenda to long-term quests
Is the universe random?Maybe … if so:
Can build efficient algorithms this way (modern examples due to Karger, Rubinfeld, Indyk, Kelner)Can synchronize distributed systems (essential, as shown by Lynch et al.)Can generate and preserve secrets (essential, as shown by Goldwasser and Micali).
Maybe not … if soMight still look random to us, because P ≠ NP. (Long history … Blum, Micali, Yao)
Is the universe quantum? Factoring easy
24
Current quests in computationAlgorithms for Massive data sets
How can we leverage the computational power of a laptop, to understand data such as the WWW Main issue: Massive data – won’t fit in our storage.Factors in our favor:
We can perform random samplingWe don’t have to deliver “guaranteed answers”
Many Results [Karger, Vempala, Rubinfeld, Indyk]Can tell if there’s a “trend change” [Rubinfeld et al.]Can tell if a signal has high-intensity in some frequency. [Indyk et al.]
Underlying emphasis on Randomness.
25
Part II: Perspective of theory
26
History of theoretical CS1930s: Turing – invented Turing machine.
Universality: One machine implements all algorithms.Why? To model thought/reasoning/logic
theorems and proofsBecame foundation of modern computers (von Neumann)
1960s: Non-trivial algorithms:Peterson – BCH decoderCooley-Tukey – FFTDijkstra – shortest paths
1970s: NP-completeness, Cryptography, RSA.1990s: Internet algorithms (Yahoo!, Akamai, Google).
27
Theory vs. PracticeTheoretical Perspective
Focus on Long-term time horizon; not very close attention to current nature of:
HardwareDomain-specific informationSolution feasibility
Why should you care (today?)Lessons learned from past are useful (theories more important than theorems).Good insight into problems of the future.Occasionally … solutions useful today!
28
Database Research
29
OutlineFive challenges on database research
Structured and unstructured dataDeclarative programmingDatabase engine revisiting Cloud data managementMobile application
Our research to meet those challenges
数据库的挑战:
Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus.
Laguna Beach, Calif. in 1989 Palo Alto, Calif. (“Lagunita”) in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass . In 2003
Claremont Meeting
About 20 Database researchers
Claremont Resort, Berkeley, CA May 29-30, 2008
32
The interplay of structured and unstructured data(1)
Witnessing a growing amount of structured data
Millions of database hidden (Deep Web)Millions of HTML tables and MashupsWeb 2.0 Service photo video websites
33
The interplay of structured and unstructured data(2)
Research challenge:Extract structured meaning for unstructured data (IR, ML)Querying and deriving insight from heterogeneous data
Keyword queriesPay-as-you-go fashion
34
XML search (1)XML twig query processing (SIGMOD’05, VLDB’05)
Problem StatementGiven an XML twig pattern Q, and an XML database D, weneed to find ALL the matches of Q on D.
An XML tree:s1
s2
f1
p1
t1
t2
Section
Title Figure
Twig pattern: Query answers:
(s1, t1, f1) (s2, t2, f1) (s1, t2, f1)
35
XML search (2)XML keyword search (ICDE’09)
Problem StatementHow to efficiently rank the results of XML keyword query
Contribution:Extend TF/IDF by incorporating the structure of XML data
36
Approximate string search Approximate string queries (ICDE’08,09)
Problem StatementGiven a collection of string data, how to efficiently perform approximate search
…Schwarzenger
Samuel Jackson
Keanu ReevesStar
Search
Output: strings s that satisfy Sim(q,s)≤δ
Schwarrzenger
37
Revisiting database engines
Research topicsRemote RAM and flash as persistent mediaTreat query optimization and physical data a a unified, adaptive, self-tuning taskCompressing and encrypting data with query optimizationDesigning systems that embrace non-relational data models
38
Declarative programming for Emerging platforms (1)
Data-centric approach for emerging platformsManycore chipsDistributed servicesCloud computing platforms…..
39
Declarative programming for Emerging platforms (2)
Good examplesMap-reduce:
data-parallelism
Ruby, Railsquery-like logic
XQuery
40
Cloud data management (1)
Cloud service: shared commodity hardware for computing and storage
Application service (salesforce.com)Storage service (Amazon Web service)Computing service (Google App Engine)Data service (Microsoft SQLServer data center)
41
Cloud data management (2)
Research challengeSelf-management database: limited human invention, various workloadsLarge scale query processing and optimizationData security and privacy with sharing
42
Cloud data management
43
Research topics about cloud data(1)Self management and self tuning
Query optimization on thousands of nodes
44
Research topics about cloud data(2)Source scheduling
Investigate the way the scheduling algorithm is currently implemented.
Multi-Tenant-Efficient
45
Mobile applications
“On the go” interaction
Location based service
46
Thank you
Q & A
Download slides: www.jiahenglu.net