Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | alfonso-fey |
View: | 216 times |
Download: | 0 times |
Updating Computer Updating Computer Science EducationScience Education
Jacques CohenJacques CohenBrandeis UniversityBrandeis UniversityWaltham, MAWaltham, MAUSAUSAJanuary 2007January 2007
TopicsTopics
Preliminary remarksPreliminary remarks Present state of affairs and Present state of affairs and
concernsconcerns Objectives of this talkObjectives of this talk Trends (Trends (hardware, software, networks, hardware, software, networks,
others)others)
Illustrative examplesIllustrative examples SuggestionsSuggestions
Present state of affairs and Present state of affairs and concernsconcerns
Huge increase in PC and internet usage.Huge increase in PC and internet usage.
Decreasing enrollment.Decreasing enrollment.
(USA mainly)(USA mainly)
Possible ReasonsPossible Reasons Previous high school Previous high school
preparation preparation Bubble burst (Bubble burst (2000) + outsourcing2000) + outsourcing Widespread usage of computers Widespread usage of computers
by lay personsby lay persons Interest in interdisciplinary Interest in interdisciplinary
topics (e.g., biology, business, topics (e.g., biology, business, economics)economics)
Public perception about: Public perception about:
What is Computer Science?What is Computer Science?
The Nature of Computer The Nature of Computer ScienceScience
Two main components:Two main components:
Theoretical Theoretical and and ExperimentalExperimental
Mathematics Mathematics and and EngineeringEngineering
What characterizes CS is the notion of What characterizes CS is the notion of Algorithms Algorithms
Emphasis on the Emphasis on the discrete discrete and and logiclogic
An interdisciplinary approach with other An interdisciplinary approach with other sciences may well revive the interest on sciences may well revive the interest on the continuous (or use of qualitative the continuous (or use of qualitative reasoning)reasoning)
Related fieldsRelated fields
Sciences in general Sciences in general (scientific (scientific computing),computing),
Management, Management, Psychology Psychology (human interaction),(human interaction), Business, Business, Communications,Communications, Journalism, Journalism, Arts, etc.Arts, etc.
The role of Computer The role of Computer Science among other Science among other sciencessciences((How we are perceived by the other sciencesHow we are perceived by the other sciences))
In physics, chemistry, biology, In physics, chemistry, biology, naturenature is the ultimate umpire.is the ultimate umpire.
DiscoveryDiscovery is paramount is paramount In math and engineering: In math and engineering: aestheticsaesthetics, ,
ease of use, acceptance, ease of use, acceptance, permanence,permanence, play key roles play key roles
Uneasy dialogue with Uneasy dialogue with biologistsbiologists It is not unusual to hear from a It is not unusual to hear from a
physicist, chemist or biologist:physicist, chemist or biologist:
““If computer scientists do not If computer scientists do not get involved in our field, we will get involved in our field, we will do it ourselves!!”do it ourselves!!”
It looks very likely that the It looks very likely that the biological sciences (including, of biological sciences (including, of course, neuroscience) will course, neuroscience) will dominate the 21st centurydominate the 21st century
Differences in approachesDifferences in approaches
Most scientific and creative discoveries Most scientific and creative discoveries proceed in a proceed in a bottom-upbottom-up manner manner
Computer scientists are taught to Computer scientists are taught to emphasize emphasize top-downtop-down approaches approaches
Polya’s Polya’s ““How to solve it”How to solve it” often mentions often mentions
First specialize then generalizeFirst specialize then generalize..
Hacking is beautiful Hacking is beautiful (mostly bottom-up)(mostly bottom-up)
ObjectivesObjectives
Provide a bird’s eye view of what is Provide a bird’s eye view of what is happening in CS educationhappening in CS education (USA) and (USA) and attempt to make recommendations attempt to make recommendations about possible directions. Hopefully, about possible directions. Hopefully, some of it would be applicable to some of it would be applicable to European universities.European universities.
PremisePremise Changes ought to be gradual and Changes ought to be gradual and
depend on resources and time depend on resources and time constraints constraints
First we have to observe current First we have to observe current trendstrendsGGenerality, Storage, Speed, Networks,enerality, Storage, Speed, Networks, oothers.thers.
Trying to make sense of present Trying to make sense of present directions.directions.
Difficult and risky to foresee future, Difficult and risky to foresee future, e.g., PC (windows, mouse), internet, e.g., PC (windows, mouse), internet, parallelismparallelism
Topics influencing computer Topics influencing computer science education.science education.
Trends in hardware, software, Trends in hardware, software, networks.networks.
Huge volume of data Huge volume of data (terabytes and petabytes(terabytes and petabytes))
Statistical nature of data Statistical nature of data Clustering, classificationClustering, classification Probability and Statistics Probability and Statistics
become increasingly become increasingly importantimportant
Trend towards generalityTrend towards generality
Need to know more about what Need to know more about what is going on in related topicsis going on in related topics
A few examples:A few examples: Robotics and mechanical engineeringRobotics and mechanical engineering Hardware, electrical engineering, Hardware, electrical engineering,
material science, nanotechnologymaterial science, nanotechnology Multi-field visualization Multi-field visualization (e.g., medicine)(e.g., medicine)
Biophysics and bioinformaticsBiophysics and bioinformatics
Nature of data structuresNature of data structures
Sequences (strings), streamsSequences (strings), streams Trees, DAGs, and GraphsTrees, DAGs, and Graphs 3D structures3D structures Emphasis in discrete structuresEmphasis in discrete structures Neglect of the continuous Neglect of the continuous
should be corrected (should be corrected (e.g., use of e.g., use of MatLabMatLab))
Trends on data growthTrends on data growthHow Much Information Is There In the How Much Information Is There In the World?World?
The The 20-terabyte size20-terabyte size of the of the Library of Congress derived by Library of Congress derived by assuming that LC has 20 million assuming that LC has 20 million books and each requires 1 MB. Of books and each requires 1 MB. Of course, LC has much other stuff course, LC has much other stuff besides printed text, and this other besides printed text, and this other stuff would take much more space.stuff would take much more space.
From Lesk From Lesk http://www.lesk.com/mlesk/ksg97/ksg.htmhttp://www.lesk.com/mlesk/ksg97/ksg.htmll
Library of Congress data Library of Congress data (cont)(cont)1. 1. Thirteen million photographsThirteen million photographs, even if , even if
compressed to a 1 MB JPG each, would be compressed to a 1 MB JPG each, would be 13 13 terabytes.terabytes.
2. The 2. The 4 million maps4 million maps in the Geography Division in the Geography Division might scan to might scan to 200 TB200 TB..
3. LC has over 3. LC has over five hundred thousand movies;five hundred thousand movies; at at 1 GB each they would be 1 GB each they would be 500 terabytes500 terabytes (most (most are not full-length color features).are not full-length color features).
4. Bulkiest might be the 4. Bulkiest might be the 3.5 million sound 3.5 million sound recordingsrecordings, which at one audio CD each, , which at one audio CD each, would be almost would be almost 2,000 TB2,000 TB..
This makes the total size of the Library perhaps This makes the total size of the Library perhaps about about 3 petabytes (3,000 terabytes3 petabytes (3,000 terabytes).).
How Much Information Is There In the How Much Information Is There In the World?World?
Lesk’s ConclusionsLesk’s Conclusions
There will be enough disk space There will be enough disk space and tape storage in the world to and tape storage in the world to store everything people store everything people write, write, say, performsay, perform or or photographphotograph.. For For writingwriting this is true already; for the this is true already; for the others it is only a year or two others it is only a year or two away.away.
Lesk’s Conclusions (cont)Lesk’s Conclusions (cont)
The challenge for librarians and The challenge for librarians and computer scientists is to let us computer scientists is to let us find find the informationthe information we want in other we want in other people's work; and the challenge for people's work; and the challenge for the lawyers and economists is the lawyers and economists is to to arrange the payment structuresarrange the payment structures so so
that we are encouraged to use the that we are encouraged to use the work of others rather than re-create work of others rather than re-create it.it.
The huge volume of data The huge volume of data impliesimplies:: LinearityLinearity of algorithms is a of algorithms is a mustmust Emphasis in Emphasis in pattern matchingpattern matching Increased Increased preprocessingpreprocessing Different levels of memory transfer Different levels of memory transfer
rates rates Algorithmic Algorithmic incrementalityincrementality (avoid redoing (avoid redoing
tasks)tasks)
Need of Need of approximateapproximate algorithms algorithms ((optimizationoptimization))
Distributed computingDistributed computing Centralized parallelism Centralized parallelism (Blue Gene, Argonne)(Blue Gene, Argonne)
The importance of pattern The importance of pattern matching (searches) in large matching (searches) in large number of itemsnumber of items
Pattern matching has to be “tolerant” (approximate)Pattern matching has to be “tolerant” (approximate)
Find closest matches (dynamic programming, Find closest matches (dynamic programming, optimization)optimization)
SequencesSequences PicturesPictures 3D structures (e.g. proteins) 3D structures (e.g. proteins) SoundSound PhotosPhotos VideoVideo
Trends in computer cycles Trends in computer cycles (speed)(speed)
Moore’s law appears to be applicable until at Moore’s law appears to be applicable until at least 2020least 2020
Use of supercomputers Use of supercomputers (2006)(2006)
Researchers at Los Alamos National Researchers at Los Alamos National Laboratory have set a new world's Laboratory have set a new world's record by performing the record by performing the first million-first million-atom computer simulation in biologyatom computer simulation in biology. . Using the "Q Machine" supercomputer, Using the "Q Machine" supercomputer, Los Alamos computer scientists have Los Alamos computer scientists have created a molecular simulation of the created a molecular simulation of the cell's protein-making structure, the cell's protein-making structure, the ribosomeribosome. The project, simulating . The project, simulating 2.64 2.64 million atoms in motionmillion atoms in motion, is more than , is more than six times larger than any biological six times larger than any biological simulations performed to date. simulations performed to date.
Graphical visualization of the Graphical visualization of the simulation of a Ribosome at simulation of a Ribosome at workwork
Network transmission Network transmission speed (Lambda Rail Net)speed (Lambda Rail Net) USA backboneUSA backbone
Trends in Transmission SpeedTrends in Transmission Speed
The High Energy Physics The High Energy Physics team's demonstration team's demonstration achieved a peak throughput achieved a peak throughput of of 151 151 GbpsGbps and an official and an official mark of mark of 131.6131.6 Gbps Gbps beating beating their previous mark for peak their previous mark for peak throughput of throughput of 101101 Gbps Gbps by 50 by 50 percent. percent.
Trends in Transmission Trends in Transmission Speed IISpeed II The new record data transfer The new record data transfer
speed is also equivalent to speed is also equivalent to serving 10,000 MPEG2 HDTV serving 10,000 MPEG2 HDTV movies simultaneously in movies simultaneously in real time, or real time, or transmitting all transmitting all of the printed content of the of the printed content of the Library of Congress in 10 Library of Congress in 10 minutes.minutes.
Trend in LanguagesTrend in Languages
Importance of scripting and Importance of scripting and string processingstring processing
XML, Java C++, Trend towards XML, Java C++, Trend towards Python, Matlab, MathematicaPython, Matlab, Mathematica
No ideal languages No ideal languages
No agreement of what the first No agreement of what the first language ought to belanguage ought to be
A recently proposed A recently proposed language (language (Fortress 2006Fortress 2006))
From Guy Steel, The Fortress Programming Language, Sun Micro-From Guy Steel, The Fortress Programming Language, Sun Micro-
SystemsSystemshttp://iic.harvard.edu/documents/steeleLecture2006public.pdfhttp://iic.harvard.edu/documents/steeleLecture2006public.pdf
Fortress Language Fortress Language (Sun, Guy Steele)(Sun, Guy Steele)
Meta-level approach to Meta-level approach to teachingteaching Learn 2 or 3 languages and assume that Learn 2 or 3 languages and assume that
expertise in other languages can be expertise in other languages can be acquired on the fly.acquired on the fly.
Hopefully, the same will occur in learning a Hopefully, the same will occur in learning a topic in depth. Once in-depth research is topic in depth. Once in-depth research is taught using a particular area it can be taught using a particular area it can be extrapolated to other areas.extrapolated to other areas.
Increasing usage of Increasing usage of cannedcanned programs or programs or data banks Typical examples: data banks Typical examples: GraphViz, GraphViz, WordNet WordNet
Trends in Algorithmic Trends in Algorithmic ComplexityComplexity Overcoming the scare of NP Overcoming the scare of NP
problemsproblems
((it happened before with it happened before with undecidabilityundecidability))
3-SAT lessons 3-SAT lessons Mapping polynomial problems Mapping polynomial problems
within NPwithin NP Optimization, approximate or Optimization, approximate or
random algorithmsrandom algorithms
Three ExamplesThree Examples Example IExample I The lessons of BLAST The lessons of BLAST
(preprocessing, incrementability, (preprocessing, incrementability, approximationapproximation))
Example IIExample II The importance of analyzing The importance of analyzing very large networks.very large networks.
(probability, sensors, sociological implications)(probability, sensors, sociological implications)
Example IIIExample III Time Series. Time Series. (data mining, pattern searches, classification)(data mining, pattern searches, classification)
Example IExample I (History of BLAST)(History of BLAST)sequence alignmentsequence alignment
Biologists matched sequences of Biologists matched sequences of nucleotides or aminoacids nucleotides or aminoacids empirically using Dot Matrices empirically using Dot Matrices
Dot matricesDot matrices
No exact matchingNo exact matching
Alignment with GapsAlignment with Gaps
Dynamic Programming Dynamic Programming ApproachApproach
Dynamic Programming Dynamic Programming complexity O(ncomplexity O(n22))
Two solutions with gapsTwo solutions with gapsComplexity can be exponential Complexity can be exponential for determining all solutionsfor determining all solutions
The BLAST approachThe BLAST approachcomplexity is almost complexity is almost
linearlinearEquivalent Dot Matrices would have Equivalent Dot Matrices would have
the size the size 3 billion columns3 billion columns ((human genomehuman genome) )
andand
Z rowsZ rows where Z is the size of where Z is the size of the sequence being matched the sequence being matched against a genome (against a genome (possibly tens of possibly tens of thousandsthousands))
BLAST TricksBLAST Tricks
PreprocessingPreprocessing Compile the locations in a genome Compile the locations in a genome
containing all possible “seeds” containing all possible “seeds” (combinations of 6 nucleotides or (combinations of 6 nucleotides or aminoacids) aminoacids)
Hacking Hacking Follow diagonals as much as possible Follow diagonals as much as possible
(Blast strategy)(Blast strategy) Use dynamic programming as a last Use dynamic programming as a last
resortresort
Lots of approximations but a Lots of approximations but a very successful outcomevery successful outcome
No multiple solutionsNo multiple solutions BLAST may not find best matchesBLAST may not find best matches The notion of The notion of p-valuesp-values becomes very becomes very
important (probability of matches in important (probability of matches in random sequences)random sequences)
Tuning of the BLAST algorithm Tuning of the BLAST algorithm parametersparameters
Mixture of Mixture of hackinghacking and and theorytheory Advantage: satisfies Advantage: satisfies incrementabilityincrementability
Example II Example II (Networks and Sociology)(Networks and Sociology)
Money travels (bills)Money travels (bills)
Probabilities Probabilities P(time,distance)P(time,distance)
Money travelsMoney travels
The entire process could be The entire process could be implemented using sensors.implemented using sensors.
Mimics spread of disease.Mimics spread of disease. The impact of computing will The impact of computing will
go deeper into the sciences go deeper into the sciences and spread more into the and spread more into the social sciences (Jon Kleinberg, social sciences (Jon Kleinberg, 2006)2006)
Example III (Time Series)Example III (Time Series)
Illustrates data mining and Illustrates data mining and how much CS can help other how much CS can help other sciencessciences
Slides from Slides from
Dr Eamonn KeoghDr Eamonn Keogh
University of California. University of California. Riverside,CARiverside,CA
Examples of time Examples of time seriesseries
Time Series (cont 1)Time Series (cont 1)
Time Series (cont 2)Time Series (cont 2)
Time Series (cont 3)Time Series (cont 3)
Time Series (cont 4)Time Series (cont 4)
Time Series (cont 5)Time Series (cont 5)
Using Logic Programming inUsing Logic Programming in Multivariate Time Series (Sleep Multivariate Time Series (Sleep
Apnea)Apnea) from from G GuimarG Guimarãães and L. Moniz Pereiraes and L. Moniz Pereira
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
04:0
0:00
04:0
0:05
04:0
0:10
04:0
0:14
04:0
0:19
04:0
0:24
04:0
0:28
04:0
0:33
04:0
0:38
04:0
0:43
04:0
0:48
04:0
0:52
04:0
0:58
04:0
1:02
04:0
1:07
04:0
1:12
04:0
1:16
04:0
1:21
04:0
1:26
04:0
1:31
04:0
1:36
04:0
1:40
04:0
1:45
04:0
1:50
04:0
1:55
04:0
2:00
04:0
2:04
04:0
2:09
04:0
2:14
04:0
2:19
04:0
2:24
04:0
2:28
04:0
2:33
04:0
2:38
04:0
2:43
04:0
2:48
04:0
2:53
04:0
2:58
04:0
3:02
04:0
3:07
04:0
3:12
04:0
3:16
04:0
3:21
04:0
3:26
04:0
3:31
04:0
3:36
04:0
3:40
04:0
3:46
04:0
3:50
04:0
3:55
04:0
4:00
Event2
Event3
Event5
Event Tace t
No ribcage and abdomina lmovements without snoringS trong ribcage and abdomina l movementsReduced ribcage and abdomina lmovements without snoringTace t
No a irflow without snoring
S trong a irflow with snoring
Tace t
Airflow
Ribcage movements
Abdomina l movements
S noring
Back to curricula Back to curricula recommendationsrecommendations
Present status (USA) Present status (USA) and suggested and suggested changes changes
Current recommended Current recommended curricula curricula ACM, SIGCSE 2001 (USA) ACM, SIGCSE 2001 (USA)
1. Discrete Structures (43 core hours)1. Discrete Structures (43 core hours) 2. Programming Fundamentals (54 core hours)2. Programming Fundamentals (54 core hours) 3. Algorithms and Complexity (31 core hours)3. Algorithms and Complexity (31 core hours) 4. Programming Languages (6 core hours)4. Programming Languages (6 core hours) 5. Architecture and Organization (36 core hours)5. Architecture and Organization (36 core hours) 6. Operating Systems (18 core hours)6. Operating Systems (18 core hours) 7. Net-Centric Computing (15 core hours)7. Net-Centric Computing (15 core hours) 8. Human-Computer Interaction (6 core hours) 8. Human-Computer Interaction (6 core hours) 9. Graphics and Visual Computing (5 core hours)9. Graphics and Visual Computing (5 core hours) 10. Intelligent Systems (10 core hours)10. Intelligent Systems (10 core hours) 11. Information Management (10 core hours)11. Information Management (10 core hours) 12. Software Engineering (30 core hours)12. Software Engineering (30 core hours) 13. Social and Professional Issues (16 core hours)13. Social and Professional Issues (16 core hours) 14. Computational Science (no core hours)14. Computational Science (no core hours)From From Domik G.: Glimpses into the Future of Computer Science Domik G.: Glimpses into the Future of Computer Science
Education University of Paderhor, GermanyEducation University of Paderhor, Germany
Changing CurriculaChanging Curricula
Two extremes Two extremes
Increased GeneralityIncreased Generality and and Limited DepthLimited Depth
Limited GeneralityLimited Generality and and Increased Increased DepthDepth
The two extremes in graphical The two extremes in graphical formform
Breadth
(generality)
D
Depth
The MIT pilot program for The MIT pilot program for freshmenfreshmen At MIT there is a unified EECS At MIT there is a unified EECS
departmentdepartment
Two choices for the first year course:Two choices for the first year course: Robotics using probabilistic Robotics using probabilistic
Bayesian approachesBayesian approaches (CS) (CS)
Study of cell phones inside outStudy of cell phones inside out (EE) (EE)
Concrete suggestions IConcrete suggestions I
Teaching is inextricably linked to researchTeaching is inextricably linked to research.. TimeTime and and resourcesresources govern curriculum changes. govern curriculum changes. GradualGradual changes are essential.changes are essential. Avoid overlapAvoid overlap of material among different of material among different
required courses.required courses. If possible introduce an elective course onIf possible introduce an elective course on
Current trends in computer science.Current trends in computer science. Deal with Deal with massive datamassive data even in intro courses. even in intro courses.
Concrete suggestions IIConcrete suggestions II
When teaching algorithms stress When teaching algorithms stress the potential of: the potential of:
Preprocessing Preprocessing Incrementality Incrementality Parallelization Parallelization ApproximationsApproximations Taking advantage of Taking advantage of
sparsenesssparseness
Concrete suggestions IIIConcrete suggestions III
Emphasize probability and Emphasize probability and statistics statistics
Bayesian approachesBayesian approaches Hidden Markov ModelsHidden Markov Models Random algorithms Random algorithms Clustering and classificationClustering and classification Machine learning and Data Machine learning and Data
MiningMining
Finally, …Finally, …
Encourage Encourage interdisciplinary work.interdisciplinary work.
It will inspire new directions It will inspire new directions in computer science.in computer science.
Thank you!!Thank you!!
Future of Computer Intensive Future of Computer Intensive Science in the U.S. Science in the U.S. (Daniel Reed 2006)(Daniel Reed 2006)
Ten years – a geological epoch on the computing time scale. Ten years – a geological epoch on the computing time scale. Looking back, a decade brought the web and Looking back, a decade brought the web and consumer email, consumer email, digital cameras and music, broadband networking, multifunction digital cameras and music, broadband networking, multifunction cell phones, WiFi, HDTV, telematics, multiplayer games, cell phones, WiFi, HDTV, telematics, multiplayer games, electronic commerce and computational scienceelectronic commerce and computational science. .
It also brought It also brought spam, phishing, identity theft, software insecurity, spam, phishing, identity theft, software insecurity, outsourcing and globalization, information warfare and blurred outsourcing and globalization, information warfare and blurred work-life boundarieswork-life boundaries. What will a decade of technology advances . What will a decade of technology advances bring in communications and collaboration, sensors and bring in communications and collaboration, sensors and knowledge management, modeling and discovery, electronic knowledge management, modeling and discovery, electronic commerce and digital entertainment, critical infrastructure commerce and digital entertainment, critical infrastructure management and security? management and security?
What will it mean for research and education?What will it mean for research and education?
Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Eminent Professor and Vice-Chancellor for Information Technology at the University of North Eminent Professor and Vice-Chancellor for Information Technology at the University of North Carolina at Chapel Hill.Carolina at Chapel Hill.
Cyberinfrastructure and Economic Cyberinfrastructure and Economic Curvature Creating Curvature in a Curvature Creating Curvature in a Flat World Flat World (Singtae Kim, Purdue, 2006)(Singtae Kim, Purdue, 2006)
Cyberinfrastructure is central to Cyberinfrastructure is central to scientificscientific advancement in advancement in the modern, data-intensive research environment. For the modern, data-intensive research environment. For example, the recent revolution in the life sciences, including example, the recent revolution in the life sciences, including the seminal achievement of sequencing the human genome the seminal achievement of sequencing the human genome on an accelerated time frame, was made possible by parallel on an accelerated time frame, was made possible by parallel advances in cyberinfrastructure for research in this data-advances in cyberinfrastructure for research in this data-intensive field. intensive field.
But beyond the enablement of basic research, But beyond the enablement of basic research, cyberinfrastructure is a driver for global economic growth cyberinfrastructure is a driver for global economic growth despite the disruptive 'flattening' effect of IT in the despite the disruptive 'flattening' effect of IT in the developed economies. But even at the regional level, developed economies. But even at the regional level, visionary cyber investments to create smart infrastructures visionary cyber investments to create smart infrastructures will induce 'economic curvature' a gravitational pull to will induce 'economic curvature' a gravitational pull to overcome the dispersive effects of the 'flat' world and the overcome the dispersive effects of the 'flat' world and the consequential acceleration in economic growth.consequential acceleration in economic growth.
Miscellaneous IMiscellaneous I
ClaytronicsClaytronics Game theory (economics - psychology)Game theory (economics - psychology) Other examples in bioinformatics Other examples in bioinformatics Beautiful interaction between sequence Beautiful interaction between sequence
(strings) and structures(strings) and structures Reverse engineeringReverse engineering In biology Geography and Phenotype In biology Geography and Phenotype
(external structural appearance) are of (external structural appearance) are of paramount importanceparamount importance
Systems Biology Systems Biology
Miscellaneous IIMiscellaneous II
Cross word puzzle using GoogleCross word puzzle using Google Skiena and statistical NLPSkiena and statistical NLP