Date post: | 22-Jan-2018 |
Category: |
Education |
Upload: | william-hall |
View: | 174 times |
Download: | 2 times |
Session 11: Episode 3(3) —
Birth & explosion of the World Wide Web
William P. Hall President Kororoit Institute Proponents and Supporters Assoc., Inc. - http://kororoit.org [email protected] http://www.orgs-evolution-knowledge.net
Access my research papers from Google Citations
Tonight
From the point of view of information science, the last session considered how the growth of knowledge overwhelmed paper-based libraries and how computers changed scholars’ personal access to published knowledge
Tonight we begin to explore how the Internet and World Wide Web grew from research into broad-scale communications networks into the technology that now gives billions of people nearly universal access to the bulk of externally preserved knowledge in the world.
2
Episode 3(3) – Birth & explosion of the World Wide Web The World Wide Web
Web Origins and History Vannevar Bush’s Memex Tim Berners–Lee Invents the World Wide Web Basic Web Tools The Web Explodes How Much Knowledge Does the Internet Access?
Was the communications infrastructure of the Internet invented to retain command &
control after a nuclear war?
— Hardware, standards, applications
(glossed over in the book)
Some think DARPA invented the internet to help command and control survive a nuclear first strike
ARPA/DARPA (Defense Advanced Research Projects Agency) – Established 1958 to formulate and execute research and
development projects to expand the frontiers of technology and science.
Packet-switching vs direct point-to-point networking – Data streams cut into standard sized blocks wrapped in header information
used by interface message processors (routers) to direct the contents to a particular destination
– One sending device can direct packets to many different destinations & vice versa
– Video: Computing Conversations: Vint Cerf on the History of Packets
– 1968-9 research project to develop packet-switching interfaces between different ARPA labs so computer resources could be shared
– Packet switching offered a solution for slow & unreliable connections Needed to cope with multiple paths
Packets arriving out of order
Lost packets
Duplicated packets (i.e., same packet received via different routes) 4
Growth in technology and interconnections
First ARPANET message sent 1969, reached East Coast 1970 1972-1982 Gov’t funded research & infrastructure
– Backbone interconnecting universities & research labs – Standards for exchanging text & digital data
1971 FTP (File Transfer Protocol) with many improvements over time 1973 Email (based on store & forward technologies)
1981 National Science Foundation (NSF) funded the Computer Science Network (CSNET). Connected additional CS depts.
1982 Internet Protocol Suite standard (TCP/IP) – End-to-end connectivity specifying how data should be packetized,
addressed, transmitted, routed and received at the destination. – Transmission Control Protocol (TCP) controls assembly & disassembly
of packets for network transmission – Internet Protocol (IP) controls addressing
Video: Vint Cerf TCP/IP 40th Anniversary Event (11:48)
– Made it possible to inter-connect networks = “Internet” Video: How did 'internetworking' become THE INTERNET? (with Vint
Cerf) 5
Exponential growth of host numbers largely driven by data & knowledge sharing (email & file sharing)
Hypertext adds cognitive links / relationships to the Internet – Includes a variety of
knowledge objects in the cognitive structure of a document
– Content begins life of its own 6
Internet Protocol (TCP/IP) introduced 1982
ARPANET
Hypertext Transfer Protocol (HTTP) Hypertext Markup Language (HTML)
World Wide Web
Web browsers: Mosaic & Netscape
Partial map of the Internet on the January 15, 2005
See also Lumeta map (2006); http://internet-map.net/
Web applications
— What is the infrastructure
good for?
Key ideas: Vannevar Bush’s Memex
Vannevar Bush – engineer – WWII Headed U.S. Office of Sci Res & Dev’t (OSRD) – initiation and early administration of the Manhattan Project – 1945 Atlantic Monthly article “As we may think”
Memex – see Life Magazine take on it – Bush developed concept in 1930’s – Based on storing, indexing, & retrieving
microfilm images – Based on indexing textual/visual object
to one-another as the knowledge worker developed concepts
– Applied concept of “associative memory” to understand relationships of content objects (mapping of memory of an object against other objects)
– Also included ability to annotate all relationships or links
Basis for hypertext/hypermedia concept developed by Ted Nelson & Doug Engelbart 8
Invention of the World Wide Web
Tim Berners–Lee (1989-91) – Hypertext as an organizational knowledge management system for
preserving & managing knowledge at CERN “a ‘web’ of notes with links (like references) between them is far more
useful than a fixed hierarchical system… to allow a place to be found for any information or reference which one felt was important, and a way of finding it afterwards” [1990. Information Management: A Proposal]
– Concept included application independent standards for HTML – markup tags to encode document formats & components defined
using a simple SGML document type description
HTTP – a request-response protocol implemented in the client-server computing model
URL – (1992-4) a way to express and locate the unique address for a file that is accessible on the Internet
– Two types of applications give life to the standards Browser – end-user ‘client’ application for retrieving, presenting and
traversing information resources on the World Wide Web
Web server – system storing, processing and delivering web pages to clients via HTTP 9
The Web transforms a communications infrastructure into a knowledge repository
Application independent standards for use by anyone Authoring tools to create content
– Text editors - SGML, HTML, XML are all expressed in ASCII characters so can be written using any character-based editor
– WYSIWYG editors try to show what the page will look like – Structure editors show logical structure as well as WYSIWYG
Web servers to provide content – Single PC in a home office
– Server farms, e.g., Google probably has more than 2 M servers
Browsers , e.g., – NCSA Mosaic (1993) – Netscape Navigator (1994) Firefox (2002) – Windows Explorer (1995) – Apple Safari (2003)
– Google Chrome (2008)
Search & retrieval engines 10
Content is useless if it cannot be found
Discovery tools & retrieval tools are essential – Web directories were initially important but now essentially extinct
Generally human curated catalogs of websites organized by some conceptual categorization, e.g., DMOZ, Yahoo Directory
Labor intensive and difficult to administer
– Automated search engines – technologically complex, vastly powerful Web crawler visits linked web pages under control of policy to collect
metadata and content for indexing Indexing engine indexes web pages by content, metadata, and perhaps
other factors such as numbers of ingoing and outgoing links according to search engine specific policy
Query processing applies input from user against actively maintained indexes to identify relevant web pages and returns links to these pages to the user.
Rise and fall of the web portals – Attempt to syndicate and provide access to range of information
retrieval & display tools via a single “easy to use” web page (e.g., Yahoo, Bigpond)
– For search, simplicity (e.g., Google), won the day – Portal technology still provides front-ends to corporate intranets 11
Search engines and web portals were the
killer applications that caused the Web
to explode
Fuel for explosive infrastructure growth
Web (and Internet) highly subsidized by the US government – Communications infrastructure
– Storage
major fractions of the knowledge being placed in the Web were freely available to end users
Fuelled by the growing epistemic value of the content that can be retrieved essentially for free, the Internet's rate of growth was unprecedented in human history
– soon grew beyond anything that was economically capable of gov’t support
Rise of the commercial (ISP)
– Similar organization and fees to commercial telecoms
– Web access common as phones 13
Early growth of the Internet and Web
14
Date Hosts 1
Domains 2
WebSites WHR(%)3
1969 4
Jul 81 210
Jul 89 130,000 3,900 –
Jul 92 4 992,000 16,300 50 0.005
Jul 93 1,776,000 26,000 150 0.01
Jul 945 3,212,000 46,000 3,000 0.1
Jul 95 6,642,000 120,000 25,000 0.4
Jul 96 12,881,000 488,000 300,000 2.3
Jul 97 19,540,000 1,301,000 1,200,000 6.2
Jan 98 29,670,000 2,500,000 2,450.000 8.3
Jul 98 36,739,000 4,300,000 4,270,000 12.o
Jul 01 126,000,000 30,000,000 28,200,000 22.0
Gromov 2011
Experimental HTML
Launch public Web
1A host is a domain name having an IP address record associated with it
2A domain is a domain name that has name server (NS) records associated with it and subdomains or hosts within the global domain.
WebSites are specifically HTTP servers for HTML & other objects.
3Web sites to Hosts Ratio – roughly estimates the percent of Web surfing people that are trying to become the Web authors by creating their own Web sites.
Phenomenal growth
Some numbers (Witiger.Com) – Number of Internet devices:
1984 1,000 (one thousand)
1992 1,000,000 (one million)
2008 1,000,000,000 (one billion)
– To reach 50,000,000 (fifty million) users it took the Telephone 38 years
Television 13 years
Internet 4 years
iPod 3 years
Facebook 2 years
15
How much knowledge held in the Web?
My primary interest is meaningful “content” (web pages, documents, books), not data
Three Webs – Surface web –freely accessible to a browser
Inktomi Jan 2000 1,000,000,000 pages
Notess (2006) Dec 2000 600,000,000 Dec 2001 1,500,000,000 Nov 2002 3,000,000,000 Feb 2004 4,000,000,000 2006 20,000,000,000
Wikipedia current 36,607, 000 (~4 M for content) Google (2008) Jul 2008 1,000,000,000,000 (w/o duplicates)
Indexed Web current ~47,000,000,000 (Google) Web Archive current 8,083,803 (books & texts)
– Deep/hidden Web – requires subscription or password to access, e.g. e-Journals: University of Melbourne Library accesses 116,279
– Some are available free to the web, most are not (Scholar indexes)
e-Book titles on Amazon: 6,911,733; (437,674 are free, rest are not) Subscription news, financial reports, other databases, etc.
– Dark Web – encrypted & deeply hidden content (TOR, privacy, hacking, …) See Dr Gareth Owen 2015 Tor: Hidden Services and Deanonymisation
Quantification difficult (~80% of access seems to be child abuse porn)
16
Some other uses of the Web/Internet
Blogs (WordPress; Blogger)
Cloud apps (Google Docs; Office 365)
eCommerce (Kogan, eStore, Coles Online)
Entertainment media (e.g., Netflix; Foxtel)
Navigation & geolocation (e.g., Google Earth, Nearmap)
News media (e.g., Google News, Huffington Post, CNN)
Photography & Video (e.g., Flickr, Panoramio)
Self storage (Dropbox, Google Drive)
Sex/pornography
Social networking (e.g., Facebook; Twitter ; Meetup; LinkedIn)
Telephony & teleconferencing (Skype; Webex)
Video sharing (YouTube, Vimeo)
17
Some thoughts on the history, what it
means, and where it is taking us
Why has the Web been so overwhelmingly successful
Tony Smith (1995) “Why the Web” (before I knew it existed) – modest extra layer on established & working technologies
– Developers worked in real world with open collaboration
– URL is human-readable and printable way to address any Internet resource
– Climate for the Web established by a succession of grand visions
– Marc Andreessen built a user-friendly graphical interface
– Newbies rapidly found the Web effectively eliminated distribution and publication costs for desk-top publishing
Puts the evolutionary growth of knowledge into hyperdrive
19
The World Wide Web links a vast network of … actors, human, non-human, material and ethereal. The six above-listed causes of the Web’s success dance with those actors across a profusion of interconnections. The ideas of human visionaries become memes propagating an epidemic of Web ‘surfing’. The Web’s computer codes become epidemic across the Internet. Loops in the Web’s links, and in its actor-network, feed back positively and cybernetically—fuelling its continued near exponential growth and its ever-accelerating transformation into cyberspace proper. [Smith 1995]
Where is the Web likely to go in the future
Trends – Ubiquity is almost here now
– Increasing epistemic power Web applications are making more and more decisions on their own
before consulting their human users
Able to make decisions with ever increasing information sources
– Generalization/convergence more and more functions incorporated in single applications
e.g., Google Earth/Maps as a geolocated memory prosthesis
Future – Increasing replacement (not extension) of human cognitive functions
E.g., spatial navigation
E.g., memory and recall (life-logging?)
– Emergent functions Global brain?
– Burnout?
20
Next session
Wrapping up the Web – I’ve already covered concepts fro most of the book sections listed
below in earlier Meetup sessions
– Here I’ll say a bit more about how the technology carries out cognitive processes in the Web
– I hope Tony will join me in a free-form discussion of Web history and our experiences with it
– There is a lot more to be said about human interactions with the technology and the Web, but that will be left until after an Interlude where I take a much deeper look from physical and evolutionary points of view at the emergence and interrelations of life and knowledge
21
Episode 3(4) - Emerging cognition in the Web itself Retrieving Value from the Web Semantically
Cataloging Approaches
Indexing Approaches
Using Portals
Multimedia
Wrapping Up the Web