+ All Categories
Home > Internet > Searching the internet - what patent searchers should know

Searching the internet - what patent searchers should know

Date post: 16-Apr-2017
Category:
Upload: eric-sieverts
View: 288 times
Download: 1 times
Share this document with a friend
49
searching the internet what patent searchers should know Eric Sieverts WON, 11-12-2012 UB Utrecht HvA-MIC GO Opleidingen
Transcript
Page 1: Searching the internet - what patent searchers should know

searching the internet what patent searchers should know

Eric Sieverts

WON, 11-12-2012

UB Utrecht HvA-MIC GO Opleidingen

Page 2: Searching the internet - what patent searchers should know

agenda

• searching the web• the volatile google landscape• searchtools and functionality• google options• beyond google • beyond general web search

Page 3: Searching the internet - what patent searchers should know

agenda

generalweb

search

specificmaterialsearch

importance of specificmaterialtypes?

the generalweb?=?

everything

how to … how to …

when& why

Page 4: Searching the internet - what patent searchers should know

an ever changing google landscapean ever changing google landscape

• unreliable numbers • irreproducible results• disappearing functions• changing interfaces

Page 5: Searching the internet - what patent searchers should know

"coping" with numbers of results

• expectation how combination of terms effects the number of results, are generally met in structured databases, but:

• with Google (and other web search) numbers are not stable, irreproducible, unreliable, with inexplicable effects– refining by adding AND-relation gives larger number, – expanding by adding OR-relation gives smaller number,– numbers are only quick extrapolation from part of search index,– depends on distribution of the index over servers,– depends on Google version, browser, whether logged in, history, ...– depends on Bing geographic setting

• Danny Sullivan explains why Google can not calculate: http://searchengineland.com/why-google-cant-count-results-properly-53559

Why Google Can’t Count Results Properly

Page 6: Searching the internet - what patent searchers should know

Google as a vanishing engine

some services and options disappear completely– timeline, wonder wheel, toolbar, ...– + operator– real time results, code search – google buzz, google wave, ...

others are only hidden– links for advanced search and for settings hidden under “cog wheel”

(sometimes dependent on browser)– Scholar and Directory no longer mentioned in drop down menus– backlink search no longer in advanced search– search for "similar" pages & "cache"-link are hidden in "invisible"

pop-up page preview – …

Page 7: Searching the internet - what patent searchers should know
Page 8: Searching the internet - what patent searchers should know
Page 9: Searching the internet - what patent searchers should know

like faceted search in for instance Scopus

Page 10: Searching the internet - what patent searchers should know

refinements and additional functionslike in modern "web scale discovery" systems

but meanwhile

this is already

an "old" interface !

Page 11: Searching the internet - what patent searchers should know

google.nl [until 2 weeks ago]

from clear left column facets & refinementsto blurry top menus (for mobile users?)

google.com

Page 12: Searching the internet - what patent searchers should know

all options by material type, in old interface

Page 13: Searching the internet - what patent searchers should know

search terms: Google is thinking for us

Google tries to improve and to broaden your queries• automatic spelling corrections (veilgheid >> veiligheid)• search for words with same word stem (singular/plural, verb,

conjugation, inflection, …)• expands acronyms (jfk >> john f kennedy | wwii >> world war II)• adds synonyms (vaccination >> immunization)• transforms separate words to compound term & vice versa (veiligheid

maatregel >> veiligheidsmaatregel | catfood >> cat food)• may leave out term as optional if not differentiating enough

more often and elaborate in English than in Dutch

• personalises search, based on previous search behaviour

and if you don't like all of this ........

never sure what/when or not

>> "verbatim"

Page 14: Searching the internet - what patent searchers should know
Page 15: Searching the internet - what patent searchers should know

new option introduced early 2012verbatim

on google.nl: "woord voor woord"

recentlymoved totop menu

Page 16: Searching the internet - what patent searchers should know
Page 17: Searching the internet - what patent searchers should know
Page 18: Searching the internet - what patent searchers should know

standard semantic codingallowed Google to make arecipe search engine"embedded metadata"

standardisation of property descriptions in HTML

of recipe pages, with"microformats"/"rich snippets markup"

Page 19: Searching the internet - what patent searchers should know

Google's "Knowledge Graph"knows 500 million objects with 3,5 billion properties(but only in English)

Page 20: Searching the internet - what patent searchers should know

dates

??

no:

Page 21: Searching the internet - what patent searchers should know

publication dates

• limitation while searching google– before search: only "past day/week/month/year"– after search: also limitation on custom range "from .. to .."

search tools:

Page 22: Searching the internet - what patent searchers should know

publication dates

• limitation while searching google– before search: only "past day/week/month/year"– after search: also limitation on custom range "from .. to .."

• how reliable are google's dates? NOT• how else to determine date?

– look at page text (especially top and bottom or blogging date)– look in page source (HTML) for metadata– try entering javascript in browser URL bar

but does NOT work for CMS generated pages– try to find recent time stamped version in Web Archive

(waybackmachine)

javascript:alert(document.lastModified)

Page 23: Searching the internet - what patent searchers should know
Page 24: Searching the internet - what patent searchers should know
Page 25: Searching the internet - what patent searchers should know

previous (disappeared) versions of pages

• recently disappeared: try search engine cache not just google! :

Bing

Yahoo

Exalead

Page 26: Searching the internet - what patent searchers should know

previous (disappeared) versions of pages

• older versions: try web archive (waybackmachine)

• links within same site are mostly working

• if particular page has not been crawled, you get overview which other pages on the website have been crawled

Page 27: Searching the internet - what patent searchers should know
Page 28: Searching the internet - what patent searchers should know

reliability - general

general website assessment criteria• professional lay-out• indication of author/organisation (“about us”)• data about organisation: address, telephone, map/driving directions• indication of targeted audience• not too many advertisements and pop-ups (although every site has them)• clear navigation• internal search option• speed of web server• backlinks from well known organisations **• up to date-ness (with date given)• language use • interpret the URL/domain-name (eg: edu, edu.au, edu.sg, edu.ng, edu.lb, ac.uk,

gov, gov.uk, gov.hk, gov.au, gov.on.ca, gob.es, gob.mx, gob.ve, gob.ec, ...)

Page 29: Searching the internet - what patent searchers should know

reliability - organisation

Information about organisation:• Google pagerank (backlinks)

use for instance: http://www.prchecker.info/http://www.checkpagerank.net/

• Alexa rank (web traffic)see for instance: http://www.alexa.com/

http://www.seomastering.com/alexa-rank-checker.php

• domain owner:use for instance: http://centralops.net/co/DomainDossier.aspx

http://whois.domaintools.com/

• search for "backlinks"

Page 30: Searching the internet - what patent searchers should know

reliability - backlinks

search backlinks to particular web-page/-site • Google: link:http://www.domain.zz/folder/file.html

very incomplete result• Yahoo site explorer: died last year• DuckDuckGo: link:http://www.domain.zz/folder/file.html

no total number of results displayed• OpenSiteExplorer: linking pages + linking domains

very complete; also domain & page authoritypaid subscription for more than 3 queries a day

• Exalead: link:http://www.domain.zz/no backlinks to specific page, but to whole site

Page 31: Searching the internet - what patent searchers should know

some more "how to" (google)

• domain search: site:edu OR site:edu.* [for all edu (sub)domains]

• url search: inurl:novelty

• title search: intitle:catalytic

• filetype search: filetype:pdffiletype:xls OR filetype:xlsxfiletype:doc OR filetype:docxfiletype:rss

• exact search: "greenhouses" [or VERBATIM for all words]

more than shown inadvanced search

drop-down

Page 32: Searching the internet - what patent searchers should know
Page 33: Searching the internet - what patent searchers should know
Page 34: Searching the internet - what patent searchers should know

web search engines to try besides google• Bing (microsoft, large)• Yahoo! (content=Bing, large)• Exalead (french, many advanced functions, primarily demo system)• Blekko (uses hashtags to search more [domain-] selective

also many predefined hashtags; e.g. /likes for Facebook)• DuckDuckGo (assures privacy, no personalisation, no filter-bubble,

!Bang-function offers many extras, rather small)• Gigablast ("green" search engine, rather small, a few unique functions)• Millionshort (leaves out results from most popular sites the long tail)• Wolfram|Alpha (knowledge engine, facts, calculations)together, these others have 30% market share in US; in NL only 3%• Yandex (in Russia more popular than Google)• Baidu (in China more popular than Google)• Naver, Daum(in South Korea more popular than Google)• Seznam (in Czechia more popular than Google)

Page 35: Searching the internet - what patent searchers should know

material type specific search

blogs google blogs, icerocket, technorati[rss] CTRLQ, RSS SearchHub

video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news

images google image, yahoo image, bing image, flickr,tineye (ip-check), panoramio (geo-search)

science google scholar, microsoft academic, scirus,oaister, scientific commons, science.gov

nieuws google news, yahoo news, bing news, cnn, bbc,historische kranten KB, historic american newspapers (LOC)

tweets twitter search, topsy, tweetzi, tweetscan, postpost, snapbird

social socialsearcher, socialmention, samepoint, whostalkin, kurrently

forums google groups, omgili, boardtracker

Page 36: Searching the internet - what patent searchers should know
Page 37: Searching the internet - what patent searchers should know
Page 38: Searching the internet - what patent searchers should know
Page 39: Searching the internet - what patent searchers should know
Page 40: Searching the internet - what patent searchers should know
Page 41: Searching the internet - what patent searchers should know

41

tweets & social search

• Twitter in 140 characters – often with shortened links– often with photo- or video-link – often with hashtags (#agreeduponkeyword)

search (often limited to last 1 - 2 weeks, and .... to those 140 characters)– twitter (also advanced search)– topsy, tweetzi, …– postpost (your own timeline - i.e. everything you're following)– snapbird (all tweets by 1 person - enter his/her twittername)– tweetscan (on limited scale also older messages)– twicsy (photo's on twitter) – ...

overview/review of tools: All the easiest ways to search old tweets

Page 42: Searching the internet - what patent searchers should know
Page 43: Searching the internet - what patent searchers should know
Page 44: Searching the internet - what patent searchers should know
Page 45: Searching the internet - what patent searchers should know
Page 46: Searching the internet - what patent searchers should know
Page 47: Searching the internet - what patent searchers should know
Page 48: Searching the internet - what patent searchers should know
Page 49: Searching the internet - what patent searchers should know

49

tweets en social search

• “Real time / social search engines”– socialsearcher, socialmention, samepoint, whostalkin, kurrently,

… (tweets + blogs + facebook + …)

– Google personal results / Google+ ("search plus your world") – real-time pictures: skylines

• Forum discussions– omgili, boardtracker, ...– Google groups (also old newsgroup discussions)

for research methods:– advice from Henk van Ess (dutch): "de digitale detective" (2012)– How to: use social media in newsgathering (2012)– 100+ Social Media Monitoring Tools (2010)


Recommended