An Interagency Model for Collaboration and Operation
Sharon JordanAssistant Director
DOE Office of Scientific and Technical Information(Operating Agent for Science.gov)
CENDI Meeting, Nov. 4, 2010
Background - Relationship to CENDI - Funding - Operations - Milestones
2
What Is Science.gov?
An interagency science discovery tool, providing single-query access to multiple government-sponsored R&D results and other S&T informationA cross-agency search that integrates and simplifies access to 200 million pages of content from 14 U.S. science agencies The “USA.gov” science portal (formerly “FirstGov for Science”)A voluntary large-scale collaboration of U.S. government agencies
A Unique Collaboration with Tangible Results!
Drills down to selected databases and websites in parallel, then presents relevancy-ranked search results
3
Two workshops spawned origin:
2000: Blue-ribbon panel explored concept of a physical science information infrastructure. http://www.osti.gov/physicalsciences/wkshprpt.pdf
This prompted interagency involvement.
2001: “Strengthening the Public Information Infrastructure for Science” http://www.science.gov/workshop/index.html
Here the interagency Science.gov Alliance was formed
Participants included federal agencies, academia, information professionals and science experts. Science.gov gained approval as “Firstgov for Science” in early 2002Science.gov was launched in December 2002.
How Did It Begin?
4
Founding Agencies in 2001Department of AgricultureDepartment of CommerceDepartment of DefenseDepartment of EducationDepartment of EnergyDepartment of Health and Human ServicesDepartment of InteriorEnvironmental Protection AgencyNational Aeronautics and Space AdministrationNational Science Foundation
New Alliance MembersDepartment of TransportationLibrary of CongressUnited States Government Printing OfficeNational Archives and Records Administration
Support and coordination by CENDI – an interagency forum of senior information managers
Alliance OnlyUnited States Forest ServiceNational Institute of Standards and Technology
5
Shared Premises
Science is not bounded by agency, organization or geographyEach agency has vast stores of information that fulfill its missionA single web gateway is the tool of choice*A commitment to voluntary collaboration is necessary
*In OCLC Perceptions of Library and Information Resources, it was reported that 84% of public began search using search engines; only 1% began with online databases. Thus a “Google-like” easy search of authoritative sources with relevant results was desired.
6
Integration Challenges
Broad scope of Federal science and technology research and development missionsWide-ranging interest of potential audiencesInformation organization (taxonomy) issues given the broad scope of disciplines and audiencesBlending information resources from different agencies into cohesive functionality and page designPolitics, human resources, funding, sustainability
7
Guiding Principles for Content
Select authoritative web-based government-sponsored information resources
Rich science content, not merely organization pages
Databases contain primarily R&D results in the form of STI (bibliographic data and/or full documents)
Supplemented by websites for currency Only freely available content that is well
maintained Our audience is “the science-attentive
citizen”!
8
Agencies brought to the Internet table theirunique information specialties and resourcesFlagship service a commitmentNotable contributions of many:
Science.gov Alliance and CENDI - seized opportunity without mandateFirstGov.gov - supported the early stages with advice and two grantsMember agencies - provided participation of 200 staff members to working teamsNLM – provided usability testing prior to initial launchUSGS – managed original website search engine (surface web search)NTIS - created initial catalog of S&T websites IIa Inc. – provided secretariat support (CENDI special task)DOE/OSTI - conceived idea, developed technologies/deep web search and hosted
websiteNAL and USGS – provided Science.gov Alliance co-chairs
9
Alliance enjoyed extraordinary voluntary collaboration Vision and strategic direction provided by Alliance principalsAdministration provided by Chair(s) selected from AllianceTechnical team provided original technical direction and recommendationsMajor support provided by CENDIAdditional task groups formed as needed
Science.gov taxonomy Content guidance and development Website management and redesign Outreach activities Enhancement development Subject expansion Image library
Collaboration Is Key
10
The Funding Approach
Built and maintained with “in-kind” contributions: each agency’s staff time and existing information resourcesInitial development benefitted from CIO Council e-gov grants for catalog + initial deep web searchAlliance annual dues help fund routine operationsCENDI support leverages resourcesIn-kind contributions supported special eventsSBIR R&D resulted in innovations that were implemented in subsequent versions“Pass the hat” contributions to take advantage of an opportunity, such as Version 3.0 development
11
Science.gov Funding
2001: Cross agency portal grants: $170,0002002: DOE SBIR conducts relevancy ranking research2003-2004: Voluntary Pass-the-Hat contributions: $200,0002001-Present: Participating agencies and in-kind support develop and maintain Science.gov. Average since 2005 = approx $180K annually (fees plus in-kind support)
Doing “a lot with a little” by implementing creative funding methods
12
CENDI promotes the productive intersection of science content, technology and interrelationshipsThe Alliance, made up of CENDI agencies plus others, provides direction and support for this intersection in the form of Science.govThrough financial and in-kind commitments from its agencies, CENDI provides the ongoing infrastructure needed to offer a large-scale collaboration across organizational boundaries
Overview of CENDI Finances
CENDI Reserve
Maintenance Costs include Alliance Only dues*
Executive Secretariat for CENDI Includes Science.gov Support
A portion of Secretariat effort is used for Science.gov Tasks
*Science.gov Alliance Only dues are deposited into the CENDI treasury, with option of being used for direct costs/purchases for Science.gov (such as exhibit expenses) or being included in funding for overall Secretariat support of Science.gov.
Total Membership Funds Are Combined into One “Pot”
14
Content Management Is Distributed NTIS developed the original “catalog” with input
from agencies
CENDI Secretariat now maintains catalog with
agency participation
Agency content managers submit and edit their
information via a web form
Websites identified in the catalog were indexed
by USGS; now done by OSTI
Deep web databases are identified by agencies
and reviewed by team for suitability
Real-time search of content in large databases
is maintained by OSTI, which continues to host
the website and serve as operations manager
15
Provides administrative information, meeting minutes, usage statistics, content selection and cataloging guidelines, subject category information, and outreach materials such as presentations and flyers.
The Alliance Members’ Page
User Name: scigov
PW: scigov#1
Provides Alliance members and content managers a secure tool to quickly retrieve Agency metadata, add or edit resource records, and expedite the maintenance and quality control of the metadata and URLs.
Metadata Input System: For Websites in Searchable Index (“Surface Web” portion of Science.gov)
17
Science.gov Phase 1 (2001-2002) Established policy & governance, technical design teams Agreed on goals, policies, website look & feel Created taxonomy Selected, cataloged and indexed agency resources
Version 2.0 launched May 2004 Introduced relevancy ranking of metasearch results One-step search across ALL databases Added advanced search
Version 3.0 Enhanced precision searching, metarank &
boolean/fielded searching Other types of science content explored
Version 4.0 Enhanced relevancy ranking, also full-text relevancy
ranking
Development Milestones
18
Version 5.0 (Sept 2008) Clustering of results by subtopics or dates to help target your
search Wikipedia results related to your search terms EurekaAlert News results related to your search terms Mark-and-send option to email results to friends and colleagues More science sources for a more thorough search Enhanced information related to your real-time search New look and feel Updated Alerts Service Standardized citation formats available for download
Version 5.1 Aggregated news feeds from 11 science agencies Internships and Fellowships section made searchable Image Search Library (Coming soon!)
Development Milestones
19
Science.gov: Finds Content from 200 Million Pages at
2100+ Websites and 42 Databases with One Query
Searches selected websites (“surface web”) and databases (“deep web”) from one search point
Combines results from all sources, ranks and displays by relevance and clusters
Sends weekly “alerts” for user-defined topics of interest
Displays related Wikipedia and EurekAlert items
Provides browsing of selected websites
Displays an integrated news feed from science agencies
Links to special collections and other information
Featured search and sites highlight hot topics
Science.gov Today
20
21
Less than 1% overlap with Google; approximately 3.2% overlap with Google Scholar
22
23
Goes where traditional search engines cannot go. Full-text documents if searchable on the target site are searchable via Science.gov. Real-time search: If a target database adds a document or record, it is available on Science.gov immediatelyDuring the query, the most-relevant documents or records from each source are gathered – approx 100-200 from each source – and then the combined set is relevancy rankedTopic and date clusters for search results – subtopics, publication years displayed on-the-fly to enable efficient drilling down
More About Science.gov You May Not Know
Science.gov Page View Totals (Dec 02 - Sep 10)
751,180
965,146
1,793,483
2,593,449
2,591,717
2,946,801
5,166,126
4,074,747
FY03
FY04
FY05
FY06
FY07
FY08
FY09
FY10
Usage Continues to Grow
25
Large voluntary collaboration between agencies is often cited as a modelCollaboration AND infrastructure served as model for WorldWideScience.org; then Science.gov became U.S.’s contributed contentAlso a model for ScienceEducation.govA top 10 Google result for “science” with other major science outletsProvides core project for spin-offs such as Science Internships, Aggregated Science News, Science Image Search – and more!
Notable Achievements
26
Science.gov is among 10 government websites “meeting and exceeding” the Obama Administration’s transparency goals, according to a special report by Government Computer News, released July 27, 2009.
Science.gov In the News
27
Real Time Search?
Ads?Relevancy Ranked?
All Govt. Science?
Known Sources?
Scholarly Info?
28
Content and Purpose: Science.gov vs Data.gov
Searches for science topics at the full record levelEase of searching, with immediate, useful resultsFor the science-attentive citizen including researchers, teachers, students, business people, and the general publicA Google-like interface with an advanced option for power usersDrills down into the “deep web”
Examples: 2668 results for diabetes from 35 sources; 2772 results for climate change from 38 sources
Searches at the source level only, not at the record levelInterface with search results pointing only to sources or databasesEmphasizes machine-readable datasets, available in raw formats; some files are quite large, ranging up to hundreds of megabytesData generally requires additional manipulation; of limited use to general public. Expect public interest groups, reporters, academics, and others to review information, build interfaces, and report on findings
Examples: Zero results for specific terms such as diabetesOne result (database pointer) for climate change
29
Ready to use info. with user friendly interface?
Record level information?
Science research and results only?
Information from multiple agencies?
Repository of datasets and tools?
Provides pointer to database/source?
30
A perfect platform on which to launch new technologies Access to new forms of STI
Translation Precision searches Image searching
Current Science.gov Prototype
31
Future Opportunities
What will Science.gov 10.0 look like?