KnowledgeLake User Conference 2012 Welcome to SharePoint ECM Heaven
Transcript
1. SharePoint 2010 SharePoint 2013Managed Property (Multiple)
Search SchemasBest Bets Promoted Results (Query Rule)Scope and
Federated Location Result SourceContent By Query Content By
SearchIncremental Crawl Continuous CrawlMCM MCSM
2. Continous Crawl Benefits Continus Crawl Facts No more
waiting for index Runs every 15 minutes by merge default Does not
wait for other Default interval can be crawls to complete changed
with PowerShell Can have multiple Should be used instead of
continuous crawls running incremental crawls for simultaneously
SharePoint content sources Continuous crawls ignores errors
3. HTTP Other File Share End User QueryUser Profile Or Content
Process Initiated SharePoint Sources Query Content Query Crawl
Index Processing Processing Component Component Component Component
Analytics Processing Link Index Crawl Partition(s) Component
Database Database(s) Event Store Analytics Database
4. What it Does Important Facts Crawls content sources to We
can have multiple crawl populate index components Delivers crawl
items (binary) and MS Recommends: 2 Crawl metadata to content
processor Components per Search Service Invokes connectors or
protocol Application handlers to interact with content MS
Recommends: 8(4vm) CPU / sources to retrieve data 8GB RAM per Crawl
Component Uses one or more crawl databases to store info about
crawl items and crawl history
5. What it Does Important Facts Processes crawl items and feeds
to index We must only have one (1) crawl component processing
component per server more Transforms crawl items into artifacts
that will hurt, not help crawl performance can be included in
search index Max of 2 per search service application (Performs
document parsing and Feeding Sessions are scaled based on property
mapping) CPU cores using a default coefficient of 3 Writes
information about links and urls 8 (cores) * 3 = 24 feeding
sessions in link database (which are analyzed by 4 (cores) * 3 = 12
feeding sessions analytics to calculate relevance and MS
Recommends: 8(4vm) CPU / 8GB currency - Results written back to
search RAM per Content Processing Component index by content
processing component Feeding sessions require RAM More Generates
phonetic name variations to RAM is necessary when more cores are
improve people search present monitoring required
6. What it Does Important Facts Runs analytics jobs that
analyze crawl items Maximum of 6 per search service and user
interaction with search results to application perform both search
analytics and usage Add more Analytics Processing Components
analytics to improve analytics performance Analyzes Link &
Anchor text analysis, Clear MS Recommends: 8(4vm) CPU / 8GB RAM /
distance, Search Clicks, Deep Links, Social 300GB disk space per
Analytics Processing Tags, Social Distance, Search Reports,
Component. Recommendations, Usage Counts, Activity Interacts with
Analytics Reporting to store Ranking statistical information
Improves search relevance and create Interacts with Link database
to store search results information about searches and crawled
Output included in search index by content documents processor
7. What it Does Important Facts Receives processed items from
content Maximum of 60 index partitions (20 processing component and
writes the index partitions X 3 index replicas) per items to the
index file search service application Receives queries from the
query Must provision one Index Component processing component and
returns for each index replica. result sets MS Recommends: 8(4vm)
CPU / 16GB Redistributes content among index RAM / 500GB disk space
per Index partitions when index architecture is Component. changed
by Search Administration Component
8. Index partition is logical portion of entire search index
(same as before) Index partition is served by one or more index
components Index components can be primary "replica" or secondary
Index "replica" Primary Replica is contacted by content processing
component to write new data in the indexArchitecture Secondary
Replica is read only copy that get updated with the data. Adding
replicas improves query performance under load Add partitions to
handle increased content corpus Cant remove partition after it has
been added.
9. What it Does Important Facts Analyzes and processes queries
and Maximum of 1 per server results MS Recommends: 8(4vm) CPU / 8GB
After receiving a query, it analyzes and RAM per Query Processing
processes the query to optimize Component. precision, recall and
relevance Submits processed queries to the index component
Processes the result set returned by the index component before
returning to the querying entity.
11. Host 1 Host 2 Host 5 Host 6 Web server Web server Web
server Web server All SharePoint databases All SharePoint databases
Application Office Application Office Search admin db Link db
Server Web Apps Server Web Apps Server Server Crawl db Analytics db
Redundant copies of all databases using SQL clustering, mirroring,
or SQL Server SharePoint Config db 2012 AlwaysOn All other
SharePoint databasesHost 3 Host 4 Application Server Application
Server Query Processing Query Processing Replica Index part ition 0
Replica Application Server Application Server Crawl Crawl Admin
Admin Analytics Analytics Content processing Content
processing
12. Host A Host B Host E Host F Application Server Application
Server Query Processing Replica Index part ition 0 Replica
Application Server Application Server Analytics Analytics
Application Server Application Server Content processing Content
processing Application Server Application Server Replica Index part
ition 1 Replica Admin Admin Crawl Content processing Crawl Content
processingHost C Host D Host G Host H Application Server
Application Server Query Processing SharePoint databases SharePoint
databases Replica Index part ition 2 Replica Crawl db Search admin
db Crawl db Redundant copies of all databases using Application
Server Application Server Link db Analytics db SQL clustering,
mirroring, or SQL Server 2012 AlwaysOn Replica Index part ition 3
Replica
13. Host A Host B Host C Host D Host K Host L Host M Host N
Application Server Application Server Application Server
Application Server Query Processing Query Processing Replica Index
part ition 2 Replica Replica Index part ition 0 Replica Application
Server Application Server Application Server Application Server
Analytics Analytics Analytics Analytics Application Server
Application Server Application Server Application Server Content
processing Content processing Content processing Content processing
Application Server Application Server Application Server
Application Server Index part ition 1 Replica Index part ition 3
Replica Replica Replica Analytics Analytics Crawl Admin Crawl Admin
Content processing Content processingHost E Host F Host G Host H
Host O Host P Host Q Host R Application Server Application Server
Application Server Application Server SharePoint databases
SharePoint databases SharePoint databases SharePoint databases
Query Processing Query Processing Index part ition 4 Replica
Replica Index part ition 6 Replica Replica Search admin db Link db
Redundant copies of all databases using Crawl db Redundant copies
of all databases using Analytics db SQL clustering, mirroring, or
SQL Server Application Server Application Server Application Server
Application Server SQL clustering, mirroring, or SQL Server 2012
AlwaysOn Crawl db 2012 AlwaysOn Analytics db Crawl db Crawl db
Replica Index part ition 5 Replica Replica Index part ition 7
Replica Crawl dbHost I Host J Application Server Application Server
Replica Index part ition 8 Replica Application Server Application
Server Replica Index part ition 9 Replica
14. Schema can be managed by site admins, reducing the load on
search administrator Schema can be configured to allow more
granularity (query, retrieve, refine, sort, etc) - Affects content
index size Remote result sources can be crawled locally and then
queried by remote farms. Huge impact on geo-distributed search KL
may be able to help! Individual items can be re-crawled easily
Automatic URL balancing in crawl databases minimizes host name
restrictions for large archive repositoriesScalability limit
changes will have a big impact on farm design for large archive
content repositories inthe near future.