Post on 25-Apr-2018
transcript
2
About Me
• Senior Learning Technologist at WellPoint, Inc • Developer for 14 years • Developing in ColdFusion for 8 years • Started in SQL Server, ASP, ASP.NET, VB.NET • Also work in Flash Builder/Flex, Java, and C#
3
Where We’ve Been: Growth and Consolidation
WellPoint, Inc. was formed in 2004 as the result of a merger between Anthem, Inc. and WellPoint Health Networks, creating
the nation’s largest health benefits company by membership
4
Where We Are: National Scale
1 out of 9 Americans are covered by WellPoint’s affiliated health plans
Note: Provider Network refers to BlueCard® PPO Network
• Nation’s Largest Insurer • ~34 million medical members
• Total Revenue • Nearly $60 billion
• Provider Network Advantage • ~94% Hospitals • ~82% Primary Physicians • ~84% Specialists
• Blue Licensee • 14 states
5
Agenda
• Problem and Goal • Why Apache Solr for ColdFusion 9.01 • Solr Multi-core Overview • Replication Overview • Installation • Replication Configuration • Managing Collections on Multiple Solr Instances • Extending ColdFusion Solr Schema • Creating a Custom Search • Q & A • Resources
6
Problem and Goal
• Problem • Slow search response
• Constant corruption issues
• Verity wasn’t scalable
• No redundancy
• Goal • Improve search response
• Create an enterprise scalable solution
• Implement redundancy for high availability
• Maintain compatibility with <cfsearch /> & <cfindex /> tags
7
Why Apache Solr for ColdFusion 9.01
• Performance
• Fast, very fast
• Optimized for high volume web traffic
• Scalable
• Distributed searches
• Replication • Redundancy
• Replication supports • Master • Slave • Repeater
9
Technologies Used
• Windows Server 2008 64 bit • IIS 7.0 • Application Request Routing • ColdFusion 9.01 Multi-server • Apache Tomcat 6
• Master instance
• Apache Solr Standalone Installation for ColdFusion 9.01 • Slave instances
• Java SE JDK 1.6_026 64-bit
10
Solr Multi-core Overview
• Solr core = ColdFusion collection • Multiple Cores
• Single Solr instance • Each Solr core has its own configuration and index • Unified administration
• Multi-core template • A template is used for creating a new core (collection)
• The template contains a directory structure and the configuration files needed to create a new core
• Location SolrInstallationDirectory\multicore\template
11
Solr Multi-core Template
• conf directory • Contains configuration files used when creating a new Solr core
• Two key files: schema.xml
– Contains the details about which fields your index can contain – How those fields should be dealt with when adding documents to the
index – How those fields should be dealt with when querying those fields
solrconfig.xml – Contains the configuration settings for the Solr core – Used to configure replication
12
Solr Multi-core Template Continued
• conf directory continued • Files referenced by schema.xml:
protwords.txt – Words that need protection from stemming – i.e. “maine” is stemmed to “main”
stopwords.txt – Words to not index e.g. a, an, and
synonyms.txt – Synonym groups e.g. GB,gib,gigabyte,gigabytes – Mappings used for spelling corrections e.g. hippa => hipaa
13
Solr Multi-core Template Continued
• conf directory continued • Optional file:
solrcore.properties – User defined properties to be referenced within solrconfig.xml – Syntax – Property=Value – File is referenced by default when present in conf directory – Example:
• data directory • Empty directory
• Solr will create the following directories the 1st time content is indexed index spellindex
14
Solr Replication Overview
• Replication Features • Efficient and automated distribution of index additions, updates, and
deletions • Pull strategy allows for easy addition of slaves • Configurable distribution interval allows tradeoff between timeliness and
cache utilization - interval is set by the slave instance • Replication and automatic reloading of configuration files • Works over HTTP • Works across platforms with same configuration
• Replication Modes • Master – optimized for indexing • Slave – optimized for searches • Repeater – used in WAN to reduce bandwidth between data centers
15
Solr Replication Considerations & Challenges
• Considerations • Replication is not a server level configuration
• Replication is configured in at the solr core (search collection) level
• New cores need to be created on all solr instances
• Challenges • Modify the multi-core template to implement replication when new cores
are created
• Automate the creation of a solr core on all solr instances
• Create a consolidated view of cores on all instances
16
Solr Replication Requirements
• Basic Requirement • One master solr instance
• One or more slave solr instances
• Configuration of replication request handlers on master and slave instances
• Replication Request Handler • Configuration is handled in the solrconfig.xml
• Replication is defined by adding a request handler using XML syntax
• Settings are used to set the properties for the request handler
• Master and slave instances are both configured using a request handler, but use different attributes to define its role
17
Master Replication Request Handler
• Replication request handler with all possible attributes • Screen shot
18
Required Master Settings
• replicateAfter • Configures when replication will be triggered
• Valid values: startup, commit, optimize
• If using startup option, it is necessary to have a commit/optimize entry also, if you want to trigger replication on future commits/optimizes.
• Example:
19
Recommended Master Settings
• confFiles • Used to specify configuration files to be replicated
• Comma delimited list of files to replicate
• Can be configured to rename files on replication Syntax – source_file_name.xml:destination_file_name.xml
• Example:
20
Optional Master Settings
• backupAfter • Configures when a backup will be created
• Valid values: optimize, startup, commit
• maxNumberOfBackups • Maximum number of backups to retain
• commitReserveDuration • Default 10 seconds
• If commits are very frequent and network is slow, you can tweak this value
21
Slave Replication Request Handler
• Slave replication request handler with all possible settings • Add screen shot and high level notes
22
Required Slave Settings
• Configuration file • solrconfig.xml
• masterUrl • Sets the url of the Solr master instance • ${solr.core.name} – system variable
• pollInterval • Sets the polling interval of the slave to poll the master for changes • Considerations
Frequency of updates to index Network Bandwidth Acceptable latency
23
Optional Slave Settings
• httpConnTimeout • Sets connection timeout on the underlying HttpConnectionManager • Default value 5000ms
• httpReadTimeout • Sets timeout when fetching index from master • Default value 10000ms
• httpBasicAuthUser • Use if basic authentication is enabled on master
• httpBasicAuthPassword • Use if basic authentication is enabled on master
• Compression • Use only if your bandwidth is low
24
Slave Replication Configuration Examples
• Basic configuration example
• Using solrcore.properties configuration example
25
Slave Solr Installation
• Slave Servers • Windows Server 2008 (64 bit 8gb ram)
• Install Java SE JDK 1.6_026 64-bit Note location of installation directory
– Example : D:\Apps\Java\jdk1.6.0_26
• Execute Apache Solr Standalone Installation for ColdFusion 9.01 installer Change Java Home from default to:
javaInstallationDirectory\jdk1.6.0_26\jre – Example: D:\Apps\Java\jdk1.6.0_26\jre
26
Master Solr Installation
• Master Solr Server • Windows Server 2008 (64 bit 8gb ram)
• Download Java JDK1.6_026 64-bit
• Download Apache Tomcat 6 32-bit/64-bit Windows Service Installer
• Execute Java JDK Installer Note installation directory Example: E:\Apps\java
• Execute the Tomcat 6 installer Java JRE – specify the jre in the jdk 1.6.0_26 installation
– Example: E:\Apps\Java\jdk1.6.0_26\jre Select installation directory
– Example: E:\Apps\tomcat6
27
Master Solr Installation Continued
• Master Solr Installation continued • Create a solr directory – example E:\Apps\solr
• Copy the following from slave installation solr.war to solr directory
– installationDirectory\webapps\solr.war Mutli-core directory to solr directory
– installationDirectory\mutlicore
• Configure Tomcat service • Launch Configure Tomcat
• Java tab
• Set initial memory pool
• Set maximum memory pool
28
Configure Tomcat for Solr
• Stop Apache Tomcat 6 service • Create solr context
• A Context is what Tomcat calls a web application • Location: tomcatInstallDir\conf\Catalina\localhost\ • Create a solr.xml file • Edit solr.xml and define Solr context • Example:
• Start Apache Tomcat 6 service • Launch Tomcat 6 - http://127.0.0.1:8080/manager/html • Navigate to solr application
30
Slave Configuration
• Apache Solr for ColdFusion 9.01 runs on a Jetty servlet • Jetty Configuration
• Configuration file location SolrInstallationDirectory\etc\jetty.xml
• Connector system properties jetty.port – default = 8983 jetty.host – default = not defined
• Default configuration listens only on 127.0.0.1
• Add jetty.host system property to the connector setting 0.0.0.0 = listen on all IPs Example:
32
Slave Service Configuration
• Service start up configuration • Default java ram maximum memory setting is 256mb
InstallationDirectory\solr.lax
• Adjust maximum memory setting -Xmx
• Add a minimum memory setting -Xms
• Example:
33
Master Solr Multi-core Template Configuration
• Create solrcore.properties • Create a text file named solrcore.properties in the Solr multicore template
directory
• Add two properties MASTER_CORE_URL=http://masterHostnameUrl:masterPort/solr POLL_TIME=hh:mm:ss
• Example:
• Create solrconfig_slave.xml • Make a copy of solrconfig.xml in the master Solr multicore template
directory
• Name the file solrconfig_slave.xml
34
Master Solr Multi-core Template Configuration Continued
• Configure solrconfig.xml for replication • Add master and slave replication request handlers • solrconfig.xml
• solrconfig_slave.xlm
35
Slave Solr Multi-core Template Configuration
• solrcore.properties • Copy solrcore.properties in template/conf directory on master to
template/conf directory on slave
• solrconfig.xml • Delete solrconfig.xml file in template/conf on slave
• Copy solrconfig_slave.xml in template/conf directory on master to template/conf directory on slave
• Rename solrconfig_slave.xml to solrconfig.xml on slave
36
Creating New Collections
• Collections (cores) need to be created on all Solr instances • Use Solr API to create new cores
• REST-like API
• Create new core parameters action – CREATE name – name of new core instanceDir – directory path for new instance template – directory path for the core template wt – writer type
– Format of response – Options: json, javabin, xml – Default = xml
version = 1
37
Creating New Collections Code
• In CF create an array of server instances • Define collection name
38
Creating New Collections Code Continued
• Loop over server instance array • Create collection on each instance
39
Collection Create Result Struct
• De-serialized file content (cfdump from previous slide) • core – collection name
• responseHeader QTime – query time milliseconds status
• saved File path to multicore\solr.xml multicore\solr.xml file is used to store
core names and instance directory
40
Solr Admin Master Replication
• Core admin • Navigate to Replication
• Replication admin • Index version
• Location
• Size
41
Solr Admin Slave Replication
• Core admin • Navigate to Replication
• Replication admin • Master
• Poll Interval
• Local Index Version & location Replication status
• Controls Disable Poll Replicate Now
42
Deleting Collections
• Collections (cores) should be deleted from all Solr instances • Use Solr API to delete cores
• Delete core parameters action – UNLOAD core – name of core to delete wt – writer type
– Format of response – json, javabin, xml – Default = xml
version = 1
44
Extend ColdFusion Solr Schema (cfcore)
• Reasons to extend/change default functionality • Change default operator
The default is OR
• Enable delete by key capability
• Enable case sensitivity on search
• Possible changes to schema.xml • Default operator between words is OR
Changing default operator to AND will reduce number of results
45
Extend ColdFusion Solr Schema – Enable Delete by Key
• Enable delete by key • Default unique key is a system generated identifier • Possible use case
Use API to delete indexed content by the key value • Changes
Create a copy of schema.xml and name it schema_slave.xml Update replication conf attribute to use schema_slave.xml: schema.xml Changes to schema.xml
– Change index attribute on key field to true
– Change unique key from uid to key
Changing unique key on slave instances will break cfsearch tag
46
Extend ColdFusion Solr Schema – Enable case sensitivity on search
• Enable case sensitivity on search • Default configuration uses a filter to change text to lower case
• Possible use case Search by title and retain case sensitivity
• Schema Change Comment out solr.LowerCaseFilterFactory
47
Creating a Custom Search
• Use case • Return category facet counts • Date range search
• Solr Search API • Basic query parameters
q – search query fq – facet query qt – query type – name of the request handler in solrconfig.xml start – start row rows – number of rows to return in response fl – comma delimited list of fields to include in response wt – write response type
48
Creating a Custom Search Continued
• Solr Search API continued • Highlight parameters
hl – enable highlighted snippets to be generated hl.fragsize – the size in characters, of the snippets created by highlighter hl.snippets – maximum number of snippets to generate per field hl.simple.pre – text which appears before highlighted term hl.simple.post – text which appears after highlighted term
• Facet parameters facet – enable facet counts in query response facet.field – specify a field which should be treated as a facet facet.mincount - minimum count to include facet in response
49
Creating a Custom Search Continued
• JSON specific parameter • json.nl
Controls the output format of NamedList used for field faceting data flat (default) – flat array
– Example: [name1,val1, name2,val2] map – JSON object
– Is a hash and can have repeated keys, but preserves order arrarr – an array of two element arrays
– Example: [[name1,val1], [name2, val2], [name3,val3]]
52
Q & A
21555 Oxnard Dr Dan Sirucek MS: CAAC08-088I Sr. Learning Technologist Woodland Hills, CA 91316 Learning Technologies and Tel (818) 234-8017 Content Mobile (323) 251-1236 www.wellpoint.com dan.sirucek@wellpoint.com
53
Resources
• Apache Tomcat 6 - http://tomcat.apache.org/download-60.cgi
• Apache Solr Standalone Installer for ColdFusion 9.0.1 - http://www.adobe.com/support/coldfusion/downloads.html
• Java JDK 1.6_26 download- http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u26-download-400750.html
• Apache Solr - http://lucene.apache.org/solr/
• Solr Wiki - http://wiki.apache.org/solr/FrontPage
• Solr Replication - http://wiki.apache.org/solr/SolrReplication
• Solr JSON Response Writer - http://wiki.apache.org/solr/SolJSON#JSON_Query_Response_Format
• Solr Facet Parameters - http://wiki.apache.org/solr/SimpleFacetParameters
• Solr Highlighting Parameters - http://wiki.apache.org/solr/HighlightingParameters