Date post: | 17-Nov-2014 |
Category: |
Technology |
Upload: | perforce |
View: | 1,631 times |
Download: | 1 times |
#
Sven Erik KnopTechnical Marketing Manager
Mastering Your UniverseP4Search
Ralf GronkowskiPrincipal Product Consultant
#
Sven Erik KnopPerforce Software
Ralf GronkowskiPerforce Software
#
• Why P4Search?• What is P4Search?• Implementation Details and Demonstration
Overview
#
Why P4Search?
#
What is Search?
p4 files / p4 fstat / ...
???
File names, Changes ...
File content?
C#
.h
JAVA
PPTX
#
• Built-in command, since Perforce 2010.1• Search files stored in P4D based on content
– Case sensitive and insensitive searches– Can use regular expressions– Can search through all revisions– Provide context search
• Returns depot paths
p4 grep
#
• A few drawbacks:– Text search only, limited to 4K lines– No search for Metadata such as attributes
• Performance concerns:– Limited to 10,000 revisions by default– Memory and CPU consumption– But: lockless with peeking since 2013.3
What’s Not to Like?
#
Solution: External Indexp4 files/p4 fstat
index
storesearch
Search engine indexes contentStores it in its own database
Users search the index firstIndex returns a depot path
Index and Perforce Servercan live on separate hosts
#
• Lucene– Scalable, high performance indexing– Search Algorithms
• Solr– Stand-alone enterprise search server– HTML Administration interface– Extensible
• Tika– Content analysis tool
Apache Lucene, Solr and Tika
#
• P4Search– Index queue (processing indexing requests)– Search controller (security)– RESTful API (integration into other tools)– UI (simple searches)
• Runs in Jetty
Additional Components Required
#
What We Want to Search For
//depot/Talkhouse/rel1.0/com/walkerbros/common/widget/EBolt.java#10
#
• Changes/Changelists• Branches• Jobs• Users• Workspaces• Depots
What We Don’t Want to Search For
#
• Content• Metadata (whatever that might be)
What We Search By
#
There is Content …
#
• Accessible through p4 files / p4 fstat ...
And There is P4 Metadata
#
And There is Common Metadata
#
• For ordinary folks– p4 edit file– p4 attribute –n tags –v cool file– p4 submit -d “just defined a cool tag on file rev”
• For admins– p4 attribute –f –n tags –v cool file#rev
• Find them with• p4 fstat -Oa -F "attr-tags=cool" //depot/...
There is Even Custom P4 Metadata
#
• File content• P4 Metadata• P4 attributes• And the common Metadata if desired
P4Search Will Index ...
#
Details
#
What We Store in Solr
+ other fields
#
Solr Search Does Know A Lot But…
No ACL’s, no permission
#
• Is query endpoint for users• Has simplified API• Provides P4 authentication (password|ticket)• Filters query results honoring the existing
P4 protections
So A Search Controller
#
Accessing the Index
P4SearchSearch controller
SolrSearch index
#
• External index and protection table?• Solution:
– Use a programmable search engine– Use Perforce protections to filter results
Users need read access to files to be able to search
Security Concerns
#
• Jetty– Solr
• Lucene
• Jetty– P4Search
• Search queue/Indexer• Search controller• RESTful API• UI
Implementation
#
• swarm.workshop.perforce.com/projects/perforce-software-p4search/files/main
Open source – Where To Find
#
• Download from the Workshop• Follow the provided instructions to install• Run two services
– p4search-solr– p4search-jetty
Installation
#
• On first run index your entire depot– You probably don’t want to do this
• On submit index new file revs– change-commit trigger on depot location
• At any time any given change– curl POST --data commit,change#
http://p4search:8080/api/queue/{token}
Ways to Populate the Index
#
• Indexing– With trigger P4D, so ultimately any given client and user
• Searching– P4Search UI– Piper– Commons– Custom through P4Search API
Who Uses P4Search Today
#
• Deep dive after learning Lucene/Solr• Starting point
p4search/solr/example/solr/collection1/conf– schema.xml– solrconfig.xml
Tweaking P4Search
#
DEMO