+ All Categories
Home > Technology > Ingesting and Manipulating Data with JavaScript

Ingesting and Manipulating Data with JavaScript

Date post: 21-Jan-2018
Category:
Upload: lucidworks
View: 214 times
Download: 4 times
Share this document with a friend
41
Transcript
Page 1: Ingesting and Manipulating Data with JavaScript
Page 2: Ingesting and Manipulating Data with JavaScript

Ingesting and Manipulating Data

with Javascript

Page 3: Ingesting and Manipulating Data with JavaScript

Produces the world’s largest open source user conference dedicated to Lucene/Solr

Lucidworks is the primary sponsor of the Apache Solr project

Employs over 40% of the active committers on the Solr project

Contributes over 70% of Solr's open source codebase

40%

70%

Based in San Francisco

Offices in Bangalore, Bangkok, New York City, Raleigh, London

Over 300 customers across the Fortune 1000

Fusion, a Solr-powered platform for search-driven apps

Page 4: Ingesting and Manipulating Data with JavaScript
Page 5: Ingesting and Manipulating Data with JavaScript

An optimized search experience

for every user using relevance

boosting and machine learning.

Create custom search and

discovery applications in

minutes.

Highly scalable search

engine and NoSQL

datastore that gives you

instant access to all your

data.

Lucidworks Fusion product suite

Page 6: Ingesting and Manipulating Data with JavaScript

• 50+ connectors

• Full SQL compatibility

• End-to-end security

• Multi-dimensional real-time

ingestion

• Administration and analytics

Page 7: Ingesting and Manipulating Data with JavaScript

• Personalized

recommendations

• Machine learning out-of-the-

box

• Powerful recommenders

and classifiers

• Predictive search

• Point-and-click relevancy

tuning

Page 8: Ingesting and Manipulating Data with JavaScript

• Quick prototyping

• Fine-grained security

• Stateless architecture

• Support 25+ data platforms

• Full library of components

• Pre-tested reusable

modules

Page 9: Ingesting and Manipulating Data with JavaScript

Fusion Pipelines

Page 10: Ingesting and Manipulating Data with JavaScript
Page 11: Ingesting and Manipulating Data with JavaScript
Page 12: Ingesting and Manipulating Data with JavaScript

Index Pipeline

Page 13: Ingesting and Manipulating Data with JavaScript

Fusion Query Pipeline

Page 14: Ingesting and Manipulating Data with JavaScript

Javascript Index Pipeline Stage

Page 15: Ingesting and Manipulating Data with JavaScript

This is a

Fusion

Javascript

Pipeline stage

Page 16: Ingesting and Manipulating Data with JavaScript

Why Javascript?

Page 17: Ingesting and Manipulating Data with JavaScript

Javascript vs

Pipeline Stage

o Existential discussion at Lucidworks

o My opinion only…

Page 18: Ingesting and Manipulating Data with JavaScript

Pipeline stages

are good for…

Page 19: Ingesting and Manipulating Data with JavaScript

And…

Page 20: Ingesting and Manipulating Data with JavaScript

Not…

o 20 discrete operations I have to do to convert one

field…

o Conditional operations (if this then this, otherwise

do this other thing)

o Canned functionality you have elsewhere.

o I don’t want to do anything that feels like

programming in form fields…

Page 21: Ingesting and Manipulating Data with JavaScript

com.lucidworks.apollo.common.pipeline.PipelineDocument

Page 22: Ingesting and Manipulating Data with JavaScript

PipelineDocument Highlights

https://doc.lucidworks.com/fusion-pipeline-

javadocs/3.1/com/lucidworks/apollo/common/pipeline/PipelineDocument.html

PipelineDocument{

addField(name, value);

getAllFieldNames(); //include internal use names

getFieldNames(); //exclude internal use names

getFirstField(name);

getLastField(name);

removeFields(name);

setField(name, value);

...

}

Page 23: Ingesting and Manipulating Data with JavaScript

The Javascript Function

Page 24: Ingesting and Manipulating Data with JavaScript

Basic

function (doc) {

// do really important things.

return doc;

}

Page 25: Ingesting and Manipulating Data with JavaScript

With Context

function (doc, ctx) {

// do really important things.

return doc;

}

https://doc.lucidworks.com/fusion-pipeline-

javadocs/3.1/com/lucidworks/apollo/pipeline/Context.html

Page 26: Ingesting and Manipulating Data with JavaScript

With Collection

function (doc, ctx, collection) {

// do really important things.

return doc;

}

Page 27: Ingesting and Manipulating Data with JavaScript

With solrServer

function (doc, ctx, collection, solrServer) {

// do really important things.

// solrServer can index/query things

return doc;

}

https://doc.lucidworks.com/fusion-pipeline-

javadocs/3.1/com/lucidworks/apollo/component/

BufferingSolrServer.html

Page 28: Ingesting and Manipulating Data with JavaScript

With

solrServerFactory

aka

SolrClientFactory

function (doc, ctx, collection, solrServer,

solrServerFactory) {

// do really important things.

// solrServerFactory look up other collections

return doc;

}

https://doc.lucidworks.com/fusion-pipeline-

javadocs/3.1/com/lucidworks/apollo/component/

SolrClientFactory.html

Page 29: Ingesting and Manipulating Data with JavaScript

Common Problems

Page 30: Ingesting and Manipulating Data with JavaScript

Add a Field

function (doc) {

// replace any values currently in

the field with new ones

doc.setField('some-new-field',

'some field value');

// for multi value fields this will

combine values with old values if

there are any, otherwise it will add a

new field.

doc.addField('some-new-field',

'some field value');

return doc;

}

Page 31: Ingesting and Manipulating Data with JavaScript

Glue Two

Fields

function(doc) {

var value = "";

if (doc.hasField("Actor1Geo_Lat") &&

doc.hasField("Actor1Geo_Long")) {

value =

doc.getFirstFieldValue("Actor1Geo_Lat") + "," +

doc.getFirstFieldValue("Actor1Geo_Long");

doc.addField("Actor1Geo_p", value);

}

return doc;

}

Page 32: Ingesting and Manipulating Data with JavaScript

Iterate through the fields

function (doc) {

// list of doc fields to iterate over

var fields = doc.getFieldNames().toArray();

for (var i=0;i < fields.length;i++) {

var fieldName = fields[i];

var fieldValue = doc.getFirstFieldValue(fieldName);

logger.info("field name:" +fieldName + ", field name: " +

fieldValue);

}

}

return doc;

}

Page 33: Ingesting and Manipulating Data with JavaScript

Logging

logger.info("field name:" +fieldName + ", field name: " +

fieldValue);

fusion/3.1.x/var/log/connectors/connectors.log

Page 34: Ingesting and Manipulating Data with JavaScript

Preview a field

function(doc){

if (doc.getId() != null) {

var fromField = "body_t";

var toField = "preview_t";

var value =

doc.getFirstFieldValue(fromField);

var pattern = /\n|\t/g;

value = value.replace(pattern, " ");

value = value ? value : "";

}

var length = value.length < 500 ?

value.length : 500;

value = value.substr(0,length);

doc.addField(toField, value);

}

return doc;

}

Page 35: Ingesting and Manipulating Data with JavaScript

Bust up a

document

function (doc) {

var field = doc.getFieldValues('price');

var id = doc.getId();

var newDocs = [];

for (i = 0; i < field.size(); i++) {

newDocs.push( { 'id' : id+'-'+i,

'fields' : [ {'name' : 'subject', 'value' :

field.get(i) } ] } );

}

return newDocs;

}

Page 36: Ingesting and Manipulating Data with JavaScript

Look up in another collection

function doWork(doc, ctx, collection,

solrServer, solrServerFactory) {

var imports = new JavaImporter(

org.apache.solr.client.solrj.SolrQuery,

org.apache.solr.client.solrj.util.ClientUtils);

with(imports) {

var sku = doc.getFirstFieldValue("sku");

if (!doc.hasField("mentions")) {

var mentions = ""

var productsSolr = solrServerFactory.getSolrServer("products");

Page 37: Ingesting and Manipulating Data with JavaScript

Look up in another collection

if( productsSolr != null ){

var q = "sku:"+sku;

var query = new SolrQuery();

query.setRows(100);

query.setQuery(q);

var res = productsSolr.query(query);

mentions = res.getResults().size();

doc.addField("mentions",mentions);

}

}

}

Page 38: Ingesting and Manipulating Data with JavaScript

Reject a

document

function (doc) {

if (doc.hasValue('foo')) {

return null; // stop this document from being indexed.

}

return doc;

}

Page 39: Ingesting and Manipulating Data with JavaScript

Java +

Javascript

var ArrayList = Java.type("java.util.ArrayList");

var a = new ArrayList;

Page 40: Ingesting and Manipulating Data with JavaScript

Next Steps

o Grab Fusion https://lucidworks.com/download/

o Ingest some data

o Create a JavaScript pipeline stage and manipulate the data

o https://doc.lucidworks.com/fusion/latest/Indexing_Data/Custom-JavaScript-Indexing-

Stages.html

o Attend a training

o Get support

Page 41: Ingesting and Manipulating Data with JavaScript

Thank You


Recommended