+ All Categories
Home > Technology > What will be new in Apache NiFi 1.2.0

What will be new in Apache NiFi 1.2.0

Date post: 21-Jan-2018
Category:
Upload: koji-kawamura
View: 2,071 times
Download: 0 times
Share this document with a friend
24
What will be new in Apache NiFi 1.2.0 (was not released as of this writing, but released on May 8th) Apr 28 th , 2017 Apache NiFi committer: Koji Kawamura (ijokarumawak)
Transcript
Page 1: What will be new in Apache NiFi 1.2.0

What will be new inApache NiFi 1.2.0

(was not released as of this writing, but released on May 8th)

Apr 28th, 2017

Apache NiFi committer: Koji Kawamura (ijokarumawak)

Page 2: What will be new in Apache NiFi 1.2.0

Disclaimer: See release note for detail!

• Apache NiFi 1.2.0 has been released on May 8th, 2017!! Please see the official release note for official what’s new.https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.2.0

• The contents in this slide deck are derived from Apache NiFi JIRA issues which is labeled with next release target 1.2.0 and source code available at Github (already merged into master branch), however it does NOT mean these are guaranteed to be released and still are subjects to change.

• The motivation of this presentation is share what have been introduced into the project since the latest Apache NiFi 1.1.2 release.

• The contents are created from information available under Apache NiFiproject, however, the way summarize it is solely done with my personal thoughts and not a consensus built among Apache NiFi community.

Page 3: What will be new in Apache NiFi 1.2.0

Themes

• Schema Registry, Record Reader/Writer

• Multiple version of Nar

• Support EL for various Processor properties

• Performance Improvement

• CDC (Capture Data Change)

• Rollback on Failure

• Flow control

• UX

• Security

Page 4: What will be new in Apache NiFi 1.2.0

Schema Management is a pain…

We need schema to:- Analyze data- Convert one format to another- Validation- … etc- Centralized schema management is needed..

Page 5: What will be new in Apache NiFi 1.2.0

Schema?

Page 6: What will be new in Apache NiFi 1.2.0

Schema Registry

• AvroSchemaRegistry• Provides a service for registering and accessing schemas. You can register a

schema as a dynamic property where 'name' represents the schema name and 'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.

• HortonworksSchemaRegistry• Provides a Schema Registry Service that interacts with a Hortonworks Schema

Registry, available at https://github.com/hortonworks/registry

Page 7: What will be new in Apache NiFi 1.2.0

Record Reader/Writer

Page 8: What will be new in Apache NiFi 1.2.0

Schema Registry and Reader/Writer

• ConvertRecord• Converts records from one data format to another using configured Record Reader

and Record Write Controller Services.

• SplitRecord• Splits up an input FlowFile that is in a record-oriented data format into multiple

smaller FlowFiles

• PutDatabaseRecord• The PutDatabaseRecord processor uses a specified RecordReader to input (possibly

multiple) records from an incoming flow file.

• QueryRecord• Evaluates one or more SQL queries against the contents of a FlowFile. The result of

the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering.

Page 9: What will be new in Apache NiFi 1.2.0

Multiple versions of Nar

MANIFEST.MF

Page 10: What will be new in Apache NiFi 1.2.0

New Processors: CDC

• CaptureChangeMySQL

Retrieves Change Data Capture (CDC) events from a MySQL database. CDC Events include INSERT, UPDATE, DELETE operations. Events are output as individual flow files ordered by the time at which the operation occurred.

Page 11: What will be new in Apache NiFi 1.2.0

How CDC works

Page 12: What will be new in Apache NiFi 1.2.0

Rollback on Failure

• PutSQL

• PutHiveQL

• PutHiveStreaming

• PutDatabaseRecord

Page 13: What will be new in Apache NiFi 1.2.0

PutSQL : default behavior

2 13 PutSQL RDBMS

Input FlowFiles

1 3

1

3 2

success failure

Page 14: What will be new in Apache NiFi 1.2.0

PutSQL : Rollback on Failure

2 13 PutSQL RDBMS

Input FlowFiles

success failure

Rollback!

Modified records are rolled back

No output FlowFile, those will be kept in the input queue

Page 15: What will be new in Apache NiFi 1.2.0

New Processors: Flow Control

• EnforceOrder• Enforces expected ordering of FlowFiles those belong to the same data group.

• Wait• Routes incoming FlowFiles to the 'wait' relationship until a matching release

signal is stored in the distributed cache from a corresponding Notify processor. When a matching release signal is identified, a waiting FlowFile is routed to the 'success' relationship, with attributes copied from the FlowFilethat produced the release signal from the Notify processor.

• Notify• Caches a release signal identifier in the distributed cache, optionally along

with the FlowFile's attributes. Any flow files held at a corresponding Wait processor will be released once this signal in the cache is discovered.

Page 16: What will be new in Apache NiFi 1.2.0

Flow Control using Wait/Notify

Page 17: What will be new in Apache NiFi 1.2.0

New Processors GCS

• DeleteGCSObject

• FetchGCSObject

• ListGCSBucket

• PutGCSObject

Page 18: What will be new in Apache NiFi 1.2.0

New Processors

• ConsumeEWS• Consumes messages from Microsoft Exchange using Exchange Web Services. The

raw-bytes of each received email message are written as contents of the FlowFile

• ConvertExcelToCSVProcessor• Consumes a Microsoft Excel document and converts each worksheet to csv.

• ExtractCCDAAttributes• Extracts information from an Consolidated CDA formatted FlowFile and provides

individual attributes as FlowFile attributes.

• ExtractGrok• Evaluates one or more Grok Expressions against the content of a FlowFile, adding the

results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content

Page 19: What will be new in Apache NiFi 1.2.0

New Processors

• FetchHBaseRow• Fetches a row from an HBase table.

• FuzzyHashContent• Calculates a fuzzy/locality-sensitive hash value for the Content of a FlowFile and puts

that hash value on the FlowFile as an attribute whose name is determined by the <Hash Attribute Name> property.

• ISPEnrichIP• Looks up ISP information for an IP address and adds the information to FlowFile

attributes.

• ListenBeats• Listens for messages sent by libbeat compatible clients (e.g. filebeats, metricbeats,

etc) using Libbeat's 'output.logstash', writing its JSON formatted payload to the content of a FlowFile.This processor replaces the now deprecated ListenLumberjack

Page 20: What will be new in Apache NiFi 1.2.0

New Processors

• ExecuteScript• ClojureScriptEngine is added!

• UpdateCounter• This processor allows users to set specific counters and key points in their flow. It is

useful for debugging and basic counting functions.

• AttributeRollingWindow• Track a Rolling Window based on evaluating an Expression Language expression on

each FlowFile and add that value to the processor's state.

• GetTCP• Connects over TCP to the provided endpoint(s). Received data will be written as

content to the FlowFile

• QueryDatabaseTable• MSSQL2008DatabaseAdapter, MSSQLDatabaseAdapter

Page 21: What will be new in Apache NiFi 1.2.0

Deep Linking!

Page 22: What will be new in Apache NiFi 1.2.0

Align Components!

Page 23: What will be new in Apache NiFi 1.2.0

More context menu at root Process Group

BTW, did you know you can‘Refresh’ flow by ‘Cmd + r’?

Page 24: What will be new in Apache NiFi 1.2.0

… and more!!1.2.0 is discussed to be released soon! Released and available for download!Thank you :)https://nifi.apache.org/download.html


Recommended