Date post: | 19-Jan-2017 |
Category: |
Technology |
Upload: | shivram-mani |
View: | 133 times |
Download: | 0 times |
Shivram Mani ( Pivotal)
PXF A Unified Access Framework for
HDFS datasets
Agenda
● Motivations● PXF Introduction● Architecture/Design● Developer View● Usage/Plugins● Value Proposition to new applications● Whats coming
Motivations: SQL on Hadoop
RDBMS
?
various formats, storages supported on HDFS
● ANSI SQL● Cost based optimizer● Transactions● ...
Foreign Tables!
PXF is an extension framework that does the following
● Uniform tabular view to heterogeneous data sources
● Exploits parallelism for data access
● Pluggable framework for custom connectors
● Provides built-in connectors for accessing data in HDFS files, Hive/HBase tables, etc
What is PXF ?
PXF Communication
Apache Tomcat
PXF WebappREST API
Java API
libhdfs3 (written in C) segments
External Tables
Native Tables
HTTP, port: 51200
Java API
Java/Thrift
Deployment Architecture
HAWQMaster Node NN
pxf
HBase Master
DN4
pxf
HAWQseg4
DN1
pxf
HAWQseg1
HBase Region Server1
DN2
pxf
HAWQseg2
HBase Region Server2
DN3
pxf
HAWQseg3
HBase Region Server3
* PXF needs to be installed on all DN* PXF is recommended to be installed on NN
PXF Components
Fragmenter Splits dataset into partitionsReturns locations of each partition
Accessor Understand and read/write the fragmentReturn records
Resolver Convert records to a consumable format (Data Types)
Architecture - Read Data Flow
HAWQMaster Node NN
pxf
DN1
pxf
HAWQseg1
select * from ext_table0
getFragments() API
pxf://<location>:<port>/<path>
1
Fragments (JSON)2
7
3Split mapping(fragment -> segment)
DN1
pxf
HAWQseg1
DN1
pxf
HAWQseg1Query dispatched to Segment 1,2,3… (Interconnect)
5
Read() REST
6 records
8
query result
Records (stream)
Fragmenter
Resolver
Accessor
4
Read Data Flow - Take 2
PXF Developer View
PXF Usage
Built-in with Plugins
● HDFS
● Hive
● HBase
Community (https://bintray.com/big-data/maven/pxf-plugins/view )
● Cassandra
● Accumulo
● Redis
● ...
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] )LOCATION ('pxf://host[:port]/path-to-data?PROFILE=<profile-name> [&custom-option=value...]')FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>);
PXF Hdfs PluginFragment - Splits (blocks)
● Support Read : multiple formats ->
● Support Write to Sequence Files
● Chunked Read Optimization
● Support for stats
Profile Description
HdfsTextSimple Read delimited single line records (plain text)
HdfsTextMulti Read delimited multiline records (plain text)
Avro Read avro records
JSON Supports simple/pretty printed JSON with
field projection
PXF Hive PluginFragment - Splits of the file stored in table
● Text based
● SequenceFile
● RCFile
● ORCFile
● Parquet
● Avro
*Complex types are converted to text
Partition Filtering
Metadata API *
Profile Description
Hive Read all Hive tables (all types)
HiveRC Hive tables stored in RC (serialized with
ColumnarSerDe/LazyBinaryColumnarSerDe)
HiveText Faster access for Hive tables stored as Text
PXF HBase PluginFragment - Regions
● Read Only. Uses Profile ‘Hbase’
● Filter push down to Hbase scanner
○ (Operators: EQ, NE, LT, GT, LE, GE & AND)
● Direct Mapping
● Indirect Mapping
○ Lookup table - pxflookup
○ Maps attribute name to hbase <cf:qualififer>
(row key) mapping
sales id=cf1:saleid
sales cmts-cf8:comments
● Abstracts application from external Datasource/APIs/Versions
● Focus on one data layout
● Off the shelf support for various datasources
● Extensibility. Ease of supporting custom datasources
● Provides means for Filter push down
● Dataset statistics for performance optimization
Value Proposition of PXF
● Using FDW callback functions that will interact with PXF.
PXF with Postgres
Apache Tomcat
PXF WebappREST API Java API
HTTP, port: 51200
Java API
Java/Thrift
FDW
● HA
● Schema Auto Discovery (Metadata)
● Support for more dataset statistics
● Time series data optimization
● More plugins (Gemfire, Solr, etc)
● Additional Filter push down support
● Custom Output Format
Whats coming
cwiki.apache.org/confluence/display/HAWQ/PXFhttp://hawq.incubator.apache.org/docs/pxf/javadoc
github.com/apache/incubator-hawq/tree/master/pxf
issues.apache.org/jira/browse/HAWQ Component = PXF
ContributionFeature Areas Custom Plugins
(storage, formats)Push Down
FiltersCustom
Applications
Documentation Wiki/Docs
Code / Review Github(Apache)
Join Discussion/Ask Questions Apache DLs [email protected]@hawq.incubator.apache.org
Github(Field) github.com/Pivotal-Field-Engineering/pxf-field
thank you !