+ All Categories
Home > Documents > Databricks Product Datasheet...Databricks: Product Datasheet Databricks offers a cloud platform...

Databricks Product Datasheet...Databricks: Product Datasheet Databricks offers a cloud platform...

Date post: 11-May-2020
Category:
Upload: others
View: 29 times
Download: 0 times
Share this document with a friend
4
Databricks Product Datasheet
Transcript
Page 1: Databricks Product Datasheet...Databricks: Product Datasheet Databricks offers a cloud platform powered by Apache Spark, that makes it easy to turn data into value, from ingest to

DatabricksProductDatasheet

Page 2: Databricks Product Datasheet...Databricks: Product Datasheet Databricks offers a cloud platform powered by Apache Spark, that makes it easy to turn data into value, from ingest to

2

Databricks: Product Datasheet

Databricks offers a cloud platform powered by Apache® Spark™, that makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools. A complete solution for data scientists and engineers.

NotebooksSupported Languages Databricks enables commands across language in cells.

• Scala• Python• Spark SQL• R• Markdown

Collaboration • Multiple users can collaborate on the same workbook concurrently• Ability to lock notebooks to prevent accidental overwrites• Ability to provide comments within notebooks for contextual feedback • Ability to review notebook history and sync with github.

Visualizations • Embedded charts including bar, scatterplot, box plot, pivot, map, etc. • Visualize ML models using display() command• d3• ggplot• matplotlib

Inline Documentation • Markdown• HTML• CSS• Javascript

Libraries • Python Libraries including scikit-learn, numby, netlib-blas, pandas, etc.• Java-based libraries• Scala libraries• Custom libraries for machine learning, graph processing, statistics, linear algebra, etc. can be

imported just like any other IDE• Both UI and REST API allow you to manage libraries on a per-cluster or account-wide basis.• R packages (many are installed including caret, glmnet, splines, randomForest, dplyr)

Databricks Guide Every release ships with an up-to-date Databricks Guide that provides many examples of new features and common use cases collected over the many years of Databricks employee experience as well as the enormous Spark Community

One Click Publishing from Notebooks Create shareable dashboards from notebooks with a single click. One notebook can be tailored into multiple dashboard views.

Continuous Real-Time Updates Publish dashboards where the data and results are automatically updated periodically.

Parameterized Dashboards Provide drop-downs in the dashboards to enable changing input parameters to dashboard values.

Export Notebooks • DBC Archive• Source File• iPython Notebook (for Python Only)• Export HTML

Spark progress reporting and Spark UI integration

View the real time progress of all the jobs and stages of a Spark command directly from the progress bar of a command run within a notebook.

Page 3: Databricks Product Datasheet...Databricks: Product Datasheet Databricks offers a cloud platform powered by Apache Spark, that makes it easy to turn data into value, from ingest to

3

Databricks: Product Datasheet

Databricks offers a cloud platform powered by Apache Spark, that makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools. A complete solution for data scientists and engineers.

InfrastructureSupported Compute Environments Amazon Web Service EC2, including memory-optimized, compute-optimized, and GPU

accelerated instance types. See databricks.com/product/pricing/instance-types for the full list.

Minimum Cluster Size 60GB (2 nodes: 1 master, 1 workers)

Maximum Cluster Size Contact us for anything greater than 1PB

Performance Optimizations • AWS optimized configurations• Data Caching via Cluster RAM• Data Caching via worker local SSD• Encrypted AWS Elastic Block Storage (EBS)

Cluster Resource Management • Simplified Cluster Administration• Simplified Cluster Configuration: Specify memory requirements and appropriate instances

are automatically configured and started• Simplified Cluster Management• Automatic OS and software patching• Early access to open source Apache Spark bug fixes and performance improvements• Pre-configured and tuned for performance• Scaled up and down clusters as desired to help control costs

Jobs • Simplified Jobs feature (Open Source spark-submit within a simplified UI that does not require compilation)

• Simplified Jobs scheduling with the ability to configure similar to Cron• Includes errors handling, retries, and timeout• Job state change notifications via email• Execute jobs for production pipelines on a specified schedule directly from a notebook or

dashboard.

Security • Completion of SOC 2 Type 1 certification• HIPAA-Compliant Offering Available• AWS IAM Role Integration• User management including role assignments and super-user privileges• Access Control Lists• Encryption at Rest• Single Sign-On• Cluster ACLs• Audit logs

Page 4: Databricks Product Datasheet...Databricks: Product Datasheet Databricks offers a cloud platform powered by Apache Spark, that makes it easy to turn data into value, from ingest to

4

Databricks: Product Datasheet

Databricks offers a cloud platform powered by Apache Spark, that makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools. A complete solution for data scientists and engineers.

IntegrationSupported BI Integrations • JDBC / ODBC

• Microstrategy• Pantera• Pentaho• Qlik• Tableau• TIBCO / Jaspersoft• Zoomdata• Beeline (Open Source, remote Hive command line)

Supported Data Source Integrations • SQL stores (JDBC/ODBC)• NoSQL stores (Cassandra, HBase)• Columnar stores (Redshift, Vertica)• Document-oriented stores (MongoDB)• Hadoop and Hive including custom UDFs, UDAFs, and UDTs• File stores (S3, AWS/EFS coming soon)• File formats (CSV, JSON, Parquet, SequenceFile, Avro, RCFile, ORCFile)• Search engines (Lucene, SOLR, ElasticSearch)

REST-Based API REST-based API that allows: • Cluster management• Databricks File System (DBFS)• Jobs• Libraries

For more information, please refer to the Databricks Platform Native REST API 2.0https://community.cloud.databricks.com/doc/api/

161207© Databricks 2016. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. Privacy Policy | Terms of Use


Recommended