Schema Agnostic Indexing with Azure DocumentDBDEEKSHA SINGH: 2641679
YASH THAKKAR: 2642764
ABSTRACT
Azure DocumentDB is Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale.
Automatic indexing of documents without requiring a schema or secondary indices.
Operates within extremely frugal resource budget .
OUTLINE
DocumentDB
DocumentDB Capabilities
Resource Model
System Topology
Design Goals
Schema Agnostic Indexing
Logical Index Organization
When to not to use and when to use DocumentDB
INTRODUCTION
DocumentDB is based on the JSON data model and JavaScript language directly within its database engine.
The indexing subsystem needs to support:
Automatic indexing of documents
DocumentDB’s query language
Real time, consistent queries
Multi-tenancy under extremely frugal resource budgets
Predictable Performance guarantees
DOCUMENTDB CAPABILITIES
DocumentDB query language supports rich relational & hierarchical queries.
By default, the database engine automatically indexes all documents without requiring schema or secondary indexes from developers.
Transactional execution of application logic.
DocumentDB offers well defined consistency levels for developers.
All machine and resource management is abstracted from users.
RESOURCE MODEL
A tenant of DocumentDB starts by provisioning a database account.
A DocumentDB database manages a set of entities: users, permissions and collections-referred to as resources.
Collection is a schema–agnostic container of arbitrary user generated documents.
Developers can interact with resources.
Tenants can elastically scale a resource by simply creating new resources which get placed across resource partition.
SYSTEM TOPOLOGY
Deployed worldwide across multiple Azure regions.
Managed and deployed on clusters of machines, each with dedicated local SSDs(to provide durability and high availability).
DocumentDB database engine consist of following components:
RSM for coordination
JavaScript language runtime
Query processor
Storage and indexing subsystems
DESIGN GOALS FOR INDEXING
Automatic Indexing
Configurable storage/performance tradeoffs
Efficient, rich hierarchical and relational queries
Consistent queries in face of sustained volume of document
Multi-tenancy
SCHEMA AGNOSTIC INDEXING
No Schema, No Problem!
Documents as Trees
Index as a Document
DocumentDB Queries
No assumptions about the documents and allows documents to vary in schema.
To blur the boundary between the schema of JSON documents and their instance values
• Every path in document tree is indexed.
• Each update of a document leads to update of the structure of index.
• Developers can query DocumentDB collections using queries written in SQL and JavaScript.
• DocumentDB Query IL
QUERY IL
Designed to exploit JSON and JavaScript integration
Rooted in JavaScript type system
Follows JavaScript language semantics for expression evaluation & function invocation
Designed to be target o translation from multiple query language frontends
LOGICAL INDEX ORGANIZATION
The index is the union of all documents and is also represented as a tree.
Each node of the index tree contains a list of document ids corresponding to the documents containing the given label.
WHEN NOT TO USE DOCUMENTDB
Consider Azure DocumentDB
When you need:
To build a new web and mobile cloud-based applications
Rapid development and high-scalability requirements
Query and processing of user and device generated data
To run a document store in virtual machines
A managed service model
REFERENCES
AzureDocumentDB Documentation: http://azure.Microsoft.com
Javascript Object Notation: http://ietf.org
Google Cloud Datastore: http://cloud.google.com/datastore/
QUESTIONS?