+ All Categories
Home > Technology > A Novel methodology for handling Document Level Security in Search Based Applications

A Novel methodology for handling Document Level Security in Search Based Applications

Date post: 11-May-2015
Category:
Upload: lucenerevolution
View: 1,853 times
Download: 1 times
Share this document with a friend
Description:
Presented by Rajini Maski, Senior Software Engineer, Happiest Minds Technologies An important problem with document-search in any content management system (CMS) is the handling of permission-based search requests for each user. In this session, we present an algorithm and framework that allows the Search Engine to plainly index both public and privileged documents without any early binding overhead—thus enforcing document-level security policies only at the time of search. With our late-binding approach for ACL (access control lists) and some custom components, we have achieved reduction in search-time overhead. We will also discuss the order of complexity and execution time for the search overhead.
Popular Tags:
34
Rajani Maski - Senior Software Engineer DOCUMENT LEVEL SECURITY IN SEARCH BASED APPLICATIONS
Transcript
Page 1: A Novel methodology for handling Document Level Security in Search Based Applications

Rajani Maski - Senior Software Engineer

DOCUMENT LEVEL SECURITY IN SEARCH BASED APPLICATIONS

Page 2: A Novel methodology for handling Document Level Security in Search Based Applications

Introduction to Search Based Applications

Requirement Analysis of Document Level Security

Access Control Lists

Multiple Solutions

Summary

Agenda

Page 3: A Novel methodology for handling Document Level Security in Search Based Applications

Search Based Applications are software application in which Search Engine platform is used as the core infrastructure for information accessing and reporting.

E-commerce web applications or content management systems are the types of search based application.

Search Based Applications

Page 4: A Novel methodology for handling Document Level Security in Search Based Applications

Authentication

• User is authenticated before providing access to the application

Application

• Presents with full fledge User Interface

• Perform user operations such as upload documents, send emails, search, etc.

Unified Data Layer

• Search Server

• Indexes content across the sources

• Retrieves data at very high speed.

Data Storage

• Volume of data sources from different repositories

Overview of Search Based System

Unified Data Layer

Search Based Application Server

Archives Documents

User Authentication System

Emails File

Server

Page 5: A Novel methodology for handling Document Level Security in Search Based Applications

So Far, So Good!

What’s the problem?

Page 6: A Novel methodology for handling Document Level Security in Search Based Applications

Unified Data Layer

Search Based Application

Archives Documents

User Authentication System

Emails

Common Access To Unified data Layer

How is this a threat?

File Servers

Page 7: A Novel methodology for handling Document Level Security in Search Based Applications

User A : - Logs in to application. - Performs a search operation

- With the key words such as ‘Pay Slips’, ‘Personal’ or ‘appraisal’.

Sample results demonstrated for “appraisal”

Consider a Sample Use Case

Page 8: A Novel methodology for handling Document Level Security in Search Based Applications

Un Authorized Results

Search Results

Page 9: A Novel methodology for handling Document Level Security in Search Based Applications

Relevant Search Results : [Correct]

- User A was returned with relevant search results based on his search query; such as exact matches, more like this key words, synonym key words, etc.

Unauthorized Search results: [Wrong]

- Few of the search results retrieved were the documents to which he was not authorized to view.

Threats:

• Exposure to other users’ confidential documents

• Access to Unauthorized information.

Observations

How are we doing with this?

Page 10: A Novel methodology for handling Document Level Security in Search Based Applications

• To develop a search platform where every user has access to only those documents to which he/she is authorized to.

• To ensure that all the confidential data uploaded is not globally searchable unless it is intended to be globally accessible.

Problem Definition

How can we achieve this?

Page 11: A Novel methodology for handling Document Level Security in Search Based Applications

Solution

Maintaining Access Control List mapped to each document object.

Access Control

List?

Page 12: A Novel methodology for handling Document Level Security in Search Based Applications

• Access Controls are Security features that control how users [subject] and documents[object] communicate and interact with one another.

• Subject: An active entity[User] that

requests access to an object[Document].

• Object: A passive entity[Document] that contains information

Access Control List

Document

Object Subject

Interaction

Page 13: A Novel methodology for handling Document Level Security in Search Based Applications

Let’s first understand the data model of search engine.

How are documents stored in search engine?

Document Oriented Approach.

Data Model

Alec_1167 {_id:”1167”,

Name:”Ale C”, Agent:”Miller”

Place:”NY, NJ, CA”, Units:570}

3424 Kiwi reds 340

5612 Reh Mo’s 664

1167 Alec Miller 570

1167 2 NJ

1167 3 CA

1167 1 NY

Page 14: A Novel methodology for handling Document Level Security in Search Based Applications

• User A uploads a document into the system

• Metadata and Text Extraction

• Convert it to a flat structure

• Input it to Search Engine

Indexing and Storing Document Object

Document

Metadata

Extract

Search Engine

Document Saved

Page 15: A Novel methodology for handling Document Level Security in Search Based Applications

• We missed to capture something!

• What did we miss?

– Capturing of User information for each document!

• Who uploaded the document

• To whom did the user share with?

• How do we maintain this information?

– Access control list to each document object.

Document Metadata Extract

Search Engine

Document Saved

Page 16: A Novel methodology for handling Document Level Security in Search Based Applications

• Access Control Lists for each user.

• At the time of search,

– Retrieve search results,

– And perform a check on each document for user’s authorization and

– Finally return the results.

Conventional Solution

Search Engine

Security Filter Each Document

Return Results to User

Page 17: A Novel methodology for handling Document Level Security in Search Based Applications

Multiple Solutions

Page 18: A Novel methodology for handling Document Level Security in Search Based Applications

Solutions are dependent on the Access Control Models we choose.

Two important types of Access Control Models:

1. Non-Discretionary Access Control(Role Based)

2. Discretionary Access Control (DAC)

Access Control Models

Page 19: A Novel methodology for handling Document Level Security in Search Based Applications

Definition:

• Non-Discretionary ACL uses a administered set of rules to determine how Users and Documents interact.

• It is referred to as nondiscretionary because assigning a user to a role is unavoidable

1. Non-Discretionary (Role Based) Sales

Super User

Manager

Sales Documents

Marketing Documents

Engineering Documents

Admin Documents

Page 20: A Novel methodology for handling Document Level Security in Search Based Applications

System that has,

• Roles defined during design time and Static ACL set to each document .

• We choose, “Early Binding with ACL bound to Document Objects”

In such systems,

• Document objects will include a multi-valued Role-id field that will contain list of role-Ids which has access to the document.

Solution For Role Based ACL - Type 1

Documents with ACLs

Index Time

Document 1 role-Ids: [“1”, “2”, “3”]

Document 1 role-Ids: [“1”, “2”, “3”]

Document 2 “role-Ids:” [ “2”, “3”]

Page 21: A Novel methodology for handling Document Level Security in Search Based Applications

Continued…

At the time of search,

• User Search Query should be appended with user’s Role Id.

• Solr’s Filter Query feature and it’s caching techniques gives the most efficient solution for

such ACL Techniques. This approach is called as

‘Early Binding’ approach.

Query Request

Solr J Client

Query Response

User Role-Id

Early Binding

Page 22: A Novel methodology for handling Document Level Security in Search Based Applications

Systems that has,

• Roles which often change; data is normalized by segregating access control information into different tables.

• This approach is called as ‘Early Binding with Externalized ACL’

In such systems:

• Role-Ids are not attached to the document object.

• Instead they are stored into different tables with foreign key relation.

• Use Pseudo Joins at the time of Search

Solution For Role Based ACL - Type 2

Document1 D1

Doc ID Role-Ids

D1 1, 2, 3, N

Page 23: A Novel methodology for handling Document Level Security in Search Based Applications

Definition:

• Discretionary – Document owner has the authority to control access of the document.

• A system that enables the document owner to specify set of Users with access to a set of

documents

2. Discretionary Access Control

Specifies Users/groups who can Access

Owner Object

Page 24: A Novel methodology for handling Document Level Security in Search Based Applications

System that has

• Frequent changes in ACL

• ACL is defined for each user and a document,

• We choose ‘Late Binding Approach with Externalized ACL’

In such systems,

• ACL is a 2D-matrix with users and documents along its rows and columns

Solution for Discretionary ACL - Type 1

Users Doc1 Doc2 Doc N

User A 1 1 1

User B 0 1 1

User M

Encode Values – 0 :No access, 1 : Access N : Number of Users, M – Number of Documents

Page 25: A Novel methodology for handling Document Level Security in Search Based Applications

For implementation, the ACL matrix can be represented as a array of bits.

This compact representation improves search efficiency and memory over head.

Continued…

Users Doc1 Doc2 Doc N

UserA 1 1 1

UserB 0 1 1

111

011

[1]

[2]

Page 26: A Novel methodology for handling Document Level Security in Search Based Applications

Consider,

• Maximum documents in the Search systems is 5 with document ids:{1,2, 3, 4, 5}

• Maximum Users are 2 { Id : 1,2 }

• User 1 has access to document {1, 2, 3}

• User 2 has access to Document {1,2,3,4,5}

• ACL matrix and array representation:

User 1 2 3 4 5

1 1 1 1 0 0

2 1 1 1 1 1

11100

11111

[1]

[2]

1 1 1 1 1

1 1 1 0 0

Example

Page 27: A Novel methodology for handling Document Level Security in Search Based Applications

Solution 1

• Solr has a Post Filter Interface that can be extended to develop a Custom Plugin.

• Interface has a method called ‘collect()’

• Collect() has a list of documents matched to the user’s search query.

– Iterate through the list, get the document-Id from the Field Cache and apply ACL using bit array .

• Code Snippets: https://gist.github.com/rajanim/7197154

Solr Implementation

1 1 1 0 0

Page 28: A Novel methodology for handling Document Level Security in Search Based Applications

Solution 2

• Using BitSet utilities

• Get the bitset of documents matched by the search query from Search Engine

• Get the User ACL bitset instance

• Obtain the intersection of the two bitsets [intersect(bitset other)]

Other Implementation Solution

1 1 1 0 0 1 1 1 0 0

1 1 1 0 0

Page 29: A Novel methodology for handling Document Level Security in Search Based Applications

• Discretionary ACL systems have static ACL

• We choose, “Early Binding with ACL bound to Document

Objects”

In such systems,

• Document objects will include a multi-valued user-id field that contains a list of user-ids with access to the document.

• The user-id field has to be indexed.

Solution for Discretionary ACL - Type 2

Page 30: A Novel methodology for handling Document Level Security in Search Based Applications

• This solution requires the ACL and document data to be de-normalized to flat structure.

Continued…

Index Time Search Time

Query Request With User ID

Solr J Client

Query Response

Parse Document

Add List of Users Who has access

Page 31: A Novel methodology for handling Document Level Security in Search Based Applications

Summary

Page 32: A Novel methodology for handling Document Level Security in Search Based Applications

• Discretionary ACL with late binding solution is a complex model and it requires

extensive verification

• Leverage Solr’s smart caching capability

• Since ACL always adds an additional over head it has to be optimized to provide minimum delay.

Summary

Page 33: A Novel methodology for handling Document Level Security in Search Based Applications

• searchhub.org/2012/02/22/custom-security-filtering-in-solr/

• Secure Search in Enterprise Webs: Tradeoffs in Efficient Implementation for Document Level Security By Peter Bailey, David Hawking, Brett Matson

• All in One Book (Shon Harris, 2005)

• http://www.searchtechnologies.com/enterprise-search-document-level-security.html

• http://alvinalexander.com/java/jwarehouse/lucene/src/test/org/apache/lucene/search/TestFilteredQuery.java.shtml

• https://github.com/Zvents/score_stats_component/blob/master/src/main/java/com/zvents/solr/components/ScoreStatsPostFilter.java

References:

Page 34: A Novel methodology for handling Document Level Security in Search Based Applications

Thank You


Recommended