Object Replication: Design and Implementation for … · Object Replication: Design and...

Confidential - Do not distribute 1

11

Object Replication: Design and Implementation for the Global EnterpriseJamal Shakra – Sr. Product ManagerYasser Muwakki – Sr. Solution Architect

Developer ConferenceSan Ramon, CA

October 2004


22

AgendaOverviewImplementing Replication Solutions5.3 Replication EnhancementsOpen Discussion


33

Replication DriversShare common content across dispersed geographic locations (e.g. branch offices)

– WAN performance limitations– WAN low reliability

Control scope of content being shared for business or security purposes


44

Common Replication ModelsDistributed contentObject replication


55

Object Replication

Object Replication is the process of duplicating objects and content from one Repository (the source) to another (the target). Each complete process of replication is scheduled and configured using a Replication job. In this way, different objects and content can be replicated between different source and target repositories by scheduling individual Replication Jobs.A Replication job consists of 3 major stages:

– Generate Dump file at Source Repository– Transfer Dump file to Target Repository– Load Dump file at Target Repository

Source Repository

Target Repository


66

Implementing Replication Solutions Business AnalysisTechnical Design and Implementation ApproachFinal SolutionConclusions and Lessons Learned


77

Business Analysis

Gather Customer Functional and Technical RequirementsAnalyze RequirementsChoose An Agreed Upon ApproachIdentify Replication RequirementsUnderstand Replication LimitationsQ&A


88

Customer Functional Requirements

Read-Only Global Intelligence Repository (ROGIR) in Washington DC (DC)ROGIR is accessible to hundreds of posts around the worldPosts submit content to ROGIRPosts can quickly meta-data/fulltext search and retrieve content from ROGIRROGIR contains read-only simple multi-lingual MS-OFFICE and PDF docsPosts can open and shutdown every few months

1. ROGIR (Read-Only Global Intelligence Repository): A collection of Read Only, Finalized documents that is accessible to all Posts. Any Post can commit a document to ROGIR to be shared with all Posts. A document committed to ROGIR at one Post may, however, not be available to other Posts immediately.

2. ROGIR provides immediate Query Results, Content that is no more than 1 month Out of Date, Free-form Text Searches and Meta Data Searchable.

3. If a Post does not have the necessary infrastructure to meet the access and freshness requirements for direct document searching and retrieval of ROGIR or their Administrative Documents stored at the Washington DC Post, then these repositories shall be replicated at the Post and synchronized on a regular basis.

4. The System shall provide a mechanism to allow System Administrators to modify document content and profile data maintained by ROGIR.


99

Customer Technical Requirements

ROGIR’s size is 1 million docs in 1 year - 100GBTotal site contributions = 5000 docs/month or 6% yearly growthTotal Global user base is 10000 users, 1000 concurrentPerformance must be reasonably fastAll Data must have a Backup and Recovery Strategy

1. The Potential Size of ROGIR over 3 years is 1,000,000 documents. The number of folders is 200 folders.)

2. The average number of documents being added or changed per month in Washington DC would be around 5000 documents. This includes contributions by Washington DC users and by all other 50 Post Repositorys.


1010

Analyze RequirementsBackup/Recovery, Capacity PlanningNetwork BottlenecksSystem Administration/SupportDemographics - 50 Hub Posts spanning 5 ContinentsContent SimplicityDocument Types, Users, Groups, Security, Folders, Data DictionaryAuthenticationInternationalization/LocalizationPost Life SpanOverall System Performance

1. Each Post in the African, Asian, South American, Middle East, and East European regions will be independent docbases on limited or no network connectivity to Washington DC. This will be potentially 51 Repositories.

2. ROGIR folder structure would be represented by the following structure and can only be modified by Washington DC: /ROGIR/<DocType>/document

3. Washington DC will maintain the Folders. 4. Network connectivity between Washington DC and the Posts is not

stable. The bandwidth ranges from 32 kbs to partial T1 with latencies over 200ms using dial-up, Satellite, ISDN, etc.

5. All ROGIR objects that are replicas will be READ-ONLY. 6. ROGIR will not have any documents that are related to any other

documents outside of the /ROGIR Cabinet. Virtual Documents will not be in ROGIR and there will not be any annotations or any other document relations.

7. During a closure of a Post, the Post repository will be destroyed. The life term of a Post ranges from a few weeks to a few months. Washington DC Repository is permanent.


1111

Which Technical Approach?

Single RepositorySingle Repository - Distributed ContentSingle Repository - Distributed Web Publishing with email submissionsMulti-Repository - Object Replication with FederationMulti-Repository - Object Replication without Federation


1212

Identify Replication Requirements

What Data must be replicated?How much data needs to be replicated?Where are the replication source and target Repositories?How will Object types, Users, Groups, ACLs, Data Dictionary be coordinated across all Repositories?What operations can be done on the replicas?What are the folder structures?Who can modify and coordinate the folder structures? Do we need to do offline or online replication?


1313

Identify Replication Requirements

Can we implement fast replication?Should we use non-federated or federated replication mode?Who owns the replicated data?What permissions will be applied to replicated data?Do we have any internationalization issues?What resources do I need, where, and when?How many replication jobs will we need?When can we schedule the jobs?Where will the jobs run? (push, pull, or mediator job)


1414

Understand Replication Limitations

Replication occurs in 3 phases, 1 & 3 require connection to Source and TargetReplication Job can only replicate docs to 1 target from 1 sourceReplica objects inherit r_modify_date of their sourceReplication jobs cannot run concurrently when job1 source folder = job2 target folderPoor Network bandwidth/latency = very poor performanceReplicas cannot be deleted manually using DCTM appsLifecycles are replicated

1. OOTB Replication Job needs to establish a connection to both the Source and Target Repositories to allow replication to succeed even during offline or manual replication. Posts that have no network connectivity to Washington DC will not be able to participate in OOTB Documentum Replication Services.

2. A replication job can only replicate documents to one target Repository from one source Repository. 3. Replica objects “inherit” the modification date of their corresponding source objects. Hence if the source was

modified on April 1, and the replication job ran on April 10, the modification time on the replica stills says April 1 even though it was created on April 10.

4. Replication is occurs in 3 phases. For each phase, the job requires communication with both the Source and Target Repositories except in phase 2 if manual transfer is performed.

1. Phase 1 - On the source, replication performs a cleanup, dump, and delete synchronization. 2. Phase 2 – Transferring data to target repository (could be manual or automatic).3. Phase 3 - On the target, replication performs a filter of the dump, delete synchronization, a load of

the filtered output, and cleanup. 5. If a folder is a target folder in one replication job and a source folder in another replication job, never

schedule the two jobs to run concurrently or overlapping. However, if the jobs share the same source folder but have different target folders the jobs can run in the same time window.

6. Replication performance is affected by network latency (combination of network traffic and network speed). For adequate performance, a ping between any two participating sites should take 200 milliseconds or less. See Appendix B for more details on performance.

7. Fast Incremental Replication will only dump the objects and any associated ACL, filestore, cabinet, content, format, folder trees, Lifecycles, Lifecycle Actions and Procedures that are new or have been changed since the last replication job.

8. If an object is replicated, then the dump process will also dump all the folders the object is linked to including the folder trees, even if the folders are replicas. It is possible that if ROGIR has a complex folder structure, that they will see a lot of folders replicated because of the objects that are modified and where those objects are linked. However, if the folders were dumped on a previous run of the replication job, they will not be dumped again if there were no changes to the folders.

9. All the following operations can be executed on the replicas with no connectivity to the source Repository: Copy, View, Link to a folder or cabinet, Delete links, move the object’s storage location, Change the object’s indexing status or conduct full-text searches on the object, Replace the replica’s ACL(non-federated mode), Create renditions, Annotate, and include the object in a virtual document. However, security will only allow users to search and view, copy ROGIR documents into their SWP or export them into their local disks.

10. Replicas in ROGIR cannot be deleted using the Documentum Client applications. Only deleting the source, and running the replication job will delete a replica. However, if the replica is linked to other folders, the links can be deleted. However, security will prevent users from deleting ROGIR documents or folders.


1515

Understand Replication Limitations

Do not move a replicated folderCannot replicate a specific versionAll repositories must use ASCII where appropriateDisk capacity for dump, filtered dump, and load Do not change job settings during job runtimeMulti-Content Server repositories - Replication must use only 1 Content Server for all phasesRepository Recovery from backup can cause loss of data integrity

1. It is possible, but do not move a replicated folder or document in a target Repository to another location. It is acceptable to link a replicated folder to another, local folder or cabinet. However, do not remove the primary link to the target folder. If the document or folder needs to be moved, then move it in the source Repository only. However, security will prevent users from moving ROGIR documents or folders.

2. Subfolders of ROGIR source parent folder must be named identically in both Repositories participating in bi-directional replication. If subfolders are named consistently, all documents, whether replicas or not, will appear in one folder in both the source and target Repositories. Problem may emerge if Posts are creating folders in ROGIR that are misspelled or spelled differently then the same folders in other ROGIR Posts or ROGIR for Washington DC. Need to manage a Global ROGIR folder structure. However, Posts will never create their own folders in their local ROGIR, and security will prevent users from creating any new ROGIR folders.

3. Replication process copies or replicates changes to the entire version tree of the source object. Hence, if a ROGIR document that was replicated earlier is linked to a SWP Document in the source Repository, then any new versions applied to the SWP document will be replicated to the target ROGIR.

4. All Repositories must use ASCII where appropriate (See Internationalization document for more details.) 1. Repository names, Repository owner user name/password, Installation owner user

name/password, Object type names and attribute names, Registered table names and column names, Documentum Server installation directory, Location object names, the value of the file_system_path attribute of a location object, Mount point object names, Format names, Format DOS extensions, All filestore names, etc.

2. If the participating Repositories have different server_os_codepage values (Cyrillic and Latin-1), the names of the source and target replication folders must consist of only ASCII characters.

5. Disk Capacity: Both the source and the target sites for each replication job must have enough disk space to accommodate the temporary dump files created by the replication jobs. Each Target Repository needs dumpfilesizeX2 for the filter file disk capacity and the target site must have enough disk space to accommodate the replicated content.

6. Security with Firewall and encryption technology can work with replication. Since the replication job uses Documentum DMCL API to perform all replication operations, Documentum Trusted Content Services can be enabled to support SSL encryption of all data traffic between the job and the Source and Target Repositories. The firewall must open the content server tcp port on both source and target Repositories for 2-way communication.


1616

Q&A - Business Analysis

Gather Customer Functional and Technical RequirementsAnalyze RequirementsChoose An Agreed Upon ApproachIdentify Replication RequirementsUnderstand Replication Limitations

QUESTIONS?


1717

Technical Design and Implementation Approach

Design ApproachImplementation OptionsCircular ReplicationSub Circular ReplicationStar ReplicationStar Replication Nice To Have OptimizationsQ&A


1818

Design Approach

Washington DC will be the main ROGIR Repository 50 repositories (1 for each hub post)No FederationDC maintains object types, folders, data dictionary, ACLsDC can only modify the ROGIR folder structureAll ROGIR objects will be assigned a READ-only ACLOnline/offline FAST Non-Federated incremental replication will be used between repositoriesROGIR data – single version, simple, non-related docsWhen Post closes, Repository will be destroyed

1. Washington DC will maintain the Folders. Normally, the folder structure would be created initially and replicated to the Posts on initial Full replication. If new documents need to be introduced then Washington DC must be notified to create the new folder in ROGIR.

2. When a new Post is established, a Full replication will occur on the Washington DC internal network with the Post Repository. Then the Post hardware with the replicated ROGIR is shipped to the Post site.

3. During a closure of a Post, the Post repository will be destroyed. The life term of a Post ranges from a few weeks to a few months. Washington DC Repository is permanent.

4. All ROGIR objects that are replicas will be READ-ONLY. If you want only to create read-only replicas in the target Repository, then cross-projection (each participating site must project to the DocBroker at each of the other participating sites) is not required.

5. Deleting or demoting a SWP document version that has a “link” to its ROGIR counterpart, will not delete ROGIR document. ROGIR document can only be replaced by subsequent promoted versions of the SWP only. In this case, the only way to remove invalid ROGIR docs is a manual process by the System Administrator.

6. ROGIR will not have any documents that are related to any other documents outside of the /ROGIR Cabinet. Virtual Documents will not be in ROGIR and there will not be any annotations or any other document relations.

7. A Federation will not be implemented in this design because of the limitations of network connectivity and complexity of bi-directional and circular replication.

8. Online and offline (manual) FAST Incremental replication will be implemented between all Repositories.

9. RemapSecurity: Security and Storage will be re-mapped on the replicas in the Target Repository: The owner_name, acl_name, acl_domain, and a_storage_type attributes are considered local attributes and will be remapped to default values provided in the replication job’s definition - all replicas created by any job will have the same ACL and are stored in the same storage area in the target Repository.

10. A standard script will be executed on each Repository to provide default ROGIR owner name(s), ROGIR groups, ROGIR ACLs, a standardized set of ROGIR object types, and standard set of ROGIR controlled vocabularies that construct ROGIR folder structure so that these default object types, users and ACLs, and folders will be assigned to the replicas during a replication process. Security will be standardized for Global Read access to ROGIR. The script is executed once per installation.


1919

Implementation Options

Circular


2020

CIRCULAR REPLICATION

/ROGIR is replicated from 1 Repository to the nextReplicas can be replicatedOnly need 1 job for each Repository – 51 jobsJob runs in a different time window from its neighborsIf 1 link is broken, no updates to remaining repositoriesRepository recovery from backup can have bad ripple effectsPost/Repository shutdown can have bad ripple effectsCannot work because of unreliable network links, hardware infrastructure, support staff

Replicas can be replicated. This is how you do a Least-Cost-Replication scheme. Replicas will not be replicated BACK to the original source repository. Replicate from Repository 1 to Repository 2, Repository 2 to 3, 3 to 4, 4 to 5, …49 to 50, and 50 to 1. Circular replication would not work because the links may not be there between each Post. Customer does not want to depend on Posts to support the replication chain. They need the control to be initiated and monitored from Washington DC.However, if a link is broken in the chain, Repository 1 will never get refreshed to get new changes from the restif Repository 10 crashes, how will u recover and re-synch all the other Repositories? This results in the least number of jobs for ROGIR replication but will not satisfy subscription requirements since the entire ROGIR is replicated. However, if we implement circular replication to the Tier-1 Regional Repositories that have high bandwidth (at least T1), then have all the near Repositories in that region use the subscription replication and submission replication, then we can still limit the number of jobs.


2121


EU AFRICA APAC

US (DC)

Sub Circular


2222

SUB CIRCULAR REPLICATION

Repositories are grouped into Continental Hubs/ROGIR is bi-directionally replicated between each HUB and DCAll the repositories in each HUB will use circular replicationEach Repository in every HUB will inherit same properties and problems from circular replicationCannot work because of unreliable network links, hardware infrastructure, support staff within the Hubs


2323


STAR Replication


2424

STAR REPLICATION

/ROGIR is Bi-Directionally replicated between DC and every PostNeed 50 Submission Jobs and 50 Subscription jobs Submission Jobs must run in their own time windowsRepository recovery from backup can have bad ripple effectsPost/Repository shutdown can have bad ripple effectsCannot work because of job scheduling challenges and Repository shutdown/recovery issues


2525

Star Replication Nice To Have Replication Optimizations

Optimized Subscription - Dump once, Load to all 50 repositories – Requires CustomizationJob does not require a connection between Source and Target – Requires CustomizationExternal distributed scheduler which facilitates job managment across posts


2626

Q&A Technical Design and Approach

Design ApproachImplementation OptionsCircular ReplicationSub Circular ReplicationStar ReplicationStar Replication Nice To Have Optimizations

QUESTIONS?


2727

Star Replication Final Solution

Subscription Job: DC/ROGIR -> Px/ROGIR Submission Job: Px/ROGIR -> Px/Px -> DC/Px -> DC/ROGIRAll jobs run from DCAll jobs can run at the same time100 total jobs for 50 Repository Posts

Solution: Washington DC/ROGIR -> Postx/ROGIR Postx/Postx -> Washington DC/Postx which would be a total of 120 jobs for 50 Field Office Repositories. The solution is fast on-line incremental replication using manual transfer or online transfer. All jobs will run from Washington DC Repository to make the administration centralized. Network accessibility to the Post is required during Phase 3 of the replication process. The only concern is the dump for submissions and the load for subscriptions since they are not remote method calls in a Post session. The dump and load would be initiated on a dmcl client call from Washington DC Repository. This api call would take around 10 minutes for a dump or load operation of a 1000 objects.

Subscription: /Washington DC/ROGIR -> /Postx/ROGIR. 50 jobs. For all jobs, source is always Washington DC /ROGIR Cabinet. Every Post can run this job at the same time since the target Cabinet is different (specific to a Post).Any Washington DC replicas that originated from the target Post Repository will not be replicated back to the Post Repository.Creating a feature request and follow up with PMO to allow and support a one job solution that can execute one dump and load in to many targets.Subscriptions should occur at least once a month.

Submission: /Postx/Postx -> /Washington DC/Postx. 50 jobs. Each Post will have a dedicated Cabinet, /Postx, in Washington DC Repository. In the Postx Repository, new or modified documents get promoted by a copy to the local Postx/ROGIR (if a copy already exists it gets replaced) and this copy gets LINKed to the Postx/Postx Cabinet. Every Post can run this job at the same time since the target is a different cabinet in Washington DC Repository (specific to a Post). Since the job replicates only the /Postx/ cabinet (cabinet is hidden from the users), only Postx objects get replicated.If a Post local document gets deleted, then all its links in /ROGIR and /Post should get deleted as well. In this case, delete synchronization on Washington DC will remove the copy of Washington DC/Postxwhich will automatically remove the copy on Washington DC/ROGIR. This will inevitably cause the delete synchronization on the other Posts during the next subscription.Submissions should occur once a week.


2828

Star Replication Final SolutionEach repository will have a working areaContent is promoted to /ROGIR from work areaPromotion copies content into /ROGIR and local /Px cabinetPost submission copies content from DC/Px to DC/ROGIR to keep ROGIR data localBecause of copies, all submitted content to DC/ROGIR are not replicas (local)No Post Shutdown issuesSimpler recovery process

When the job completes, a post replication process runs on Washington DC Repository that will copy the /Washington DC/Postx/ docs (cabinet is hidden to the users) into /Washington DC/ROGIR based on the attributes.Advantages:

All jobs can run at the same time. However, for best resource balancing, jobs can be scheduled to run twice a month or 2X100= 200 jobs a month/30 would be 7 jobs per day. If each job takes around 3 hours for a 28kbs line replicating 33MB they can be scheduled throughout the day to reduce overhead of the server. The scheduling can be changed in the future. Keep in mind it is best to schedule jobs not during the backup time window.Satisfies all the best practices.

Disadvantages: Since there will be 100 replication jobs running throughout the month, Washington DC Server must be sized accordingly and the Database tunes to accommodate the overhead.Copying the new replicas into Washington DC /ROGIR cabinet during the concurrent subscription jobs should not cause problems but needs to be tested. Some of those replicas will be skipped and will be picked up on the next run. Since the dump is a collection of read-only queries, new objects added should not interfere.


2929

Q&A Final Solution

QUESTIONS?


3030

Conclusions and Lessons LearnedIssuesJob Performance MeasurementsLessons LearnedQ&A


3131

IssuesNetwork LimitationsJob SchedulingJob PerformanceWashington DC PerformanceRestoring From BackupRepository Administration

An enterprise with a number of large Repositories at relatively autonomous sites will probably require one additional full-time Documentum Administrator per site to administer replication. This individual will need conventional Documentum skills (API, DQL, server environment, client environment, and scripting) as well as business knowledge in order to translate business replication requests into appropriate replication jobs.Need a Documentum Administrator for adding groups and users to the appropriate RepositoriesNeed a Documentum Administrator for resolving problems when networks or Repositories are not availableNeed a Documentum Administrator resource for monitoring the progress of replication jobs and resolving failuresNeed a Documentum Administrator resource for coordinating object type definitions across RepositoriesMust coordinate multi-site, multi-Repository users, groups, and access permissions as a business function between Repositories. Object types, users, groups, and ACLs (dm_acl objects) are not replicated as part of the replication job.

If any of the object types are not identical on all Repositories, during replication the system will copy all information possible and ignore any information that cannot be replicated. Ensure that object types are defined identically in all Repositories. If you replicate an object into a Repository in which the object’s type does not exist, the operation creates the type in the target Repository. However, if the type already exists in both source and target Repositories, it should be defined identically in each.A user who has access to a document in one Repository may not necessarily have access to the document’s replicas in other Repositories. Replication between Repositories does not require user definitions and ACLs to be the same across the Repositories. In such cases, the replication job maps the ownership and ACLs of the replicated objects to users and ACLs in the target Repository.

Maintain a standard set of Controlled vocabularies to all Repositories: All the controlled vocabularies are managed by Washington DC only. Options:

Controlled vocabularies could be defined in dedicated sysobjects (list of values defined through DQL queries) for being also replicated across Repositories.Or Let the tables to be propagated by dumping them into a flat file or dbf and sent with the DVD. This includes the technical sectors. Not recommending to use the Data dictionary (DD) since they are not replicated across Repositories. However, a script can be used to apply the updates to the DD on a scheduled basis.


3232

Job Performance Measurement

Dump/Load process– CPU, DISK I/O Speeds and Peak vs. Off Peak times– 2.43 sec/doc*num of docs + (filesize KB/111KB/s)

Bulk Data transfer – depends on bandwidth, latency, dump file size,

compression, protocol overhead, traffic, packet loss, collisions

– DCTM or 3rd Party transfer options– Bandwidth time (s) = (filesize KB*8*1.10)/bandwidth

Kb/s– Latency time (s) = (((filesize KB/64K)+6)*latency(ms) *

1.10)/1000

NOTE:DCTM content transfer mechanismThe replication dump file is not compressed at all. The dump file is transferred as the content associated with an object and as such is transferred the same way as any other content via getfile/setfile (depending on whether you’re using push or pull replication). The content transfer algorithm breaks the content up into 64K-sized pieces until the last piece is less than 64K and then sends smaller packets, i.e., 32K, 16K, etc. The OS TCP layer is free to packetize the transfer however it wants.So, a 1Mb file would break up into 16 64k chunks. A 1.1Mb file would have 17x64kb, 1x32k, 1x16k… A 63k file would have 1x32k, 1x16k, 1x8k,1x4k,1x2k, 1x1k. This leads to the conclusion that a file will be broken in to at most round(file_size/64)+6 chunks to be transferred.The number of TCP packets depends on a number of factors, including any kind of Max MTU limitation imposed by tunneling. This becomes much more difficult to model based on the available information.Assuming that each "chunk" is sent and verified before the next "chunk" is sent, then the latency cost can be roughly calculated as follows:((size of document / 64k) +6) * latencyPacket loss may add up to an additional 10 or 15 percent to this value for the less reliable network connections


3333


28.8 32 64 128256 512

1024

0

10

100250

5001000

2000

0

10

20

30

40

50

60

Time (hours)

Bandwidth Kb/s

Latency (ms)

Subscription Replication Performance 500 MB dump file (5000 objects)

01010025050010002000


3434


28.8 32 64 128 256 512 1024

0

100

500

2000

0

10

20

30

40

50

60

70

Time (min)

Bandwidth (Kb/s)

Latency ms

Submission Replication10 MB (100 objects)

01010025050010002000


3535

Lessons LearnedBe PreparedTime Management – Plan wellUnderstand Replication LimitationsTesting, Testing, More TestingManaging Customer’s expectationsReality versus Theory


3636

Q&A - Conclusions and Lessons LearnedIssuesJob Performance MeasurementsLessons Learned

QUESTIONS?


3737

5.3 Replication EnhancementsIncrease reliability by defining a configurable maximum on number of objects to be dumped, transferred or loaded

– Avoids need to allocate space for huge file– Limits lost work when failures occur– Avoids need to keep connections open for large intervals

Synchronize replication by sequencing jobs – Runs a set of jobs in a manner that respects job

dependencies– Avoids running conflicting jobs across repositories– Works for other jobs too


3838

Date post:	08-Sep-2018
Category:	Documents
Upload:	vannga
View:	228 times
Download:	0 times