FAILOVER AND REDUNDANCY FOR VMS SYSTEMS · FAILOVER AND REDUNDANCY FOR VMS ... event occurs during...

FAILOVER AND REDUNDANCY FOR VMS

SYSTEMS

Benefits and drawbacks of different fault-tolerant deployment models in

Video Management Systems

Salient Systems Corp.

4616 W. Howard Lane

Building 1, Suite 100

Austin, TX 78728

SALIENT SYSTEMS WHITE PAPER

Salient Systems Page 2

Failover capability may be perceived as an important feature for some video security deployments. Two questions to ask when considering failover as an important purchasing criteria are:

1. What are the most likely points of failure that require backup? 2. What is the cost of providing backup to components of the system vs. the level

of risk component-failure represents to the organization? When selecting Video Management Software (VMS) for security deployments requiring failover, consumers and security integrators generally limit their search to VMS platforms advertising N+1 Failover capability. But what does N+1 Failover capability of a VMS platform protect against? Do those protections represent the greatest point of risk in the system? Furthermore, how likely is the risk to occur as compared to the cost? To some organizations requiring video security, the risk considered is “What if a critical event occurs during system failure and we miss the video?”. Even if a failover implementation costs twice as much as a traditional deployment, the possibility of missing an event justifies the cost. What is usually not considered are all the components of the system other than the Network Video Recorder which could fail. Failure of any component will cause a loss of live video and recordings. Consider the following points of failure:

If the network switch connecting cameras goes down, all the corresponding cameras will no longer be able to stream. If cameras are powered via PoE, they will turn off, preventing them from recording to on board storage, if installed.

If the network cable connecting the cameras goes down, the corresponding camera will lose connectivity and possibly power.

If the camera itself goes down, there will be no video.

If the power goes out, switches, NVRs, cameras and more may lose power without UPS backup or a secondary power source.

Full system failover is generally impractical. Drawing from the list of components presented above, a fully redundant system would require two cameras at each position, duplicate network cables and switches, backup power and failover Network Video Recorders (NVRs). Backup lighting or camera illumination may be required if the primary lighting goes out. If there is a fire in the building, video stored on premise may be lost if not also streamed and recorded offsite. Considering failover in a video security system comes with tradeoffs, and represents a significant financial investment.


Goals of Fault Tolerance for VMS Systems

When protecting against failure conditions related specifically to the VMS and NVR

component of the video security system, there are three primary considerations:

High Availability

Data Integrity

Disaster Recovery

High Availability applied to a VMS system means the system will continue to function

during a failure event. Taken further, the clients will continue to have access to live

video when the primary VMS server, which normally would provide the live video

streams to connected clients, fails. High Availability can also mean access to recordings,

necessary to perform investigations, continues to function during a failure event.

Data Integrity with VMS systems means video will continue to record during a failure of

the primary VMS recording server. Users generally expect that following a failure event,

when the primary VMS server is restored, the recordings made during the failure event

will be automatically moved from the temporary failover recording point to the primary

VMS. This action would ‘fill in the gap’ created in recorded video during the failure

event. This is the capability most sought after when failover is desired.

Disaster Recovery indicates that in the event of total loss of the VMS server, the existing

recordings are still available. Disaster Recovery protects against events like a fire or

flood which may affect all equipment on premise. This feature is particularly difficult to

implement with VMS systems because generally the live video needs to be recorded in

duplicate off site, to protect against disaster affecting the local site. Because live video

streams are high bitrate, securing an internet connection with sufficient bandwidth to

allow for offsite streaming can be difficult and costly for sites with more than a few

cameras.

The sections to follow will examine how each type of failover implementation works and

identify the features and benefits each provides.


N+1 Software-Based Failover

When most security professionals seek failover with a VMS system they’re looking for N+1

failover, but is it the best choice?

N+1 refers to a software based failover system which allows a single idle server (‘+1’) to provide

failover to several other VMS servers (‘N’).

Using a simple example, two VMS servers

exist at a site and are recording two

cameras each. A third server also exists at

the site to provide the failover capability.

The failover server sends a network

message periodically to each of the

primary servers, commonly referred to as

a “heartbeat”, and waits for a response. If

there is no response after a period of

time, the failover server assumes the

primary server is down.

Using a stored copy of the primary server’s

configuration, the failover server begins

recording video from the affected cameras.

Any clients connected to the primary server

for live video viewing detect the failure and

automatically connect to the failover server.

Once the failure is resolved and the primary

VMS server is back online, the heartbeat

message is responded to, notifying the

failover server that normal operations can resume.

The failover server will then unwind the actions taken during the failure event. The failover

server may notify any connected clients to reconnect to the primary VMS server. It will also

disconnect from cameras so as not to pull live video streams while the primary server is

operating. Finally, some systems, but not all, will initiate a synchronization of recordings to “fill

in” gaps in recorded video on the primary VMS server from the video recorded during the

failure event on the failover system.

Here's how N+1 failover delivers on fault tolerant capabilities:


High Availability (delay): Continuous service to clients of live video is available with N+1

failover, however video recorded on the primary server is not available during the

failure event. There will be a delay between the start of the failure event and the

reconnection of the clients related to the time it takes for the failure event to be

detected, the time for the failover server to connect to cameras and the time needed

for the client to connect to the failover server.

Data Integrity (delay): Continuous recording of video occurs during the failure event,

albeit with some delay before recording is reestablished. The delay is related to the

time it takes to detect the failure event and then connect to the affected cameras.

× Disaster Recovery: In order to implement disaster recovery, the failover server would

need to be located in a physically different location. This would likely involve the

failover server connecting to the primary recording servers and cameras via the

Internet. While this is certainly possible to implement, it would likely be impractical due

to bandwidth constraints. During a failure event, all of the primary recording server’s

cameras would need to be streamed over the internet to the failover server. Any

system requiring failover will also likely have many cameras, and streaming many

cameras over the internet to an offsite location is cost prohibitive for most

deployments.

With N+1 failover comes cost considerations and other tradeoffs. First, an additional NVR

would be required at a site, increasing the hardware cost. In the event the site has only one

NVR, implementing N+1 would nearly double the hardware cost of the VMS deployment.

Compared to other fault tolerant techniques, hardware savings for an N+1 implementation are

available only when there are multiple NVRs per site, making it most applicable to high camera

count per site deployments (like casinos). In addition, VMS systems supporting N+1 generally

require software licenses for failover often adding dramatically to the software cost of the

solution. Finally, N+1 can back up one server failure at a time. If multiple NVRs could fail

simultaneously a VMS supporting N+2 or higher should be used, limiting choices and increasing

hardware costs.


VM Failover

The Virtual Machine based failover model utilizes both Salient CompleteView-specific

technologies combined with industry standard VM platforms supporting High Availability.

High Availability VM configurations allow VM software administrators to create a copy of a

virtual machine which can sit idle until the primary virtual machine goes offline. The duplicate

VM can also run on a different physical computer than the primary VM, protecting the VM

configuration from a hardware failure.

In this configuration, the primary VM running

CompleteView is configured to record video

to a Network Attached Storage (NAS). Using

NAS for the primary recording volume allows

both VMs to access and contribute to the

same set of recordings regardless of which

one is operating.

A failure of the primary VM is detected by the

VM Hypervisor software. The VM Host will

initiate a boot up of the duplicate VM. When

the duplicate VM boots, CompleteView starts

on the duplicate VM with the same camera

and recording configuration as was running

on the primary VM. The duplicate VM

connects to the affected cameras and

resumes their recording. Because the

duplicate VM will have the same IP address as

the primary VM, CompleteView clients will automatically reconnect to the duplicate VM,

allowing live video viewing to continue during the failure event.

Video recorded by the duplicate VM is also recorded to the same NAS volume as the primary

VM. Because of CompleteView’s unique Stable Recording Architecture, video is written outside

of a single database file, and in small chunks on disk. New video files from the duplicate VM are

written to the same structure already created by the primary VM, without affecting existing

recordings. Upon loading, CompleteView will scan for all recordings; this feature allows the

duplicate VM to identify all existing video previously recorded by the primary VM, allowing

investigators continued access to all recordings during the failure event.


When the primary VM is restored, the Hypervisor shuts down the secondary VM.

CompleteView running on the primary VM connects to its corresponding cameras and resumes

recordings. Clients connected to the duplicate VM reconnect to the primary VM automatically.

When CompleteView loads on the primary VM it scans the NAS recording volume for recorded

video and updates its recordings database with pointers to all clips recorded by the duplicate

VM during the failure event, making those recordings automatically available to investigators.

VM based failover provides fault tolerant capabilities comparable to that of N+1, with some

distinct advantages:

High Availability (delay): Continuous service to clients of live video is available with VM

based failover. Unlike with N+1 based failover, VM failover allows continuous access to

recordings made before the failure event because they’re centrally located on NAS.

There will be a delay between the start of the failure event and the reconnection of the

clients related to the time it takes for the failure event to be detected, the time for the

failover VM to connect to cameras and the time needed for the client to connect to the

failover VM. Failover delay can be largely avoided using ‘Fault Tolerance’ features of

VM software as opposed to ‘High Availability’ VM feature sets at the cost of additional

hardware resources.

Data Integrity (delay): Continuous recording of video occurs during the failure event,

albeit with some delay before recording is reestablished. The delay is related to the

time it takes to detect the failure event, startup the duplicate VM and connect to the

affected cameras.

× Disaster Recovery: Disaster recovery is likely impractical to implement with VM based

failover for the same reasons as described with N+1 based failover. Streaming the

camera feeds at their full resolution and frame rate, normally designed to transport over

a LAN, would likely be impractical over a WAN.

VM based failover shares a similar cost profile to N+1 based failover. An additional server is

required to back up ‘N’ number of other systems. Furthermore, storage is centralized on a NAS

appliance as opposed to being built into each recording server. Instead of purchasing failover

licenses from the VMS vendor, Virtualization software would be purchased.

VM based failover can vary in delay time. Depending on the need, options range from slightly

delayed to instantaneous failover. Instantaneous failover requires a duplicate VM running as a

‘hot standby’ whereas delayed failover requires the duplicate VM to boot up before servicing

clients and recording cameras. One key advantage of VM based failover, as compared to N+1,

is having access to the full set of recordings during the failure event.


Dual Stream Recording

Dual Stream Recording, very simply put, is recording an IP camera stream to two VMS servers

simultaneously.

Most IP cameras and encoders

have the ability to send multiple

streams of video at similar quality

levels simultaneously. Using this

feature, two duplicate VMS

systems can be setup to run in

parallel with the same

configuration.

When the two VMS servers are

running simultaneously, a complete backup of recordings exist between the two. If one of the

two servers fail, recording is already occurring on the other server, providing the benefit of no

time required to “failover” recording to the backup system.

Using CompleteView, any connected

clients would be able to log off the

primary VMS server and log into the

backup VMS server for continued

access to live video and recordings.

Although the process requires

manual user intervention, the task

can be performed easily by the user

simply logging out of and back into

CompleteView. Within

CompleteView this setup can be accomplished using different account credentials or separate

client configuration files hosted on two different Config Servers.

A unique attribute of Dual Stream Recording is the ability to specify a video stream with

different properties for recording to the backup recording server. Most cameras and encoders

not only have the ability to provide multiple video streams that are the same but can also

generally provide those streams at different resolutions and a different frame rate. Using this

feature of streaming devices allows the security system designer to plan for a lower frame rate

and resolution to the backup recording server.


Failover methods described thus far are limited in their ability to provide Disaster Recovery, not

because it’s impossible to be configured, but impractical for most deployments to stream full

resolution and frame rate video over a WAN connection. Dual Stream Recording can allow a

server located offsite to receive lower bit rate streams which would provide disaster recovery,

albeit by recording video at lower quality than the primary, onsite, recording server.

Dual Stream Recording fault tolerance profile includes:

High Availability: Continued availability of live video and recordings using Dual Stream

Recording requires manual intervention by users of the system. A client user would

need to log out and log back in. Although the backup recording server is immediately

available using this configuration, the time required to recover client services in a failure

event is predicated on the speed of the user identifying the issue and taking action to

recover.

Data Integrity: Because video is continuously recording to both the primary and backup

recording servers, video continues to record during the failure event to either the back

or primary recording server without interruption or lag time.

Disaster Recovery: Using Dual Stream recording for Disaster Recovery is more practical

than other methodologies mentioned herein. The backup server can be located offsite

and continuously record cameras over a WAN connection by pulling lower bitrate video

streams.

Dual Stream Recording provides some powerful benefits and comes with some tradeoffs. It can

provide some of the highest levels of protection by keeping the camera connection active on

two servers simultaneously. Similarly, recorded video is kept on two storage arrays

simultaneously. On the down-side, two servers may be required per site. If there is only one

server per site normally required, hardware cost would be similar to that of N+1 failover, and

possibly lower because failover licensing would not need to be purchased. In fact, in the case

of using CompleteView for Dual Stream recording, added licenses are provided at no cost to the

user. When using Dual Stream Recording for Disaster Recovery hardware costs are limited as

compared to other methods, because only one addition, centrally-located, recording server is

required.


Camera SD Card Recording

Recording video to the designated VMS recording server and simultaneously recording to

camera SD cards does provide some unique fault tolerance features.

Implementation of edge

recording support varies by

VMS. Using CompleteView

video records to the VMS

server and camera SD card at

the same quality level.

CompleteView allows

administrators to define a

synchronization schedule

from every hour to every

week. The synchronization

schedule defines when CompleteView will download missing video segments from the camera

SD card to fill in any gaps in video on the CompleteView recording server.

When a failure event occurs, video continues to record to the SD card. Because the recording

occurs at the camera, this method of failover protects not only against VMS recording server

failures but also against cabling and infrastructure problems. Using other failover techniques, if

the network cable connecting the camera fails, or infrastructure equipment, such as the switch

the camera is connected to fails, video would not be accessible on either the primary recording

server or the failover server. In order to benefit from protection against infrastructure failures,

the cameras cannot use PoE power, as cabling failures may prevent the camera from receiving

power.

Furthermore, this method provides a small level of geographic distribution of the recorded

video which can be an advantage in some installation scenarios. For example, if a surveillance

system is installed on a container or cruise ship, cameras would be dispersed across the vessel.

Because recording occurs at both the camera and on the VMS recording server, if a flood or fire

occurred in the room containing the VMS recording server, recording would still be present on

the camera SD card. This provides a level of disaster recovery which generally is unavailable

with other fault tolerant deployment architectures.

Camera SD Card recording fault tolerant capabilities include:


× High Availability: During a failure event, where the VMS recording server is down, there

would be no backup server for clients to reconnect to. Although the system would still

be recording during a failure event, live video viewing and investigation functions of the

video security system would be inaccessible.

Data Integrity: Because video is continuously recording to both the VMS recording

server and to the camera SD cards there would be no interruption to recording during a

failure event. A unique benefit of SD card based fault tolerance is the ability to continue

recording even during a failure related to network cabling or switch infrastructure,

provided the camera is powered from a local power source. PoE power to cameras

could be interrupted if the cabling or switch fail.

Disaster Recovery: Using SD Card recording, video recordings are physically located on

the VMS recording server and at the camera. Because of physical distribution of the

recordings, flood, fire, tampering or other events which could cause physical damage to

the VMS recording server may not affect the cameras and corresponding recordings

present on the SD cards. This can provide a level of disaster recovery which may be of

benefit for some markets like cruise and container vessel installations.

SD card recording is a relatively affordable way to implement a level of fault tolerance,

requiring only the purchase of SD cards and compatible cameras. It also provides unique

benefits which can be used in conjunction with other failover methods described herein by

adding protection against failure of network and cabling infrastructure as well as geographic

distribution of the recordings.


Hardware Failover

Redundancy can be built directly into the VMS recording server using a hardware platform with

built in hardware based failover.

Several computer

products

manufacturers offer

single-box server

solutions which

contain a duplicate set

of components

combined with a

watchdog controller. If

any individual

component fails, the

entire platform

instantly fails over to the duplicate set of hardware components.

These systems generally use a single storage array shared by the two sets of computing

components. This ensures that if the hardware fails over, data access is not compromised.

Hardware based failover solutions are simple to implement and are not dependent on VMS

feature sets or architecture. Additionally, failover using this method is nearly instantaneous.

Although this solution provides the highest availability it may also represent the highest

hardware cost. Because there is a duplicate set of components, the consumer pays not only for

the equivalent of two servers but also a premium for the watchdog controller functionality

which monitors the hardware and performs the failover actions.

Fault tolerant capabilities of hardware based failover solutions include:

High Availability: Hardware based failover can provide the best high availability of the

methods reviewed. Failover should be nearly instantaneous allowing clients to maintain

their connection for live video viewing. Hardware failover platforms general have a pool

of storage shared between the duplicate set of computing components, allowing no

interruption to video recorded prior to the failure event. This ensures that not only new

video will be recorded during the failure event, but investigators will also have access to

video recorded prior to the failure.


Data Integrity: The ultra-fast failover speed offers an advantage over other methods

described. When a failure event occurs, existing camera connections should be

maintained and recording should occur without interruption.

× Disaster Recovery: Because all of the video and computing resources are (usually)

contained in a single chassis, there is no disaster recovery capability using this method.

Hardware based failover offers high performance, simple setup and universal compatibility.

Although failover licensing or virtual machine software is not required, it does not mean this

solution comes without budgetary considerations. The higher cost of the hardware itself puts

this solution equal to or great cost than other methods reviewed.


Summary

There are a variety of failover methods which can be applied to VMS platforms. Some are not

dependent on VMS features or architecture and can therefore be applied to any VMS platform.

Additionally, failover methods can be combined for best protection. As an example, camera SD

card recording can be used in combination with other methods to protect against both VMS

recording server failures as well as network infrastructure and cabling failures.

Each method has different attributes related to the type of fault it protects against,

performance, compatibility and cost. As such, planning for the type of risk to protect against

and identifying the budget for fault tolerance are the first steps to planning for the correct

failover technology.


ABOUT SALIENT SYSTEMS

Salient Systems offers network friendly, comprehensive IP and analog video surveillance

management systems (VMS) built on open architecture. As the recognized transition leader

from analog to digital video, Salient Systems’ VMS, CompleteView™, is scalable and provides

everything needed to manage a multi-server enterprise from a single desktop. Salient delivers

simple and scalable security today…and tomorrow. For more information about Salient Systems

and CompleteView, visit www.salientsys.com.

ABOUT THE AUTHOR Brian Carle is the Director of Product Strategy for Salient Systems Corporation. Prior to Salient he worked as the ADP Program Manager for Axis Communications. For information about this white paper or CompleteView, email [email protected].

©2017 Salient Systems Corporation. Company and product names mentioned are registered trademarks of their respective owners.

Salient Systems 4616 W. Howard Lane Building 1, Suite 100 Austin, TX 78728 512.617.4800 512.617.4801 Fax www.salientsys.com

mailto:[email protected]

http://www.salientsys.com/

Date post:	29-Jul-2018
Category:	Documents
Upload:	trinhdien
View:	227 times
Download:	1 times

FAILOVER AND REDUNDANCY FOR VMS SYSTEMS · FAILOVER AND REDUNDANCY FOR VMS ... event occurs during...

Documents