FAILOVER AND REDUNDANCY FOR VMS
SYSTEMS
Benefits and drawbacks of different fault-tolerant deployment models in
Video Management Systems
Salient Systems Corp.
4616 W. Howard Lane
Building 1, Suite 100
Austin, TX 78728
SALIENT SYSTEMS WHITE PAPER
Salient Systems Page 2
Failover capability may be perceived as an important feature for some video security deployments. Two questions to ask when considering failover as an important purchasing criteria are:
1. What are the most likely points of failure that require backup? 2. What is the cost of providing backup to components of the system vs. the level
of risk component-failure represents to the organization? When selecting Video Management Software (VMS) for security deployments requiring failover, consumers and security integrators generally limit their search to VMS platforms advertising N+1 Failover capability. But what does N+1 Failover capability of a VMS platform protect against? Do those protections represent the greatest point of risk in the system? Furthermore, how likely is the risk to occur as compared to the cost? To some organizations requiring video security, the risk considered is “What if a critical event occurs during system failure and we miss the video?”. Even if a failover implementation costs twice as much as a traditional deployment, the possibility of missing an event justifies the cost. What is usually not considered are all the components of the system other than the Network Video Recorder which could fail. Failure of any component will cause a loss of live video and recordings. Consider the following points of failure:
If the network switch connecting cameras goes down, all the corresponding cameras will no longer be able to stream. If cameras are powered via PoE, they will turn off, preventing them from recording to on board storage, if installed.
If the network cable connecting the cameras goes down, the corresponding camera will lose connectivity and possibly power.
If the camera itself goes down, there will be no video.
If the power goes out, switches, NVRs, cameras and more may lose power without UPS backup or a secondary power source.
Full system failover is generally impractical. Drawing from the list of components presented above, a fully redundant system would require two cameras at each position, duplicate network cables and switches, backup power and failover Network Video Recorders (NVRs). Backup lighting or camera illumination may be required if the primary lighting goes out. If there is a fire in the building, video stored on premise may be lost if not also streamed and recorded offsite. Considering failover in a video security system comes with tradeoffs, and represents a significant financial investment.
Salient Systems Page 3
Goals of Fault Tolerance for VMS Systems
When protecting against failure conditions related specifically to the VMS and NVR
component of the video security system, there are three primary considerations:
High Availability
Data Integrity
Disaster Recovery
High Availability applied to a VMS system means the system will continue to function
during a failure event. Taken further, the clients will continue to have access to live
video when the primary VMS server, which normally would provide the live video
streams to connected clients, fails. High Availability can also mean access to recordings,
necessary to perform investigations, continues to function during a failure event.
Data Integrity with VMS systems means video will continue to record during a failure of
the primary VMS recording server. Users generally expect that following a failure event,
when the primary VMS server is restored, the recordings made during the failure event
will be automatically moved from the temporary failover recording point to the primary
VMS. This action would ‘fill in the gap’ created in recorded video during the failure
event. This is the capability most sought after when failover is desired.
Disaster Recovery indicates that in the event of total loss of the VMS server, the existing
recordings are still available. Disaster Recovery protects against events like a fire or
flood which may affect all equipment on premise. This feature is particularly difficult to
implement with VMS systems because generally the live video needs to be recorded in
duplicate off site, to protect against disaster affecting the local site. Because live video
streams are high bitrate, securing an internet connection with sufficient bandwidth to
allow for offsite streaming can be difficult and costly for sites with more than a few
cameras.
The sections to follow will examine how each type of failover implementation works and
identify the features and benefits each provides.
Salient Systems Page 4
N+1 Software-Based Failover
When most security professionals seek failover with a VMS system they’re looking for N+1
failover, but is it the best choice?
N+1 refers to a software based failover system which allows a single idle server (‘+1’) to provide
failover to several other VMS servers (‘N’).
Using a simple example, two VMS servers
exist at a site and are recording two
cameras each. A third server also exists at
the site to provide the failover capability.
The failover server sends a network
message periodically to each of the
primary servers, commonly referred to as
a “heartbeat”, and waits for a response. If
there is no response after a period of
time, the failover server assumes the
primary server is down.
Using a stored copy of the primary server’s
configuration, the failover server begins
recording video from the affected cameras.
Any clients connected to the primary server
for live video viewing detect the failure and
automatically connect to the failover server.
Once the failure is resolved and the primary
VMS server is back online, the heartbeat
message is responded to, notifying the
failover server that normal operations can resume.
The failover server will then unwind the actions taken during the failure event. The failover
server may notify any connected clients to reconnect to the primary VMS server. It will also
disconnect from cameras so as not to pull live video streams while the primary server is
operating. Finally, some systems, but not all, will initiate a synchronization of recordings to “fill
in” gaps in recorded video on the primary VMS server from the video recorded during the
failure event on the failover system.
Here's how N+1 failover delivers on fault tolerant capabilities:
Salient Systems Page 5
High Availability (delay): Continuous service to clients of live video is available with N+1
failover, however video recorded on the primary server is not available during the
failure event. There will be a delay between the start of the failure event and the
reconnection of the clients related to the time it takes for the failure event to be
detected, the time for the failover server to connect to cameras and the time needed
for the client to connect to the failover server.
Data Integrity (delay): Continuous recording of video occurs during the failure event,
albeit with some delay before recording is reestablished. The delay is related to the
time it takes to detect the failure event and then connect to the affected cameras.
× Disaster Recovery: In order to implement disaster recovery, the failover server would
need to be located in a physically different location. This would likely involve the
failover server connecting to the primary recording servers and cameras via the
Internet. While this is certainly possible to implement, it would likely be impractical due
to bandwidth constraints. During a failure event, all of the primary recording server’s
cameras would need to be streamed over the internet to the failover server. Any
system requiring failover will also likely have many cameras, and streaming many
cameras over the internet to an offsite location is cost prohibitive for most
deployments.
With N+1 failover comes cost considerations and other tradeoffs. First, an additional NVR
would be required at a site, increasing the hardware cost. In the event the site has only one
NVR, implementing N+1 would nearly double the hardware cost of the VMS deployment.
Compared to other fault tolerant techniques, hardware savings for an N+1 implementation are
available only when there are multiple NVRs per site, making it most applicable to high camera
count per site deployments (like casinos). In addition, VMS systems supporting N+1 generally
require software licenses for failover often adding dramatically to the software cost of the
solution. Finally, N+1 can back up one server failure at a time. If multiple NVRs could fail
simultaneously a VMS supporting N+2 or higher should be used, limiting choices and increasing
hardware costs.
Salient Systems Page 6
VM Failover
The Virtual Machine based failover model utilizes both Salient CompleteView-specific
technologies combined with industry standard VM platforms supporting High Availability.
High Availability VM configurations allow VM software administrators to create a copy of a
virtual machine which can sit idle until the primary virtual machine goes offline. The duplicate
VM can also run on a different physical computer than the primary VM, protecting the VM
configuration from a hardware failure.
In this configuration, the primary VM running
CompleteView is configured to record video
to a Network Attached Storage (NAS). Using
NAS for the primary recording volume allows
both VMs to access and contribute to the
same set of recordings regardless of which
one is operating.
A failure of the primary VM is detected by the
VM Hypervisor software. The VM Host will
initiate a boot up of the duplicate VM. When
the duplicate VM boots, CompleteView starts
on the duplicate VM with the same camera
and recording configuration as was running
on the primary VM. The duplicate VM
connects to the affected cameras and
resumes their recording. Because the
duplicate VM will have the same IP address as
the primary VM, CompleteView clients will automatically reconnect to the duplicate VM,
allowing live video viewing to continue during the failure event.
Video recorded by the duplicate VM is also recorded to the same NAS volume as the primary
VM. Because of CompleteView’s unique Stable Recording Architecture, video is written outside
of a single database file, and in small chunks on disk. New video files from the duplicate VM are
written to the same structure already created by the primary VM, without affecting existing
recordings. Upon loading, CompleteView will scan for all recordings; this feature allows the
duplicate VM to identify all existing video previously recorded by the primary VM, allowing
investigators continued access to all recordings during the failure event.
Salient Systems Page 7
When the primary VM is restored, the Hypervisor shuts down the secondary VM.
CompleteView running on the primary VM connects to its corresponding cameras and resumes
recordings. Clients connected to the duplicate VM reconnect to the primary VM automatically.
When CompleteView loads on the primary VM it scans the NAS recording volume for recorded
video and updates its recordings database with pointers to all clips recorded by the duplicate
VM during the failure event, making those recordings automatically available to investigators.
VM based failover provides fault tolerant capabilities comparable to that of N+1, with some
distinct advantages:
High Availability (delay): Continuous service to clients of live video is available with VM
based failover. Unlike with N+1 based failover, VM failover allows continuous access to
recordings made before the failure event because they’re centrally located on NAS.
There will be a delay between the start of the failure event and the reconnection of the
clients related to the time it takes for the failure event to be detected, the time for the
failover VM to connect to cameras and the time needed for the client to connect to the
failover VM. Failover delay can be largely avoided using ‘Fault Tolerance’ features of
VM software as opposed to ‘High Availability’ VM feature sets at the cost of additional
hardware resources.
Data Integrity (delay): Continuous recording of video occurs during the failure event,
albeit with some delay before recording is reestablished. The delay is related to the
time it takes to detect the failure event, startup the duplicate VM and connect to the
affected cameras.
× Disaster Recovery: Disaster recovery is likely impractical to implement with VM based
failover for the same reasons as described with N+1 based failover. Streaming the
camera feeds at their full resolution and frame rate, normally designed to transport over
a LAN, would likely be impractical over a WAN.
VM based failover shares a similar cost profile to N+1 based failover. An additional server is
required to back up ‘N’ number of other systems. Furthermore, storage is centralized on a NAS
appliance as opposed to being built into each recording server. Instead of purchasing failover
licenses from the VMS vendor, Virtualization software would be purchased.
VM based failover can vary in delay time. Depending on the need, options range from slightly
delayed to instantaneous failover. Instantaneous failover requires a duplicate VM running as a
‘hot standby’ whereas delayed failover requires the duplicate VM to boot up before servicing
clients and recording cameras. One key advantage of VM based failover, as compared to N+1,
is having access to the full set of recordings during the failure event.
Salient Systems Page 8
Dual Stream Recording
Dual Stream Recording, very simply put, is recording an IP camera stream to two VMS servers
simultaneously.
Most IP cameras and encoders
have the ability to send multiple
streams of video at similar quality
levels simultaneously. Using this
feature, two duplicate VMS
systems can be setup to run in
parallel with the same
configuration.
When the two VMS servers are
running simultaneously, a complete backup of recordings exist between the two. If one of the
two servers fail, recording is already occurring on the other server, providing the benefit of no
time required to “failover” recording to the backup system.
Using CompleteView, any connected
clients would be able to log off the
primary VMS server and log into the
backup VMS server for continued
access to live video and recordings.
Although the process requires
manual user intervention, the task
can be performed easily by the user
simply logging out of and back into
CompleteView. Within
CompleteView this setup can be accomplished using different account credentials or separate
client configuration files hosted on two different Config Servers.
A unique attribute of Dual Stream Recording is the ability to specify a video stream with
different properties for recording to the backup recording server. Most cameras and encoders
not only have the ability to provide multiple video streams that are the same but can also
generally provide those streams at different resolutions and a different frame rate. Using this
feature of streaming devices allows the security system designer to plan for a lower frame rate
and resolution to the backup recording server.
Salient Systems Page 9
Failover methods described thus far are limited in their ability to provide Disaster Recovery, not
because it’s impossible to be configured, but impractical for most deployments to stream full
resolution and frame rate video over a WAN connection. Dual Stream Recording can allow a
server located offsite to receive lower bit rate streams which would provide disaster recovery,
albeit by recording video at lower quality than the primary, onsite, recording server.
Dual Stream Recording fault tolerance profile includes:
High Availability: Continued availability of live video and recordings using Dual Stream
Recording requires manual intervention by users of the system. A client user would
need to log out and log back in. Although the backup recording server is immediately
available using this configuration, the time required to recover client services in a failure
event is predicated on the speed of the user identifying the issue and taking action to
recover.
Data Integrity: Because video is continuously recording to both the primary and backup
recording servers, video continues to record during the failure event to either the back
or primary recording server without interruption or lag time.
Disaster Recovery: Using Dual Stream recording for Disaster Recovery is more practical
than other methodologies mentioned herein. The backup server can be located offsite
and continuously record cameras over a WAN connection by pulling lower bitrate video
streams.
Dual Stream Recording provides some powerful benefits and comes with some tradeoffs. It can
provide some of the highest levels of protection by keeping the camera connection active on
two servers simultaneously. Similarly, recorded video is kept on two storage arrays
simultaneously. On the down-side, two servers may be required per site. If there is only one
server per site normally required, hardware cost would be similar to that of N+1 failover, and
possibly lower because failover licensing would not need to be purchased. In fact, in the case
of using CompleteView for Dual Stream recording, added licenses are provided at no cost to the
user. When using Dual Stream Recording for Disaster Recovery hardware costs are limited as
compared to other methods, because only one addition, centrally-located, recording server is
required.
Salient Systems Page 10
Camera SD Card Recording
Recording video to the designated VMS recording server and simultaneously recording to
camera SD cards does provide some unique fault tolerance features.
Implementation of edge
recording support varies by
VMS. Using CompleteView
video records to the VMS
server and camera SD card at
the same quality level.
CompleteView allows
administrators to define a
synchronization schedule
from every hour to every
week. The synchronization
schedule defines when CompleteView will download missing video segments from the camera
SD card to fill in any gaps in video on the CompleteView recording server.
When a failure event occurs, video continues to record to the SD card. Because the recording
occurs at the camera, this method of failover protects not only against VMS recording server
failures but also against cabling and infrastructure problems. Using other failover techniques, if
the network cable connecting the camera fails, or infrastructure equipment, such as the switch
the camera is connected to fails, video would not be accessible on either the primary recording
server or the failover server. In order to benefit from protection against infrastructure failures,
the cameras cannot use PoE power, as cabling failures may prevent the camera from receiving
power.
Furthermore, this method provides a small level of geographic distribution of the recorded
video which can be an advantage in some installation scenarios. For example, if a surveillance
system is installed on a container or cruise ship, cameras would be dispersed across the vessel.
Because recording occurs at both the camera and on the VMS recording server, if a flood or fire
occurred in the room containing the VMS recording server, recording would still be present on
the camera SD card. This provides a level of disaster recovery which generally is unavailable
with other fault tolerant deployment architectures.
Camera SD Card recording fault tolerant capabilities include:
Salient Systems Page 11
× High Availability: During a failure event, where the VMS recording server is down, there
would be no backup server for clients to reconnect to. Although the system would still
be recording during a failure event, live video viewing and investigation functions of the
video security system would be inaccessible.
Data Integrity: Because video is continuously recording to both the VMS recording
server and to the camera SD cards there would be no interruption to recording during a
failure event. A unique benefit of SD card based fault tolerance is the ability to continue
recording even during a failure related to network cabling or switch infrastructure,
provided the camera is powered from a local power source. PoE power to cameras
could be interrupted if the cabling or switch fail.
Disaster Recovery: Using SD Card recording, video recordings are physically located on
the VMS recording server and at the camera. Because of physical distribution of the
recordings, flood, fire, tampering or other events which could cause physical damage to
the VMS recording server may not affect the cameras and corresponding recordings
present on the SD cards. This can provide a level of disaster recovery which may be of
benefit for some markets like cruise and container vessel installations.
SD card recording is a relatively affordable way to implement a level of fault tolerance,
requiring only the purchase of SD cards and compatible cameras. It also provides unique
benefits which can be used in conjunction with other failover methods described herein by
adding protection against failure of network and cabling infrastructure as well as geographic
distribution of the recordings.
Salient Systems Page 12
Hardware Failover
Redundancy can be built directly into the VMS recording server using a hardware platform with
built in hardware based failover.
Several computer
products
manufacturers offer
single-box server
solutions which
contain a duplicate set
of components
combined with a
watchdog controller. If
any individual
component fails, the
entire platform
instantly fails over to the duplicate set of hardware components.
These systems generally use a single storage array shared by the two sets of computing
components. This ensures that if the hardware fails over, data access is not compromised.
Hardware based failover solutions are simple to implement and are not dependent on VMS
feature sets or architecture. Additionally, failover using this method is nearly instantaneous.
Although this solution provides the highest availability it may also represent the highest
hardware cost. Because there is a duplicate set of components, the consumer pays not only for
the equivalent of two servers but also a premium for the watchdog controller functionality
which monitors the hardware and performs the failover actions.
Fault tolerant capabilities of hardware based failover solutions include:
High Availability: Hardware based failover can provide the best high availability of the
methods reviewed. Failover should be nearly instantaneous allowing clients to maintain
their connection for live video viewing. Hardware failover platforms general have a pool
of storage shared between the duplicate set of computing components, allowing no
interruption to video recorded prior to the failure event. This ensures that not only new
video will be recorded during the failure event, but investigators will also have access to
video recorded prior to the failure.
Salient Systems Page 13
Data Integrity: The ultra-fast failover speed offers an advantage over other methods
described. When a failure event occurs, existing camera connections should be
maintained and recording should occur without interruption.
× Disaster Recovery: Because all of the video and computing resources are (usually)
contained in a single chassis, there is no disaster recovery capability using this method.
Hardware based failover offers high performance, simple setup and universal compatibility.
Although failover licensing or virtual machine software is not required, it does not mean this
solution comes without budgetary considerations. The higher cost of the hardware itself puts
this solution equal to or great cost than other methods reviewed.
Salient Systems Page 14
Summary
There are a variety of failover methods which can be applied to VMS platforms. Some are not
dependent on VMS features or architecture and can therefore be applied to any VMS platform.
Additionally, failover methods can be combined for best protection. As an example, camera SD
card recording can be used in combination with other methods to protect against both VMS
recording server failures as well as network infrastructure and cabling failures.
Each method has different attributes related to the type of fault it protects against,
performance, compatibility and cost. As such, planning for the type of risk to protect against
and identifying the budget for fault tolerance are the first steps to planning for the correct
failover technology.
Salient Systems Page 15
ABOUT SALIENT SYSTEMS
Salient Systems offers network friendly, comprehensive IP and analog video surveillance
management systems (VMS) built on open architecture. As the recognized transition leader
from analog to digital video, Salient Systems’ VMS, CompleteView™, is scalable and provides
everything needed to manage a multi-server enterprise from a single desktop. Salient delivers
simple and scalable security today…and tomorrow. For more information about Salient Systems
and CompleteView, visit www.salientsys.com.
ABOUT THE AUTHOR Brian Carle is the Director of Product Strategy for Salient Systems Corporation. Prior to Salient he worked as the ADP Program Manager for Axis Communications. For information about this white paper or CompleteView, email [email protected].
©2017 Salient Systems Corporation. Company and product names mentioned are registered trademarks of their respective owners.
Salient Systems 4616 W. Howard Lane Building 1, Suite 100 Austin, TX 78728 512.617.4800 512.617.4801 Fax www.salientsys.com