Home News Business Simpana™ 7.0 - Single Instance Storage
|
Thursday, 01 January 2009 |
Simpana™ 7.0 - Single Instance Storage
Frequently Asked Questions
Q. How is the Single Instance Storage feature licensed?
A. Single Instance Storage (SIS) is licensed per Media Agent. It is not supported on
Netware Media Agent. If a SIS enabled storage policy copy is shared between multiple
Media Agents, each Media Agent involved must have a SIS license.
Q. When/where is Single Instance Storage best used?
A. The principal benefit obtained from SIS is from multiple full backups of the same
client/data. Additional benefit may be realized from similar data across multiple clients and
lastly some small benefit from duplicate objects within the same client. A prediction of
space saving can be calculated by subtracting the average incremental backup size from
the full backup size and multiplying that by the number of full backups being retained.
Q. Is there a predictor tool to tell me how much Single Instancing Storage will save me?
A. Since the primary benefit of Single Instance storage is from static files in repeated full
backups, the more fulls you do – the more single instance savings you get. The basic
formula for this is: Backup Size x (#of Full Backups – 1 (first Full backup)). For example; If
you have 100GB full backups and retain 6 full backups on magnetic storage, your potential
savings is 100GB x (6-1) or 500GB. Of course, this formula assumes that your full backup
contains files that do not change, which isn’t the likely case. A more exact estimate would
need to factor in new and changed files.
There could be additional savings from single instancing duplicate files from the same or
different clients. However, this is considered less significant in storage savings than
repeated full backups.
Predicting what files are static between full backups and the volume associated with those
files is the challenge. New and changed files are included in an incremental backup.
Predicting what portion of the incremental volume are actually new files and what are
changed have impact on the number and frequency of static files that can be single
instanced. Any Single instancing prediction based on these assumptions would be of
limited value. A large file included in a single incremental backup could skew the
prediction one way or the other.
Knowing all this, and if you want to make a prediction – here’s the easiest way:
If you are currently doing backups you can take the size of your full backup minus the
average size of your incremental backup to get an estimated single storage savings
volume per full backup. Multiple this by the number of full backups you intend to retain
(minus 1) will give you the potential Single Instance Storage saving.
If you are not already doing backups, you can find the maximum potential savings from
Single Instancing use the script listed below to add up the bytes of all potentially single
instanced files in the provided path. Read the instructions and caveats at the beginning of
the script. Subtract your estimated average incremental percentage volume (5% ?) and
Multiple by the number of full backups you intend to retain (minus 1). This will give you the
potential maximum Single Instance storage savings.
@ECHO OFF
:::::::::::::::::::::::::::::
::
:: sis.cmd
::
:: Author: M.C. Dahlmeier
:: Date: 11-06-07
::
:: Purpose: Calculates potential Single Instance Storage savings
:: for Full backups on a Windows File System
::
:: Caveats
::
:: - File System objects only
:: - Must be run in each subclient content root folder level
:: - Does not account for exclusions/exceptions
:: - Does not account for compression
::
:: Instructions: Copy to host system's subclient content folder
:: and rename as sis.cmd
::
:: Script scans all files/subfolders in the currect directory and calculates
:: the total volume and the potential savings from single instance storage
::
:: Results will be reported in MegaBytes for the directory total volume and
:: single instance storage savings from subsequent full backups. Actual
:: storage savings is a factor of the number of retained full backups
:: that are single instanced.
::
:: The Variable SIS_MIN assumes the default value of 50 KB (51200) as the minimum
:: size of files that are not single instanced. This value can be changed
:: here to correspond to the actual setting.
::
SET /A SIS_MIN=51200
::
::::::::::::::::::::::::::::::
:: Initialize running sum holders
SET /A SIS_SAVINGS=0
SET /A TOTAL_VOLUME =0
S i n g l e I n s t a n c e S t o r a g e F A Q | 3
The Information in this Document Is Subject to Change without Notice
:: Get CUrrent directory
FOR /F %%I in ('cd') do SET CUR_DIR= %%I
:::::::::::::::::::::::::::::::
::
:MAIN
::
:: Main Routine to calculate and report totals
::
:::::::::::::::::::::::::::::::
:: Parse all files and folders in current directory
FOR /F %%I in ('dir /s /b') DO CALL :SIS_CALC %%~zI
:: Convert to MegaBytes
SET /A SIS_SAVINGS /=1024000
SET /A TOTAL_VOLUME /=1024000
:: Report Results
cls
ECHO.
ECHO Total Volume Found in %CUR_DIR% = %TOTAL_VOLUME% MB
ECHO.
ECHO Potential Single Instance Storage Saving for each
ECHO additional Full backup = %SIS_SAVINGS% MB
GOTO :EOF
::::::::::::::::::::::::::::::::
::
:SIS_CALC
::
:: Subroutine to sum file sizes
::
::::::::::::::::::::::::::::::::
:: Check for blank size from directories
if [xxx%1]==[xxx] GOTO :EOF
:: Increment the running sum holders
SET /A TOTAL_VOLUME+=%1
IF %1 GEQ %SIS_MIN% SET /A SIS_SAVINGS+=%1
GOTO :EOF
::::::::::::::::::::::::::::::::
::
:: End of Script
::::::::::::::::::::::::::::::::
Q. Are files with identical content but different names or different owners single instanced?
A. A typical backup stream consists of the actual data and the metadata (such as ACLs)
associated with the file. When SIS is enabled, only the data stream is single instanced.
The metadata containing the file name, security, etc. which is specific to each of the
associated objects is individually stored in the SIDB. This enables single instancing of the
same object as owned by different users/hosts or with different file names.
Q. Will my Archive, Backup or Auxiliary copy jobs run faster to a Single Instance Storage enabled storage policy copy?
A. No, SIS improves storage efficiency not throughput. Incremental jobs and software
compression can be used to minimize the amount of data being transferred.
All original backup/archive/auxiliary copied data destined for a Single Instance Storage
enabled copy is written to the magnetic media in full regardless of where signature
generation is performed. After the data is written, a check is made to determine if a
duplicate baseline object already exists in the Single Instance Data Base. If so, then the
new physical copy is linked to the baseline object and the redundant copy is deleted.
Thus, SIS is “state only” within the storage policy copy.
A restore or auxiliary copy job will always read out the full data set from a SIS enabled
copy. If the auxiliary copy target is another single instanced copy then the data will be rehashed
and re-single instanced at the destination Media Agent.
Q. How do I enable Single Instance Storage on a storage policy copy?
A. Single Instance storage must be enabled on the Storage Policy copy during creation.
This setting cannot be changed afterwards. If single instancing is no longer required, you
can disable all subclient single instancing or aux copy data off to a non-single instanced
copy.
Any storage policy copy associated with a magnetic library can be enabled for SIS. For
example; for additional performance gains you might enable spooling for primary copy and
SIS enabled for a secondary copy.
Q. Can I enable Single Instancing Storage on a Storage Policy copy
using a Static Shared Magnetic Library?
A. Single instancing can be used with a Static Shared Magnetic Library. The configuration
requires that one, non-participating Media Agent host the Single Instance Data Base
(SIDB). This requirement is based on the issue that the SIDB is transactional log based
and only one MediaAgent can write to it at any one time. All other Media Agents sharing
the library will pass signature generation for comparison to that library.
Q. Can a Single Instance Storage Policy copy use a Static Shared
Magnetic Library use Replication disks?
A. Yes, however, a real-time replication configuration is not recommended as the amount
of initial I/O involved in creating the SIS is significant. Scheduled replication using periodic
snapshots is much more efficient and an effective means to retain duplicate SIS in more
than one location. CommVault provides this scheduled replication for Static Shared
Magnetic Libraries with replicated disks using the Remote Office Backup configuration of
CommVault’s Continuous Data Replication (CDR) product.
Q. How big does the Single Instance Database get?
A. The average size of a unique entry in the SIDB is approximately 200 Bytes. Each
duplicate saved object adds another 8 bytes. Hence the size of the SIDB is determined by
the number of objects processed and the level of uniqueness and duplication. Assuming
full uniqueness, you can calculate the maximum expected size of a SIDB by multiplying
200 * number of objects * number of retained backups.
Q. If I lose the Single Instance Data Base can I still restore data?
A. The SIDB is not used during restore. Hence, the loss of SIDB has no impact on
capability or performance of restore.
Q. What happens if I do lose the Single Instance Data Base?
A. Loss of SIDB causes subsequent data protection operations to restart single instancing
as the index is regenerated. A new copy of each object will be written. However, this
should be of little concern since each object is already scheduled for re-generation on a
regular basis by the SIS configuration setting. Your main concern should be that the loss
of the SIDB will cause an inability to prune the previous single instance stored data.
Without the SIDB, the data aging process can’t identify which file to prune. To prevent this
from happening, CommVault recommends that the SIDB be backed up on a regular basis.
A File System iDataAgent should have a subclient configured to protect the SIDB to
another storage policy/library. You should restore the SIDB before continuing SIS
backups. If a new SIDB in generated before the restore is done, the restored SIDB and the
new SIDB cannot be merged.
Q. Can I use Data Encryption or Compression with Single Instance
Storage?
A. This is one of the advantages CommVault’s Single Instance Storage has over other deduplication
software/hardware products. If Data Encryption and/or Data Compression is
enabled the system automatically runs the signature module after data compression and
before data encryption. If the setup contradicts this order, the system will automatically
perform compression, signature generation and encryption in the source client computer.
(remember to CHEck (Compress - Hash - Encrypt )
Q. Can I use CommVault’s Single Instance Storage with other dedup
devices?
A. SIS should be enabled to optimize data storage by NAS hardware de-dup devices such
as netapp ontap/asis or quantum dxi/cifs/nfs. Data Domain devices do not required SIS
for optimized storage.
Q. What Single Instance Storage controls are available?
A. You can control the following:
• Where the signature (hash) generation is performed.
Signature generation is enabled on the properties page of the associated subclient.
Options for signature generation are: None (default), Client, or Media Agent. If
multiple SIS subclients concurrently use the Media Agent for signature generation
you may get CPU thrashing and extremely poor performance. For scalability,
Single Instance storage generation on the client can be enabled and configured
using a subclient policy.
• Where the Single Instance Data Base (SIDB) is located.
The storage policy name and storage policy copy name are combined to create the
folder on the first mount path for the Single Instance Data Base (SIDB) location.
Make sure you do not use illegal file characters (e.g. / ) in your naming convention.
You can change the name and location at anytime.
If the system has problems creating/access the SIDB folder, the SIS tab in the
Storage Policy Copy Properties page may appear disabled. Verify the presence of
the SIDB folder and if not there - validate the user access privileges and re-create
the storage policy copy.
CommVault recommends not using the default location of first mount path for the
SIDB if there are multiple mount paths configured in the Magnetic Library. If the
SIDB mount path fills up, Single instanced jobs to the other mount paths will fail.
• Accessible data path(s) to the SIDB.
While UNC paths can be used, we strongly recommend you use a local path only
to the SIDB. Only one Media Agent can or needs to write to the SIDB. In a Static
Shared Magnetic Library configuration, the Media Agent having write access to the
SIDB should not be in the client available data paths for the storage policy copy.
Data path Media Agents writing to the shared magnetic library will pass their
signatures to the SIDB Media Agent for comparison. Additionally, since the SIDB is
not required for restores or auxiliary copy, there is no value in adding multiple data
paths. The ability to add and remove a data path is provided to facilitate moving
the SIDB if necessary. Only Media Agents of same byte order (endianness) should
be used when sharing an SIDB. There are bi-endian host operating systems that
allow you to switch byte order. For the most part, stick with the same OS Media
Agents.
• Minimum object size being single instanced.
On the Storage Policy Copy Properties page’s Single Instance tab you can specify
the minimum object size threshold to determine what objects get single instanced.
This has a default setting of 50KB. It is not efficient to reduce that value below
20KB. Objects that are not single instanced are stored as N-xxx-xxx-xxx files.
Single instanced objects are stored as S-xxx-xxx-xxx files.
• Maximum age of a baseline reference object.
On the Storage Policy Copy Properties page’s Single Instance tab you can define
the maximum age of a baseline reference object. As new duplicate objects are
written to the store, if the reference object is older than this value then a new
baseline object will be written to the system. This improves performance over time.
This option uses a 90 day default value.
©1999-2008 CommVault Systems, Inc. All rights reserved. CommVault, CommVault and logo, the “CV” logo, CommVault Systems,
Solving Forward, SIM, Singular Information Management, Simpana, CommVault Galaxy, Unified Data Management, QiNetix, Quick
Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe and CommCell
and are trademarks or registered trademarks of CommVault Systems, Inc. All other third party brands, products, service names,
trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners.
All specifications are subject to change without notice.
 |
|
|