Pages

Tuesday, July 3, 2012

Running Commands with SRM 5.x

Over the last few months I’ve had the opportunity to develop various commands to be run within SRM recovery plans, during this time I’ve uncovered some interesting results and gotcha’s.

Command generated Errors – If you look around there at various blog posts and community pages which provide an insight to running commands on the SRM Server, most of these suggest using a batch (.bat) or command (.cmd) file to call the script or run the command that you desire. The problem with this approach is that the error state reported within the SRM recovery plan relates only to the success of failure of calling the batch or command file provided within the recovery plan. For example, ‘c:\windows\system32\cmd.exe /c c:\srm-scripts\callout.cmd’ will call a file called ‘callout.cmd’ located in the c:\srm-scripts folder, if this file exists and can be called a ‘0’ is returned to SRM and the step gets a ‘Success’ status, if SRM is not able to call the file a ‘1’ is returned and the SRM step gets a ‘Warning’ status.

Calling PowerShell – If you have ever used SRM you will know that currently (5.0 or earlier) SRM is still a 32-bit application even though it can run within a 64-bit operating system. Most of you will realize this when you setup the environment as it requires a 32-bit ODBC system DSN to connect to database. Well this run’s true when calling PowerShell, in order to call PowerShell you must use the syntax ‘C:\Windows\SysWOW64\WindowsPowerShell\v1.0\Powershell.exe’ otherwise nothing will happen.

Single Quotes v Double Quotes – When running commands that have switches where you provide a value I’ve had different levels of success depending on the contents of the value being passed to the command. If you’re a programmer (which I’m not) you will know single quotes tell the command not to interpret anything inside the quotation marks whereas double quotes can be used to interpret the variables contained within. Using double quotes created unusual behavior on some of our PowerShell commands recently for values that contained a space and passwords with strange characters. I’ve not been able to find any SRM documentation that provides guidance on this but would suggest always using single quotes.

Calling Linux Scripts – Traditionally my clients have been 100% Windows shops, and when looking for guidance on SRM and calling scripts on Linux machines there is a distinct lack of discussions or information available. On a recent engagement this is exactly what I needed to do, but thanks to my colleague and SRM guru Lee Dilworth the answer is very simply use the following syntax ‘/bin/sh /srm-scripts/callout.sh’.


Thursday, June 7, 2012

SRM 5 & Dell Compellent Best Practice Procedures

I was recently involved in an engagement to implement SRM 5 with a Dell Compellent storage solution. During the engagement we had various issues around how to implement the Dell components. Dell have produced a best practice document but some elements of the document are a little light on the ground. This blog is a collection of tested procedures that we followed to prove the solution.

Setting up the Environment

During our testing we setup the environment using the following steps:


1. Set up the LUNs and replication using Compellent Enterprise Manager client.
2. Save the restore points using Enterprise Manager client. This is a manual process that Dell talks about within their Best Practice guide and must be done after all major SRM events.
3. Set up datastores in vSphere and present them to the appropriate hosts.
4. Create a Protection Group(s) in SRM for the newly created datastores.
5. Create a Recovery Plan(s) in SRM for the newly created datastores.
6. Configure SRM debug logging in both Data Collectors to collect SRM information. (We did this step in order to capture detailed logging to provide Dell if required)
7. Confirmed that the SiteA SRA was talking to the remote data collector (hosted in SiteA) and the SiteB SRA was talking to primary data collector (hosted in SiteB), as per Compellent Best Practice. (This configuration is not 100% clear in the best practice document and we were provided conflicting information from Dell Support when we asked for verification, the document did not define Primary vs Remote Data Collectors and simply referred to Enterprise Manager in both sites.


Defining the Failover procedure

We defined the following procedure to be run during a failover type of 'Disaster Recovery':

1. Run the Recovery Plan, selecting Disaster Recovery as the recovery type.
2. Save the restore points using Enterprise Manager client, ignoring the inactive ones flagged. (The Compellent Best Practice document says that these can be removed, but our testing showed that this created errors at the reprotect stage so we skipped this process)
3. Reprotect (to reverse replication)
4. Save the restore points using Enterprise Manager client.

Defining the Failback procedure

We defined the following procedure to be run during a failback type of 'Disaster Planned Migration:

1. Run the same Recovery Plan as used for the Failover procedure, selecting 'Planned Migration' as the recovery type.
2. Save the restore points using Enterprise Manager client, ignoring the inactive ones flagged. (The Compellent Best Practice document says that these can be removed, but our testing showed that this created errors at the reprotect stage so we skipped this process).
3. Reprotect (to reverse replication).
4. Save the restore points using Enterprise Manager client.

Testing

After defining the procedures we then carried out the following tests to verify the process.

Test 1 - Both Compellent Enterprise Manager data collectors running


Ensuring both data collectors were up and running, the failover procedure was run for the following scenarios:
DR from SiteB to SiteA, virtual machine TESTVM001, hosted on datastore SRMTest_1
DR from SiteA to SiteB, virtual machine TESTVM001, hosted on datastore SRMTest_2
Both failovers completed without error, so we then followed the failback procedure to bring everything back to where it started. Both failback procedures completed without error.

Test 2 - Primary data collector shut down

Shutting down the primary data collector hosted at SiteB, we simulated a failure of SiteB and then ran the following scenario:
DR from SiteB to SiteA, virtual machine TESTVM001, hosted on datastore SRMTest_1
The failover completed successfully, but with a number of errors at the following stages of the recovery plan:

- Pre-Synchronise Storage
- Prepare Protected Site FMs for Migration
- Synchronise Storage

None of these errors were fatal but they did mean that a Reprotect of the protection group could not be completed until they had been resolved. This meant running a second Recovery once the Primary data collector was back online, this then completed successfully and then allowed a Reprotect to run.
With both data collectors online, the failback procedure was then run to bring everything back to where it started. This completed without error.


Test 3 -  Remote data collector shut down

Shutting down the remote data collector hosted in SiteA, we simulated a failure of the SiteA and then ran the following scenario:
DR from SiteA to SiteB, virtual machine TESTVM002, hosted on datastore SRMTest_2
This failover completed successfully, but with the same non-critical errors as seen in test 2 along the way. Resolution was the same as test 2, bringing the remote data collector back online and then running a second Recovery. This completed successfully and allowed a Reprotect to run.
With both data collectors online, the failback procedure was then run to bring everything back to where it started. This completed without error.

Wednesday, June 6, 2012

Joing a VM to a Domain with vCloud Director

Q. According to this KB http://kb.vmware.com/kb/1026326, Windows VMs in vCD need to be configured with DHCP to be able to join a domain. I have a VM with a static IP address. How do I get around this requirement?

A. The issue around DHCP is Microsoft-related and not VMware. The problem is that we rely on Sysprep to perform the customization and this is where the DHCP requirement comes in. As a workaround, we ran a script called SetupCommand.cmd which calls the netdom.exe command with the following syntax:

netdom.exe join %COMPUTERNAME% /Domain:vmware.com /OU:OU=vApp-VMs,DC=vmware,DC=com /Userd:???? /PasswordD:???? /Reboot

To configure this: 

1. Before submitting the vApp to the catalog
2. Right click the VM and select ‘Properties’



3. Select the ‘Guest OS Customization’ tab


4. Scroll down to the ‘Customization Script’ sections and enter your command:

Friday, June 1, 2012

Installing SRM with HDS Storage

These are notes on how to setup SRM with HDS Storage using a command control device.

If using a VM for the SRM server ensure the following are configured before attempting to install the SRM components:


1. RDM presented to the SRM VM, attached, initialized, but not formatted or assigned a drive letter.
2. Install Hitachi RAID-Manger/Command Control Interface to setup the HORCM service (Protected site HORCM0 & Recovery Site HORCM1).
3. Configure the horcm0.conf and horcm1.conf files on the respective SRM Servers.
4. Start HORCM service (you will need to alter the name of the run config file located in the C:\HORCM\Tools, to match the service name 0 or 1.
5. Test connection locally using C:\HORCM\Tools\pairdisplay –g GRP1 –l (for local check) and pairdisplay –g GRP1 –fcx (remote side check), alter GRP1 to match the Group defined in your HORCM.conf file.


FAQ:

Does the SRA work at an individual Hitachi LUN level or HUR (Hitachi Universal Replicator) journal  level ?
Answer: HUR keeps all LUN pairs in the JNL group in the same status so if the pairs are split for a failover, all pairs in the JNL groups are split. The SRA cannot override this behavior. To split a single LUN it must be in a JNL group without any other LUNs.

Can SRA do split/sync for just one LUN ?
Answer: Yes, depending on the configuration of HORCM but this is possible. The HUR pairs need to be created as single LUN/JNL group.

If we have multiple LUN's in same journal does it split  all LUN's at same time?
Answer: Yes

Does the SRA start a reverse sync immediately (Reverse P-VOL and S-VOL) once the secondary site is brought up?
Answer:  With SRM 4, HUR / TC do horctakeover so replication is reversed and if possible resync is run in the reverse direction. This behavior was changed for SRM 5 to support the additional functions. SRM 5 will only split and reverse the replication but will not resync until the reverseReplication function is called.

Do we have to create a separate HUR journal for each LUN,  for individual LUN failover/failback?
Answer: A separate JNL Group.

Useful links:
http://www.hds.com/solutions/applications/vmware/download.html

Friday, April 20, 2012

VMware OVF Tool

So the other day I wanted o import a virtual appliance into vCloud Director, but unfortunantly I could only find it in OVA format. Not a problem I thought I can use the VMware OVF Tool to convert and then carry out.

Trouble is it took a little while to find so for future reference and in case anyone else needs to get hold if it in a hurry the download and documentation can be found using the following communities link:

http://communities.vmware.com/community/vmtn/server/vsphere/automationtools/ovf