Migrate a MS SQL cluster with a shared RDM disk in a VMware environment

by David Fong

We had a need to migrate a MS SQL cluster with a shared RDM disk in a VMware environment to a new storage for both the OS disks and the RDM.  The two nodes on the clustered are located on different ESXi hosts.  We put the database files and logs on the RDM disk other than the OS is on a VMFS datastore.  It was not a very straight forward migration that involves un-mapping and re-mapping RDMs, coping the databases and all the related files, and finally migrating the OS drives.

We’ve decided to copy all the data to the new database disk and change the drive letter(s) to reflect the original ones while the MS SQL is offline and that worked out for us. Here’s a brief description of the steps we decided to take:

Preparation:

  1. Allocate a new RAW LUN for the Database, visible to both ESXi hosts
  2. Allocate two new 2TB LUN for OS, formatted as vmfs storage, one for each ESXi host and visible to both (so that the 2nd node can find the RDM file)
  3. Backup whole OS
  4. Turn off the related application so there’ll be no new writes to the Databases.
  5. Backup all Databases

DB LUN (RDM) migration:

  1. Add the new LUN to cluster node 1 as RDM
    – Pick SCSI controller 1, SCSI ID 1:1 (the old one is using SCSI 1:0)
    b. Make a note of the new disk’s path and vol ID
  2. Add new LUN to cluster node 2 as existing HD
    – Pick SCSI controller 1, SCSI ID 1:1, pick independent, persistent option
  3. In Disk Manager, create and format new volumes as needed
  4. In Failover Cluster Manager, add new disk to cluster
  5. Assign new disk to MSSQL role
  6. DO NOT STOP mssql role
    a. Click MS SQL Role, then click the resource tab at the bottom
    b. Stop SQL server/SQL agent resource and leave disk resources online
    c. Make sure sql server is offline
  7. Copy/restore from backups all the DB files to the new disk
  8. Change Drive letter(s)
  9. Update SQL server/agent dependencies
  10. Restart SQL server/SQL agent to test.
  11. Test DB failover
  12. Check/update resource dependency
  13. Remove old disk from MSSQL Role AND cluster disks resource
  14. Remove old disk from Disk Manager
  15. Remove old RDM from VMs
  16. Test DB….

OS migration:

  1. Shutdown both cluster nodes
  2. Remove the DB disk from the cluster node 2, DO NOT DELETE DATA FILE
  3. Remove the RDM disk from cluster node 1, DO NOT DELETE DATA FILE
  4. Migrate both nodes to the corresponding new Datastore
  5. Add the new LUN back on cluster node 1as RDM
    – pick SCSI Controller 1; SCSI ID 1:0, note Disk file Path and match vol ID
  6. Add the new LUN as existing HD to cluster node 2
    – pick SCSI Controller 1; SCSI ID 1:0, and the independent, persistent option
  7. Start-vm db1c
  8. Start-vm db2c
  9. Make sure MS SQL roles starts up correctly
  10. Test DB/failover
  11. Test the application

And that concludes the migration process.  There are other migration options like using ALTERDATABASE to change physical location of the DB files, building a new cluster and do a full restore, and vmotion the VMs with the DBM files, etc… but we since we need to keep the same drive letter(s) and with other requirements we eventually decided that the procedure we followed worked best for us.

We did a test migration and recorded the whole process here:

https://drive.google.com/open?id=1tpfEnRvrVny3vmEw3E3x94FqFup_9tPy

It took about 45 min for the test run to complete. But that’s because the test DBs were very small and the OS drives have only the basic Windows install.  Please take a look if you’re interested and shoot me a email if you have any questions or comments at davidmfong@stanford.edu.