Provisioning Services High Availability with a Local vDisk Store
When setting up high availability in a Provisioning Services environment there is always the question of where to put the vDisk store. You have several different options for the location of the vDisk store. You can place it on the local system, on a Windows CIFS share, Network Attached Storage, iSCSI SAN, or Fibre Channel SAN. Each option has both its pros and cons. For more information on the pros and cons of different vDisk store options see Citrix support article CTX119286 – Provisioning Server High Availability Considerations. In this post we are going to focus on high availability with a local vDisk store.
The steps for configuring high availability with a local vDisk store are pretty straight forward. The main concern is the space needed on each Provisioning Server if you have multiple vDisk images. The good side is that drives are cheap these days. Using a local vDisk store for Provisioning Services high availability is also cheaper than purchasing Network Attached Storage and an iSCSI or Fibre Channel SAN. It also offers more redundancy than a Windows CIFS share because having a vDisk store on each Provisioning Server eliminates a vDisk store on a Windows CIFS share on a file server from being a single point of failure. You could also cluster the file server for the Windows CIFS share, but then you are still spending extra money on some type of shared storage for the cluster. Another concern is the number of disks you can install in your servers. If you have enough room to split the operating system and vDisk store on different RAID sets, then do so. Having the RAID sets on different RAID controllers would be even better. I strongly recommend this. A good example would be a server with six disk drive bays. Use two drive bays for your operating system in a mirror or RAID 1 set. Then use the other four drives bays for your vDisk store in a RAID 5 set with a spare or RAID 10 set. The more spindles, the better performance. The only downside to using high availability with a local vDisk store is that you have manually copy the vDisk files to all of your Provisioning Servers in the farm. I’m sure this can be automated with scripting or another process.
So let’s go through the process of setting up Provisioning Services high availability with a local vDisk store. Follow the steps below to configure high availablity with a local vdisk store:
1. Configure the vDisk store to be available on all Provisioning Servers in the farm with the same path to the vDisk store – example D:vDisks.
2. Setup a vDisk on one of the Provisioning Servers in the farm without load balancing or high availability enabled. Make sure the vDisk is in Private Mode (single device, R/W access).
3. Create a Target Device and assign the vDisk to it.
4. Boot the Target Device to the vDisk and create your master image on the vDisk using XenConvert.
5. Shutdown the Target Device and make a copy for backup/updates. Keep the backup/updates copy on this server. Do not copy the backup/updates copy to the other Provisioning Servers. Also do not enable load balancing and high availability on the backup/updates copy.
6. Put the vDisk in Standard Mode (multi-device, write-cache enabled).
7. Copy the vDisk (both .vhd and .pvp files) to the remaining Provisioning Servers in the farm to the same path you created the original vDisk at in step 1 – example D:vDisks.
8.Once the vDisk is copied over (.vhd and .pvp files) to the remaining Provisioning Servers in the farm, enable load balancing and high availability on the vDisk.
9. Create the remaining Target Devices and assign the vDisk to them.
10. Now you can boot Target Devices to the vDisk.
Note: When you need to update your image, use the backup/updates copy. Make sure the backup/updates copy is in Private Mode (single device, R/W access) and assign it to one Target Device. Make your changes/updates to the image and start with step 5 to update all of your Provisioning Servers with highly available load balanced vDisk using a local vDisk store. Another thing to be aware of is that copying large vDisk files during the day between Provisioning Servers can cause performance issues and put unecessary load on your Provisioning Server network cards. Copying vDisk updates between Provisioining Servers should be done during non production hours.
You should now see the Target Devices get load balanced across the Provisioning Servers in the farm. The Target Devices will also fail over between Provisioning Servers if one goes down since high availability is also enabled. When the downed Provisioning Server comes back up, you can right click on the Provisioning Servers and click rebalance devices to distribute Target Devices evenly in your Provisioning Services farm.
If you have found this article interesting or if you have any other insights, please feel free to leave comments on this article.
Jarian – can you clarify something for me regarding this – I have read in the documentation “Load balancing is not supported in the NIC failover implementation.” – and have seen conflicting posts about this over in the Citrix forums. Is this regarding active/passive NIC failover to different switches with regards to physical PVS servers? I am assuming that teamed physical NICs using LACP/LAG, on physical PVS servers, working in tandem with other PVS servers leveraging an HA/LB configuration is fine, correct? Thanks.
I just discovered what might be a bug. I’m running PVS 5.1 SP2.
I found that if a vDisk is configured for HA using local storage, when the vDisk is deleted from the store via the console, and the checkbox to delete the associated VHD files is checked, the actual .VHD, .PVP. and .LOK files are only removed from one of the two HA servers.
From what I can tell, it seems to consistently only remove it from the first one in the list. I have to go purge the files from the second one manually.
Although this is not a huge deal if you are aware of it, if disk space is something you need to keep an eye on, this is something you should definitely look out for.
Has anyone else experienced this, or know of a way to fix it?
Cheers,
Shawn
My method of monitoring performance was to keep a close eye on the target device boot times by filtering the application event log. As long as I didn’t see a spike after the file copy was started, I was confident that all was good.
This article is extremely timely. I’ve been running PVS since it was Ardence 4.0. I used to have one production server and one development server, each of which had vdisks located on local storage. I would do the large file copy at off hours using a script just like Rich posted.
Over the past few months, I’ve migrated over to PVS 5.1 SP2. We now have two production servers set up for HA/LB, and one development server for creating/updating vdisks.
All three servers connect to the same database, which is located on one of the two production servers. This server also runs the license manager.
All three servers are VMs on vSphere 4.0, and all target devices are physical.
I learned long ago that doing large file copies to a production server with target devices connected will have a huge impact on the performance of those target devices.
Here’s something I have recently discovered. Now that I have HA/LB at my fingertips, I’ve found that I can stop the stream service on server “A”, which fails all the target devices over to server “B” within seconds. Then, I can remove server “A” from the list of servers providing that particular store. This ensures target devices are not going to be connecting to “A” at which point, I can safely copy large vdisks to that server during regular hours. Once I’m done with “A”, I simply reverse the process, and copy the vdisks to “B”.
This does two things. First, it means I don’t have to wait until off hours to get a vdisk pushed out. Second, it provides a bigger window for file copies, which has become a concern as the number of vdisks I maintain has increased over time.
Jarian, is there any reason I should not do things this way?
As long as everything fails over properly and there is not performance impact while copying it should be fine. Nice process. I will have to test that out.
UPDATE:
Today I followed the same process I outlined in my previous post. However this time I had a different result from what I usually experience. I manually stopped the stream service on one of two HA/LB servers, as I always do. Then, I went back to the PVS console, and refreshed the screen to verify that the server was marked as “down” and that all clients had been re-connected to the other HA/LB server.
After stopping the service, I also removed that server from the list of available servers for that particular store.
Now, here’s the strange thing. As I refreshed, I watched the number of client connections climb up, up up, then drop down again. This happened over and over for several minutes, until they finally stabilized.
A few minutes later I was notified that while this was going on, those same clients had temporarily hung. When the service for those clients finally came back there was a balloon window on the client stating “Provisioning Server: Service Restored.”
Could it be the order in which I am doing things? Might I have to perform certain tasks (such as removing a server from the list of available servers for a particular store) on specific servers in order for them to work properly?
I am concerned, since I rely on this process to be able to work on these servers during business hours without impacting end users.
Any thoughts or advice are welcome.
Cheers,
Shawn
We have 2 PVS with the same vDisk. We have enable load balancing and high availability on the vDisk, but stop the stream server on Server “A”, the target fail NOT to the server “B”.
someone has an idea where I have to search ???
Great article! First time I’ve ever setup with local storage and it works great. Did Rich ever share the script as I’d be interested in tat same scenario.
Thanks!