Introduction and Evaluation of Windows Server 2016 TP4 Hyper-V using Storage Spaces Direct for Hyper-Converged Infrastructure
This is a two part series. To read the second post, click here.
A couple years ago Microsoft released a new feature in Windows Server called Storage Spaces. From the customers that I work with, the feature has not been widely adopted, but has attracted much interest from Microsoft shops, particularly those looking to leverage for Hyper-V SMB storage or for scale-out private clouds. I suspect this is mostly due to one of the original limitations of Storage Spaces, requiring all hosts in the cluster to see the local storage, either directly or through JBOD SAS expanders, which are not common in the enterprise space. With the upcoming release of Windows Server 2016, a new enhancement to Storage Spaces is being introduced, called Storage Spaces Direct.
To draw a correlation, this is very similar to other scale-out storage solutions such as VMware Virtual SAN (VSAN) and other Hyper-Converged Infrastructure (HCI) offerings on the market. However, unlike HCI, Storage Spaces Direct doesn’t necessarily need to be used in conjunction with Hyper-V, although it’s a very logical fit. For Microsoft shops, however, this feature is enticing as it shows promise of the absolutely lowest cost simple building block for scale-out storage architectures as the Windows Server licenses are typically assumed. Additionally, in a future blog post, I’ll show you how you can leverage Microsoft’s free hypervisor, Hyper-V Server 2016 to deploy Failover Clustering and Storage Spaces Direct without any Windows licensing requirements! Everyone likes free, right?
Initially, I would anticipate Storage Spaces Direct will be leveraged for Small-Medium Businesses (SMB) or larger Enterprises with Remote Office, Branch Office (ROBO) distributed architectures where some degree of shared storage is desired in remote locations. I could also see Storage Spaces Direct being used in application architectures that are highly distributed where reliability of single points of failure are addressed at a higher level in the application stack, such as Microsoft Azure and Hyper-V Private Clouds, IIS Web Farms, Virtual Desktop Infrastructure, Remote Desktop Services, etc. In this blog post I’m going to show you how to setup a lab environment for running three Windows Server 2016 TP4 virtual nodes with Storage Spaces Direct, specifically for Hyper-V workloads, all nested on a single VMware vSphere 6.0 host. Then, I’ll show you how to expand compute and storage to a fourth node, a common use case or scenario when evaluating Hyper-Converged Infrastructure.
I frequently prefer to do lab builds nested inside VMware vSphere as it requires less coordination than shuffling multiple pieces of hardware around. Especially for this type of lab exercise where a minimum of three physical server nodes with Windows Server 2016 TP4 would be required. Much easier to do in a virtualized lab. The type of setup I’m going to demonstrate in this blog post makes a lot of sense to deploy using a 2U 4-Node configuration, although you could easily do with rack-based pizza box servers. Here’s a visual of what the physically equivalent setup would look like using either HP Apollo 2000 or SuperMicro TwinPro hardware:
Doing some quick ballpark pricing, I am confident a three-node setup with spinning or flash local storage, Windows Server licensing and Storage Spaces Direct can start as low as $15,000-$30,000 or $5,000-$10,000 per server node. Hopefully you can quickly see how compelling and attractive this can be as most competitive solutions start around $20,000-$30,000 per server node (when including some type of shared or HCI storage). Don’t take my word for it, go ahead and price out a four node SuperMicro TwinPro kit for yourself, using this public configuration website: http://www.thinkmate.com/system/superserver-2028tp-hc1r/117613. Unfortunately this reseller/distributor is fairly limited in their configurations, so you may need a custom configuration for three nodes or single processor systems. For ROBO, I would probably start with the following physical server configuration per node: 1x2680v3 Processor, 64GB RAM, Dual 10GbE, MicroSD, 1×800 (or 960GB) SSD and Windows Server OEM license. Don’t forget to add at least one 10 Gigabit switch for storage replication and VM traffic.
To learn more about Storage Spaces Direct, see Microsoft’s site: https://technet.microsoft.com/en-us/library/mt126109.aspx and blog: http://blogs.technet.com/b/clausjor/archive/2015/05/14/storage-spaces-direct.aspx. Let’s get started!
Introduction and Overview of Deployment Tips
Since I’ll be showing how to leverage Storage Spaces Direct for a Hyper-V use case, we need to setup a Windows Server 2016 TP4 template with the correct VMX flags to enable nested virtualization. For this blog post, I will be using VMware vSphere 6.0. Setting the flags for Hyper-V on vSphere 6.0 works well with one exception. If you deploy nested virtual machines, the Hyper-V Integration Services will not start inside the nested VMs as it improperly detects that Windows is virtualized. For more details see the following: https://communities.vmware.com/thread/520832. Ignoring this one limitation, we can easily proceed with building a three-node cluster, nested in VMware vSphere 6.0.
Updated 2016.3.17: If you use vSphere 6.0u2 or later, the Hyper-V Integration Services issue has now been resolved! See screenshot from the communities thread above for Integration Services working properly in a nested VM.
To start, build a template VM for Windows Server 2016 Technical Preview 4 with the following:
Guest OS: Microsoft Windows Server Threshold (64-bit)
VMX Parameter: guestOS = “windows9srv-64”
Compatibility: ESXi 6.0 and later (VM version 11)
VMX Parameter: virtualHW.version = “11”
VMware Tools: version:9537 (Current)
VMX Parameter: N/A
Virtual Hardware: CPU
Hardware virtualization: Expose hardware assisted virtualization to the guest OS
Additional Advanced VMX Parameters:
hypervisor.cpuid.v0 = false
vhv.enable = TRUE
Additional parameters can be set through vSphere Web Client, VM -> Edit Settings -> VM Options -> Advanced -> Edit Configuration.
I have attempted a variety of settings and combination of configurations, and this setup is consistently successful. For example, if Guest OS or Hardware Version are set lower, Hyper-V may install, but nested VMs won’t power on with the following error:
Similar errors appear if you try to run Windows Server 2016 Technical Preview 4 on VMware vSphere 5.5. Unfortunately, this is the workaround to get the Hyper-V Integration Services to start within the nested VMs.
I’d recommend you get a single VM working with Windows Server 2016 TP4 Hyper-V in order to use as a template to deploy additional VMs. You can download the latest ISO for Windows Server 2016 Technical Preview from the following link: https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-technical-preview
Creating a Template VM for Windows Server 2016 Hyper-V and Failover Clustering
For the template, install Windows Server 2016 TP4 using defaults. Additionally, I disable the firewall and enable RDP prior to converting to template. If you use MDT to slipstream the drivers into the installation, you can use the VMXNET3 network adapter and Paravirtual SCSI controller for the system disk.
Next, from this working template I’ve deployed three VMs for my lab environment. On each of these three VMs, I’ve added a secondary SCSI Controller and one 800GB virtual disks. For the SCSI Controller, you can use Paravirtual, but you must select SCSI Bus Sharing: Physical. If you don’t select this option for the SCSI controller, the Cluster Validation process will fail and you won’t be able to add the disks to Storage Spaces (looking for SCSI page 83h VPD descriptor).
Here’s one of the VMs looks like during customized deployment:
Once all three VMs have been deployed and are on the domain, we can start with the configuration. All of these configurations can be done through the GUI using Server Manager and various Control Panel commands. Instead of capturing and sharing screenshots, I’ll provide a narrative of the steps to perform, along with PowerShell scripts to automate the process.
Configuring Windows Server 2016 Virtual Servers for Hyper-V and Failover Clustering using PowerShell
First, we need to set Static IP addresses for each of our three nodes. Feel free to customize and reuse the following snippet for assigning a static IP address via PowerShell:
### BEGIN VARIABLES ###
$OldNICName = “Ethernet0”
$NICName = “VLAN24_Desktops”
$IPAddress = “172.16.254.134”
$prefixlength = “19”
$defaultgateway = “172.16.224.1”
$dnsserverlist = “172.16.2.201,172.16.2.202,172.16.2.203,172.16.2.204”
### END VARIABLES ###
Get-NetAdapter $OldNICName | Rename-NetAdapter -NewName $NICName
new-netipaddress -interfacealias $NICName -ipaddress $IPAddress -prefixlength $prefixlength -defaultgateway $defaultgateway
set-dnsclientserveraddress -interfacealias $NICName -serveraddresses $dnsserverlist
Next, we need to install a couple Roles and Features using Server Manager. Specifically Hyper-V, Failover Clustering, and associated administrative tools. Since this needs to be done on each of the three nodes, again, feel free to customize and reuse the following snippet for installing the roles and features via PowerShell:
Add-WindowsFeature Hyper-V
Add-WindowsFeature RSAT-Hyper-V-Tools
Add-WindowsFeature Failover-Clustering
Add-WindowsFeature RSAT-Clustering
Stringing it all together, here’s what the output looks like after running from an elevated PowerShell prompt:
After the roles and features have been installed, we need to reboot each node. Of course, we can do this via CMD or PowerShell using the following:
Shutdown -f -r -t 0
Once each node has rebooted, we can add a Hyper-V virtual switch using the following:
New-VMSwitch -Name “VLAN24_Desktops” -AllowManagementOS $true -NetAdapterName “VLAN24_Desktops”
At this stage we can review the Failover Cluster and Hyper-V consoles to see the changes that have been applied to each node. Since Storage Spaces Direct for Failover Clustering is a fairly new concept with Windows Server 2016, I’ll share screenshots for manual configuration, along with the PowerShell commands to automate the process.
Creating and Validating the Cluster using Failover Cluster Manager
Once all nodes have been configured using the process above, open Failover Cluster Manager and create a new cluster. You can find Failover Cluster Manager under the Start Menu, All Apps, Windows Administrative Tools.
On the right Actions Pane, click Create Cluster:
Click next:
Enter the server names of each of the nodes to be added to the cluster. Ensure each connect and validate, then click Next:
Click Next to run the validation report:
Click Next:
Select Run only tests I select and click Next:
Windows Server 2016 introduces a new test to validate against, Storage Spaces Direct. This test is not selected by default when you run all tests, so we have to manually check the box to validate for Storage Spaces Direct. Check the box and click Next:
Click Next to confirm:
Running all tests will take a couple minutes. You can watch the progress as it’s completing:
Click View Report once finished:
Verify there are no Failed categories in the report:
If the VM’s SCSI controller was not set to SCSI Bus Sharing: Physical you will see the Storage Spaces Direct section with a Red X and a Failed status. Digging a little deeper, you’ll see an error similar to the following:
If you see the SCSI page 83h VPD description errors, go back and set each host’s SCSI controller to SCSI Bus Sharing: Physical. This is why I recommend putting the secondary (non-System) disks on a separate Paravirtual SCSI controller as described in the intro.
Once the report validation process has completed, we can finish building the cluster. Provide a Cluster Name and IP Address for the Cluster network and click Next:
Click Next:
Click Finish:
The cluster should now be online!
Creating and Validating the Cluster using PowerShell
To create and validate the cluster using PowerShell, change the variables below and run the following:
New-Cluster -Name VMWS16TP4HV-CL01 -Node VMWS16TP4HV-01,VMWS16TP4HV-02,VMWS16TP4HV-03 -StaticAddress 172.16.224.130
Test-Cluster -Cluster VMWS16TP4HV-CL01 -Include “Storage Spaces Direct”,Inventory,Network,“System Configuration”
Simply substitute the Cluster Name, list of nodes, and IP Address prior to running the above commands. Don’t forget to review the Cluster Validation Report, which is permanently stored under C:\Windows\Cluster\Reports. Here’s what the PowerShell output looks like:
Next, open Server Manager and go to All Servers. In the top right, click Tasks and Refresh:
Since the servers are now in a cluster, they will automatically propagate all nodes to the cluster in Server Manager:
Enabling Cluster Storage Spaces Direct using PowerShell
If you navigate to File and Storage Services, you’ll probably be as baffled as I was initially plugging through creating the Storage Spaces Direct virtual pools and disks. Unfortunately it doesn’t appear that Enabling Storage Spaces Direct has been made available through the Server Manager GUI. If I’m simply missing where it’s hidden, feel free to leave a comment below! To enable Storage Spaces Direct, run the following from an elevated PowerShell prompt:
Enable-ClusterStorageSpacesDirect
Without running this command, Server Manager will behave very similar to Storage Spaces prior to Windows Server 2016, only showing local disks to each node.
While running the command, watch the progress in the top header section while Storage Spaces reaches out to each of the nodes in the cluster to determine eligible disks.
There’s no output once the command has finished running, but if you go back to Server Manager it will look drastically different. Let’s go ahead and create a pool and disk using Storage Spaces Direct.
Disks view prior to enabling cluster Storage Spaces Direct:
Disks view after enabling cluster Storage Spaces Direct:
Deploying a Storage Pool and Disk using Server Manager
Navigate to Server Manager -> File and Storage Services -> Volumes -> Storage Pools. In the top right, click Tasks, New Storage Pool:
Click Next:
Name the pool and select the Primordial pool from the list. Click Next:
Select all appropriate disks and click Next:
Click Create:
Review the progress:
Check the box at the bottom to create a virtual disk and click Close:
Click Next:
Select the Pool and click Next:
Name the Disk and click Next. If I had multiple tiers of storage, this would be the time to specify to use storage tiers.
Leave Enclosure Awareness selected and click Next. If I had multiple disks attached to each of the nodes, this would ensure that my mirror and parity copies of data do not end up on the same nodes.
Select the desired storage layout (Mirror or Parity) and click Next:
Specify the size of the disk, and respectively the CSV where we’ll place virtual machine storage. This can be expanded at a later time if you start too small. Click Next.
Review and click Create:
Deselect Create Volume in the bottom left and click Close. We need to deselect Create Volume as the disk needs to be put into maintenance before we can create a volume and format it, which needs to be done through Failover Cluster Manager.
Open up Failover Cluster Manager if it was closed and connect to the cluster. Expand Storage and select Disks. Select the newly created disk and from the Actions pane click More Actions -> Turn On Maintenance Mode:
Determine the Owner Node listed for the volume and on that server open up Disk Management to create a new volume, NTFS formatted. Right click the Unallocated space and click New Simple Volume:
Click Next:
Click Next:
Click Next:
Name the volume and click Next:
Click Finish:
Validate the new partition was successfully created and formatted:
In Failover Cluster Manager, take the disk out of maintenance mode by selecting More Actions -> Turn Off Maintenance Mode:
From the Actions Pane, select Add to Cluster Shared Volumes:
The Assigned To property will change from Available Storage to Cluster Shared Volume as shown below:
We can verify the new Volume1 appears correctly under C:\ClusterStorage:
Deploying the Storage Pool and Disk through PowerShell
To repeat the manual steps outlined above, there’s a couple simple PowerShell commands we can execute…
The following command will create a Storage Spaces Direct Virtual Pool using all available Disks in the cluster. Change the StorageSubSystemName and FriendlyName variables shown below. Additionally, you can change ResiliencySettingNameDefault from Mirror to Parity:
New-StoragePool -StorageSubSystemName “VMWS16TP4HV-CL01.demo.entisys.com” -FriendlyName VMWS16TP4HV-CL01-Pool1 -WriteCacheSizeDefault 0 -ProvisioningTypeDefault Fixed -ResiliencySettingNameDefault Mirror -EnclosureAwareDefault 1 -PhysicalDisk (Get-StorageSubSystem -FriendlyName “Clustered*” | Get-PhysicalDisk)
The following command will create a Virtual Disk named Disk1 using Pool1 with size 400GB. Replace FriendlyName, StoragePoolFriendlyName, and Size (400 math in parenthesis). Additionally, you can change ResiliencySettingName from Mirror to Parity:
$NewvDisk = New-VirtualDisk -FriendlyName VMWS16TP4HV-CL01-Disk1 -StoragePoolFriendlyName VMWS16TP4HV-CL01-Pool1 -Size (400*1024*1024*1024) -ResiliencySettingName Mirror
Put the Cluster disk into Maintenance Mode:
Get-ClusterResource -InputObject $NewvDisk | Suspend-ClusterResource
Create a partition, drive letter, and format the newly created Virtual Disk:
Get-Disk -VirtualDisk $NewvDisk | New-Partition -UseMaximumSize -AssignDriveLetter | Format-Volume -Force
Take the cluster disk out of Maintenance Mode and add as Cluster Shared Volume:
Get-ClusterResource -InputObject $NewvDisk | Resume-ClusterResource | Add-ClusterSharedVolume
Here’s what the PowerShell output looks like, all put together:
Reviewing the results, doing through PowerShell has the exact same outcome, but requires about 20-30 fewer clicks! J
Deploying a Nested Virtual Machine and Reviewing Basic Storage Performance Results
Next, we’ll create a Windows Server 2016 TP4 Virtual Machine inside this CSV on the Storage Spaces Direct pool. In Failover Cluster Manager right click Roles. Select Virtual Machines -> New Virtual Machine:
Select the node that you’re logged into and click OK:
Click Next:
Name the VM and place it in the appropriate volume:
Select Gen2:
Assign the appropriate memory:
Attach to the appropriate network:
Create the virtual hard disk:
For the ISO, I’ll attach to a LiteTouch disk I created from my MDT server. You can easily copy down the Windows Server 2016 TP4 ISO and build the VM manually:
Click Finish:
Verify the VM was successfully made highly available and click Finish:
Right click and Start the VM, the right click again and Connect to open the console:
If you have any issues getting the VM to power on, please review the recommended VM configurations and additional advanced VMX parameters to make sure nested virtualization is correctly configured.
During the VM installation process, I experienced surprisingly good storage performance. Especially when considering all the VMs are on the same host and share the same storage. While this shouldn’t be used as a reference point of Storage Spaces Direct performance, it’s certainly not bad given all the factors weighing against it (Disk transfers between 100-200 MB/s and 1,000-1,500 IOPS for a single VM).
Given what I’ve seen so far, I’d be keenly interested to do a high end hardware performance test just to see how it holds up to what I’ve come to expect from storage.
Node Failure Scenario Testing and Results
Now that we have a VM deployed and up and running, let’s do some failure tests:
With the VM still running, I’m going to forcefully stop one of the other nodes in the cluster.
Within a couple seconds, Failover Cluster Manager reflects the failure (requires a Refresh of the console):
The VM is still fully responsive, even responds as expected to a reboot request:
Server Manager shows the node is failing to respond to queries. Additionally, in the Storage Pools view, we can see the degraded status on the Virtual Disk and the corresponding Physical Disk that is offline:
Interesting, I am not prevented from creating additional Virtual Disks during the degraded state, but I get a warning about enclosure awareness:
The disk is created successfully, but of course with a failure attempting to remotely connect to the failed node:
With confidence that data services were not degraded in this failure scenario, I’ll power back on the previously failed node.
Refreshing the Server Manager console and rescanning the storage pools shows the pools and virtual disks returned to a healthy state:
Adding a New Node and Scaling Out the Virtual Storage Pool
Next, let’s walk through adding a new node to the cluster. I’ll build the fourth node identical to the procedures defined above, installing all the same Roles and Features, assigning a static IP address. To add the new node, open Failover Cluster Manager and on the Nodes section click Add Node from the Actions Pane:
Click Next:
Enter the server name and click Next:
Select Yes and click Next to run through the same validation tests using the procedures outlined above:
Be warned that if you do any Storage tests while workloads are actively using them, they will be taken offline during the test. To avoid this, manually deselect Storage tests from running.
Once the tests have completed, ensure that Add All Eligible Storage is selected and click Next:
Validate the new node was added to the cluster successfully and click Finish:
Go back to Server Manager and click the All Servers node. In the top right click Actions -> Refresh for Server Manager to add the new node to the console:
Go back to Storage Pools and after refreshing, you will see the primordial pool from the new server node. Next to Physical Disks click Tasks -> Add Physical Disk:
Select the newly added disk and click OK:
A dialog will pop up with progress for a couple seconds, then the console will update showing the updated capacity for the Storage Pool:
That’s all there is to it! We’ve successfully added a new node to the cluster and brought its local storage into management, increasing the Storage Pool size from 2.34TB to 3.12TB. The process is nearly identical for adding new storage to existing nodes. Simple install the new storage, scan, and add the storage from the primordial pool to the respective storage pool. Pretty awesome stuff.
Summary
As a recap, in this blog post I’ve shown how you can evaluate Windows Server 2016 Technical Preview 4 using VMware vSphere 6.0, including the necessary VMX modifications and tricks to get Storage Spaces Direct working inside a vSphere VM. I’ve also shown you how you can take a default installation of WS 2016 TP4 and customize it with the necessary roles and features for Hyper-V, Failover Clustering, and Storage Spaces Direct capabilities using PowerShell. Then, I’ve shown how to Enable and Configure Storage Spaces Direct, both through the GUI and through PowerShell. Finally, I’ve performed some basic failure scenarios and added new nodes to the cluster, showing how simple it is to expand storage using Storage Spaces Direct.
Continue reading part two, where I will cover Free MDT Tools and PowerShell Scripts to Fully Automate the Deployment and Configuration of Hyper-V Server 2016, Failover Clustering, and Storage Spaces Direct for Hyper-Converged Infrastructure!
I hope this blog post has been informative and useful as we get closer to Windows Server 2016 general availability! As always, if you want to leave feedback with any comments, questions, or concerns please feel free to do so in the section below! If this blog post has been useful to you, please tweet or share with your personal network.
Thanks!
Hi, love this post, very informative. Thanks.
Q1) if I have 1 HP Apollo 2000, 2U 4-Node, dual-port 10GB RDMA nic and 2x 10GB L3 Switches.
can is just modify the host file, and not use any external server, just use the 4 hyper-v server to create my HCI?
Q2) Storage Space Direct & Cluster Shared Volume, are these 2 not the same? what is the diff? don’t they do the same job?
Q3) It’s 2017-Nov, if I change a few stuff, what will the power-shell script look like:
a) if I had 1x 250GB SSD (for cache) & 1x 2TB HDD (for slow data)?
a1) if I wanted to use ReFS for all storage format? hypervisor & VM.
a2) if I then later added a 1TB SSD (for fast data) ?
a3) if we enable dedup & compression?
Please & Thanks.
Very very nice post!! great job and usefull!
Pretty stoked about this 2016 feature and I just followed your guide – worked a treat.
Also, Thanks!
Hi Dane,
Whate a great and elaborate post. Walks you through the whole thing and gives you a real understanding how it works. I really appreciate the PowerShell commands, GUI is nice for one time but PowerShell is better.
I also wanted you to know that the link at the end refering to part 2 is not working, it’s a redirect to wp-admin. Should be http://blog.itvce.com/2015/12/03/free-mdt-tools-and-powershell-scripts-to-fully-automate-the-deployment-and-configuration-of-hyper-v-server-2016-failover-clustering-and-storage-spaces-direct-for-hyper-converged-infrastructure/
Regards, Ruud
Excellent, thanks for the feedback. I’ve fixed the broken link at the bottom.
Hopefully this blog post was useful to you!
–@youngtech
Hi!
Nice work! Clear and easy to follow guideline for anyone who want to evaluate the S2D! But one question: SO you managed to configure a S2D clustered storage with only 3 nodes? This is contradicting the official hardware requirement as posted by Microsoft https://technet.microsoft.com/en-sg/library/mt126109.aspx?f=255&MSPPError=-2147217396
It says: Storage hardware: The storage system consisting of a minimum of four storage nodes with local storage. Each storage node can have internal disks, or disks in an external SAS connected JBOD enclosure. The disk devices can be SATA disks, NVMe disks or SAS disks.
This is an interesting find, because some folks out there are using the 4 nodes requirement as a minus point to compare with products like EMC ScaleIO.
Hi David,
Yes, that’s correct. Microsoft’s documentation clearly states four node minimum, but I demonstrated there’s only a three node minimum to establish quorum (node majority) for Storage Spaces Direct. Failing a node does not impact the cluster, even when going from three to two. Not sure why Microsoft’s documentation states four node minimum, but it doesn’t seem to be a hard limitation.
Hope you enjoyed the blog post and series.
Thanks!
Dane