Last year I spent some time integrating an IBM DS3524 SAN into our existing iSCSI network. After implementation, I found that the envisaged performance was not up-to specification so decided to turn to IOMeter to gather some statistics. Using the VMWare unofficial storage thread (http://communities.vmware.com/thread/73745) I run the outlined profiles and compared the results to the other admins findings. The performances that I was seeing was much lower in comparison.
Preconfigured profiles can be downloaded from:
http://vmktree.org/iometer/
Moving on we started digging further using the "Max-Throughput 100%Read" test and this was our findings:
CONFIGURATION:

SANIBM DS3524
2 controllers each with four 1000Mbps host ports.
24 136GB 15K RPM Disk drives.
Each host port assigned to VLAN 2 or 3 - see above image
SWITCHES2 HP Procurve 2510 10/100/100 Switches
Each switch configured with an isolated VLAN (Switch 1 = VLAN 2, Switch 2 = VLAN 3)
Jumbo Packets Enabled
ESX (I will explain in another blog post our iSCSI configuration)
ESX 5.1
2 1000Mbps NIC's assigned to iSCSI
Added LUNs set to use Round-Robin
RESULTS
The results from IOMeter showed that we were getting a maximum of 104-113MBps using the Max-Throughput 100%Read test. This was even after implementing jumbo frames (on/off), flow control, different RAID options (RAID 1/5/10), the turbo license, smaller LUNS etc.
We had started to exhausted the options and were putting it down to a SAN fault. Even after getting in a replacement SAN (Demo Unit) the issue was still apparent.
Quite a bit of research later, I came across a blog post outlining an issue with ESXi, Round-robin path control and iSCSI. It would seem by default if ESXi is configured to use Round-Robin it sends 1000 Iops down one interfaces then 1000 Iops down the next. Changing the IOPS to 1 balances the load more affectively and increases performance - For us this increase was double to approx.. 207-210MBps and was within our tolerancea as the maximum throughput for a 1000Mbps interface is 125MBps totalling 250MBps across the two configured interfaces.

During our testing we also found the following:
If a single RAID array is created across all disks on a DS3524. Any performance intensives server can affect the performance of other servers as the read\write operations are spread across all disks
LESSON: CREATE SMALLER RAID ARRAYS USING 4-6 DISKS CONFIGURATIONS
If a device or server is accessing a LUN the LUN is locked, this can affect other systems performance as the other server are waiting for the LUN to be freed
LESSON: IF CREATING A LUN FOR ESX VM's CREATE SMALLER LUNS WITHIN EACH ARRAY.
Of course the above lessons are for ESX LUNS and will be different for different workloads
CHANGING THE IOPS
Enable SSH:
Within VSphere Server > Click the ESXi Host > Click the "Configuration" Tab > Click "Security Profile" Tab > Locate the "Security Profile Heading > Click "Properties"
Locate and Left Click "SSH" > Click "Options" > Ensure "Start and Stop Manually" is selected > Click "Start"
SETTINGS IOPS
Using Putty Connect to the ESXi Hosts using SSH IP > Type username and password
Once logged in type the below command into putty. The output lists all iSCSI LUN that are configured on the ESX Host. I usually copy and paste the list into Notepad++. I am then able search and replace to get a small script together which include the rest of the commands
esxcli storage nmp device list | grep naa.600
For the below commands replace the naa.<number> for an naa. entry from the list gathered from the above step
Set the LUN to use Round Robin
esxcli storage nmp device set -d naa.<number> --psp=VMW_PSP_RR
Check that Round-Robin has been configured
esxcli stroage nmp device list -d naa.<number>
Set the IOPS to 1 for a LUN
esxcli storage nmp psp roundrobin deviceconfig set -t iops -I 1 -d=naa.<number>
Check the IOPS for a LUN
esxcli storage nmp psp roundrobin deviceconfig get -d=naa.<number>
No comments:
Post a Comment