How I fixed an iSCSI link and found 28 bytes

One of the many popular features of the XIV is the ability to replicate using iSCSI.  On  XIV Gen3 there are now at least 10 and up to 22 active iSCSI ports on each machine (depending on how many modules you order).

Implementation of the iSCSI connection between two XIVs is a piece of cake.   If both XIVs are defined to the XIV GUI (which they should be), you just need to drag and drop links between XIVs to bring the iSCSI mirroring connections alive.   If the network gods are with you, the link goes green.  But…  if the networking gods are against you… the links stays red and then the question is… what to do?

Old fashioned problem diagnosis leads us straight to the ping command.  However I routinely find that the ping command works fine (all interfaces respond), but the link stubbornly remains  red.

The first possible problem is that iSCSI uses TCP port 3260, so hopefully there are no firewalls blocking that port.

The second possible problem is the MTU size (Maximum Transmission Unit).   When we define the iSCSI interfaces on the XIV we set the MTU as a value of up to 4500 bytes. When we establish connections between two XIVs, each XIV will send test packets that are sized to the MTU.   If the intervening network does not support that packet size, the packets will be dropped by the network, because the XIV sets the don’t fragment flag to ON.

So how to work out what the MTU is?   Well the first thing to do is ask your friendly networking team member.  But sometimes I find that the intervening networks are controlled by third parties, which means that getting a straight (and reliable) answer can prove difficult.  Even worse, some of these third parties charge a fee every time you call them, so there may be hesitation to even get them involved!

One simple trick is to re-use the ping command but play with payload sizes. We can use a command that looks like this:

ping -f -l 1472

That command sends a ping with a payload of 1472 bytes to IP address  We add the -f  parameter to prevent packet fragmentation.  What you then do is slowly increase the payload until you no longer get a reply.

This process works fine and is great way to determine the maximum payload size the end to end network will support.  However if you’re using the payload size to determine the maximum transmission unit, there is a little trick.  The MTU is the maximum packet size, but a ping sends a payload wrapped in 28 bytes of IP and ICMP headers.   So our example:

ping -f -l 1472

sends a 1500 byte frame to the IP address (1472 bytes of payload and 28 bytes of headers.

If this command succeeds, you can use an MTU of 1500 in the XIV GUI or XCLI (rather than an MTU of 1472, which is 28 bytes smaller).

For those who are wondering how I did the networking sniffing to get the screen captures above, I used a brilliant piece of freeware software called Wireshark.   My only warning is that your corporate security policies may have rules on sniffing the network.   Don’t take my blog post as permission to use it  #;-)   And for the networking geeks among you, yes I know that extra packets could actually be wrapped around our ethernet packet for things like VLAN tags or encapsulation, but hopefully this should not affect our mathematics.

Controlling the background traffic

Final pointer.  Having finally gotten the link up and going, you are now free to start replicating volumes.  But how much traffic can the cross site link support?  The XIV can  limit the background copy bandwidth with a parameter called max_initialization_rate.   This is useful to stop you flooding the cross site link and annoying your link co-tenants.    To display the current setting, open an XCLI window and issue the following command:

target_list -x

For each target you should see three parameters:

<max_initialization_rate value="100"/>
<max_resync_rate value="300"/>
<max_syncjob_rate value="300"/>

These three settings should be tuned to reflect the  possible throughput of the cross site links.

  • The max_initialization_rate controls the initial sync to create a mirror.  Increasing it will speed up initial mirror creation.
  • The max_resync_rate controls how fast a mirror will be returned to sync after a link failure.
  • The max_syncjob_rate controls how quickly the most recent snapshot is replicated to the remote site.

To change the settings use a command like this (change the target name and the rates to suit):

target_config_sync_rates target="Remote_XIV" max_initialization_rate=120 max_syncjob_rate=240 max_resync_rate=240

If you want to see the current throughput rate, open XIV Top on the remote machine.  You should see how much write I/O is being sent to the mirror target volumes in MBps.

So hopefully your now better positioned to diagnose iSCSI link issues, maximize your MTU and tune and monitor your link speed.

Questions?  Fire away…


About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in IBM XIV and tagged , , , , . Bookmark the permalink.

7 Responses to How I fixed an iSCSI link and found 28 bytes

  1. Pingback: How I fixed an iSCSI link and found 28 bytes « Storage CH Blog

  2. Claudio says:

    Do you know if is possible to set the MTU from the GUI in a V7000? I want to configure Jumbo Frames with a ESXi hosts


    • You cannot set it in the GUI but you can set it in the CLI.
      So fire up PuTTY and check out this command:

      svctask cfgportip -h

      It gives examples of help, like:

      An invocation example to set an MTU of 1600 on port #1 in I/O group 0

      cfgportip -mtu 1600 -iogrp 0 1

      • Claudio says:

        Thank you Anthony, just a last question. Since I have a V7000 with no aditional 10GB ethernet ports I have to set the same ports as management for iSCSI. If I set a different network with jumbo frame enable conected to the Ethernet Port2 of each node would be the best? Since in Port1 I have the management network and I dont want to mix, since they are on differents IP segments. What would you do?


      • Normally I recommend using one port on each node canister for management and one for iSCSI. That has problems of its own but at least separates management traffic from iSCSI traffic.

  3. Claudio says:

    Anthony, is it safe to do it with the storage in production?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s