I have a vSphere 5.x environment consisting of 10 ESXi hosts. We have succesfully deployed a CIsco Nexus 1000v solution L3 implementation using 2 x VSM virtual appliances configured for HA failover. The ESX hosts are connected to 2 x Cisco Nexus 2000 Fabric extenders (using 2 x 10g adapters) which are each connected to 2 x Nexus 5000 switches.
We only have 2 x 10g adapters configured in the hosts, which are both uplinks to our Nexus dvSwitch (i.e. no Standard vSwitches).
We also run a Virtual vCenter server which uses the dvSwitch. This was all checked for supportability and compatibility with Cisco at the design stage of the project
We are currently doing intrusive testing of the solution prior to handing it over to the customer, and have come across a potential issue wherby if both of our Nexus 1000V appliances are shutdown, when they are restarted they cannot connect to the network. To make thigs worse, if a VMotion is performed in our cluster, the VM loses network connectivity on the receiving host - moving it back to the source host does not fix this (no vethernet port exists).
Th only way to fix this is to take one of the 10g adapters out of the dvSwitch, create a Standard vSwitch and connect the 10g adapter to it and create a Standard port group on the same VLAN as the vCenter, change the Primary VSM to use this port group and hey presto, everything springs into life again.
Once we start the secondary VSM (still connected to the dvs) the two VSMs succesfully form a HA pair, at which point we can move the primary VSM back to the dvs.
To minimise the risk of both of our VSM pairs being down, we have implemented a DRS host group and a rule so VSM1 runs on hosts 1-5 and VSM2 on hosts 6-10, however I need to deliver a fix for this issue!
Thanks in advance