Saturday, June 18, 2011

L2TPv3 performance tweaks for HyperV GeoCluster Live Migration

L2TPv3 is a great feature for extending a Layer2 network across sites. This is very useful when you are using Live Migration with HyperV or when you want to pass traffic (at Layer 2) to a different location. 
 
Recently while working on a SAN GeoClustering test case we want to do a stretched VLan from one lab to another (in different buildings). This gets tricky because we (my team) don’t own or manage the network infrastructure between these buildings. Many companies will use MPLS/VPLS to create a VPN or VLL (Virtual Leased Line) between two sites. This can be expensive and a pain to get setup if you only need a link for a short period of time.
 
Stretched VLans are becoming more and more common as feature like Live Migration allow the moving of a system from one datacenter to another at the push of a button (well mouse clicks). 
 
The setup:
Sites: We were testing at labs that are about ¼ of a mile apart from each other on opposite sides of Main Campus
 
The test network consisted for 4 vlans. These networks were duplicated at both sites
·         Public -- internet facing for request to HyperV hosted servers
·         Cluster – heartbeat network for Fail over Cluster
·         Live Migration – network to pass server state during live migration
·         Management – backend network to access remove desktop to servers and devices
 
Servers: We had 6 physical servers for the test (3 at each site). Each site had the same setup for servers
·         Server 1 – Host infrastructure systems (AD, DNS, DHCP, WDS)
·         Server 2 &3 – Fail over cluster hosts that would host the workload VMs
 
To interconnect the 4 vlans across buildings we uses the existing IPv4 infrastructure and 2 cisco routers. The building infrastructure offers > 1gbps in an out of each lab, so we connected with 1gig interfaces to uplinks. We also do not have visibility into MTU on building (intra and inter) infrastructure.
 
So down to what we found and what we changed:
Original Setup (poor performance): - config shown below
·         Single L2TPv3 Tunnel between sites
·         Lab facing interfaces were connected to switch that was a trunk port allowing our 4 VLans
          o   Switchport mode trunk
Original design findings:
·         Traffic passed (with 802.1q tags) correctly through tunnel
·         MAX bandwidth when coping files/Live Migration from site to site was 20-30mbps
·         Large amount of dropped packets in queue
·         CPU maxed at 100%
 
Tweaked design (best performance)
·         4 tunnels (one for each vlan) using sub interfaces
·         No change to switch trunk config
·         Modified hold-queue on each physical interface
·         Enabled MTU tuning on pseudowire-class (ip pmtu)
·         Enabled Don’t Fragment bit on pseudowire-class (ip dfbit set)
Tweaked design Findings:
·         Traffic passed correctly through tunnel
          o   Each VLan correctly mapped on either side
·         MAX bandwidth when coping files/Live Migration from site to site was ~800mbps
·         CPU would be around 40%
 
Original Config (each site was identical except swapped xconnect ip with interface ip address) – SLOW!
---
l2tp-class class1
authentication
password 0 123456789
!
pseudowire-class ethernet
encapsulation l2tpv3
protocol l2tpv3 class1
ip local interface GigabitEthernet0/1
!
interface GigabitEthernet0/0
no ip address
duplex full
speed auto
no cdp enable
xconnect 10.216.44.47 125 pw-class ethernet
!
interface GigabitEthernet0/1
ip address 10.197.251.215 255.255.255.240
duplex auto
speed auto
no cdp enable
!
ip route 0.0.0.0 0.0.0.0 10.197.251.209
!
no cdp run
!
End
 
Tweaked Design
--
l2tp-class class1
authentication
password 0 123456789
!
pseudowire-class ethernet
encapsulation l2tpv3
protocol l2tpv3 class1
ip local interface GigabitEthernet0/1
ip pmtu
ip dfbit set
!
interface GigabitEthernet0/0
no ip address
duplex full
speed auto
no cdp enable
hold-queue 2048 in
hold-queue 2048 out
!
interface GigabitEthernet0/0.2996
encapsulation dot1Q 2996
no cdp enable
xconnect 10.216.44.47 126 encapsulation l2tpv3 pw-class ethernet
!
interface GigabitEthernet0/0.2997
encapsulation dot1Q 2997
no cdp enable
xconnect 10.216.44.47 127 encapsulation l2tpv3 pw-class ethernet
!
interface GigabitEthernet0/0.2998
encapsulation dot1Q 2998
no cdp enable
xconnect 10.216.44.47 128 encapsulation l2tpv3 pw-class ethernet
!
interface GigabitEthernet0/0.2999
encapsulation dot1Q 2999
no cdp enable
xconnect 10.216.44.47 125 encapsulation l2tpv3 pw-class ethernet
!
interface GigabitEthernet0/1
ip address 10.197.251.215 255.255.255.240
duplex auto
speed auto
no cdp enable
hold-queue 2048 in
hold-queue 2048 out
!
ip route 0.0.0.0 0.0.0.0 10.197.251.209
!
no cdp run
!
 

5 comments:

  1. Hi Mike,

    What model routers did you use in this test?

    Thanks,
    Elmo

    ReplyDelete
  2. Hi Mike,

    What model routers did you use in this test?

    Thanks,
    Elmo

    ReplyDelete
  3. nice work bro - settings were good to us.. we definitely had some fragmentation issues before good brought us to your page...

    and after reading about your honey bee "fan" - i'm going to make some wasp hives on my property :)

    Joe
    #19366

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Thanks for post.how can i test or use in this routers?.Important.

    leased line

    ReplyDelete

10 Years from last post

 Well world!   After the last almost 10 years I have been up to a few things in life and work.  Most recently I was working at Microsoft on ...