SDN

You are currently browsing the archive for the SDN category.

Redefining the WAN

One of the more interesting recent trends in the network space has been around software defined WAN (SDWAN).  While I’ll admit I first didn’t give it much attention, I’ve since given it a harder look and see quite a bit of promise in the technology.  The WAN is a part of the network that, until recently, hasn’t received much attention particularly as it relates to SDN.  SDN in the enterprise space seems mostly focused on the data center since that’s where the network always seems to be the most complicated.  The unfortunate outcome of that mindset is that while we focus on the data center network, technologies like SDWAN appear and don’t always get the attention they deserve.  I think the primary reason for this is that many of us have WANs that we think are ‘working just fine’.  And while that may be the case, I think SDWAN has the potential to significantly reduce costs, improve WAN performance, and increase network agility.

One of the vendors in this market that I’ve recently had the chance to hear about is Silver Peak.  Silver Peak has been around for quite some time and is well known in the WAN optimization space.  In the past year Silver Peak has released it’s SDWAN product called Unity EdgeConnect.  The solution also includes Unity Orchestrator to manage your SDWAN endpoints and Unity Boost which adds WAN optimization to the endpoints.  Let’s talk a little bit about each piece of the solution.

The heart and soul of the solution lives in the EdgeConnect appliances.  These are your SDWAN endpoints and terminate all of the overlay network tunnels on either side of your WAN.  What I found the most interesting about EdgeConnect was the pricing model.  While traditionally we’re used to spending a lot upfront for remote site hardware, Silver Peak obviously isn’t looking to make a lot of money on hardware margin with the appliances being very reasonably priced.  There’s also a virtual edition allowing you to use your own hardware if you prefer.  The licensing model is simple at $199 per site regardless of bandwidth and what size hardware appliance you deploy.  And while not unique in this space, the EdgeConnect appliances support zero touch provisioning and are managed centrally from the Unity Orchestrator. 

The central point of control for Silver Peak’s SDWAN is the Unity Orchestrator.  In another interesting move, Silver Peak makes this software free with any Unity deployment.  The controller allows for single screen administration of your entire SDWAN and offers visibility into key metrics for monitoring and troubleshooting.  This also includes heat map like functionality to give a high level overview of how certain pieces of the WAN are performing.  This allows you to quickly isolate issues based on sites and regions which is key when considering that a major use case for SDWAN is using internet based circuits.  The orchestrator is also where you define what Silver Peak calls ‘business intent policies’ that define how certain application traffic is handled as it traverses the WAN.

The last optional component of the solution is Unity Boost.  Boost adds Silver Peak’s well known WAN optimization features to the solution.  And just like the two other components, the pricing on this piece is also innovative.  Boost is purchased ‘by the bit’.  That is, you can buy a pool of WAN optimization capacity and allocate it as you see fit across your SDWAN.  This opens up some interesting uses cases given that WAN optimization is usually an all or nothing proposition.  Traditional WAN optimization was either at the site or not at the site.  Many times it wasn’t always needed and was typically an expensive solution to have if not required.  In this model you can dole it out as needed. One of your WAN sites starts having connectivity issues?  Do you have a large migration to handle that could benefit from one of the many WAN optimization features?  Now you can allocate it as you see fit. 

While you can use SDWAN over any type of circuit, I believe the real gain with SDWAN is had when using it in conjunction with internet based circuits.  That being said, the focus of any SDWAN solution should be around making non-SLA driven circuit types (the internet) act more like a dedicated private link.  Silver Peak has a variety of features that all fall into the category of path conditioning…

Adaptive forward error correction (FEC) – FEC is a means to rebuild lost packets on the far side of a link which helps with the delay induced by having to resend lost packets.  The solution uses parity packets sent along with the real data that can be used to rebuild any packets that get lost in transit.  The feature scales dynamically minimizing parity packet overhead when it’s not required. 

Real-time Packet Order Correction – Ensures that packets are delivered in order on either side of the link by resequencing packets that arrive out of order.  This can be a two way street as waiting for out of order packets can often cause other problems.  However, as with all of these features, timeout settings can be configured to meet your needs. 

Tunnel bonding and failover – This is what allows you to combine multiple physical circuits into one or many logical circuits.  Having the ability to abstract the physical network is one of the main features that allows you to implement business intent policies across the WAN. 

Silver Peak is not alone in the SDWAN space, but I believe they are unique in many of their features and their pricing model.  If you’re interested in hearing more about their products and SDWAN solutions I’d suggest you check out these videos…

Introduction to Silver Peak with David Hughes

Silver Peak Unity EdgeConnect SD-WAN Overview

Silver Peak Creating Business Intent Policies with Silver Peak’s EdgeConnect SD-WAN Solution Demo

Silver Peak Delivering Broadband QoS with Silver Peak’s EdgeConnect SD-WAN Solution Demo

Silver Peak Zscaler Security Demo and Discussion

Last night I finally finished watching all of the Big Switch Networking field day 10 videos.  If you haven’t seen them yet, I’d recommend taking a look out at them out on YouTube…

Big Switch Networks – Overview

Big Switch Networks – Why SDN Fabrics?

Big Switch Networks – Big Cloud Fabrics

Big Switch Networks – Big Cloud Fabric GUI demo

Big Switch Networks – Big Cloud Fabric for VMWare

Big Switch Networks – Monitoring Fabric

All of the presentations were awesome and well worth your time especially if you’re new to their products. 

If you haven’t looked at Big Switch before, their name sort of says it all.  Their base concept is disaggregating a standard chassis switch into individual components.  The breakdown would look something like this…

image 
As you can see, each component of a standard data center chassis switch has a similar component in the Big Cloud Fabric.  Leaf switches are the new line cards, spine switches the fabric modules or backplane, and the Big Cloud controller is the supervisor. Big switch then uses a standard IP management network to connect all of their components together.  This isn’t a very big leap to make as most chassis based switches actually have an out of band Ethernet bus that’s used to communicate with all of the chassis components.  Disaggregating them and having them talk to each other over a management network isn’t that big of a change. 

So what’s so special about Big Cloud Fabric?  While the majority of new data center network builds are predicated on a leaf and spine design, we’re still doing all the switch configuration manually.  Adding a new VLAN or VRF to the fabric means touching a lot of switches.  Big Switch takes care of that for you by eliminating switch by switch configuration.  With their software on each switch, and the Big Cloud Controller managing the switches, configuration changes are done from a single controller.  Not only does this make things a whole heck of a lot easier, it also helps eliminate configuration inconsistencies.  In addition to managing physical switches, Big Switch can now also control virtual switches in a KVM environment.

One of the things I really appreciated about the Big Switch presentation was the sizing discussion.  While some vendors like to talk about how large they can scale, Big Switch seems pretty content keeping their fabric reasonably sized.  Their fabric is sized for 16 fully redundant racks encompassing 40 devices(32 leaf switches, 2-6 spines, and 2 controllers).  They also acknowledged that making this any larger might be making your ‘splash zone’ (read failure domain) too large.  I would agree on this point and much rather have smaller (to be clear 16 racks isn’t small) managed zones than put all my eggs in one basket. 

I also appreciated the discussion around upgrades and actual real world testing.  The fabric is built with the ability to do hitless upgrades.  This is premised off of having redundant leaf switches in each cabinet and driven totally from the controller point of view.  They’ve also done quite extensive resiliency testing on the fabric.  Implementing a network version of Chaos Monkey they performed switch failovers every 8 seconds and link failures every 4 seconds across the fabric.  This testing showed no perceivable degradation of service across the network as the test was being completed.  That’s a pretty big claim to make and one that should certainly draw some attention.

All in all I came away from the videos impressed.  The product is constantly evolving and I thought they were really focusing on the right areas.  Adding the ability to span any port to any port with Fabric span is an obvious win and something that’s very much needed in today’s network fabrics.  I also liked to see that they were integrating open source tools like Elastic Search rather than spend the time building their own. 

If you’re anxious to spend more time with their products (like me), check out their online labs at…

http://labs.bigswitch.com/

Tags:

In my last post, we wrapped up the base components required to deploy NSX.  In this post, we’re going to configure some logical routing and switching.  I’m specifically referring to this as ‘logical’ since we are only going to deal with VM to VM traffic in this post.  NSX allows you to logically connect VMs at either layer 2 or layer 3.  So let’s look at our lab diagram…

image

If you recall, we had just finished creating the transport zones at the end of the last post.  The next step is to provision logical switches.  Since we want to test layer 2 and layer 3 connectivity, we’re going to provision NSX in two separate fashions.  The first method will be using the logical distributed router functionality of NSX.  In this method, tenant 1 will have two logical switches.  One for the app layer and one for the web layer.  We will then use the logical distributed router to allow the VMs to route to one another.  The 2nd method will be to have both the web and app VMs on the same logical layer 2 segment.  We will apply this method to tenant 2.  So let’s get started…

Tenant 1
So the first thing we need to do is create the logical switches for our tenant, this is done on the ‘Logical Switches’ tab of the NSX menu.  Navigate here and then click on the plus sign to add a new one…

image

Give it a descriptive name, and ensure the control plane is set to Unicast.  Do the same thing for the App switch for tenant 1…

image

Once both switches are created, you should see them both under logical switches showing a status of ‘Normal’.  Note that NSX allocated a segment ID for each of the switches out of the pool we created in the last post. 

The next step is to attach the tenant VMs to the logical switch.  If we look at the DVS, we see that any DVS we associated with the transport zone has a port-group associated with this new logical switch…

image

So each logical switch really has it’s own port-group.  One would think that this would mean we could just manually edit a VM’s properties and select the port-group to associate it with the logical switch.  From my testing, this didn’t work.  Host association needs to occur from the NSX management portal.  This is done by clicking on the ‘Add Virtual Machine’ button on the logical switch menu (highlighted in red below)…

image

So let’s start with the tenant1-app VM.  Select it and click next.  On the next screen, select the VNIC you want to move to the logical switch and then click next…

image

The last screen has you confirm the move.  Click finish to submit the change…

image

That’s all it takes.  Now tenant1-app is successfully associated with the logical switch tenant1_app.  Let’s do the same changes for tenant1-web…

image

image

image

So now we have two VMs connected to two different logical network segments.  How do we get them to talk to each other?  We need some sort of layer 3 gateway that each host can use to get off subnet.  This is where the distributed local router (DLR)comes in.  The DLR is considered to be a ‘edge’ in NSX, so let’s click on the edges menu on NSX and add an edge…

image

Make sure you select ‘Logical (Distributed) Router’.  Just a quick FYI, I’ve seen it referred to both as the logical distributed router as well as the distributed local router. The Edge services gateway is used to connect your logical networks to the physical.  We’ll use those later on in upcoming posts.  Above I’ve filled out the basic information.  Click next to continue…

image

Enter the credentials you want to use and enable SSH access to the edge (I’m a network guy, I still need CLI).  Click Next to move on…

Note: For packet capture reasons, I decided to deploy my DLR controllers on the management cluster.  In order to do this, I had to go back to the transport zone for tenant1 and add the management cluster to the transport zone for tenant1.  Despite doing this, the change didn’t seem to ‘take’.  I rebooted the NSX controllers, then the managers.  I still couldn’t provision the DLR’s to the management cluster.  I finally rebooted vCenter which resolved the issue.

image

As you can see above, I chose to deploy the DLRs to the management cluster.  This is so I’ll be able to more easily implement packet captures between nodes later on.  The next thing we need to do is allocate the IP addressing for the DLR.  We’ll need an IP in each logical switch for the VMs to use as a gateway.  In addition, the DLR allows you to provision a management interface.  We’ll pick an IP out of the ESX management VLAN for the management interface and use the following networks for the logical switch interfaces…

image
When provisioning the interfaces on the DLR ensure you select the ‘Internal’ option for the Type. Then select the logical switch you want the interface on, and assign an IP address for it…

image

Do the same for the second interface for the web segment…

image

When you’re done, you should have both interfaces configured as shown below…

image

Note: If you noticed, you don’t get to pick a default gateway for the management IP. What does this mean? As far as I can tell, it makes the management IP useless. I’ll need to follow up on this to see what it can be used for.

The configure HA screen asks for 2 IP addresses to use in the HA configuration.  As far as I know, these IPs can be locally significant so I just picked two randomly…

image 

On the last screen make sure that everything looks correct and then click finish..

image

After you click finish, NSX will begin deploying the DLR VMs.  Keep in mind that these VMs aren’t in the data path, they just provide control plane operations to the DLR located on each physical ESX hosts. 

Once they’re deployed, we should see two DLR VMs in the management cluster…

image

In addition, NSX should report the edge as deployed…

image

So now that the DLR is deployed, let’s check our VMs.  As shown above, the App VM has an IP address of 10.20.20.74/29 and the Web VM has an IP address of 10.20.20.66/29.  Since they’re in different subnets, they’ll have to route to talk to each other…

So let’s take a look at the App VM…

image

As you can see, it’s IP is correct and it can ping the web VM off subnet.  Let’s check the web VM and see if we see similar results…

image

Yep, so looks like we’re routing just fine.  However, this doesn’t seem like anything crazy at this point.  We’ve connected two VMs, on two different hosts, that normally would have had to route to talk to each other anyway.  All we did was made them route through a logical router.  The tenant 1 configuration looks like this on our diagram…

image

With the black dotted line I’m showing the path that the VMs traffic took to reach the web server.  Seems like nothing’s changed really right?  We’re still routing.  Actually, a lot has changed, we’re tunneling the routed VM traffic with VXLAN. 

1 – The app server tries to talk to the web server.  Being off subnet, it needs to talk to its default gateway which is on the DLR.
2 – The DLR on the local ESX host receives the traffic and knows that the destination (the web server) is on one of it’s directly connected interfaces.  That directly connected interface is logical switch tenant1_web.  Through the NSX control plane the DLR knows that the web server is actually on another host.  It encapsulates the original packet in a VXLAN header and sends the packet towards the physical network.
3 – The encapsulated VXLAN packet now reaches the physical NIC and has a source of the VTEP interface on Thumper3 and a destination of the VTEP interface on Thumper2.  The ESX host must now encapsulate the packet in a dot1q header for VLAN 119  to get it onto the physical network.
4 – The MLS receives the dot1q packet, strips the layer 2 header and routes the original IP packet
5 – Leaving the physical switch the packet needs to be retagged with another dot1q header for VLAN 118 which is the VLAN where the VTEP interface for Thumper2 resides.
6 – Thumper2 receives the packet, strips the header and passes the VXLAN packet to the DLR
7 – The DLR strips the VXLAN packet off and examines the inner IP packet.  Since we are now on the right host the DLR forwards the IP packet accordingly
8 – The Web server receives the packet.

To see this in action, lets look at a packet capture I pulled off the wire of a ping between the app and web servers…

Note: The VLAN tag shows as 118 in both packet captures.  This is because I was only spanning the packets on the switch interface facing Thumper2.

image

Let’s look and see what happened to se if we can match it up with some of the steps from above…

-The original data packet has a source of 10.20.20.74 (Tenant1-App VM) and a destination of 10.20.20.66 (Tentant1-Web VM).
-Original data packet is encapsulated in VXLAN.  Note the segment ID is 5000 which coincides with segment ID given to the Web logical switch
-The VXLAN outer packet has a source of 10.20.20.42 (The VTEP on Thumper3) and a destination of 10.20.20.35 (The VTEP on Thumper2)

The return looks similar but in reverse…

image

The VLXAN encapsulation is what makes NSX so powerful.  We can fully encapsulate layer 2 and layer 3 in a layer 3 header and route it.  So like I said, this is nice and all, but let’s look at an example of where NSX can really shine when using VXLAN.  Let’s move ahead with setting up tenant 2. 

Tenant 2
As we said earlier, we’re going to deploy the tenant 2 app and web VMs on the same subnet so they’re layer 2 adjacent.  Normally, this would mean that the two VMs would need to be on the same host in the same port-group (or same VLAN) or separate hosts that trunked the same VLAN.  You’ll note that in our case, each of the tenant 2 VMs are on different physical hosts per the diagram above.  In addition, I’m not trunking a common VLAN for this purpose to both hosts.  So let’s deploy a new logical switch just called ‘tenant2’…

image

Now let’s add both of the tenant2 VMs onto the logical switch (I’m not going to show how to do that since we did it above for tenant1).  Once both of the VMs are on the same logical switch, let’s take a look at the IP address allocation we have for tenant2…

image

So pretty straight forward.  Now, let’s try and ping from host to host and see what happens…

image

Cool, it works.  Wait, what?!?!  Yeah, that’s right.  NSX just extended layer 2 for us across layer 3 boundaries.  Let’s look at one of those pings in the packet capture so you can see it in action…

  image

Check out the Mac addresses in the VXLAN encapsulated packet.  They match up perfectly with the Mac addresses on the VMs…

image

image

As we’ve seen today, NSX can create logical networks to encapsulate layer 2 and layer 3 network traffic inside of VXLAN.  In addition, NSX’s control plane removes the need for me to support multicast on the physical network gear when using VXLAN.  AKA, this all just sort of worked without having to tweak my physical network config.  Next up we’ll start looking at routing logical networks back to the physical network.

Tags:

« Older entries