VMWare

You are currently browsing the archive for the VMWare category.

In our last post, we talked about how to deploy what I referred to as logical networking.  I classify logical networking as any type of switching or routing that occurs solely on the ESXi hosts.  It should be noted that with logical networking, the physical network is still used, but only for IP transport of overlay encapsulated packets. 

That being said, in this post I’d like to talk about how to connect our one of our tenants to the outside world.  In order for the logical tenant network to talk to the outside world, we need to find a means to connect the logical networks out to the physical network.  In VMware NSX, this is done with the edge gateway.  The edge gateway is similar to the DLR (distributed local router) we deployed in the last post, however there is one significant difference.  The edge gateway is in the data plane, that is, it’s actually in the forwarding path for the network traffic. 

Note – I will sometimes refer to the edge services gateway as the edge gateway or simply edge.  Despite both the edge services gateway and the DLR both being considered ‘NSX edges’ I will not refer to the DLR as an edge for the sake of clarity.

At the end of our last post, the lab logically looked like this…

image

The web and the app VMs were able to route to each other through the DLR.  In order to extend that connectivity to the physical network, we need to deploy the edge gateway and connect it to the physical MLS.  By the end of this post, our logical network should look like this…

image

So let’s jump right in!  Before we start deploying the edge, we need to do a little additional configuration on the DLR.  Since we want the logical segments connected to the DLR to talk to the physical network through the edge, we need the traffic to flow from the DLR to the edge gateway.  That being said, we need a network segment to connect the DLR and the edge.  Let’s add another logical switch called ‘Tenant1_DLR_Uplink’ for this purpose…

image

Now, let’s update the DLR to include an IP interface on this uplink which it can use to talk to the edge.  To do this, browse to your NSX edges, and double click on the edge you created in your last post (Yes, I know, there’s no edit button, I don’t know why…).  Then navigate to settings and interfaces and lets create an internal interface to be used to uplink to the DLR…

image

Now that we have the IP interface in place, let’s add a static default route to the DLR so it knows where to route traffic.  Navigate over to the Routing tab, click ‘Static Routes’, then the plus button to add one…

image

We’ll point the default route to the interface of the edge we’re going to deploy later on.  Once that’s done, we can deploy our edge appliance.  Navigate back to the NSX menu, click on the NSX edges tab, and the the plus sign to deploy a new edge…

image

We’ve seen this before, just make sure you’re deploying the ‘Edge Service Gateway’ and not the DLR this time.  On the next screen give it a username and password and enable SSH access if you wish…

Note – I find it slightly humorous that they enforce password complexity restrictions like this…
image 
But they don’t have you confirm the password.  Make sure you type it correctly!

image

On the next screen you’ll be asked to size the edge.  Despite really wanting to have a ‘Quad Large’ edge in my lab, I’m going to stick with compact.  I’m also going to leave the checkbox in place for the ‘auto rule generation’.  We’ll look at that further later on.  Next pick the cluster, host, and data store you wish to use for deployment…

image

The next task is to configure the IP interfaces for the edge.  In our case, we want to connect the DLR to the edge, and the edge to the physical network so we’ll need two IP interfaces.  Let’s first configure the matching interface to the one we just created on the DLR…

image

Next we’ll add the uplink interface that we’ll use to talk to the physical network.  Note that I’m connecting that interface to a distributed port group which is trunking the VLAN (120) that I’ll use for external connectivity.  AKA – There’s a VLAN 120 IP interface which is the other end of this /30 (10.20.20.50) on the physical switch…

image

Let’s leave the default gateway settings as is and move on…

image

On the next screen, lets change the default traffic policy to allow and configure the management IPs for the HA configuration.  Here again I just picked two random non-routable IPs…

image

The click ‘Finish’ to kick off the configuration!

image

After hitting finish, I got this error…

image

Weird – Last time I checked my subnet math those two IPs were in the exact same /30.  After trying multiple times to get HA working and receiving the same error I rebuilt the edge from scratch without HA and it seemed to work.  So no HA for now…

Once it’s deployed, you should see the edge listed under your NSX edge tab…

image

The next thing we have to do is configure some routes on the edge so it knows how to get places.  Double click on the edge to edit it.  We want to add 3 routes.  A default pointing out to the physical network, a route pointing in for the app layer, and another pointing for the web layer…

image

The last step is to tell my physical network to route to the edge to get to the app and web layers.  We do this by adding two static routes on the physical MLS…

image 
Now I should have full connectivity between the logical network VMs and the physical network VMs.  I can test this with ping the VMs from my desktop which is not inside of the vCenter environment.…

image

So now this is where the traffic flow get’s really interesting.  For instance, let’s take a look at what the traffic flow is for my desktop pinging the Tenant 1 App server…

image

1 – My traffic sourced from the IP of 10.20.30.41 hits the MLS which does a route lookup and determines that to get to the App server (10.20.20.74) I need to route to the edge gateway (10.20.20.49).  My traffic enters VLAN 120 through the VLAN interface on the switch.

2 – The traffic destined for the edge traverses the physical trunk to Thumper tagged on VLAN 120.  The host sends that traffic towards the edge gateway VM

3 – The edge gateway receives the traffic, does a route lookup, and sees that it has a route for 10.20.20.72 /29 pointing at the DLR interface.  Recall that the DLR is local to each host so the edge must now send the traffic back out to the physical network to get to the correct host.

4 – The traffic first encapsulated in VXLAN, and then in the dot1q header for VLAN 118 leaves the host and returns to the switch destined for the VTEP of Thumper3.

5 – The traffic routes between VLANs 118 and 119

6 – The traffic leaves the switch encapsulated in a dot1q header for VLAN 119 destined for the VTEP of Thumper3.

7 – The traffic can now be striped of its VXLAN header and delivered to the app guest VM

The cool part, is we can see pieces of this with a packet capture on the wire…

Step 1/2 – My initial ping heading towards the host, note the VLAN 120 dot1q tag

image

Step 4/5 – The VXLAN encapsulated traffic destined for the VTEP of Thumper3.  Note that the outer MAC address is destined for the Cisco switch VLAN interface and the layer2 header has a tag of 118 indicating it’s coming into the switch off of the Thumper VTEP interface

image

Step 6 – The VXLAN encapsulated packet leaves the switch interface destined for the VTEP of Thumper3.  Note that external layer2 header lists a destination of the Thumper3 (VMware) VTEP interface and we can verify this by seeing it’s leaving the switch tagged dot1q on VLAN 119.

image

Pretty cool huh?  A similar traffic flow would be seen in reverse as the VM replies to my pings with ICMP echo replies.  So there you have it.   Over the last 3 posts we’ve successfully deployed NSX, created logical networks, and routed them out to the physical network.  The primary goal of all of this for me was to see how NSX was using VXLAN and it seems to make good sense now.  Hopefully later I’ll have more time to dig into some of the other features of NSX.

Tags:

In my last post, we wrapped up the base components required to deploy NSX.  In this post, we’re going to configure some logical routing and switching.  I’m specifically referring to this as ‘logical’ since we are only going to deal with VM to VM traffic in this post.  NSX allows you to logically connect VMs at either layer 2 or layer 3.  So let’s look at our lab diagram…

image

If you recall, we had just finished creating the transport zones at the end of the last post.  The next step is to provision logical switches.  Since we want to test layer 2 and layer 3 connectivity, we’re going to provision NSX in two separate fashions.  The first method will be using the logical distributed router functionality of NSX.  In this method, tenant 1 will have two logical switches.  One for the app layer and one for the web layer.  We will then use the logical distributed router to allow the VMs to route to one another.  The 2nd method will be to have both the web and app VMs on the same logical layer 2 segment.  We will apply this method to tenant 2.  So let’s get started…

Tenant 1
So the first thing we need to do is create the logical switches for our tenant, this is done on the ‘Logical Switches’ tab of the NSX menu.  Navigate here and then click on the plus sign to add a new one…

image

Give it a descriptive name, and ensure the control plane is set to Unicast.  Do the same thing for the App switch for tenant 1…

image

Once both switches are created, you should see them both under logical switches showing a status of ‘Normal’.  Note that NSX allocated a segment ID for each of the switches out of the pool we created in the last post. 

The next step is to attach the tenant VMs to the logical switch.  If we look at the DVS, we see that any DVS we associated with the transport zone has a port-group associated with this new logical switch…

image

So each logical switch really has it’s own port-group.  One would think that this would mean we could just manually edit a VM’s properties and select the port-group to associate it with the logical switch.  From my testing, this didn’t work.  Host association needs to occur from the NSX management portal.  This is done by clicking on the ‘Add Virtual Machine’ button on the logical switch menu (highlighted in red below)…

image

So let’s start with the tenant1-app VM.  Select it and click next.  On the next screen, select the VNIC you want to move to the logical switch and then click next…

image

The last screen has you confirm the move.  Click finish to submit the change…

image

That’s all it takes.  Now tenant1-app is successfully associated with the logical switch tenant1_app.  Let’s do the same changes for tenant1-web…

image

image

image

So now we have two VMs connected to two different logical network segments.  How do we get them to talk to each other?  We need some sort of layer 3 gateway that each host can use to get off subnet.  This is where the distributed local router (DLR)comes in.  The DLR is considered to be a ‘edge’ in NSX, so let’s click on the edges menu on NSX and add an edge…

image

Make sure you select ‘Logical (Distributed) Router’.  Just a quick FYI, I’ve seen it referred to both as the logical distributed router as well as the distributed local router. The Edge services gateway is used to connect your logical networks to the physical.  We’ll use those later on in upcoming posts.  Above I’ve filled out the basic information.  Click next to continue…

image

Enter the credentials you want to use and enable SSH access to the edge (I’m a network guy, I still need CLI).  Click Next to move on…

Note: For packet capture reasons, I decided to deploy my DLR controllers on the management cluster.  In order to do this, I had to go back to the transport zone for tenant1 and add the management cluster to the transport zone for tenant1.  Despite doing this, the change didn’t seem to ‘take’.  I rebooted the NSX controllers, then the managers.  I still couldn’t provision the DLR’s to the management cluster.  I finally rebooted vCenter which resolved the issue.

image

As you can see above, I chose to deploy the DLRs to the management cluster.  This is so I’ll be able to more easily implement packet captures between nodes later on.  The next thing we need to do is allocate the IP addressing for the DLR.  We’ll need an IP in each logical switch for the VMs to use as a gateway.  In addition, the DLR allows you to provision a management interface.  We’ll pick an IP out of the ESX management VLAN for the management interface and use the following networks for the logical switch interfaces…

image
When provisioning the interfaces on the DLR ensure you select the ‘Internal’ option for the Type. Then select the logical switch you want the interface on, and assign an IP address for it…

image

Do the same for the second interface for the web segment…

image

When you’re done, you should have both interfaces configured as shown below…

image

Note: If you noticed, you don’t get to pick a default gateway for the management IP. What does this mean? As far as I can tell, it makes the management IP useless. I’ll need to follow up on this to see what it can be used for.

The configure HA screen asks for 2 IP addresses to use in the HA configuration.  As far as I know, these IPs can be locally significant so I just picked two randomly…

image 

On the last screen make sure that everything looks correct and then click finish..

image

After you click finish, NSX will begin deploying the DLR VMs.  Keep in mind that these VMs aren’t in the data path, they just provide control plane operations to the DLR located on each physical ESX hosts. 

Once they’re deployed, we should see two DLR VMs in the management cluster…

image

In addition, NSX should report the edge as deployed…

image

So now that the DLR is deployed, let’s check our VMs.  As shown above, the App VM has an IP address of 10.20.20.74/29 and the Web VM has an IP address of 10.20.20.66/29.  Since they’re in different subnets, they’ll have to route to talk to each other…

So let’s take a look at the App VM…

image

As you can see, it’s IP is correct and it can ping the web VM off subnet.  Let’s check the web VM and see if we see similar results…

image

Yep, so looks like we’re routing just fine.  However, this doesn’t seem like anything crazy at this point.  We’ve connected two VMs, on two different hosts, that normally would have had to route to talk to each other anyway.  All we did was made them route through a logical router.  The tenant 1 configuration looks like this on our diagram…

image

With the black dotted line I’m showing the path that the VMs traffic took to reach the web server.  Seems like nothing’s changed really right?  We’re still routing.  Actually, a lot has changed, we’re tunneling the routed VM traffic with VXLAN. 

1 – The app server tries to talk to the web server.  Being off subnet, it needs to talk to its default gateway which is on the DLR.
2 – The DLR on the local ESX host receives the traffic and knows that the destination (the web server) is on one of it’s directly connected interfaces.  That directly connected interface is logical switch tenant1_web.  Through the NSX control plane the DLR knows that the web server is actually on another host.  It encapsulates the original packet in a VXLAN header and sends the packet towards the physical network.
3 – The encapsulated VXLAN packet now reaches the physical NIC and has a source of the VTEP interface on Thumper3 and a destination of the VTEP interface on Thumper2.  The ESX host must now encapsulate the packet in a dot1q header for VLAN 119  to get it onto the physical network.
4 – The MLS receives the dot1q packet, strips the layer 2 header and routes the original IP packet
5 – Leaving the physical switch the packet needs to be retagged with another dot1q header for VLAN 118 which is the VLAN where the VTEP interface for Thumper2 resides.
6 – Thumper2 receives the packet, strips the header and passes the VXLAN packet to the DLR
7 – The DLR strips the VXLAN packet off and examines the inner IP packet.  Since we are now on the right host the DLR forwards the IP packet accordingly
8 – The Web server receives the packet.

To see this in action, lets look at a packet capture I pulled off the wire of a ping between the app and web servers…

Note: The VLAN tag shows as 118 in both packet captures.  This is because I was only spanning the packets on the switch interface facing Thumper2.

image

Let’s look and see what happened to se if we can match it up with some of the steps from above…

-The original data packet has a source of 10.20.20.74 (Tenant1-App VM) and a destination of 10.20.20.66 (Tentant1-Web VM).
-Original data packet is encapsulated in VXLAN.  Note the segment ID is 5000 which coincides with segment ID given to the Web logical switch
-The VXLAN outer packet has a source of 10.20.20.42 (The VTEP on Thumper3) and a destination of 10.20.20.35 (The VTEP on Thumper2)

The return looks similar but in reverse…

image

The VLXAN encapsulation is what makes NSX so powerful.  We can fully encapsulate layer 2 and layer 3 in a layer 3 header and route it.  So like I said, this is nice and all, but let’s look at an example of where NSX can really shine when using VXLAN.  Let’s move ahead with setting up tenant 2. 

Tenant 2
As we said earlier, we’re going to deploy the tenant 2 app and web VMs on the same subnet so they’re layer 2 adjacent.  Normally, this would mean that the two VMs would need to be on the same host in the same port-group (or same VLAN) or separate hosts that trunked the same VLAN.  You’ll note that in our case, each of the tenant 2 VMs are on different physical hosts per the diagram above.  In addition, I’m not trunking a common VLAN for this purpose to both hosts.  So let’s deploy a new logical switch just called ‘tenant2’…

image

Now let’s add both of the tenant2 VMs onto the logical switch (I’m not going to show how to do that since we did it above for tenant1).  Once both of the VMs are on the same logical switch, let’s take a look at the IP address allocation we have for tenant2…

image

So pretty straight forward.  Now, let’s try and ping from host to host and see what happens…

image

Cool, it works.  Wait, what?!?!  Yeah, that’s right.  NSX just extended layer 2 for us across layer 3 boundaries.  Let’s look at one of those pings in the packet capture so you can see it in action…

  image

Check out the Mac addresses in the VXLAN encapsulated packet.  They match up perfectly with the Mac addresses on the VMs…

image

image

As we’ve seen today, NSX can create logical networks to encapsulate layer 2 and layer 3 network traffic inside of VXLAN.  In addition, NSX’s control plane removes the need for me to support multicast on the physical network gear when using VXLAN.  AKA, this all just sort of worked without having to tweak my physical network config.  Next up we’ll start looking at routing logical networks back to the physical network.

Tags:

VCP – EVC

EVC ,or enhanced vMotion Compatibility, is a feature that ensure vMotion capability within a cluster.  That is, you could possibly have a cluster that contains a mixed batch of servers.  Some that have newer Intel processors, some that have older Intel processors and so forth.  The problem is, that based on the feature set of the CPU, certain features are presented to the VM guest.  If you try and vMotion a server to a different guest that lacks that feature, vMotion will fail.  EVC takes into consideration the CPUs on all of the hosts in the cluster and basically computes the lowest common denominator.  It then present that feature set to the VM guests.  So even though a newer server might have a newer feature set, it would be masked by the hypervisor to ensure vMotion compatibility between other hosts in the cluster. 

EVC is turned on at the cluster level.  To enable EVC, right click the cluster and edit settings…

image

As you can see on my cluster, the EVC feature is turned off.  Let’s try and turn it on.  Click the ‘Change EVC Mode’ button…

image

For kicks, I tried enabling EVC for AMD hosts.  The compatibility checker immediately noticed that I actually have Intel based hosts.  Also, it noticed that the CPUs I do have are not compatible with EVC.  The CPU has to have a specific feature to support feature set masking and the processors I have don’t support it.

A couple of other quick notes to consider…

-EVC can only be configured on a cluster that doesn’t have any running guests.  This is to ensure that a feature currently in use by a guest doesn’t get taken away (masked) by EVC while it is in use.

-EVC masks features so in some cases, this can be undesirable.  Keep that in mind when you build the clusters.  Purpose built clusters are always a good idea in my mind.

-EVC can save you some pain when using things like DRS.  I’ve configured DRS before and then days later determined DRS wasn’t working.  It was because the hosts weren’t compatible. 

Tags:

« Older entries