Switching

You are currently browsing articles tagged Switching.

So here’s another common question I see from other engineers.  Goes something like this…

"I ran a traceroute from the switch and I’m getting all sorts of goofy responses.  Lines that show multiple hops and hops that definitely shouldn’t appear where they do.  What’s going on?"

What they are referring to is the dreaded multiple response per hop output…

MLS# traceroute 192.168.5.1
Type escape sequence to abort.
Tracing the route to 192.168.5.1
1 172.172.172.26 464 msec
  172.172.172.22 360 msec
   172.172.172.26 80 msec
2 172.172.172.2 960 msec
   172.172.172.6 1004 msec
   172.172.172.2 1216 msec
3 172.172.172.14 1216 msec 1680 msec 1852 msec

So you might look at this and be entirely confused.  Or you might look at it and know that you probably should have ran the trace from a workstation rather than a MLS but didn’t know why you should have (The boat I was in a few years ago).  Let’s dig in and see why we get output like this. 

First off, you need to know traceroute works.  If you don’t (you should) I’ll explain it in quickly.  The machine generating the traceroute sends packets towards the destination incrementing the packets TTL with each new packet.  Basically, it starts with a TTL of 1.  The next hop router (the next layer 3 hop) get’s the packet, decrements the TTL by one and sends it on it’s way.  If the resulting TTL is 0, it can’t send it on it’s way so it instead returns a ‘TTL Exceeded’ message back to the host that sent the packet.  When the machine generating the traces get’s the response it sends another packet towards the destination with a TTL of 1 larger than the last packet.  This way, the host generating the traceroute get’s a response back from each layer 3 hop on it’s way to the destination.  Make sense?

So now that we know that, what’s going on with the output we are getting?  Basically, the host has two equal cost paths to the destination you are trying to reach.  At this point, a smart network engineer would say "Ok, but doesn’t the MLS use the CEF default of per destination load balancing?".  Ah ha!  You’ve hit the nail on the head here!  CEF defaults to per destination load balancing.  This ensures that traffic from the same source, headed to the same destination always takes the same path (saving us from out of order packets).  So that being said, this still doesn’t make sense.  If each probe takes the same path, then why are we getting two different paths at the first two hops? 

The answer is so simple it initially baffled me.  I can’t help myself here, so let me give you one more clue before I give you the answer.  If the same traceroute was ran from a standalone device plugged into this same MLS, we would get a traceroute with the same hop for each probe per TTL (in other words, a normal looking traceroute).  Did that help?

Ok, ok.  So the difference is that in this case, we are running the trace from the MLS itself.  That being said, we aren’t technically using the CEF table to do destination lookups.  Since the MLS originates the packets, it process switches  them.  When the traceroute probes are process switched, they default to the default process switching load balancing method which is per-packet.  Since the host we are generating the packets from has two equal cost paths, per packet load balancing sends one probe one way, the second the second way, and the third the first way again.  The same goes on for each subsequent hop.  If we generated the trace from a PC plugged into this MLS we wouldn’t have two equal cost paths.  The MLS does, but it’s just going to do per-destination load balancing on the trace probes that I send.  So I wont see different results for each probe.

Does that make sense?  The bottom line is that there is significant difference between a switch forwarding a packet, and a switch generating a packet. 

Tags: ,

Since I’ve recently become more interested in the actual switching and fabric architectures of Cisco devices I decided to take a deeper look at the 6500 series switches.  I’ve worked with them for years but until recently I didn’t have a solid idea on how they actually switched packets.  I had a general idea of how it worked and why DFCs were a good thing but I wanted to know more.  Based on my research, this is what I’ve come up with.  I’d love to hear any feedback on the post since there is a chance that some of what I’ve read isn’t totally accurate.  That being said, lets dive right in…

Control vs Data Plane
All actions on a switch can be considered to be part of either the control or the data plane.  The 6500 series switch is a hardware based switch which implies that it performs switching in hardware rather than software.  The pieces of the switch that perform switching in hardware are considered to be part of the data plane.  That being said, there still needs to be software component of the switch that tells the data plane how to function.  The parts of the switch that function in software are considered to be the control plane.  These components make decisions and perform advanced functions which then tell the data plane how to function.  Cisco’s implementation of forwarding in hardware is called CEF (Cisco Express Forwarding).

Switch Design
The 6500 series switch is a modular switch that is comprised of a few main components. Let’s take a look at each briefly.

The Chassis
The 6500 series switch comes in many shapes and sizes.  The most common (in my opinion) is the 6509.  The last number indicates the number of slots on the chassis itself.  There are also 3, 4, 6, and 13 slot chassis available.  The chassis is what holds all of the other components and facilitates connecting them together.  The modules plug into a silicon board called the backplane.

The Backplane
The backplane is the most crucial component of the chassis.  It has all of the connectors on it that the other modules plug into.  It has a few main components that are highlighted on the diagram below.

image 
The diagram shows the backplane of a standard 9 slot chassis.  Each slot has a connection to the crossbar switching fabric, the three buses (D,R,C) that compose the shared bus, and a power connection. 

The switch fabric in the 6500 is referred to as a ‘crossbar’ fabric.  It provides unique paths for each of the connected modules to send and receive data across the fabric.  In initial implementations the SUP didn’t have an integrated switch fabric which required the use of a separate module referred to as the SFM (Switch Fabric Module).  With the advent of the SUP720 series of SUPs the switch fabric is now integrated into the SUP itself.  The cross bar switching fabric provides multiple non-blocking paths between different modules.  The speed of the fabric is a function of both the chassis as well as the device providing the switch fabric. 

Standard 6500 Chassis
Provides a total of 40Gbps per slot
Enhanced(e) 6500 Chassis
Provides a total of 80Gbps per slot

SFM with SUP32 Supervisor
Single 8 gig Fabric Connection
256Gbps switching fabric
18 Fabric Channels
SUP720 through SUP720-3B Supervisor
Single 20 gig Fabric Connection
720Gbps switching fabric
18 Fabric Channels
SUP720-3C Supervisor
Dual 20 gig Fabric Connections
720Gbps switching fabric
18 Fabric Channels
SUP2T Supervisor
Dual 40 gig Fabric Connections
2.08Tbps switching fabric
26 Fabric Channels

So as you can see, there are quite a few combinations you can use here.  The bottom line is that with the newest SUP2T and the 6500e chassis, you could have a module with eight 10Gbps ports that wasn’t oversubscribed.

The other bus in the 6500 is referred to as a shared bus.  In the initial 6500 implementation the fabric bus wasn’t used.  Rather, all communication came across the shared bus.  The shared bus is actually comprised of 3 distinct buses.

DBus (Data Bus) – Is the main bus in which all data is transmitted.   The speed of the DBus is 32Gbps.
RBus (Results Bus) – Used by the supervisor to forward the result of the forwarding operation to each of the attached line cards.  The speed of the RBus is 4Gbps.
CBus (Control Bus) – Relays information between line cards and the supervisor.  This is also sometimes referred to as Ethernet Out of Band or EOB or EOBC (Ethernet Out of Band Controller).  The speed of the CBus is 100Mbps half duplex. 

The Supervisor (Or as well call them, SUPs)
The switch supervisor is the brains of the operation.  In the initial implementation of the 6500 the SUP handled the processing of all packets and made all of the forwarding decisions.  A supervisor is made up of three main components which include the switch fabric, MFSC (Multi-Layer Switch Feature Card), and the PFC (Policy Feature Card).  The image below shows a top down view of a SUP 720 and the location of each component on the physical card. 

image

 

MSFC – The Multi-Layer Switch Feature Card is considered to be the control plane of the switch.  The MSFC runs processes that help build and maintain the layer 3 forwarding table (routing table), process ACLs, run routing protocols, and other services that are not run in hardware.  The MSFC is actually comprised of two distinct pieces. 

SP – The SP (Switch Processor) handles booting the switch.  The SP copies the SP part of a IOS image from bootlfash, boot’s itself, and then copies the RP part of the IOS image to the RP.  Once the RP is booted the SP hands control of the switch over to the RP.  From that point on the RP is what the administrator talks to in order to administer the switch.  In most cases, the SP still handles layer 2 switch protocols such as ARP and STP. 

RP – The RP (Route Processor) handles all layer 3 functions of the 6500 including running routing protocols and building the RIB from with the FIB is populated.  Once the FIB is built in the RP it can be downloaded to the data plane TCAM for hardware based forwarding of packets.  The RP runs in parallel with the SP which it allows to provide the layer 2 functions of the switch. 

PFC – The policy feature card receives a copy of CEF’s FIB from the MFSC.  Since the MFSC doesn’t actually deal with forwarding any packets, the MFSC downloads the FIB into the hardware on the PFC.  Basically, the PFC is used to accelerate layer 2 and layer 3 switching and it learns how to do that from the MFSC.  The PFC is considered to be part of the data plane of the switch. 

Line Cards
The line cards of a 6500 series switch provide the port density to connect end user devices.  Line cards come in different port densities and support many different interface types.  Line cards connect to the SUP via the backplane. 

The other pieces…
The 6500 also has a fan tray slot as well as two slots for redundant power supplies.  I’m not going to cover these in detail since they don’t play into the switch architecture. 

Switching modes
Now that we’ve discussed the main components of the 6500 lets talk about the different ways in which a 6500 switches packets.  There are 5 main modes in which this occurs and the mode that is used relies heavily on what type of hardware is present in the chassis. 

Classis mode
In classic mode the attached modules make use of the shared bus in the chassis.  When a switchport receives a packet it is first locally queued on the card.  The line card then requests permission from the SUP to send the packet on to the DBUS.  If the SUP says yes, the packet is sent onto the DBUS and subsequently copied to the SUP as well as all other line cards.  The SUP then performs a look up on the PFC.  The result of that lookup is sent along the RBUS to all of the cards.  The card containing the destination port receives information on how to forward the packet while all other cards receive word to terminate processing on the packet and they delete it from their buffers.  The speed of the classic mode is 32gbps half duplex since it’s a shared medium. 

CEF256
In CEF256 mode each module has a connection to the shared 32Gbps bus as well as a 8Gbps connection to the switch fabric.  In addition each line card has a local 16Gbps bus (LCDBUS) on the card itself.  When a switchport receives a packet it is flooded on the LCDBUS and the fabric interface receives it.  The fabric interface floods the packet header onto the DBUS.  The PFC receives the header and makes the forwarding decision.  The result is flooded on the RBUS back to the line card and the fabric interface receives the forwarding information.  At that point, the entire packet is sent across the 8Gbps fabric connection to the destination line card.  The fabric interface on the egress line card floods the packet on the LCDBUS and the egress switchport sends the packet on it’s way out of the switch. 

dCEF256
In dCEF256 mode each line card has dual 8Gbps to the switch fabric and no connection to the shared bus.  In this method, the line card also has a DFC (Distributed forwarding card) which holds a local copy of the FIB as well as it’s own layer 2 adjacency table.  Since the card doesn’t need to forward packets or packet headers to the SUP for processing there is no need for a connection to the shared bus.  Additionally, dCEF256 cards have dual 16Gbps local line card buses.  The first LCDBUS handles half of the ports on the line card and the second LCDBUS handles the second half of the ports.  Communication from a port on one LCDBUS to a port on the second LCDBUS go through the switch fabric.  Since the line card has all of the forwarding information that it needs it can forward packets directly across the fabric to the egress line card without talking to the SUP.

CEF720
Identical operation to CEF256 but includes some upgrades.  The switch fabric is now integrated into the SUP rather than on a SFM.  And the dual fabric connections from each line card are now 20Gbps a piece rather than 8Gbps. 

dCEF720
Identical to dCEF256 with addition of same upgrades present in CEF720 (Faster fabric connections and SF in SUP). 

Centralized vs Distributed Forwarding
I had indicated earlier that the early implementations of the switch utilized the SUP to make all switching and forwarding decisions.  This would be considered to be centralized switching since the SUP is providing all of the functionality required to forward a packet or frame.  Lets take a look at how a packet is forwarded using centralized forwarding.

Line cards by default (in most cases) come with a CFC or centralized forwarding card.  The card has enough logic on it to know how to send frames and packets to the Supervisor when it needs an answer.  In addition, most cards can accept a DFC or distributed forwarding card.  DFCs are the functional equivalent to the PFC located on the SUP and hold an entire copy of CEF’s FIB and adjacency tables.  With a DFC in place, a line card can perform distributed forwarding which takes the SUP out of the picture. 

How centralized forwarding works…
1. Frame arrives at the port on a line card and is passed to the CFC on the local line card.
2. The bus interface on the CFC forwards the headers to the supervisor on the DBus.  All other line cards connected to the DBus ignore the headers.
3. The PFC on the supervisor makes a forwarding decision based on the headers and floods the result on the RBus.  All other line cards on the RBus ignore the result.
4. The CFC forwards the results ,along with with the packet, to the line cards fabric interface of the line card.  The fabric interface forwards the results and the packet onto the switch fabric towards their final destination. 
5. The egress line card’s fabric ASIC receives the packet and forwards the data out towards the egress port. 

How distributed forwarding works…
1. Frame arrives at the port on a line card and is passed to the fabric interface on the local line card. 
2.  The fabric interface sends just the headers to the DFC located on the local line card.
3. The DFC returns the forwarding decision of it’s lookup to the fabric interface. 
4. The fabric interface transmits the packet onto the switch fabric and towards the egress line card
5. Egress line card receives the packet and forwards the packet on to the egress port.

So as you can see, distributed forwarding is much quicker than centralized forwarding just from a process perspective.  In addition, it doesn’t require the use of the shared bus. 

Conclusion
There are many pieces of the 6500 that I didn’t cover in this post but hopefully it’s enough to get you started if you are interested in knowing how these switches work.  Hopefully I’ll have time soon to do a similar post on the Nexus 7000 series switch.

Tags: ,

For those of you who don’t know what MDIX is, it stands for Media Dependant Interface Crossover.  In other words, it’s the feature on switches that allows you to use a patch (straight through) cable rather than a crossover cable to interconnect switches.  It’s a great feature to have but there is some debate in regards to whether or not it should be used.

Personally I never use it.  Why?  Since I started Cisco it has been beaten into my head that trunks use crossover cables.  That’s just how it was.  Truthfully, most trunk links these days are going to be fiber but if we do run across a copper trunk we’ll use a crossover cable.

So why would we still mess around with using crossover cables when managed switches can flip the pairs for us?  Because there are a few things that you might not know about the auto-mdix feature on Cisco switches that can leave you perplexed if you don’t fully understand it.

The one big problem with auto-mdix is that you HAVE to use auto duplex and auto speed settings on the trunk ports.  Let’s take a look at an example of an auto-mdix configuration.

2940# config t
Enter configuration commands, one per line.  End with CNTL/Z.
2940(config)#
int faste0/4
2940(config-if)#
mdix auto
2940(config-if)#
1w6d: %LINK-3-UPDOWN: Interface FastEthernet0/4, changed state to up
1w6d: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/4, changed state to up
2940(config-if)#
duplex full
1w6d: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/4, changed state to down
1w6d: %LINK-3-UPDOWN: Interface FastEthernet0/4, changed state to down
2940(config-if)#

I start by configuring the port for auto-mdix.  As you can see, the instant I configure the option, the interface loads and comes up.  However, the instant I hard code the port to a duplex of full, the interface goes down.  A documented requirement of the auto-mdix feature is that you have to let both sides do auto duplex and speed negotiation.  So, if your company standard is to hard code speed and duplex on trunk ports, then you’ll be using a cross over cable.

I will admit that it’s a great feature to use in a pinch.  Sometime I just don’t have a crossover cable with me and in those cases I’ll use it temporarily.  But I always go back and put a crossover cable in its place.  That’s just me though.  I know some people that use it religiously, and other that won’t touch it.  I see it as an unnecessary complication that can cause issues down the road.  If an engineer doesn’t see the auto-mdix configuration and sets the speed or duplex they can end up stumped for days.  You should be aware that its an option, but be aware of its limitations.

Tags:

« Older entries