After some reflection I’ve realized that while I’ve spent a lot of time talking about BGP in it’s many forms I haven’t really ever done a deep dive on it. To be clear – Im not aiming to talk about how to configure BGP , or how path selection works, or even how to troubleshoot BGP. What I want to examine is what BGP is doing on the wire. How it communicates with peers, when it sends updates, and what kind of things are in the updates. Im hoping to write several blogs starting with the basics and then diving deeper as we go. That said, let’s get started!
To start things out with – I think it makes sense to start with a simple lab consisting of two BGP nodes that are peering together. Something like this…
I don’t want to spend a lot of time focusing on the configuration syntax and basic configuration parameters so let’s just run BIRD on both of the nodes so we can get off the ground with minimal effort. Let’s assume that both BGP Peers shown above are just normal Ubuntu VMs and both have a single interface on common 169.254.10.0/31 network.
Note: All of the links will come out of the /24 and be /31 IPv4 networks.
To install BIRD we’ll simply….
apt install bird2
Once that completes we should be up and running with a default BIRD configuration. You can validate this by simply typing birdc show proto
…
root@vm0:~# birdc show proto
BIRD 2.0.7 ready.
Name Proto Table State Since Info
device1 Device --- up 18:12:43.178
direct1 Direct --- down 18:12:43.178
kernel1 Kernel master4 up 18:12:43.178
kernel2 Kernel master6 up 18:12:43.178
static1 Static master4 up 18:12:43.178
root@vm0:~#
Perfect! Now the default configuration that comes with BIRD is meant to be more of an example of what’s possible rather than a meaningful configuration. That said – let’s add our own basic configuration to the BIRD-0 host. You can simply overwrite the configuration in /etc/bird/bird.conf
with this…
log syslog all;
router id 169.254.10.0;
debug protocols {states,interfaces,events};
protocol bgp bird_1_iBGP_peering {
local as 65000;
neighbor 169.254.10.1 as 65000;
ipv4 {
import all;
export all;
};
}
And then restart the bird service (systemctl restart bird
). Now let’s do something similar on the BIRD-1 host…
log syslog all;
router id 169.254.10.1;
debug protocols {states,interfaces,events};
protocol bgp bird_0_iBGP_peering {
local as 65000;
neighbor 169.254.10.0 as 65000;
ipv4 {
import all;
export all;
};
}
After you have this configuration in place on both nodes, and have restarted the BIRD service, you should be able to run the show proto
command again and see something like this…
root@vm0:~# birdc show proto
BIRD 2.0.7 ready.
Name Proto Table State Since Info
bird_1_iBGP_peering BGP --- up 18:53:51.344 Established
root@vm0:~#
Wohoo! That might be the quickest Zero to BGP peering we’ve ever done here! Now at this point – all we have is a BGP session. But that by itself required a lot of communication to happen. I’d like to spend the rest of this post talking about what actually happened for both of these BGP peers to actually decide it was OK for them to form a peering relationship. So let’s kill the BIRD service on the BIRD-0 host, start a packet capture on the wire, and then restart the service so we can see the communication that happens when the hosts try to BGP peer…
Now this is hard to see but above is the total output of our BGP session startup. So let’s walk through these packets one at a time to make sure we know what’s going on…
Packet 1 – Packet one shows the host BIRD-1 (169.254.10.1) attempting to setup a TCP session to the host BIRD-0 (169.254.10.0) by sending a SYN packet. This makes sense as we only stopped the BGP service on BIRD-0 so BIRD-1 will happily try to kick things off.
Packet 2 – The host BIRD-0 sends a reset to BIRD-1. It does this because it’s not actually listening on the TCP 179 for BGP yet as the service is still down.
Packet 3 – We restart the BIRD service on the host BIRD-0 and it immediately attempts to setup a TCP connection to the host BIRD-1 by sending a SYN packet. Let’s use this TCP session setup as a means to remind ourselves how the TCP three way handshake works. You’ll note that the sequence number in this SYN packet is 2120355052 which is a random 32 bit number chosen by the client. This initial sequence value is sometimes referred to as an “Initial Sequence Number” or ISN for short. Also worth noting is that the acknowledgement number in this packet is set to 0 which it should always be in the initial setup of a TCP session.
Note: Keep in mind that Wireshark is trying to be helpful which is why the “Sequence number” field is listed as 0. It’s 0 relative to the first TCP segment seen. To see the real sequence number you should be looking at the “raw” version of this field.
Packet 4 – In packet 4 we see BIRD-1’s response to the SYN sent by BIRD-0. Note that both the SYN and ACK flags are set and that the acknowledgement number is 2120355053 which is the sequence number received in the SYN packet from BIRD-0 (2120355052) plus 1. In addition BIRD-1 has sent it’s own ISN of 147274070 in the reply to BIRD-0.
Packet 5 – In packet 5 we see the final ACK of the TCP session setup. Note that the raw sequence number is the ISN show in packet 3 incremented by 1. Also the acknowledgment number is the ISN sent in packet 4 incremented by 1. In terms of TCP options we see that the ACK flag is also set. While there’s a lot more going on at the TCP level (windowing etc) that we arent covering here we now have a setup TCP session.
Note: It’s worth mentioning here that there are generally 6 phases of BGP session establishment. While checking on the status of a BGP peer you might see either Idle, Connect, Active, Open Sent, Open Confirm, or Established. The first 3 typically refer to the TCP session setup while the last 3 refer to the BGP session setup. I’ll note here that the “Active” state has led to more than one conversation around “Well I don’t know why you’re end is down – my end says it’s active” when turning up BGP peerings.
Packet 6 – Our first BGP protocol packet! In this packet we see host BIRD-0’s attempt to start a BGP session. To do this the host originates a BGP open message. The BGP open message includes a surprising amount of information about the peers BGP configuration. Let’s talk through it line by line…
Marker – This one deserves a paragraph on it’s own which we’ll cover in a later post. For now just know that in a standard BGP open message this field is just filled with all 1s.
Length: The Length of the BGP options which is 53 bytes. In other words the TCP payload of this frame is 53 bytes.
Type: There are defined types of BGP message types. 1 happens to be the BGP Open message.
Version: The BGP version being used. While many folks assume that version 4 implies that this is an IPv4 peering – that’s not actually true. It’s the actual version of the BGP protocol which means the previous version was in fact BGP-3. I would be stretching my memory if I took a guess at when BGP-4 became the defacto standard but Im thinking it was in the early 2000s.
My AS – Just what it sounds like. The Autonomous system this BGP speaker lives in.
Hold time – This describes the hold time for the BGP session. There is often confusion around this with BGP. There’s always a misconception around hold time and keep alive timers and if they need to match. The reality is that they dont need to match and the only thing that’s negotiated is the hold time. The hold time negotiation is really just the two peers picking the lower of the two values in the exchanged open messages. The keepalive is not negotiated and is assumed to be 1/3 of the hold time by default. That said – the keepalive can be manually set on each peer as well although I’d argue you’d have to have a very good reason to manually change the keep-alive timers. But it does mean that you can configure one peer to send keepalives every second and let the other use the default keepalive (which in BIRD is a hold time of 240 seconds yielding a keepalive of 80 seconds). The summary here being that hold times are negotiated and keepalives are assumed based on the hold time by default. They can be overriden in the peer configuration so each peer sends them at different rates.
BGP Identifier – This is the peers router-ID. There is also general confusion around this. In most cases – a router-ID is configured to align with the peering IP address. This makes things easy to comprehend. The reality though is that a router-ID can be pretty much anything you want. Some operating systems mandate that the router-ID is one of the defined IP addresses on the routing system. They even go so far as to have logic that dynamically picks a router-ID based on the available IPs. That should be your first hint that there isn’t a strong binding between the peering relationship and the router-ID. They are only mandating that you have it configured as “a” IP address on the system. That could be anything! Now you might be wondering – what happens if the IPs overlap between peers? Turns out so long as it’s an eBGP peering that works! The only rule is that a router-ID can not overlap inside of a given AS. At the end of the day – the best approach is to align the router-ID with something that’s significant. For iBGP peerings likely the source transport loopback – for eBGP the peering interface IP address in the case of a single peer.
Optional Parameters – These define the optional parameters which are typically a list of capabilities the peer can support. There can be other things than capabilities here – but the most common type of optional parameter is type 2 (capabilities). We won’t dive into each capability listed in this open (there’s a complete reference here) – but it’s important to know that peers need to align on the available capabilities. That is – if one peer supports something the other peer doesnt – that capability wont work across the peering. The capability negotiation (I use that term very very loosely here) is actually sort of a mess. Since the exchange of capabilities only occurs as part of the BGP open some BGP implementations will forcibly restart a session if you change your local capabilities. Others might expect you to know that you have to restart the session and will happily keep running without the new capability until you manually restart the session. Ivan Pepelnjak wrote an interesting post about this awhile back and mentioned the draft RFC for capability exchange which appears to have still gone nowhere. Since we’re using BIRD I’ll note that if you change the capabilities and ask BIRD to refresh the configuration (birdc configure
) the host will actually send a BGP notification message (type 3) major code 6 minor code 6. Which if we reference our type codes from the links you’ll see it’s a BGP notification message for Cease because of configuration change. Put another way – BIRD will forcibly restart the session if the peer capabilities change. That said if there’s not consensus on a capability it’s just not used.
Packet 7 – Packet 7 is simply host BIRD-1 ACKing the BGP open packet it got from BIRD-0. If you’re still playing the TCP game with me you’ll notice that the ACK number is the length of the open packet TCP segment (53) plus the initial 1 from the TCP session setup.
Packet 8 – Now it’s host BIRD-1’s turn to send it’s BGP open message to BIRD-0. These packets should look shockingly similar as they share almost identical configuration.
Packet 9 – Host BIRD-0 ACKing the open from BIRD-1. Again you can follow the TCP seq/ack numbers if you’re interested. A fun exercise if you’re so inclined on these smaller more manageable captures.
Packet 10 – BIRD-1 sends BIRD-0 a keepalive message. While keep alive messages have a distinct use case in terms of keeping a BGP session alive this one is actually super important. It signals to BIRD-0 that BIRD-1 has agreed to the BGP session (agreed to the terms in the open message) and acts as a sort of final ACK that we’re good to establigh the BGP peering. After this initial keepalive future keepalive messages are used to keep the session open.
Packet 11 – BIRD-0 sends BIRD-1 the keepalive ACK
Packet 12 – BIRD-0 ACKing the keepalive from BIRD-1
Packet 13 – BIRD-1 ACKing the keepalive from BIRD-0. The BGP session is now fully established!
Packet 14 – The next thing we see right after the session comes up is what I would describe as an empty update packet. We can see it’s a type 2 BGP message which is an update. But it looks sort of desperately empty. If you’re thinking this makes sense because we aren’t advertising any routes yet you’d be mostly correct. But the reality is this is a very special kind of BGP update message that’s referred to as an “End of RIB marker”. While initially defined for use as part of the BGP graceful restart capability (we’ll talk about that in an upcoming post because it is some truly interesting signaling that takes place) it was proposed to be used as part of general BGP convergence and used in some other capabilities such route target constraints (another interesting use case we’ll eventually get to). In either case – you might have noticed that one of the default capabilities our peers have is – you guessed it – graceful restart! Since we negotiated that capability we use it which means that following the RFC which says….
The Receiving Speaker MUST send the End-of-RIB marker once it completes the initial update for an address family (including the case that it has no routes to send) to the peer.
So in our case we see in both packet 14 and 16 that each BGP speaker sends this End of RIB marker to other peer to indicate that it has no more updates to send. But what makes this an End of RIB marker? Specifically the fact that it’s an update that has 0 withdrawn paths and 0 path attribute length.
Packet 15 – ACKing the End of RIB marker from BIRD-0
Packet 16 – End of RIB marker from BIRD-1
Packet 17 – ACKing the End of RIB marker from BIRD-1
So there you have it – a walk through of basic BGP session establishment. In the next post we’re going to cover some other use cases around this basic BGP Peering before we move into the exciting world of actually exchanging routing information!