BGP operates on a Router in either of the following modes, as shown in Figure 1:
Internal BGP (IBGP)
External BGP (EBGP)
BGP is called IBGP when it runs within an AS It is called EBGP when it runs between ASs.
Speaker: The device that sends BGP messages is called a BGP speaker. The speaker receives or generates new routing information, and then advertises the routing information to other BGP speakers. When receiving a new route from another AS, a BGP speaker compares the route with the current route. If the route takes precedence over the existing route, or the route is new, the speaker advertises this route to all other BGP speakers except the BGP speaker that sent this route.
Peer: BGP speakers that exchange messages with each other are called peers. Multiple peers compose a peer group.
BGP runs by sending five types of BGP messages: Open, Update, Notification, Keepalive, and Route-refresh.
Open message: is the first message that is sent after a TCP connection is set up, and is used to set up BGP peer relationships. After the peer receives an Open message and peer negotiation succeeds, the peer sends a Keepalive message to confirm and maintain the peer relationship. Then, peers can exchange Update, Notification, Keepalive, and Route-refresh messages.
Update message: is used to exchange routes between BGP peers. Update messages can be used to send the following communications:
Advertise multiple reachable routes with the same attributes. These routes can share a group of route attributes. Route attributes contained in an Update message are applicable to all destination addresses (expressed by IP prefixes) contained in the Network Layer Reachability Information (NLRI) field of the Update message.
Withdraw multiple unreachable routes. Each route is identified by its destination address, which identifies routes previously advertised between BGP speakers.
Withdraw routes only. In this case, the message does not need to carry the path attributes or NLRI. Conversely, an Update message can be used only to advertise the reachable routes, so it does not need to carry information about withdrawn routes.
Notification message: is sent to its peer when BGP detects an error. The BGP connection is then torn down immediately.
Keepalive message: is sent periodically to the peer to maintain the peer relationship.
Route-refresh message: is used to notify the peer of the capability to refresh routes.
If all devices of BGP are enabled with Route-refresh capability, the local BGP device sends Route-refresh messages to peers when the import routing policy of BGP changes. After receiving the message, the peers resend their routing information to the local BGP device. The BGP routing table can be dynamically refreshed, and the new routing policy can be used, without tearing down BGP connections.
The BGP Finite State Machine (FSM) has six states: Idle, Connect, Active, OpenSent, OpenConfirm, and Established.
In Idle state, BGP denies all connection requests. This is the initial status of BGP.
Upon receiving a Start event, BGP initiates a TCP connection to the remote BGP peer, starts the ConnectRetry Timer with the initial value, listens for a TCP connection initiated by the remote BGP peer, and changes its state to Connect.
If the TCP connection succeeds, BGP stops the ConnectRetry Timer, sends an Open message to the remote peer, and changes its state to OpenSent.
If the TCP connection fails, BGP restarts the ConnectRetry Timer with the initial value, continues to listen for a TCP connection initiated by the remote peer, and changes its state to Active.
If the ConnectRetry Timer has expired before a TCP connection is established, BGP restarts the timer with the initial value, initiates a TCP connection to the remote BGP peer, and stays in the Connect state.
If the TCP connection succeeds, BGP stops the ConnectRetry Timer, sends an Open message to the remote peer, and changes its state to OpenSent.
If the ConnectRetry Timer has expired before a TCP connection is established, BGP restarts the timer with the initial value and changes its state to Connect.
If BGP initiates a TCP connection with an unknown IP address, the TCP connection fails. When this occurs, BGP restarts the ConnectRetry Timer with the initial value and stays in the Active state.
If there are no errors in the Open message received, BGP changes its state to OpenConfirm.
If there are errors in the Open message received, BGP sends a Notification message to the remote peer and changes its state to Idle.
If the TCP connection fails, BGP restarts the ConnectRetry Timer with the initial value, continues to listen for a TCP connection initiated by the remote peer, and changes its state to Active.
If BGP receives a Notification message or the TCP connection fails, BGP changes its state to Idle.
If BGP receives a Keepalive message, BGP changes its state to Established.
In Established state, BGP peers can exchange Update, Route-Refresh, Keepalive, and Notification messages.
If BGP receives an Update or a Keepalive message, its state stays in Established.
If BGP receives a Notification message, BGP changes its state to Idle.
The BGP peer relationship can be established only when both BGP peers are in the Established state. The two peers send Update messages to exchange routes.
BGP adopts TCP as its transport layer protocol. Before the BGP peer relationship is set up, a TCP connection must be set up between the peers. Then, BGP peers exchange Open messages to negotiate related parameters, and finally establish the BGP peer relationship.
After the peer relationship is set up, BGP peers exchange BGP routing tables. BGP does not periodically update the routing table. When BGP routes change, however, BGP updates the BGP routing table incrementally through Update messages.
BGP sends Keepalive messages to maintain the BGP connection between peers. When it detects an error on a network, for example, error packets or packets that indicate unsupported negotiation capability are received, BGP sends a Notification message to report the error, and the BGP connection is torn down.
The BGP route attribute is a set of parameters that further describe routes. With the BGP route attribute, BGP can filter and select routes. BGP route attributes are classified into the following types:
Well-known mandatory: can be identified by all BGP devices. This type of attribute is mandatory and must be carried in Update messages. Without this attribute, errors occur in the routing information.
Well-known discretionary: can be identified by all BGP devices. The attribute is discretionary and is not necessarily carried in Update messages.
Optional transitive: indicates the transitive attribute between ASs. A BGP device may not recognize this attribute, but it still receives these attributes and advertises them to other peers.
Optional non-transitive: indicates an attribute that is not recognized. The corresponding attributes are ignored and are not advertised to other peers.
The common BGP route attributes are described as follows:
Origin: defines the origin of a route and marks the paths of a BGP route. The Origin attributes are classified into the following types:
IGP: indicates the highest priority. For routing information obtained through an IGP of the AS that originates the route, the Origin attribute is IGP. For example, for routes imported to the BGP routing table through the network command, the Origin attribute is IGP.
Exterior Gateway Protocol (EGP): indicates the second highest priority. The Origin attribute of routes obtained through EGP is EGP.
Incomplete: indicates the lowest priority. The Origin attribute of routes learned by other means is Incomplete. For example, for the routes imported by BGP through the import-route command, the Origin attribute is Incomplete.
AS_Path: is used to record all ASs that a route passes through from the local end to the destination in the distance-vector (DV) order.
Assume that the BGP speaker advertises a local route:
When advertising the route to other ASs, the BGP speaker adds the local AS number in the AS_Path list, and advertises it to the neighboring devices through Update messages.
When advertising the route to the local AS, the BGP speaker creates an empty AS_Path list in an Update message.
Assume that the BGP speaker advertises the routes learned from Update messages of other BGP speakers:
When advertising the route to other ASs, the BGP speaker adds the local AS number to the beginning of the AS_Path list. According to the AS_Path attribute, the BGP device that receives the route can detect the ASs through which the route passes to the destination. The number of the AS that is nearest to the local AS is placed at the top of the list. The other AS numbers are arranged in sequence.
When the BGP speaker advertises the route to the local AS, it does not change the AS_Path.
The AS_Confed_Sequence and AS_Confed_Set attributes are used to prevent route loops and to select routes among the various sub-ASs in a confederation.
Next_Hop: is different from that of IGP. It is not necessarily the IP address of a neighboring device. Generally, the Next_Hop attribute complies with the following principles:
When advertising a route to an EBGP peer, the BGP speaker sets the next hop of the route to the address of the local interface through which the BGP peer relationship is set up.
When advertising a locally generated route to an IBGP peer, the BGP speaker sets the next hop of the route to the address of the local interface through which the BGP peer relationship is set up.
When advertising a route learned from an EBGP peer to an IBGP peer, the BGP speaker does not change the next hop of the route.
Multi_Exit Discriminator (MED): is exchanged only between two neighboring ASs. The AS that receives the MED does not advertise it to any other ASs.
MED serves as the metric used by an IGP. It is used to determine the optimal route when traffic enters an AS. When a BGP device obtains multiple routes to the same destination address but with different next hops through EBGP peers, the route with the smallest MED value is selected as the optimal route.
Local_Pref: indicates preferences of the BGP devices. It is exchanged only between IBGP peers and is not advertised to other ASs.
The Local_Pref attribute is used to determine the optimal route when traffic leaves an AS. When a BGP device obtains multiple routes to the same destination address but with different next hops through IBGP peers, the route with the largest Local_Pref value is selected.
When there are multiple routes to the same destination, BGP selects routes according to the following policies:
Prefers the route with the highest PreVal.
PrefVal is a Huawei-specific parameter. It is valid only on the device where it is configured.
Prefers the route with the highest Local_Pref.
A route without Local_Pref has had the value set using the default local-preference command or has a value of 100 by default.
Prefers a locally originated route. A locally originated route takes precedence over a route learned from a peer.
Locally originated routes include routes imported using the network command or the import-route command, manually summarized routes, and automatically summarized routes.
Prefers the route with the shortest AS_Path.
Prefers the route with the highest Origin type. IGP is higher than EGP, and EGP is higher than Incomplete.
Prefers the route with the lowest MED.
Prefers EBGP routes over IBGP routes.
EBGP is higher than IBGP, IBGP is higher than LocalCross, and LocalCross is higher than RemoteCross.
If the export route target (ERT) of a VPNv4 route in the routing table of a VPN instance on a Provide Edge (PE) matches the import route target (IRT) of another VPN instance on the PE, the Virtual Private Network version 4 (VPNv4) route is added to the routing table of the second VPN instance. This is called LocalCross. If the ERT of a VPNv4 route from a remote PE is learned by the local PE and matches the IRT of a VPN instance on the local PE, the VPNv4 route will be added to the routing table of that VPN instance. This is called RemoteCross.
Prefers the route with the lowest IGP metric to the BGP next hop.
After the bestroute igp-metric-ignore command is run, the IGP metrics are not compared for routes during route selection.
Assume that load balancing is configured. If the preceding rules are the same and there are multiple external routes with the same AS_Path, load balancing will be performed based on the number of configured routes.
Prefers the route with the shortest Cluster_List.
Prefers the route advertised by the device with the smallest router ID.
If routes carry the Originator_ID, the originator ID is substituted for the router ID during route selection. The route with the smallest Originator_ID is preferred.
Prefers the route learned from the peer with the smallest address if the IP addresses of peers are compared in the route selection process.
When multiple equal-cost routes have the same destination address, traffic can be evenly load balanced using BGP Equal Cost Multiple Path (ECMP).
Condition for BGP ECMP: Routes must have the same first eight attributes defined in the preceding "Policies for BGP Route Selection".
BGP adopts the following policies for the BGP speaker to advertise routes:
Advertises only the optimal route to its peer when there are multiple valid routes.
Advertises the routes learned from EBGP devices to all BGP peers, including EBGP peers and IBGP peers.
Does not advertise the routes learned from IBGP devices to its IBGP peers.
Advertises the routes learned from IBGP devices to its EBGP peers.
Advertises all BGP optimal routes to new peers when the peer relationship is established.
IBGP and IGP are synchronized to prevent unreachable routes being imported to the external AS devices.
If a non-BGP device in an AS provides forwarding service, IP packets forwarded by this AS might be discarded because the destination address is unreachable. As shown in Figure 2, RouterE learns route 8.0.0.0/8 of RouterA from RouterD through BGP, and then forwards the packet to RouterD. RouterD searches the routing table and detects that the next hop is RouterB. RouterD forwards the packet to RouterC through route iteration, because RouterD obtained a route to RouterB through IGP. RouterC, however, does not obtain the route to 8.0.0.0/8 and discards the packet.
If synchronization is configured, devices check the IGP routing table before they add the IBGP route to the routing table and advertising it to the EBGP peers. The IBGP route is added to the routing table and advertised to EBGP peers only when IGP obtains this IBGP route.
The synchronization can be disabled in the following cases:
In the FW, the synchronization function is disabled by default.