IS-IS Graceful Restart (GR) implements non-stop forwarding by extending IS-IS to support the GR capability. It is one of the high availability (HA) technologies. RFC 3847 defines the IS-IS GR standard.
IS-IS is a link state routing protocol. All routers in an area must maintain the same network topologies, that is, the same LSDBs.
After the master/slave switchover, no neighbor information is stored on the restarted router. Thus, the first Hello packets sent by the router do not contain the neighbor list. After receiving the Hello packets, the neighbor checks the 2-way neighbor relationship and finds that it is not in the neighbor list of the Hello packets sent by the router. Thus, the neighbor relationship is interrupted.
The neighbor then generates new LSPs and floods the topology changes to all other routers in the area. Routers in the area then calculate routes based on the new LSDBs, which leads to route interruption or routing loops.
Because no LSDB is stored on the restarted router, the router needs to synchronize its LSDB with those of the neighbors after the master/slave switchover.
If IS-IS is not restarted in GR mode, IS-IS neighbor relationships are reset and LSPs are regenerated and flooded. This triggers the SPF calculation in the entire area, which causes route flapping and forwarding interruption in the area.
The IETF defined the GR standard, RFC 3847, for IS-IS. The restart of the protocol is processed for both the reserved FIB tables and unreserved FIB tables. Thus, the route flapping and interruption of the traffic forwarding caused by the restart can be avoided.
When a router fails, neighbors at the routing protocol layer detect that their neighbor relationships are Down and then become Up again after a period of time. This is the flapping of neighbor relationships. The flapping of neighbor relationships causes route flapping, which leads to black hole routes on the restarted router or causes data services from the neighbors to be looped on the restarted router. This decreases the reliability on the network. GR is thus introduced to address route flapping.
IS-IS GR involves two roles, namely, GR restarter and GR helper.
GR restarter
The GR restarter refers to the router that restarts in GR mode.
GR- helper
The GR helper refers to another GR router that helps the restarter to complete the GR process. The GR restarter must have the capability of the GR helper.
By default, the device supports the GR helper.
To implement GR, IS-IS introduces the restart Type-Length-Value (TLV), T1 timer, T2 timer, and T3 timer.
The restart TLV is an extended part of an IS-to-IS Hello (IIH) PDU. All IIH packets of the router that supports IS-IS GR contains the restart TLV. The restart TLV carries the parameters for the protocol restart. Figure 1 shows the format of the restart TLV.
Table 1 describes the fields of the restart TLV.
| Field | Length | Description |
|---|---|---|
| Type | 1 byte | Indicates the TLV type. If the value is 211, the TLV is the restart TLV. |
| Length | 1 byte | Indicates the length of the TLV. |
| RR | 1 bit | Indicates the restart request bit. A router sends an RR packet to notify the neighbors of its restarting or starting and to require the neighbors to retain the current IS-IS adjacency and return CSNPs. |
| RA | 1 bit | Indicates the restart acknowledgement bit. A router sends an RA packet to respond to the RR packet. |
| SA | 1 bit | Indicates the suppress adjacency advertisement bit. The starting router uses an SA packet to require its neighbors to suppress the broadcast of their neighbor relationships to prevent routing loops. |
| Remaining Time | 2 bytes | Indicates the time during which the neighbor does not reset the adjacency. The length of the field is 2 bytes. The time is measured in seconds. When RA is reset, the value is mandatory. |
| Restarting Neighbor System ID | 6 bytes | Indicates the system ID of the neighboring router that responds to the RA packet. |
Three timers are introduced to enhance IS-IS GR. They are T1, T2, and T3 timers.
T1
Any interface enabled with IS-IS GR maintains a T1 timer. On a Level-1-2 router, broadcast interfaces maintain a T1 timer for Level-1 and Level-2 neighbor relationships respectively.
If the GR restarter has already sent an IIH packet with RR being set but does not receive any IIH packet that carries the restart TLV and the RA set from the GR helper even after the T1 timer expires, the GR restarter will reset the T1 timer and continues to send the restart TLV.
If the ACK packet is received or the T1 timer expires for three times, the T1 timer is deleted. The default value of a T1 timer is 3 seconds.
T2
Level-1 and Level-2 LSDBs maintain separate T2 timers.
T2 is the maximum time that the system waits for the synchronization of various LSDBs. T2 is generally 60 seconds.
T3
The entire system maintains a T3 timer.
T3 timer can be considered as the maximum time for GR to complete.
If the T3 timer expires, GR fails.
The initial value of the T3 timer is 65535 seconds. After the IIH packets that carry the RA are received from neighbors, the value of the T3 timer becomes the smallest value of the Remaining Time field among the Remaining Time fields of the IIH packets.
The T3 timer applies to only restarting devices.
For differentiation, GR triggered by the master/slave switchover or the restart of an IS-IS process is referred to as restarting. In this case, the FIB table remains unchanged. GR triggered by router restart is referred to as starting. In this case, the FIB table is updated.
The following describes the process of IS-IS GR in restarting and starting modes:
Figure 2 shows the process of IS-IS restarting.
After performing the protocol restart, the GR restarter performs the following actions:
After receiving an IIH packet, the GR helper performs the following actions:
If the GR helper does not support GR, it ignores the restart TLV and resets the adjacency with the GR restarter according to the normal processing of IS-IS.
After the GR restarter receives the IIH response packet, in which RR is set to 0 and RA is set to 1, from the neighbor, it performs the following actions:
After the T2 timer is deleted, the LSDB of the level has been synchronized.
The starting device does not keep the FIB table. Thus, the starting device hopes the neighbors, whose adjacency with itself is Up before it starts, reset their adjacency, and suppress the neighbors from advertising their adjacency. The IS-IS starting process is different from the IS-IS restarting process, as shown in Figure 3.
After the GR restarter is started, it performs the following actions:
Sends IIH packets that contain the restart TLV from all interfaces. In such a packet, RR is set to 0, and SA is set to 1.
If RR is set to 0, a router is started.
If SA is set to 1, the router requests its neighbor to suppress the advertisement of their adjacency before the neighbor receives the IIH packet in which SA is set to 0.
After the neighbor receives the IIH packet that carries the restart TLV, it performs the following actions according to whether GR is supported:
GR is supported.
Re-initiates the adjacency.
Deletes the description of the adjacency with the GR restarter from the sent LSP. The neighbor also ignores the link connected to the GR restarter when performing SPF calculation until it receives an IIH packet in which SA is set to 0.
GR is not supported.
Ignores the restart TLV and resets the adjacency with the GR restarter.
Replies an IIH packet that does not contain the restart TLV. The neighbor then turns to the normal IS-IS processing. In this case, the neighbor does not suppress the advertisement of the adjacency with the GR restarter. On a P2P link, the neighbor also sends a CSNP.
After the GR restarter receives the IIH ACK packet and CSNP from the neighbor, it deletes the T1 timer.
If the GR restarter does not receive the IIH packet or CSNP, it constantly resets the T1 timer and resends the IIH packet in which RR and SA are set to 1. If the number of the timeouts of the T1 timer exceeds the threshold value, the GR restarter forcibly deletes the T1 timer and turns to the normal IS-IS processing to complete LSDB synchronization.