I’ve finally had time to do some proper studying for JNCIE, and I noticed something that I may have been getting wrong for a looong time. It is minor, but could have bad consequences in a route-reflection environment.
I have a lab topology set up that looks like this:
R1 is advertising a direct network of 10.0.5.0 to the route reflectors R3 and R4. When I looked at R5 I was expecting to see R1 as the “protocol next-hop” but instead I was seeing R3 and R4. That didn’t look right to me.
Some explanation first: When you look at a route using “extensive” you get quite a lot of information but in there are two types of next hop. The “forwarding next hop” is (literally) the next IP hop to get to the “protocol next-hop” which is the BGP speaker that is advertising the route. The forwarding next hop is derived from the IGP, but the protocol next hop comes from iBGP. I was expecting to see the forwarding next hop to be the other end of one of the circuits to R3 or R4 (10.0.2.2 or 10.0.2.10), and that the protocol next hop would be R1 (10.0.6.1). But it wasn’t – see below:
root@R5> show route 10.0.5.0 extensive inet.0: 27 destinations, 30 routes (27 active, 0 holddown, 0 hidden) 10.0.5.0/24 (2 entries, 1 announced) TSI: KRT in-kernel 10.0.5.0/24 -> {indirect(1048574)} *BGP Preference: 170/-101 Next hop type: Indirect Address: 0x94036dc Next-hop reference count: 7 Source: 10.0.3.3 Next hop type: Router, Next hop index: 573 Next hop: 10.0.2.2 via ge-0/0/0.0, selected Session Id: 0x4 Protocol next hop: 10.0.3.3 Indirect next hop: 0x975c000 1048574 INH Session ID: 0x5 State: <Active Int Ext> Local AS: 65020 Peer AS: 65020 Age: 1:04 Metric2: 1 Validation State: unverified Task: BGP_65020.10.0.3.3+179 Announcement bits (2): 0-KRT 4-Resolve tree 1 AS path: I (Originator) Cluster list: 10.0.3.3 Originator ID: 10.0.6.1 Communities: 65020:30 Accepted Localpref: 100 Router ID: 10.0.3.3 Indirect next hops: 1 Protocol next hop: 10.0.3.3 Metric: 1 <==== Uh oh - protocol next hop is route-reflector... Indirect next hop: 0x975c000 1048574 INH Session ID: 0x5 Indirect path forwarding next hops: 1 Next hop type: Router Next hop: 10.0.2.2 via ge-0/0/0.0 Session Id: 0x4 10.0.3.3/32 Originating RIB: inet.0 Metric: 1 Node path count: 1 Forwarding nexthops: 1 Nexthop: 10.0.2.2 via ge-0/0/0.0
As far as I always understood it, changing of next hops is only done when an eBGP prefix is advertised to iBGP speakers. The route reflector in my lab has a policy doing next-hop-self for eBGP learned routes, and this was exported on the iBGP sessions:
root@R3# show policy-options policy-statement NHS term T1 { from { protocol bgp; } then { next-hop self; } }
I thought that the route-reflector would follow the rules and not change the next-hop of reflected iBGP-learned routes, but it was changing them. This could be pretty bad in a production network because it then makes all traffic go through a non-optimal path – i.e. through the route-reflector.
How to fix it? Simply modify the NHS policy to match external routes only:
root@R3# show policy-options policy-statement NHS term T1 { from { protocol bgp; route-type external; } then { next-hop self; } }
Now, I’m getting the expected behaviour – forwarding hops are on the local links, and the protocol next hop is the originating iBGP speaker.
root@R5> show route 10.0.5.0 extensive inet.0: 27 destinations, 30 routes (27 active, 0 holddown, 0 hidden) 10.0.5.0/24 (2 entries, 1 announced) TSI: KRT in-kernel 10.0.5.0/24 -> {indirect(1048574)} *BGP Preference: 170/-101 Next hop type: Indirect Address: 0x9403688 Next-hop reference count: 4 Source: 10.0.3.3 Next hop type: Router, Next hop index: 573 Next hop: 10.0.2.2 via ge-0/0/0.0, selected Session Id: 0x4 Protocol next hop: 10.0.6.1 Indirect next hop: 0x975c220 1048574 INH Session ID: 0x8 State: <Active Int Ext> Local AS: 65020 Peer AS: 65020 Age: 53:01 Metric2: 2 Validation State: unverified Task: BGP_65020.10.0.3.3+179 Announcement bits (2): 0-KRT 4-Resolve tree 1 AS path: I (Originator) Cluster list: 10.0.3.3 Originator ID: 10.0.6.1 Communities: 65020:30 Accepted Localpref: 100 Router ID: 10.0.3.3 Indirect next hops: 1 Protocol next hop: 10.0.6.1 Metric: 2 <=== Correct proto next hop! Indirect next hop: 0x975c220 1048574 INH Session ID: 0x8 Indirect path forwarding next hops: 1 Next hop type: Router Next hop: 10.0.2.2 via ge-0/0/0.0 Session Id: 0x4 10.0.6.1/32 Originating RIB: inet.0 Metric: 2 Node path count: 1 Forwarding nexthops: 1 Nexthop: 10.0.2.2 via ge-0/0/0.0
Not that any of this matters since my topology makes everything pass through the route-reflector anyway!
Your RR *would* follow the rules. But the application of im/export policies changes the rules so of course the router changes the NLRI to NHS.
It also matters where you apply the policy. If you only applied it to the group with your ebgp neighbor then you also wouldn’t have had this problem. ‘set protocols bgp import {…}’ is very different from ‘set protocols bgp groups import {…}’
Yeah – you’re right of course! I removed the policy completely after I wrote the post above and it worked as you describe. Maybe i need to edit the post…
My policy was exported on the IBGP group actually – I’ve always done it that way for some reason. I figure I’m changing the next hop on EBGP-learned routes for the benefit of my IBGP routers, so I do it on that group. Would you do it as an import on the EBGP group instead normally?
It all depends. I prefer to modify the routes as I get them so the router advertising the routes isn’t not “do as I say, not as I do” with the respect to the rest of the AS. Internal consistency.
Very nice catch.