Protocol Next-hops in a Junos Route-Reflection Cluster

5 11 2014

I’ve finally had time to do some proper studying for JNCIE, and I noticed something that I may have been getting wrong for a looong time.  It is minor, but could have bad consequences in a route-reflection environment.

I have a lab topology set up that looks like this:

Route-reflection in JNCIE study lab

Route-reflection in JNCIE study lab

R1 is advertising a direct network of 10.0.5.0 to the route reflectors R3 and R4.   When I looked at R5 I was expecting to see R1 as the “protocol next-hop” but instead I was seeing R3 and R4.  That didn’t look right to me.

Some explanation first:  When you look at a route using “extensive” you get quite a lot of information but in there are two types of next hop.  The “forwarding next hop” is (literally) the next IP hop to get to the “protocol next-hop” which is the BGP speaker that is advertising the route.   The forwarding next hop is derived from the IGP, but the protocol next hop comes from iBGP.    I was expecting to see the forwarding next hop to be the other end of one of the circuits to R3 or R4 (10.0.2.2 or 10.0.2.10), and that the protocol next hop would be R1 (10.0.6.1).  But it wasn’t – see below:

root@R5> show route 10.0.5.0 extensive

inet.0: 27 destinations, 30 routes (27 active, 0 holddown, 0 hidden)
10.0.5.0/24 (2 entries, 1 announced)
TSI:
KRT in-kernel 10.0.5.0/24 -> {indirect(1048574)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect
                Address: 0x94036dc
                Next-hop reference count: 7
                Source: 10.0.3.3
                Next hop type: Router, Next hop index: 573
                Next hop: 10.0.2.2 via ge-0/0/0.0, selected
                Session Id: 0x4
                Protocol next hop: 10.0.3.3
                Indirect next hop: 0x975c000 1048574 INH Session ID: 0x5
                State: <Active Int Ext>
                Local AS: 65020 Peer AS: 65020
                Age: 1:04       Metric2: 1
                Validation State: unverified
                Task: BGP_65020.10.0.3.3+179
                Announcement bits (2): 0-KRT 4-Resolve tree 1
                AS path: I (Originator)
                Cluster list:  10.0.3.3
                Originator ID: 10.0.6.1
                Communities: 65020:30
                Accepted
                Localpref: 100
                Router ID: 10.0.3.3
                Indirect next hops: 1
                        Protocol next hop: 10.0.3.3 Metric: 1               <==== Uh oh - protocol next hop is route-reflector...
                        Indirect next hop: 0x975c000 1048574 INH Session ID: 0x5
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.0.2.2 via ge-0/0/0.0
                                Session Id: 0x4
                        10.0.3.3/32 Originating RIB: inet.0
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.0.2.2 via ge-0/0/0.0

As far as I always understood it, changing of next hops is only done when an eBGP prefix is advertised to iBGP speakers.   The route reflector in my lab has a policy doing next-hop-self for eBGP learned routes, and this was exported on the iBGP sessions:

root@R3# show policy-options policy-statement NHS
term T1 {
    from {
        protocol bgp;
    }
    then {
        next-hop self;
    }
}

I thought that the route-reflector would follow the rules and not change the next-hop of reflected iBGP-learned routes, but it was changing them.  This could be pretty bad in a production network because it then makes all traffic go through a non-optimal path – i.e. through the route-reflector.

How to fix it?  Simply modify the NHS policy to match external routes only:

root@R3# show policy-options policy-statement NHS
term T1 {
    from {
        protocol bgp;
        route-type external;
    }
    then {
        next-hop self;
    }
}

Now, I’m getting the expected behaviour – forwarding hops are on the local links, and the protocol next hop is the originating iBGP speaker.

root@R5> show route 10.0.5.0 extensive

inet.0: 27 destinations, 30 routes (27 active, 0 holddown, 0 hidden)
10.0.5.0/24 (2 entries, 1 announced)
TSI:
KRT in-kernel 10.0.5.0/24 -> {indirect(1048574)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect
                Address: 0x9403688
                Next-hop reference count: 4
                Source: 10.0.3.3
                Next hop type: Router, Next hop index: 573
                Next hop: 10.0.2.2 via ge-0/0/0.0, selected
                Session Id: 0x4
                Protocol next hop: 10.0.6.1
                Indirect next hop: 0x975c220 1048574 INH Session ID: 0x8
                State: <Active Int Ext>
                Local AS: 65020 Peer AS: 65020
                Age: 53:01      Metric2: 2
                Validation State: unverified
                Task: BGP_65020.10.0.3.3+179
                Announcement bits (2): 0-KRT 4-Resolve tree 1
                AS path: I (Originator)
                Cluster list:  10.0.3.3
                Originator ID: 10.0.6.1
                Communities: 65020:30
                Accepted
                Localpref: 100
                Router ID: 10.0.3.3
                Indirect next hops: 1
                        Protocol next hop: 10.0.6.1 Metric: 2              <=== Correct proto next hop!
                        Indirect next hop: 0x975c220 1048574 INH Session ID: 0x8
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.0.2.2 via ge-0/0/0.0
                                Session Id: 0x4
                        10.0.6.1/32 Originating RIB: inet.0
                          Metric: 2                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.0.2.2 via ge-0/0/0.0

Not that any of this matters since my topology makes everything pass through the route-reflector anyway!


Actions

Information

4 responses

5 11 2014
StuckInActive

Your RR *would* follow the rules. But the application of im/export policies changes the rules so of course the router changes the NLRI to NHS.

It also matters where you apply the policy. If you only applied it to the group with your ebgp neighbor then you also wouldn’t have had this problem. ‘set protocols bgp import {…}’ is very different from ‘set protocols bgp groups import {…}’

5 11 2014
DataPlumber

Yeah – you’re right of course! I removed the policy completely after I wrote the post above and it worked as you describe. Maybe i need to edit the post…

My policy was exported on the IBGP group actually – I’ve always done it that way for some reason. I figure I’m changing the next hop on EBGP-learned routes for the benefit of my IBGP routers, so I do it on that group. Would you do it as an import on the EBGP group instead normally?

5 11 2014
StuckInActive

It all depends. I prefer to modify the routes as I get them so the router advertising the routes isn’t not “do as I say, not as I do” with the respect to the rest of the AS. Internal consistency.

24 05 2015
-tc

Very nice catch.

Leave a comment