DHCP Relay Issues With Microsoft Surface Pro Docks and Junos

7 09 2020

After deploying some new Juniper EX4600 core switches, my customer complained that he was experiencing about 45 seconds of delay in getting an IP address on a Surface Pro connected to a dock. The second time of connecting, it took about 8 seconds which was more acceptable. The 45 second delay came back every time they moved the Surface Pro to a new dock.

After ruling out a few things like Spanning Tree and LLDP, we isolated it down to the core switch. An older core switch elsewhere was configured for BootP Helper rather than DHCP relay, and clients connected to that did not have the problem.

Other devices didn’t exhibit the problem either – a Macbook was given an IP in the region of 4 seconds after connecting. The Surface Pro took 8 seconds consistently to connect when using a USB dongle. So the issue seemed to centre around the dock.

If you haven’t seen one of these before, they look like this – a black brick with some ports on it, supplied with power by another black brick:

The wire to the right of the image above ends in an edge connector that is plugged on to the side of the Surface Pro.

On its own, the dock does not bring up the network port’s link light – only when a Surface is connected does link come up. Written in grey on the black underside of the dock is the MAC address. So basically, the dock has the network adapter inside it, and when a Surface moves to another dock it is like changing network adapters – unlike the Macbook, or the USB dongle mentioned above which move with the Surface as it connects to different ports.

I turned on some traceoptions under the system stanza:

processes {
    dhcp-service {
        traceoptions {
            file dhcp.log size 2m;
            flag all;
        }
    }
}

With this enabled I could see that when a Surface moved to a new dock, Junos was complaining that the IP address was already in use by another MAC address:

Sep 7 10:48:51 SWITCH jdhcpd: DH_SVC_DUPLICATE_IPADDR_ERR: Failed to add 172.16.2.102 as it is already used by 1618

What has happened here is that Windows has remembered the IP address it had at docking station A, and has asked for the same IP address again in its initial DHCP DISCOVER packet, sent from docking station B. But as we know, the docking station has the MAC address, not the Surface Pro, so from Junos’s point of view, the client is different, but requesting an IP address that is already in use.

After the error above comes up, there is a 45 second delay, during which time nothing much appears in the log file. Looking at the DHCP relay bindings for the MAC address on the base of docking station B repeatedly, we see about 35 seconds of nothing, then 13 seconds of ‘SELECTING’ state, and finally a new binding with a new IP address:

{master:0}
admin@SWITCH> show dhcp relay binding | match bc:83:85:f6:61:99
0.0.0.0 1634 bc:83:85:f6:61:99 0 SELECTING irb.32

{master:0}
admin@SWITCH> show dhcp relay binding | match bc:83:85:f6:61:fb
172.16.2.116 1634 bc:83:85:f6:61:99 691199 BOUND irb.32

This is how DHCP works – it asks for the IP it had before, and if it can’t use that it requests a fresh one. If the DHCP server is authoritative, and the requested IP is already used, the client should get a DHCP NAK. However, it seems that the client wasn’t getting the NAK and instead waited 45 seconds before trying for a fresh IP.

This situation was resolved by making DHCP relay behave more like BootP helper – turning off the bindings using this command:

set forwarding-options dhcp-relay forward-only

I hope this helps someone out there.


Actions

Information

Leave a comment