First of
all, this non-compliant behavior is observed only on some Juniper devices, not
all.
The
potential effect of the observed behavior is such that certain OSPF routes fail
to propagate as expected.
Here is the
scenario: SRX originally uses a loopback 10.0.0.11 for its Router ID. When that loopback was deleted, it changed
its Router ID to a different address 10.0.17.4. This is expected behavior so
far.
srx-node0>
show interfaces lo0.1
error:
interface lo0.1 not found
srx-node0>
show ospf overview instance VR
Instance: VR
Router ID: 10.0.17.4
…
However, a
closer look at SRX OSPF database reveals that it still has LSAs with 10.0.0.11
(the old Router ID) as its Advertising Router ID.
srx-node0>
show ospf database summary instance VR
OSPF database, Area 0.0.0.0
Type
ID Adv Rtr Seq Age
Opt Cksum Len
Summary 10.0.17.0 10.0.0.11 0x80000008 1423
0x22 0xc857 28
Summary
*10.0.17.0 10.0.17.4 0x80000002 792
0x22 0x8794 28
OSPF database, Area 0.0.0.69
Type
ID Adv Rtr Seq Age
Opt Cksum Len
Summary 10.0.16.3 10.0.0.11 0x8000001c 2280
0x22 0xfbfc 28
Summary 10.0.16.7 10.0.0.11 0x8000001c 2137 0x22
0xd321 28
Summary 10.0.16.8 10.0.0.11 0x8000001c 1994
0x22 0xbf35 28
Summary 10.0.17.16 10.0.0.11 0x8000001d 3423
0x22 0xfdfc 28
Summary 10.0.17.32 10.0.0.11 0x8000001d 1851
0x22 0xaf2e 28
According to
RFC2328:
If a router's OSPF
Router ID is changed, the router's OSPF software should be restarted before the
new Router ID takes effect. In this case
the router should flush its self-originated LSAs from the routing domain before restarting
Note there are two desired
behaviors when Router ID changes: 1) OSPF restarts; 2) originated LSA flushes.
When that did not happen with JUNOS, the resulting behavior is that the old
Router ID is still the “Advertising Router ID” in the LSA, an address that is
no longer valid.
Why is that a problem? Because
these LSAs will be flooded to neighbors (assuming the router here is an ABR).
The neighbor would have noticed the change of Router ID, and thus it will check
the validity of Advertising Router ID.
This field indicates the Router ID of the router advertising the summary-LSA or AS-external-LSA that led to this path.
Since
the neighbor sees the Advertising Router ID (the old Router ID) no longer
matches the new Router ID, it will discard the LSA.
When
troubleshooting OSPF routing involving Juniper devices, check OSPF databases for
invalid entries.
To
prevent such pitfalls, always set Router ID in OSPF. And more importantly, set
Router ID using loopbacks, and make sure they are not accidentally deleted.
I experienced something like this in our network.
ReplyDeleteI know that it was OSPF that caused our whole ISP network to go down. Never found what was the root cause of it. Our OSPF database was HUGE! completely overloaded our core devices.
This made me think of that. Any idea if this issue could happen if you make changes on another vendor router? (a week later)
Thanks for sharing this info.
Standard based inter-vendor behavior usually works. But when an odd event occurs (in this case change of a Router-ID), then the "hooks" each vendor build in to treat such event may not be all consistent.
ReplyDeleteI don't know of any easier way to detect this, especially when you are dealing with a large network. Maybe focus on a few problem routes and examine OSPF database. Figuring out one, is the same as figuring it out all.