LDAP and Kerberos Dependencies Under Latency

DNS, Site Classification, and Why Hybrid Architectures Break Authentication


1. The Real Problem (Beyond “ISE is Slow”)

LDAP and Kerberos rarely fail because of bandwidth exhaustion. They fail because of latency, incorrect DNS resolution, and wrong server selection.

In hybrid environments, authentication often degrades even when:

  • All Domain Controllers are online

  • Network links are “healthy”

  • No service is technically down

The root cause is usually identity traffic crossing regions unnecessarily, driven by DNS behavior.

Kerberos fails architecturally before it fails technically.


2. Why Kerberos Is Highly Latency-Sensitive

Kerberos (RFC 4120) depends on:

  • Multiple request/response exchanges

  • Strict time validity windows

  • Correct KDC (Domain Controller) selection

  • DNS-based service discovery (SRV records)

Unlike stateless protocols, Kerberos is stateful and time-bound. As latency increases:

  • Ticket requests take longer

  • Retries become more frequent

  • Time skew tolerance becomes tighter

  • Failures become intermittent and non-deterministic

A Kerberos environment may appear stable in low-latency labs but collapse under WAN or inter-region RTT.


3. DNS Is the Control Plane of Kerberos

Kerberos does not “find” Domain Controllers dynamically. It relies entirely on DNS SRV records, including:

If DNS returns a valid but distant Domain Controller, Kerberos will still use it.

DNS correctness does not imply DNS optimality.


4. Baseline Scenario – Simple, Healthy Environment

4.1 Architecture

  • Single on-prem site

  • Local Domain Controllers

  • Cisco ISE on-prem

  • Local DNS resolvers

  • RTT ≤ 5 ms

4.2 Behavior

  • DNS resolves only local DCs

  • Kerberos tickets are issued locally

  • LDAP group lookups are fast

  • RADIUS timers are respected

  • Authentication and posture are stable

4.3 Why This Works

  • No site ambiguity

  • No cross-region identity traffic

  • DNS answers are implicitly correct

  • Kerberos assumptions hold true

This environment often masks poor architectural decisions.


5. Hybrid Scenario – Where Things Break

5.1 Architecture

  • Primary on-prem site in Brazil

  • Domain Controllers distributed across regions

  • Cisco ISE deployed in cloud (US or Europe)

  • DNS resolvers located on-prem

  • Single global DNS domain


6. Bad Process (Very Common in Real Deployments)

6.1 Step-by-step failure chain

  1. ISE (US cloud) queries DNS in Brazil

  2. DNS returns SRV records for Brazilian DCs

  3. ISE selects a Brazilian DC (valid but distant)

  4. Kerberos exchanges cross continents

  5. RTT increases (150–300+ ms)

  6. Kerberos ticket validation slows or retries

  7. LDAP group queries are delayed

  8. RADIUS responses are delayed

  9. NAD retries authentication

  10. Queues grow, sessions churn

Operators conclude: “ISE is slow” Everything is functionally correct, yet architecturally broken.


7. Why Kerberos Breaks First

Kerberos assumes:

  • Low RTT between client and KDC

  • Predictable response times

  • Minimal retransmissions

In cross-region designs:

  • Ticket exchanges exceed timing expectations

  • Retries amplify latency

  • Failures appear random

  • Authentication becomes unstable under load

Kerberos does not degrade gracefully with distance.


8. The Wrong Fix: Increasing Timeouts

8.1 Common reaction

  • Increase RADIUS timeout

  • Increase LDAP timeout

  • Increase Kerberos retry thresholds

8.2 This only:

  • Hides the root cause

  • Increases session duration

  • Increases queue depth

  • Expands security exposure windows

Timeout tuning is not an architectural fix.


9. The Right Fix: DNS and Site Classification

9.1 Core principle

Identity resolution must be local to the consumer.

This requires DNS, AD Sites & Services, and resolver placement to work together.


10. Bad DNS Design Pattern

10.1 Characteristics

  • Single flat DNS domain

  • Global SRV record usage

  • No site-based resolution

  • Remote DNS resolvers

  • No regional separation

10.2 Result

  • Random DC selection

  • Cross-region Kerberos traffic

  • High RTT baked into auth flow

  • Authentication stability tied to WAN quality


11. Good DNS Design – Site-Aware Resolution

11.1 Key design principles

  • DNS resolution must be local

  • Cloud ISE → cloud-local DNS

  • On-prem ISE → on-prem DNS

  • AD Sites & Services must reflect reality

11.2 Required elements

  • Correct subnet mapping

  • Cloud subnets mapped to logical sites

  • Site link costs aligned with latency

  • Site-specific SRV records:

    • _ldap._tcp.<site>._sites.domain

    • _kerberos._tcp.<site>._sites.domain


12. Practical Classification Example

12.1 Regional model

Region
Logical Site

Brazil On-Prem

BR-ONPREM

US Cloud

US-CLOUD

EU Cloud

EU-CLOUD

12.2 Correct DNS resolution flow

ISE in US-CLOUD resolves:

DNS returns only US-based DCs.

12.3 Result

  • Kerberos stays regional

  • LDAP RTT remains low

  • RADIUS fits within timer budget

  • Authentication and posture stabilize


13. DNS Resolver Placement (Critical Detail)

13.1 Bad practice

Adds latency before authentication even starts.

13.2 Good Prictice

DNS answers are already locality-aware. DNS query time is part of authentication latency.


14. Visual Comparison (Mermaid)

14.1 Bad flow – Cross-region identity

spinner

14.2 Good flow – Localized identity

spinner

15. Impact on Posture

Posture depends on:

  • Stable authentication

  • Timely authorization updates

  • Predictable session lifecycle

When Kerberos or LDAP are slow:

  • Posture results arrive late

  • Sessions may reauthenticate

  • Cached posture states persist longer

  • Compliance assurance weakens

Identity latency becomes security debt.


16. Design Checklist (Actionable)

  • Map all subnets correctly in AD Sites & Services

  • Create logical sites for cloud regions

  • Ensure ISE uses local DNS resolvers

  • Validate site-specific SRV resolution

  • Measure ISE ↔ DC RTT (not just ISE ↔ NAD)

  • Avoid global DNS answers for identity services

  • Never rely on timeout increases as a fix


Key Takeaway

Kerberos failures in hybrid environments are almost always DNS and site-design failures.

Cisco ISE exposes these issues because it sits at the intersection of:

  • RADIUS

  • Kerberos

  • LDAP

  • DNS

  • NTP

  • Posture state

Fix identity locality, and latency stops being a mystery.


Last updated