Active Directory Sites & Services for Hybrid Identity

Preventing “Far DC” Authentication, Latency Collapse, and Unstable ISE Operations

Active Directory Sites & Services is not just “organization”. It is performance control-plane for:

  • Domain Controller (DC/KDC) selection

  • Kerberos authentication stability

  • LDAP query latency and consistency

  • Group membership resolution time

  • Cisco ISE authentication latency and queue behavior

  • Posture session stability and time-to-compliant

In hybrid environments, the most common root cause of “ISE latency” is:

ISE and/or endpoints selecting a remote Domain Controller due to incorrect site/subnet mapping and DNS resolver placement.


1. What AD Sites & Services Actually Controls

AD Sites & Services influences which DCs are “closest” by modeling:

  • Sites: logical representations of network locations

  • Subnets: IP ranges mapped to sites

  • Site links: replication topology + link cost and schedule

  • KDC/DC discovery: through site-aware DNS SRV records

Correct site modeling ensures that clients and services preferentially use local DCs for:

  • Kerberos (KDC)

  • LDAP (directory queries)

  • Global Catalog lookups (if used)

  • Group membership resolution


2. Why Cisco ISE Is Extremely Sensitive to Site Design

ISE is a policy engine sitting in the middle of multiple dependencies:

  • NAD ↔ ISE (RADIUS/EAP)

  • ISE ↔ DNS (SRV discovery)

  • ISE ↔ DC/KDC (Kerberos and/or LDAP)

  • ISE ↔ directory attribute/group queries

If ISE selects a remote DC:

  • Identity response time increases

  • NAD timers get pressured

  • Retries begin

  • Queues grow

  • Authentication becomes unstable under peak load

The failure often looks like “ISE is slow”, but the real cause is:

ISE is waiting on remote identity operations.


3. Key Mechanism: Site-Aware SRV Records

AD publishes SRV records such as:

When a client or service is associated with an Active Directory site (typically via subnet-to-site mapping), it resolves site-specific SRV records, which direct it to local or regional domain controllers.

When a client or service is not associated with a site, it falls back to non-site-specific SRV records, which may return domain controllers from any site in the forest.

This distinction is critical for latency-sensitive services such as authentication.


4. Baseline Scenario: Single-Site On-Prem (Works “by Default”)

4.1 Environment

  • Single on-premises site

  • All subnets mapped to the same AD site

  • DNS resolvers local to the site

  • Domain controllers local to the site

  • ISE deployed locally

4.2 Why It Is Stable

  • Even non-site-specific SRV lookups return local DCs

  • DC selection remains deterministic despite poor site modeling

  • End-to-end latency stays low regardless of DC choice

  • RADIUS and EAP timers remain well within budget

As a result, suboptimal site design often goes unnoticed.

In a single-site environment, “any DC” is still a local DC.

This is why many teams underestimate the importance of AD site design until the environment becomes distributed or hybrid.


5. Hybrid Failure Scenario (Classic Pattern)

5.1 Environment

  • Primary on-premises site: Brazil

  • Secondary on-premises sites: branch locations

  • Cisco ISE deployed in a cloud region (US or EU)

  • DNS resolvers remain on-premises (Brazil)

  • AD Sites and Services not updated for cloud subnets

5.2 What Happens

  1. ISE queries DNS for _kerberos._tcp.domain.local

  2. DNS returns non-site-specific SRV records

  3. Returned DCs are located in Brazil

  4. ISE selects a Brazilian DC/KDC

  5. Kerberos and LDAP traffic crosses regions

  6. RTT increases to 150–300 ms or more, with added jitter

  7. RADIUS retransmissions begin

  8. Authentication queues grow

  9. Authentication becomes increasingly unstable

All components are technically correct and standards-compliant.

The failure is architectural, not functional.

Everything works — until load, jitter, or retries push the system past its timer budget.


5.3 Key Insight

Hybrid environments expose weaknesses that were always present but invisible in single-site designs.

Site awareness is not an optimization. It is a requirement once authentication dependencies cross regions.


6. The Golden Rule: Subnets Must Reflect Reality

Active Directory site association is driven primarily by IP subnet mapping.

When subnet definitions are missing, incomplete, or inaccurate:

  • Clients and services become effectively “site-less”

  • DC Locator falls back to non-site-specific SRV records

  • Domain controller selection becomes non-deterministic

  • Selected DCs are frequently remote

  • Latency becomes unpredictable and difficult to reason about

This behavior is standards-compliant, but operationally dangerous in distributed environments.

If a subnet is not mapped, AD cannot make a locality decision.


6.1 What Must Be Mapped

A correct site model must include all subnets from which identity-related traffic originates, including:

  • Wired client subnets (IPv4)

  • Wireless client subnets (IPv4)

  • VPN address pools (IPv4)

  • Cloud VPC/VNet subnets hosting ISE nodes (IPv4)

  • IPv6 subnets, if in use (commonly overlooked)

  • Management or infrastructure subnets where identity services initiate connections

Missing any of these creates implicit remote dependencies.


7.1 Step 1 — Define Logical Sites by Latency Domain

Start by defining sites based on latency boundaries, not organizational structure.

A common and effective pattern is:

  • One AD site per major geographic region

  • Separate sites for each cloud region

  • Shared forest, distinct sites

Example:

Location
AD Site Name

Brazil primary DC location

BR-ONPREM

US cloud region

US-CLOUD

EU cloud region

EU-CLOUD

Cloud regions should always be treated as first-class sites, even when connected by high-bandwidth links.


7.2 Step 2 — Map All Relevant Subnets to Those Sites

Subnet-to-site mapping must reflect where services actually run, not where they are logically managed.

Conceptual example:

  • 10.10.0.0/16BR-ONPREM

  • 10.20.0.0/16US-CLOUD

  • 10.30.0.0/16EU-CLOUD

Critical point:

Cloud subnets hosting ISE nodes must be mapped. If they are not, ISE becomes site-less and DC selection degrades immediately.


7.3 Step 3 — Validate Site-Aware SRV Resolution

From each ISE node, validate DNS behavior from that node’s perspective.

For an ISE node running in the US cloud, DNS resolution should prioritize:


8. DNS Resolver Locality (Mandatory for Hybrid)

Even with a correct and complete AD site model, resolver placement can still break locality.

Active Directory site awareness depends on where DNS queries originate, not only on how sites and subnets are defined.

If DNS resolvers are remote, site-aware logic becomes distorted.


8.1 Anti-Pattern: Remote DNS Resolvers

Scenario

  • ISE nodes deployed in a US cloud region

  • DNS resolvers remain on-premises (Brazil)

Result

  • DNS queries incur additional RTT

  • SRV responses are influenced by resolver location and network path

  • Increased probability of selecting DCs outside the intended latency domain

  • Higher variance and less predictable authentication time

Even with correct subnet-to-site mapping, resolver locality can undermine the design.


8.2 Correct Pattern: Local DNS Resolvers

Scenario

  • ISE nodes deployed in a US cloud region

  • DNS resolvers deployed in the same cloud region

Result

  • Faster DNS resolution

  • Consistent site-aware SRV responses

  • Deterministic DC/KDC selection

  • Identity and policy locality preserved

DNS locality is not optional in hybrid environments. It is a core dependency of AD site behavior.


8.3 Key Principle

DNS resolver placement is part of the Active Directory site strategy, not a separate concern.


AD sites and subnets answer the question: “Where am I?”

Site links answer a different question: “What is closest?”

DC locator, replication, and fallback behaviors rely on site link costs when local options are unavailable.


If site link costs do not reflect real-world latency:

  • The “closest” DC may actually be far

  • Fallback behavior may select suboptimal or remote DCs

  • Authentication latency becomes inconsistent

  • Large-scale environments behave unpredictably

This effect is amplified in hybrid and multi-region designs.


  • Assign site link costs that align with actual RTT and latency boundaries

  • Prefer simple, readable models over overly granular designs

  • Avoid assuming bandwidth equals proximity

  • Revisit site link costs after:

    • WAN redesigns

    • SD-WAN changes

    • Cloud connectivity updates

Site links are not cosmetic. They directly influence authentication behavior under stress and failure conditions.


9.3 Architectural Insight

In hybrid environments:

  • Subnets define locality

  • DNS resolvers determine how locality is interpreted

  • Site links determine fallback and proximity decisions

All three must align to preserve latency budgets and authentication stability.


10. Good vs Bad Topology

10.1 Bad: Cloud ISE + On-Prem DNS + Remote DC

spinner

Typical outcome:

  • High authentication latency

  • Retries and request queues

  • Unstable posture assessment

10.2 Good: Cloud ISE + Local DNS + Local DC

spinner

Typical outcome:

  • Stable Kerberos operations

  • Predictable LDAP response time

  • Reduced retransmissions


11. Operational Validation (What to Check)

11.1 Validate DC/KDC selection from ISE

  • Which DC is used for authentication?

  • Does it match the ISE node region?

11.2 Validate DNS SRV responses

From the ISE node:

  • Resolve _ldap._tcp.<site>._sites.domain.local

  • Resolve _kerberos._tcp.<site>._sites.domain.local

11.3 Measure RTT distribution

  • ISE ↔ DC RTT (not just average)

  • 95th / 99th percentile RTT and jitter

11.4 Validate at peak load

  • Many designs work at low load and collapse under concurrency


Last updated