Active Directory Sites & Services for Hybrid Identity

Preventing “Far DC” Authentication, Latency Collapse, and Unstable ISE Operations

Active Directory Sites & Services is not just “organization”. It is performance control-plane for:

Domain Controller (DC/KDC) selection
Kerberos authentication stability
LDAP query latency and consistency
Group membership resolution time
Cisco ISE authentication latency and queue behavior
Posture session stability and time-to-compliant

In hybrid environments, the most common root cause of “ISE latency” is:

ISE and/or endpoints selecting a remote Domain Controller due to incorrect site/subnet mapping and DNS resolver placement.

1. What AD Sites & Services Actually Controls

AD Sites & Services influences which DCs are “closest” by modeling:

Sites: logical representations of network locations
Subnets: IP ranges mapped to sites
Site links: replication topology + link cost and schedule
KDC/DC discovery: through site-aware DNS SRV records

Correct site modeling ensures that clients and services preferentially use local DCs for:

Kerberos (KDC)
LDAP (directory queries)
Global Catalog lookups (if used)
Group membership resolution

2. Why Cisco ISE Is Extremely Sensitive to Site Design

ISE is a policy engine sitting in the middle of multiple dependencies:

NAD ↔ ISE (RADIUS/EAP)
ISE ↔ DNS (SRV discovery)
ISE ↔ DC/KDC (Kerberos and/or LDAP)
ISE ↔ directory attribute/group queries

If ISE selects a remote DC:

Identity response time increases
NAD timers get pressured
Retries begin
Queues grow
Authentication becomes unstable under peak load

The failure often looks like “ISE is slow”, but the real cause is:

ISE is waiting on remote identity operations.

3. Key Mechanism: Site-Aware SRV Records

AD publishes SRV records such as:

_ldap._tcp.domain.local
_kerberos._tcp.domain.local

_ldap._tcp.<site>._sites.domain.local
_kerberos._tcp.<site>._sites.domain.local

When a client or service is associated with an Active Directory site (typically via subnet-to-site mapping), it resolves site-specific SRV records, which direct it to local or regional domain controllers.

When a client or service is not associated with a site, it falls back to non-site-specific SRV records, which may return domain controllers from any site in the forest.

This distinction is critical for latency-sensitive services such as authentication.

4. Baseline Scenario: Single-Site On-Prem (Works “by Default”)

4.1 Environment

Single on-premises site
All subnets mapped to the same AD site
DNS resolvers local to the site
Domain controllers local to the site
ISE deployed locally

4.2 Why It Is Stable

Even non-site-specific SRV lookups return local DCs
DC selection remains deterministic despite poor site modeling
End-to-end latency stays low regardless of DC choice
RADIUS and EAP timers remain well within budget

As a result, suboptimal site design often goes unnoticed.

In a single-site environment, “any DC” is still a local DC.

This is why many teams underestimate the importance of AD site design until the environment becomes distributed or hybrid.

5. Hybrid Failure Scenario (Classic Pattern)

5.1 Environment

Primary on-premises site: Brazil
Secondary on-premises sites: branch locations
Cisco ISE deployed in a cloud region (US or EU)
DNS resolvers remain on-premises (Brazil)
AD Sites and Services not updated for cloud subnets

5.2 What Happens

ISE queries DNS for _kerberos._tcp.domain.local
DNS returns non-site-specific SRV records
Returned DCs are located in Brazil
ISE selects a Brazilian DC/KDC
Kerberos and LDAP traffic crosses regions
RTT increases to 150–300 ms or more, with added jitter
RADIUS retransmissions begin
Authentication queues grow
Authentication becomes increasingly unstable

All components are technically correct and standards-compliant.

The failure is architectural, not functional.

Everything works — until load, jitter, or retries push the system past its timer budget.

5.3 Key Insight

Hybrid environments expose weaknesses that were always present but invisible in single-site designs.

Site awareness is not an optimization. It is a requirement once authentication dependencies cross regions.

6. The Golden Rule: Subnets Must Reflect Reality

Active Directory site association is driven primarily by IP subnet mapping.

When subnet definitions are missing, incomplete, or inaccurate:

Clients and services become effectively “site-less”
DC Locator falls back to non-site-specific SRV records
Domain controller selection becomes non-deterministic
Selected DCs are frequently remote
Latency becomes unpredictable and difficult to reason about

This behavior is standards-compliant, but operationally dangerous in distributed environments.

If a subnet is not mapped, AD cannot make a locality decision.

6.1 What Must Be Mapped

A correct site model must include all subnets from which identity-related traffic originates, including:

Wired client subnets (IPv4)
Wireless client subnets (IPv4)
VPN address pools (IPv4)
Cloud VPC/VNet subnets hosting ISE nodes (IPv4)
IPv6 subnets, if in use (commonly overlooked)
Management or infrastructure subnets where identity services initiate connections

Missing any of these creates implicit remote dependencies.

7. Building a Hybrid Site Model (Recommended Approach)

7.1 Step 1 — Define Logical Sites by Latency Domain

Start by defining sites based on latency boundaries, not organizational structure.

A common and effective pattern is:

One AD site per major geographic region
Separate sites for each cloud region
Shared forest, distinct sites

Example:

Location

AD Site Name

Brazil primary DC location

BR-ONPREM

US cloud region

US-CLOUD

EU cloud region

EU-CLOUD

Cloud regions should always be treated as first-class sites, even when connected by high-bandwidth links.

7.2 Step 2 — Map All Relevant Subnets to Those Sites

Subnet-to-site mapping must reflect where services actually run, not where they are logically managed.

Conceptual example:

10.10.0.0/16 → BR-ONPREM
10.20.0.0/16 → US-CLOUD
10.30.0.0/16 → EU-CLOUD

Critical point:

Cloud subnets hosting ISE nodes must be mapped. If they are not, ISE becomes site-less and DC selection degrades immediately.

7.3 Step 3 — Validate Site-Aware SRV Resolution

From each ISE node, validate DNS behavior from that node’s perspective.

For an ISE node running in the US cloud, DNS resolution should prioritize:

_kerberos._tcp.US-CLOUD._sites.domain.local
_ldap._tcp.US-CLOUD._sites.domain.local

8. DNS Resolver Locality (Mandatory for Hybrid)

Even with a correct and complete AD site model, resolver placement can still break locality.

Active Directory site awareness depends on where DNS queries originate, not only on how sites and subnets are defined.

If DNS resolvers are remote, site-aware logic becomes distorted.

8.1 Anti-Pattern: Remote DNS Resolvers

Scenario

ISE nodes deployed in a US cloud region
DNS resolvers remain on-premises (Brazil)

Result

DNS queries incur additional RTT
SRV responses are influenced by resolver location and network path
Increased probability of selecting DCs outside the intended latency domain
Higher variance and less predictable authentication time

Even with correct subnet-to-site mapping, resolver locality can undermine the design.

8.2 Correct Pattern: Local DNS Resolvers

Scenario

ISE nodes deployed in a US cloud region
DNS resolvers deployed in the same cloud region

Result

Faster DNS resolution
Consistent site-aware SRV responses
Deterministic DC/KDC selection
Identity and policy locality preserved

DNS locality is not optional in hybrid environments. It is a core dependency of AD site behavior.

8.3 Key Principle

DNS resolver placement is part of the Active Directory site strategy, not a separate concern.

9. Site Link Cost and “Closest Site” Behavior

AD sites and subnets answer the question: “Where am I?”

Site links answer a different question: “What is closest?”

DC locator, replication, and fallback behaviors rely on site link costs when local options are unavailable.

9.1 Why Site Link Cost Matters

If site link costs do not reflect real-world latency:

The “closest” DC may actually be far
Fallback behavior may select suboptimal or remote DCs
Authentication latency becomes inconsistent
Large-scale environments behave unpredictably

This effect is amplified in hybrid and multi-region designs.

9.2 Best Practices for Site Link Design

Assign site link costs that align with actual RTT and latency boundaries
Prefer simple, readable models over overly granular designs
Avoid assuming bandwidth equals proximity
Revisit site link costs after:
- WAN redesigns
- SD-WAN changes
- Cloud connectivity updates

Site links are not cosmetic. They directly influence authentication behavior under stress and failure conditions.

9.3 Architectural Insight

In hybrid environments:

Subnets define locality
DNS resolvers determine how locality is interpreted
Site links determine fallback and proximity decisions

All three must align to preserve latency budgets and authentication stability.

10. Good vs Bad Topology

10.1 Bad: Cloud ISE + On-Prem DNS + Remote DC

Typical outcome:

High authentication latency
Retries and request queues
Unstable posture assessment

10.2 Good: Cloud ISE + Local DNS + Local DC

Typical outcome:

Stable Kerberos operations
Predictable LDAP response time
Reduced retransmissions

11. Operational Validation (What to Check)

11.1 Validate DC/KDC selection from ISE

Which DC is used for authentication?
Does it match the ISE node region?

11.2 Validate DNS SRV responses

From the ISE node:

Resolve _ldap._tcp.<site>._sites.domain.local
Resolve _kerberos._tcp.<site>._sites.domain.local

11.3 Measure RTT distribution

ISE ↔ DC RTT (not just average)
95th / 99th percentile RTT and jitter

11.4 Validate at peak load

Many designs work at low load and collapse under concurrency

Previous03-hybrid-environments NextCisco ISE Node Placement Strategy

Last updated 2 hours ago

hashtagPreventing “Far DC” Authentication, Latency Collapse, and Unstable ISE Operations

hashtag1. What AD Sites & Services Actually Controls

hashtag2. Why Cisco ISE Is Extremely Sensitive to Site Design

hashtag3. Key Mechanism: Site-Aware SRV Records

hashtag4. Baseline Scenario: Single-Site On-Prem (Works “by Default”)

hashtag4.1 Environment

hashtag4.2 Why It Is Stable

hashtag5. Hybrid Failure Scenario (Classic Pattern)

hashtag5.1 Environment

hashtag5.2 What Happens

hashtag5.3 Key Insight

hashtag6. The Golden Rule: Subnets Must Reflect Reality

hashtag6.1 What Must Be Mapped

hashtag7. Building a Hybrid Site Model (Recommended Approach)

hashtag7.1 Step 1 — Define Logical Sites by Latency Domain

hashtag7.2 Step 2 — Map All Relevant Subnets to Those Sites

hashtag7.3 Step 3 — Validate Site-Aware SRV Resolution

hashtag8. DNS Resolver Locality (Mandatory for Hybrid)

hashtag8.1 Anti-Pattern: Remote DNS Resolvers

hashtag8.2 Correct Pattern: Local DNS Resolvers

hashtag8.3 Key Principle

hashtag9. Site Link Cost and “Closest Site” Behavior

hashtag9.1 Why Site Link Cost Matters

hashtag9.2 Best Practices for Site Link Design

hashtag9.3 Architectural Insight

hashtag10. Good vs Bad Topology

hashtag10.1 Bad: Cloud ISE + On-Prem DNS + Remote DC

hashtag10.2 Good: Cloud ISE + Local DNS + Local DC

hashtag11. Operational Validation (What to Check)

hashtag11.1 Validate DC/KDC selection from ISE

hashtag11.2 Validate DNS SRV responses

hashtag11.3 Measure RTT distribution

hashtag11.4 Validate at peak load