03-posture-assessment-failure

Failure Modes: Posture Assessment Under Latency

(AnyConnect / Cisco Secure Client + ISE)

Posture is more latency-sensitive than AAA because it is not a single request/response. It is a stateful workflow involving handshakes, provisioning, data collection, reporting, decision-making, and often a CoA to change the authorization level.

AnyConnect / Secure Client documentation describes posture modules (HostScan / ISE Posture Module) that evaluate endpoint compliance and restrict access until the endpoint becomes compliant.


1. Primary Posture Failure Modes Under Latency

1.1 Endless “Posture Pending” (State Stuck)

A classic failure mode reported in the Cisco Community:

  • Endpoint reports Compliant

  • Posture result appears in reports

  • ISE remains in Pending

  • No CoA is sent to update the authorization profile

  • Endpoint stays locked in Unknown / Quarantine

Common workarounds seen in the field:

  • Interface shut/no shut

  • Manual CoA

  • Forcing posture reassessment on the client (e.g. toggling Block Untrusted Servers)

Why this happens under latency / partial failure:

  • The posture event arrives,

  • but the state transition or CoA execution fails mid-path:

    • internal queue delay

    • Dynamic Authorization reachability issues

    • NAD unreachable at CoA time

    • session correlation loss


1.2 CoA Is “Sent” but Never Applied

In the same cases:

  • Live Logs show intent to send CoA

  • No evidence of CoA leaving or being applied

Typical root causes:

  • Dynamic Authorization misalignment (NAD config, ACLs, firewalls)

  • Session ownership change (load balancer without stickiness → CoA sent to the wrong PSN)

  • Internal PSN queue delays side effects (CoA) while the endpoint already advanced its posture state

Cisco guidance is explicit: all RADIUS traffic for a session (auth, reauth, accounting, CoA) must remain on the same PSN.


1.3 Reassessment Loops / “Reassessment Failed”

Observed behavior:

  • Periodic posture reassessment fails

  • Manual reassessment succeeds

This usually indicates:

  • Timer misalignment

  • Partial reachability issues

  • Posture channel state no longer matching reevaluation windows


2. Posture State Synchronization: A Very Real Failure Mode

Cisco provides a TechNote (ISE 3.1+) describing Posture State Synchronization, including:

  • Bidirectional communication over TCP 8449

  • Validation using:

    • DART bundle

    • AnyConnect_ISEPosture.log

  • Examples such as:

    • HTTP Probe failed/timed-out, Retrying...


2.1 Common Trap: Incorrect dACL Blocking Synchronization

The TechNote highlights a subtle but critical issue:

  • If the dACL applied after compliance does not explicitly deny posture synchronization traffic when appropriate,

  • ISE may raise alarms and disable synchronization

  • Recovery may require restarting Secure Client

Translated into failure modes:

  • Posture works initially

  • Then enters a stuck state

  • Only reauth or client restart clears it

  • In high-latency or complex ACL environments, this becomes intermittent


3. How Latency Breaks State (Not Just Response Time)

Posture involves multiple correlated components:

  • RADIUS / EAP session (identity)

  • Posture channels (HTTPS / portals)

  • Authorization logic (Unknown → Compliant / Non-Compliant)

  • CoA to enforce the change on the NAD

If any step exceeds correlation timers, results include:

  • State divergence (endpoint says compliant, ISE shows pending)

  • Access divergence (ISE decided, NAD never applied)

  • Session split-brain (one PSN handles auth, another handles posture)

Again: session stickiness is mandatory in multi-PSN or load-balanced environments.


4. Mitigations (Architecture + Operations)

4.1 Reduce Posture Blocking in the Critical Path

If posture is mandatory, design it in phases:

  • Minimal access (pre-auth / quarantine)

  • Elevation only after posture + CoA


4.2 Guarantee CoA Reliability

“Works sometimes” almost always means:

  • Reachability issues

  • ACL or firewall state problems

  • NAD Dynamic Authorization misconfiguration

Validate explicitly:

  • Does CoA leave the PSN?

  • Does it reach the NAD?

  • Does the NAD apply it?


4.3 Control Session Churn

  • Prevent PSN ownership changes (load balancers must use stickiness)

  • Avoid mass reauth triggered by aggressive timers


4.4 Instrumentation That Actually Helps

  • Live Logs focused on posture state transitions

  • Endpoint-side validation via DART bundle

  • Review AnyConnect_ISEPosture.log as shown in Cisco TechNotes


5. Posture Architecture Under Latency (Deep Dive)

This section extends the previous failure analysis by mapping where posture breaks architecturally, not just operationally.

Posture relies on strict correlation between:

  • RADIUS session state

  • Posture workflow state

  • Authorization state on the NAD

Latency causes failures when any of these drift out of alignment.


5.1 Posture Control-Plane Flow (Expanded)

spinner

Critical Dependency

The CoA must execute successfully after posture evaluation and before correlation timers expire.

Latency anywhere in this chain breaks enforcement.


5.2 Where Latency Actually Breaks Posture

Posture failure rarely happens at the evaluation stage. It happens during state transition and enforcement.

Latency-Sensitive Points

  • Internal PSN queues delaying state commit

  • Dynamic Authorization reachability

  • NAD responsiveness to CoA

  • Session ownership changes (multi-PSN environments)

  • Load balancer re-dispatching traffic

These failures are silent: posture appears successful, but enforcement never happens.


6. Advanced Failure Modes (Observed in the Field)

6.1 State Split-Brain (Endpoint vs ISE vs NAD)

spinner

Result

  • Endpoint believes it is compliant

  • ISE cannot advance session state

  • NAD never updates authorization

This condition often persists until reauthentication.


6.2 PSN Ownership Drift (LB Without Stickiness)

spinner

What Breaks

  • Auth session is owned by PSN-A

  • Posture event processed by PSN-B

  • CoA sent by the wrong node

  • NAD rejects or ignores the CoA

Symptom

  • Live Logs show “Sending CoA”, but nothing changes on the NAD


6.3 Correlation Timer Expiry

Posture relies on finite correlation windows:

  • RADIUS session lifetime

  • Posture evaluation timers

  • NAD authorization update expectations

Latency stretches these beyond safe limits:

  • Posture result arrives “too late”

  • ISE refuses to apply it to an aging session

  • State remains frozen


7. Why Posture Fails Harder Than AAA

AAA failures are usually binary:

  • Accept

  • Reject

  • Timeout

Posture failures are stateful and asymmetric:

  • Partial success

  • Silent enforcement failure

  • Divergent views of reality

Latency creates invisible failure states, not explicit errors.


8. Architectural Design Rules for Posture (Latency-Aware)

8.1 Geographic and Topological Rules

  • Posture PSNs must be local to NADs

  • Avoid posture flows across regions or continents

  • Do not centralize posture enforcement

Rule:

If AAA latency is “barely acceptable”, posture will be unstable.


8.2 Load Balancer Requirements

If a load balancer is used:

  • RADIUS session stickiness is mandatory

  • Stickiness must persist across:

    • Auth

    • Reauth

    • Accounting

    • CoA

Preferred affinity:

  • Source-IP + Calling-Station-ID

Without this, posture will fail intermittently.


8.3 CoA Path Engineering

Explicitly validate:

  • CoA source IP

  • NAD reachability

  • Firewall state

Design rules:

  • Avoid asymmetric routing for CoA

  • Treat CoA as critical control-plane traffic

“Sometimes works” is always a design flaw.


8.4 Reduce Posture in the Critical Path

Design posture in layers:

  1. Minimal access (pre-auth / quarantine)

  2. Posture evaluation

  3. Explicit elevation via CoA

Never assume posture == immediate enforcement.


9. Operational Guardrails

9.1 Instrument What Matters

Live Logs:

  • Focus on state transitions, not just results

Endpoint:

  • DART bundle

  • AnyConnect_ISEPosture.log

NAD:

  • CoA received?

  • CoA applied?


9.2 Symptoms That Indicate Latency-Induced Posture Failure

  • “Pending” that never clears

  • Manual reassessment works, automatic fails

  • CoA visible in logs but not on NAD

  • Fix requires reauth or interface bounce



10. Healthy vs Broken Posture: Architectural Comparison

This section contrasts what “good” looks like versus what actually breaks under latency.

10.1 Healthy Posture Flow (Latency-Aware Design)

spinner

Characteristics

  • Deterministic path

  • Low RTT across all hops

  • Single PSN owns the full session lifecycle

  • CoA is delivered and applied within correlation timers


10.2 Broken Posture Flow (Latency + Misalignment)

spinner

Resulting State

  • Endpoint: Compliant

  • ISE: Pending

  • NAD: Quarantine / Unknown

This condition is stable-but-wrong.


11. Design Rules for Reliable Posture (Latency-Aware)

These rules assume posture is mandatory and enforcement matters.


11.1 Geographic and Topology Rules

  • Posture PSNs must be regionally local to NADs

  • Avoid intercontinental posture flows

  • Do not centralize posture for distributed access layers

Rule of thumb:

If RTT NAD ↔ PSN exceeds 20–30 ms, posture reliability degrades rapidly.


11.2 Session Ownership Rules

One PSN must own:

  • Authentication

  • Posture

  • Reassessment

  • CoA

Design requirements:

  • Load balancers must enforce stickiness

  • Reauth and CoA must return to the same PSN

Session ownership drift is the #1 hidden posture killer.


11.3 CoA Engineering Rules

Treat CoA as critical control-plane traffic.

Ensure:

  • Symmetric routing

  • Firewall state alignment

  • Correct NAD Dynamic Authorization configuration

Operational guidance:

  • Validate under load, not only in lab tests

If CoA is unreliable, posture is theater.


11.4 Authorization Model Rules

Use two-phase access:

  1. Minimal / quarantine access

  2. Explicit elevation after posture + CoA

Avoid designs that assume:

  • posture == immediate enforcement

Posture is event-driven, not synchronous.


12. Posture Anti-Patterns (Seen Repeatedly)

12.1 Centralized Posture PSNs

Characteristics:

  • Global posture PSN

  • Regional NADs

  • High RTT

  • Silent CoA failures

Outcome: Random “Pending” states and manual recovery.


12.2 Load Balancing Without Stickiness

Failure pattern:

  • HTTPS posture hits PSN-B

  • RADIUS auth owned by PSN-A

  • CoA sent from the wrong node

Symptom:

  • Logs say “Sending CoA”

  • NAD never changes state


12.3 Aggressive Reauth + Posture

Characteristics:

  • Short reauth timers

  • Periodic posture

  • Latency in the control plane

Outcome: Constant churn, reassessment failures, auth storms.


12.4 Treating Posture as “Just Another Policy”

Indicators:

  • Multiple external lookups

  • REST, MDM, OCSP on every posture event

  • No defined latency budget

Outcome: Posture collapses first.


13. Operational Checklist (Posture Failure Mode)

Use this checklist when posture is unstable:

If ≥2 answers are “no”, posture issues are architectural, not client-side.


Key Takeaway

AAA failures are noisy. Posture failures are quiet and misleading.

Latency does not cause posture to fail fast — it causes posture to fail silently and persistently.

A posture design that is not:

  • latency-aware

  • state-aware

  • ownership-aware

will eventually drift into an unrecoverable-but-accepted failure mode.

Posture reliability is an architecture problem, not a tuning problem.


Last updated