Out-of-Band Management
What It Is
Out-of-band (OOB) management is the ability to manage a machine through a channel that is independent of the machine's main OS. When the OS is hung, crashed, or not yet installed, OOB management still works.
"Out-of-band" means "not through the normal path." The normal path (SSH, web console) requires the OS to be running. OOB works even when the OS doesn't.
Why It Matters
Machines fail. Kernels panic, disks fill up, network configs go wrong. When the OS is unresponsive, you need a way to:
- Power cycle the machine (without walking to it)
- See the console output (to diagnose boot failures)
- Boot from an alternative image (to recover)
Without OOB, recovery requires physical access. For a VPS, that means a support ticket. For a homelab server in another room, that means walking over. For a machine in a data center, that means a drive to the facility.
The Technologies
Intel AMT (Active Management Technology)
Part of Intel ME, available on vPro-branded Intel chips. Provides:
- Remote power control (on/off/reset/boot-to-PXE)
- KVM remote desktop (full keyboard/video/mouse, works during BIOS)
- Serial over LAN (remote serial console)
- IDE redirection (mount a remote ISO as a virtual CD)
AMT communicates on ports 16992/16993 (HTTP/HTTPS) using SOAP/WS-MAN. It has its own TLS stack and credentials, independent of the OS.
Availability: Intel vPro business/enterprise chips only. Not on consumer chips (Core i3/i5/i7 without vPro).
AMD DASH
DASH (Desktop and mobile Architecture for System Hardware) is a DMTF standard for client/desktop OOB management. AMD's answer to Intel AMT, available on AMD PRO-branded business platforms.
Provides: remote power control, hardware inventory, health monitoring, OS state detection. Less feature-rich than AMT (no KVM remote desktop in most implementations).
Availability: AMD PRO business platforms only.
IPMI / BMC
IPMI (Intelligent Platform Management Interface) is the server-equivalent of AMT. A BMC (Baseboard Management Controller) is a dedicated processor on the server motherboard that provides:
- Remote power/reset
- Serial console
- Virtual media (remote ISO mount)
- Hardware sensors (temperature, fan speed, voltage)
- KVM over IP (web-based remote console)
Common BMC implementations: iLO (HP), iDRAC (Dell), IMM (Lenovo), Redfish (modern standard replacing IPMI).
Availability: Server hardware only. Not available on consumer/desktop.
VPS Provider Console
Cloud/VPS providers offer their own OOB via the management dashboard:
- Serial console (via web browser)
- Rescue mode (boot into a recovery OS)
- Hard reset
- ISO mounting
This is provider-specific, not a hardware standard.
How FortrOS Uses It
FortrOS treats OOB management as an org-policy-controlled resource:
Policy: use
- During enrollment, the maintainer provisions AMT/DASH (sets admin credentials over the local network)
- Credentials are stored encrypted in org state
- The org can power-cycle unresponsive nodes via AMT
- Self-healing: detect failure via gossip -> AMT reboot -> node re-enrolls -> org heals without human intervention
Policy: neutralize
- Intel ME is neutralized (me_cleaner / HAP bit) during provisioning
- No OOB management available
- Recovery requires physical access
- Appropriate for government/opsec deployments where the ME is considered an unacceptable risk
Policy: ignore (default)
- Don't touch the management engine
- Don't provision AMT
- OOB is whatever the vendor shipped (may or may not work)
- Appropriate for homelab where ME is a non-concern
The Self-Healing Loop
When AMT is available and provisioned:
1. SWIM gossip detects node-X is unreachable (failed probes)
2. Maintainer on node-Y sends AMT power-reset to node-X
3. Node-X reboots from local preboot UKI
4. Preboot authenticates via TPM, unlocks /persist, kexec
5. Node-X rejoins the gossip mesh
6. Org state converges, workloads re-reconcile
No human involved. No physical access. The org detected the failure and healed itself.
Additional Recovery Channels
OOB isn't a single mechanism -- FortrOS uses every available recovery channel, taking positive control of each when possible. These aren't alternatives to each other; they're layers that apply in different environments:
Hardware watchdog: A kernel watchdog timer that resets the machine if the OS hangs. Built into most hardware. Always active, no external dependency. Detects OS hangs but not network issues or boot failures.
PDU (Power Distribution Unit) control: Smart PDUs remotely toggle power to individual outlets. Coarser than AMT (power cycle, no console) but works with any hardware. FortrOS provisions PDU credentials during enrollment where available.
Wake-on-LAN (WoL): Send a magic packet to a machine's MAC address to power it on. Only powers on -- can't reset or access console. Requires L2 network access. Useful for bringing up nodes after planned power-downs.
VPS provider API: Cloud providers offer hard reset, serial console, rescue mode, and ISO mounting via API. FortrOS uses provider APIs for automated recovery of VPS nodes.
VM host management: Nodes running as VMs on Proxmox, vSphere, or similar can be power-cycled via the hypervisor's management API. FortrOS provisions API credentials during enrollment for automated recovery.
The principle: every management channel the hardware or environment offers is a recovery path FortrOS can use. The maintainer's OOB module detects what's available (AMT, IPMI, provider API, PDU, WoL) and uses the best option for each node.
Links
- Intel AMT
- AMD DASH
- DMTF DASH
- Redfish API -- Modern replacement for IPMI