All posts

Thursday, 14:32 — the Z2M cascade no-one was meant to hear.

A fictional shift from the perspective of a solo integrator: three Z2M crashes after a silent add-on update — and the forty minutes between the first symptom and a quiet fix.

14:32 — A tile turns red

It’s Thursday, early afternoon. Sandra — a fictional solo integrator somewhere between Zürich and Konstanz, looking after 23 Home Assistant installations — sits in her office over a price quote. The fleet dashboard runs on the second monitor. She doesn’t look at it. It’s always open, almost never loud.

At 14:32:04 one tile flips from green to amber. Kanzlei Weber. Disk usage above 70 %. Not a drama. She takes a sip of coffee.

14:34 — The second. The third.

Two minutes later the Haus Zürichberg tile tips. Status: critical. Source: zigbee2mqtt. Adapter unreachable. Eight seconds later Büro Meier. Same adapter, same message.

Three installations, three customers, the same symptom, within two minutes. That’s not coincidence. That’s a pattern.

Sandra closes the quote document. With a bit of luck none of the three customers has yet noticed that the lunch-break lighting automation didn’t fire.

14:38 — The shared add-on

She filters the dashboard on zigbee2mqtt. Seven installations run it. Three are down. Four are not. She sorts by add-on version:

  • All three down-installations are on 2.5.1 — auto-updated this morning.
  • The four still running are on 2.4.x — auto-update disabled, because the customer wanted it that way.

Correlation found. It’s not the hub. It’s the update.

14:44 — Three maintenance requests

Sandra opens the bulk action: pick three customers, send one maintenance request with the same wording. Subject: “Restart Z2M adapter after update”. Reason: “Known bug in 2.5.1 — rollback to 2.4.6”. Duration: 30 minutes.

She knows they won’t all accept at once. Frau Weber is in a client meeting. The Meier family is on holiday. That’s fine — the tunnel waits for each one individually.

14:51 — Three clicks at the customer side

At Haus Zürichberg the request arrives as an actionable item directly in Home Assistant. The owner is at the kitchen table, sees the prompt, reads “Restart Z2M adapter”, clicks Accept. 38 seconds later Sandra has a URL — it opens directly onto his HA frontend.

Frau Weber accepts thirteen minutes later. The Meier family follows at 15:04, from a beach café. Three clicks, three sessions, three tunnels — each with a 30-minute window, then automatically closed.

15:09 — Rollback, tunnel closed

The fix itself is trivial: downgrade the add-on to 2.4.6, restart the adapter, status green. Sandra needs under four minutes per customer. While she’s working on the third tunnel she’s already typing the note in the internal wiki:

“Z2M 2.5.1 — pause auto-update until patched.” Tag: incident, severity: medium, affected: 3/7.

At 15:24 everything is green again. None of the three customers called.


What would have been different, before

Before the fleet dashboard, this story would have played out differently. Realistically:

  • Sandra would have seen the Z2M crashes only the next morning — while checking her own home automation.
  • Frau Weber would have called in the evening because the office lighting wasn’t responding. Appointment on Friday.
  • The Meier family would have written from holiday, annoyed that the heating wasn’t ramping up. Trust slightly damaged.
  • Three separate drives, three separate VPN sessions, three separate explanations.

Instead: forty minutes between the first symptom and a quiet fix. Three happy customers who didn’t notice a thing — and one solo integrator who saved her Thursday afternoon.

This story is fictional. The people, companies and version numbers don’t exist as described. What is real: the pattern that HA·Fleet·Manager is built to address.

DO
Denny Ovčar
Founder · ha-fleet-manager.com
Reply
Share