prevention steps for the future: added an additional node to the cluster, upped the replica count for sidekiq and streaming to 2 for better redundancy against single node failure. I have a generic monitoring/alerting solution written out, but haven't yet deployed it on this cluster, oops! Will get that deployed so I get better mobile downtime alerts on triggers like frequent pod restarts, 0 available replicas in a set, or no endpoints for a critical service.

