On Sunday July 26th, 2020 we experienced a major service outage between 2020-07-26 23:06 UTC to 2020-07-27 01:08 UTC that caused:
Login failure to admin dashboard
Permanent schedule change for tasks with longer than 2 hour interval and were scheduled to execute during this window. Cron based tasks are not affected.
The outage is caused by network connectivity issue between our admin dashboard servers, task execution servers and our primary database. The connectivity issue is caused by an improper VPC network configuration between the app servers and the primary database.