Skip to main content
Alerting and monitoring system
Existing systems
Zabbix
Requirements
- Must:
- alert Slack team when key infrasture goes offline within 5 minutes
- Should:
- be easy to update for new equipment
- be easy to configure to notify new volunteers
- be easy to deploy
- be reliable
- be configurable though a version controlled config to enable easy updates
- be editable by multiple volunteers
Questions
- Major
- What key metrics should we alert based on?
- Minor
Proposed software
Next Steps
Log
- prompted by this Slack discussion on Grand St. outage
- added Zabbix server during Hack night,