Alerting and monitoring system
Out of date
Existing systems
Within NYC Mesh only
- #monitoring-unms- /UISP
- Grafana/Prometheus- public, setup 4 years ago:UISP: https://- stats.nycmesh.net10.70.76.21/nms/login
- Mesh only, Omni's etc:Grafana: http://10.70.90.82:- 3000/dashboards3000
- Prometheus General: http://10.70.90.82:9090
- Prometheus Omni only: http://10.70.90.142:9090
- snmp-exporter: http://10.70.90.82:9116
 - Zabbix- IP:http://10.70.73.58/
- Details: Runs on Quincy's server, connected to Beta Slack
 - Requirements- Must:
- alert Slack team when key infrasture goes offline within 5 minutes
 
- Should:
- be easy to update for new equipment
- be easy to configure to notify new volunteers
- be easy to deploy
- be reliable
- be configurable though a version controlled config to enable easy updates
- be editable by multiple volunteers
 
 - Questions- Major
- What key metrics should we alert based on?
 
- Minor
- frequency? ~1 point/hour
 
 - Proposed software- Next Steps- Log- prompted by this Slack discussion on Grand St. outage
- added Zabbix server during Hack night,
 
