# uc-monitor-go-test [![Go Report Card](https://goreportcard.com/badge/lightkone.guifi.net/lightkone/uc-monitor-go-test)](https://goreportcard.com/report/lightkone.guifi.net/lightkone/uc-monitor-go-test) A proof-of-concept application that leverages [AntidoteDB](https://syncfree.github.io/antidote/) to orchestrate the Guifi.net network nodes monitoring system for the [LightKone](https://www.lightkone.eu/) project. ## Description The monitoring application is distributed in three main blocks: - Network description fetching and feeding (functional) - Nodes assignment among the different monitoring servers (functional, WiP) - Actual nodes monitorisation (functional, WiP) This proof of concept takes advantage of the [AntidoteDB Java tutorial](https://github.com/SyncFree/antidote-java-tutorial) by [Deepthi Akkoorath (@deepthidevaki)](https://github.com/deepthidevaki) and uses [Mathias Weber (@mweberUKL)](https://github.com/mweberUKL)'s Go client for AntidoteDB (previously, [João Neto (@joaomlneto)](https://github.com/joaomlneto)'s [HTTP/HTTPS REST API for AntidoteDB](https://github.com/LightKone/antidote-rest-server) was used). ## Installation ### Docker Install `docker-ce` using your preferred method as [described here](https://docs.docker.com/install/). ### Go and required libraries Install Go using your operating system's package manager or follow the [instructions here](https://golang.org/doc/install). After installing Go, download and install the folowing external libraries needed: - golang/glog: `go get github.com/golang/glog` - sparrc/go-ping: `go get github.com/sparrc/go-ping` - antidote-go-client: `go get github.com/AntidoteDB/antidote-go-client` ### AntidoteDB Java tutorial Get the tutorial's source code [here](https://github.com/SyncFree/antidote-java-tutorial) to download the code and [start the two AntidoteDB nodes](https://github.com/SyncFree/antidote-java-tutorial#starting-antidote-nodes). ### HTTP/HTTPS REST API The AntidoteDB REST API server is no longer needed, but you can follow the [instructions here](https://github.com/LightKone/antidote-rest-server) to install it, as it comes handy to perform quick tests. ## Running the application ### Fetch the network description and feed it to AntidoteDB The Guifi-UPC network (a small fraction of the whole Guifi.net) description file is included in `assets/cnml/upc.xml` and is used by default. The `monitor-fetch` program fetches the data from the file and pushes it to AntidoteDB: ```bash $ cd src/monitor/fetch/ $ go run monitor-fetch.go 19 nodes read from ../assets/cnml/upc.xml 63 devices read from ../assets/cnml/upc.xml 54 devices exported to /tmp/gmonitor2/devs.json 16 devices removed from AntidoteDB (0 success, 0 fail) 16 graphservers removed from AntidoteDB (0 success, 0 fail) 27 IPv4 addresses removed from AntidoteDB (0 success, 0 fail) 54 devices added to AntidoteDB (54 success, 0 fail) 54 graphservers added or updated to AntidoteDB (54 success, 0 fail) 67 IPv4 addresses added or updated to AntidoteDB (67 success, 0 fail) ``` Other small sub-networks' descriptions are available in the `assets/cnml` folder, some of them overlapping and some not. Beyond the development and testing phase, the whole Guifi.net CNML description can be loaded (warning, it's HUGE): ```bash $ go run monitor-fetch.go -cnml_file ../assets/cnml/guifi.xml 57642 nodes read from ../assets/cnml/guifi.xml 53461 devices read from ../assets/cnml/guifi.xml 39988 devices exported to /tmp/gmonitor2/devs.json 0 devices removed from AntidoteDB (0 success, 0 fail) 0 graphservers removed from AntidoteDB (0 success, 0 fail) 0 IPv4 addresses removed from AntidoteDB (0 success, 0 fail) 39988 devices added to AntidoteDB (39988 success, 0 fail) 39988 graphservers added or updated to AntidoteDB (39988 success, 0 fail) 49671 IPv4 addresses added or updated to AntidoteDB (49671 success, 0 fail) ``` ### Assign network nodes to monitoring servers Each monitoring instance consists of three pieces of code, `monitor-assign`, `monitor-ping` and `monitor-snmp`. Eventually they will be merged into a single one encompassing all the functions. The `monitor-assign` program gets the whole Guifi.net network description from AntidoteDB, checks which devices are not being monitored (or are not being monitored with enough redundancy) and randomly picks some of them to start monitoring them. In the future, this random picking will be replaced by a smarter algorithm. Different options can be specified, like the monitor's `ID`, the maximum number of devices to take care of monitoring, the minimum redundancy, etc.:
$ go run monitor-assign.go -id 12345 -maxDevs 5 -minMons 3 (click here to see the whole content)

` Initializing... Using ID 12345 Setting timestamp to 1560419260 Updating globalAssign... Adding device 35578 from cnmlDevices into globalAssign Adding device 35580 from cnmlDevices into globalAssign Adding device 41236 from cnmlDevices into globalAssign Adding device 52800 from cnmlDevices into globalAssign Adding device 53410 from cnmlDevices into globalAssign Adding device 55625 from cnmlDevices into globalAssign Adding device 58266 from cnmlDevices into globalAssign Adding device 66287 from cnmlDevices into globalAssign Adding device 67954 from cnmlDevices into globalAssign Adding device 69514 from cnmlDevices into globalAssign Adding device 74780 from cnmlDevices into globalAssign Adding device 74943 from cnmlDevices into globalAssign Adding device 75036 from cnmlDevices into globalAssign Adding device 75038 from cnmlDevices into globalAssign Adding device 75651 from cnmlDevices into globalAssign Adding device 92844 from cnmlDevices into globalAssign globalAssign updated! Initialization done. Entering infinite loop... Setting timestamp to 1560419265 Managing the monitors list... I am monitor 12345 1 monitors registered in the database: 12345 Updating globalAssign... globalAssign updated! Sanitizing the assignation list... Getting the current monitors list... Updating the current cnml... Ended assignation list sanitization... Reassignation of devices 0 devices currently assigned to this monitor (maximum: 5 devices) Updating the current cnml... Assigning 1 new devices 16 devices unassigned Picking 1 nodes randomly 1 devices currently assigned to this monitor Exporting the new assigned devices list Setting timestamp to 1560419270 Managing the monitors list... I am monitor 12345 1 monitors registered in the database: 12345 Updating globalAssign... globalAssign updated! Sanitizing the assignation list... Getting the current monitors list... Updating the current cnml... Monitor 12345 found, keeping it for device 52800 Ended assignation list sanitization... Reassignation of devices 1 devices currently assigned to this monitor (maximum: 5 devices) Updating the current cnml... Assigning 1 new devices 15 devices unassigned Picking 1 nodes randomly 2 devices currently assigned to this monitor Exporting the new assigned devices list Setting timestamp to 1560419275 Managing the monitors list... I am monitor 12345 1 monitors registered in the database: 12345 Updating globalAssign... globalAssign updated! Sanitizing the assignation list... Getting the current monitors list... Updating the current cnml... Monitor 12345 found, keeping it for device 52800 Monitor 12345 found, keeping it for device 75651 Ended assignation list sanitization... Reassignation of devices 2 devices currently assigned to this monitor (maximum: 5 devices) Updating the current cnml... Assigning 1 new devices 14 devices unassigned Picking 1 nodes randomly 3 devices currently assigned to this monitor Exporting the new assigned devices list Setting timestamp to 1560419280 Managing the monitors list... I am monitor 12345 1 monitors registered in the database: 12345 Updating globalAssign... globalAssign updated! Sanitizing the assignation list... Getting the current monitors list... Updating the current cnml... Monitor 12345 found, keeping it for device 52800 Monitor 12345 found, keeping it for device 75036 Monitor 12345 found, keeping it for device 75651 Ended assignation list sanitization... Reassignation of devices 3 devices currently assigned to this monitor (maximum: 5 devices) Updating the current cnml... Assigning 1 new devices 13 devices unassigned Picking 1 nodes randomly 4 devices currently assigned to this monitor Exporting the new assigned devices list Setting timestamp to 1560419285 Managing the monitors list... I am monitor 12345 1 monitors registered in the database: 12345 Updating globalAssign... globalAssign updated! Sanitizing the assignation list... Getting the current monitors list... Updating the current cnml... Monitor 12345 found, keeping it for device 52800 Monitor 12345 found, keeping it for device 75036 Monitor 12345 found, keeping it for device 75038 Monitor 12345 found, keeping it for device 75651 Ended assignation list sanitization... Reassignation of devices 4 devices currently assigned to this monitor (maximum: 5 devices) Updating the current cnml... Assigning 1 new devices 12 devices unassigned Picking 1 nodes randomly 5 devices currently assigned to this monitor Exporting the new assigned devices list Setting timestamp to 1560419290 Managing the monitors list... I am monitor 12345 1 monitors registered in the database: 12345 Updating globalAssign... globalAssign updated! Sanitizing the assignation list... Getting the current monitors list... Updating the current cnml... Monitor 12345 found, keeping it for device 52800 Monitor 12345 found, keeping it for device 75036 Monitor 12345 found, keeping it for device 75038 Monitor 12345 found, keeping it for device 75651 Monitor 12345 found, keeping it for device 92844 Ended assignation list sanitization... Reassignation of devices 5 devices currently assigned to this monitor (maximum: 5 devices) Updating the current cnml... Not assigning any new device ```

`monitor-assign` not only cares about assigning itself the devices to monitor, but also pushes this information to AntidoteDB, so that other monitors can indirectly coordinate and pick the right devices to monitor. Additionally, it periodically sanitizes the global assignation in AntidoteDB, by pruning assignations from monitors that are not in the system anymore (crashed, unresponsive, in another network partition...) ### Monitor the nodes' liveliness (ping RTT and TTL) [WiP] The `monitor-ping` pings the different devices the monitor is assigned periodically, and writes the information to AntidoteDB. To have enough resolution for graphing, each device must be pinged *at least* every five minutes (and, ideally, this will happen simultaneously in other monitors, achieving the required redundancy). The `monitor-ping` instance **must** use the same `ID` parameter as the local `monitor-assign` instance (in the future they will be merged into a single process):

$ go run monitor-ping.go -id 12345 (click here to see the whole content)

` Initializing... Using ID 12345 Initialization done. Entering infinite loop... Pinging devices... Pinging device 101994 10.1.27.13 I was asked to ping 10.1.27.13 false ==> offline Pinging device 38720 10.1.26.97 I was asked to ping 10.1.26.97 true ==> online Pinging device 42628 10.1.25.225 I was asked to ping 10.1.25.225 true ==> online Pinging device 83865 10.1.25.130 10.228.207.130 I was asked to ping 10.1.25.130 true ==> online Pinging device 87503 10.1.25.194 I was asked to ping 10.1.25.194 true ==> online Pinging devices... Pinging device 101994 10.1.27.13 I was asked to ping 10.1.27.13 false Pinging device 38720 10.1.26.97 I was asked to ping 10.1.26.97 ```

#### Note Prior to running the program, a system setting must be configured: ```bash $ sudo sysctl -w net.ipv4.ping_group_range="0 2147483647" ``` as per this note on Linux support: ```this library attempts to send an "unprivileged" ping via UDP. On Linux, this must be enabled by setting``` ### Monitor the nodes' network traffic (SNMP) [WiP] The `monitor-snmp` periodically asks the different devices the monitor is assigned periodically for their [SNMP information](https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol) to get details about their interfaces and the inbound/outbound traffic. To have enough resolution for graphing, each device must be probed *at least* every five minutes (and, ideally, this will happen simultaneously in other monitors, achieving the required redundancy). The `monitor-snmp` instance **must** use the same `ID` parameter as the local `monitor-assign` instance (in the future they will be merged into a single process). Development of this piece of code is WiP. ## Data structures in AntidoteDB ### Devices⇔Monitors assignation The primary data source for this application is the CNML file. The `monitor-fetch` application parses the specified CNML file and pushes its contents to AntidoteDB. There is a single `monitor-fetch` instance, and its writes/updates are __authoritative__. In AntidoteDB, the data are structured as follows (some structures are shared with the *Monitoring data*, described below): #### guifi (bucket) The `guifi` bucket contains the lists of monitors and network devices to be monitored: ##### guifi (bucket) => devices (set) The `devices` *set* in the `guifi` *bucket* is an `array` of `strings`, each `string` containing the `ID` of a Guifi.net device. For example: ```bash $ curl localhost:3000/set/read/guifi/devices ["22110","26932","38720","40605","40962","41175","42331","42626","42627","42628", "46654","46656","47103","48030","51580","57728","59001","60415","64962","64963", "64965","64966","65291","65720","72843","73952","79715","81297","82096","82097", "82098","82099","82103","82104","82105","82111","83865","85877","87503","90228", "92032","92802","92803","92804","94210","94965","96225","96676","96684"] ``` ##### guifi (bucket) => monitors (set) The `monitors` *set* in the `guifi` *bucket* is an `array` of `strings`, each `string` containing the `ID` of a Guifi.net monitor. For example: ```bash $ curl localhost:3000/set/read/guifi/monitors ["a45632","a47363", "21435"] ``` When started, each of the monitoring instances register to the system by adding their `ID` to this *set*. ##### guifi (bucket) => checksum (LWW1 register) The `checksum` *LWW register* in the `guifi` *bucket* is a `strings` containing the *SHA256 checksum* of the CNML data fetched from the Guifi.net website and pushed to the database. For example: ```bash $ curl localhost:3000/register/read/guifi/checksum 35aaa826b841ed412897691bb1f50278d742ef9a76da9750a8ae509d3b01f8ee ``` #### device-i (bucket) The `device-i` bucket, where `i` is the numeric `ID` of a device in the `guifi/devices` set, contains the information about a Guifi.net device: ##### device-i (bucket) => ipv4s (set) The `ipv4s` *set* in the `device-i` *bucket* is an `array` of `strings`, each `string` containing an IPv4 address of the device. For example: ```bash $ curl localhost:3000/set/read/device-26932/ipv4s ["10.139.37.226","172.25.40.188","172.25.40.189"] ``` ##### device-i (bucket) => monitors (set) The `monitors` *set* in the `device-i` *bucket* is an `array` of `strings`, each `string` containing the ID of a monitor the device is assigned to (i.e. the `ID` of a monitor that is in charge of monitoring the device). For example: ```bash $ curl localhost:3000/set/read/device-26932/monitors ["a45632","a47363", "21435"] ``` ##### device-i (bucket) => graphserver (LWW register) The `graphserver` *LWW register* in the `device-i` *bucket* is a `string` containing the ID of a monitor the device is assigned to **in the Guifi.net** website (i.e., not automatically assigned by the monitoring application, but done manually on the Guifi.net website, and included in the CNML). For example: ```bash $ curl localhost:3000/register/read/device-26932/graphserver 71808 ``` ### Monitoring data The collected monitoring data are stored in AntidoteDB using the following structure and data types (some structures are shared with the *Devices⇔Monitors assignation* sata, described above: #### device-i (bucket) The `device-i` bucket, where `i` is the numeric `ID` of a device in the `guifi/devices` set, contains the monitoring data collected about a Guifi.net device: ##### device-i (bucket) => rawping (map) The `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored. The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/map/list/device-26932/rawping/ {["2018", "2019"]} ``` ##### device-i (bucket) => rawping (map) => year (map) The `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored. The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/map/list/device-26932/rawping/2019/ {["01","02","03","04","05","06"]} ``` ##### device-i (bucket) => rawping (map) => year (map) => month (map) The `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored. The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/map/list/device-26932/rawping/2019/01/ {["01","02","03","04","05","06","07","08","09","10","11","12","13","14","15", "16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31"]} ``` ##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) The `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored. The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/map/list/device-26932/rawping/2019/01/03/ {["rtt","ttl"]} ``` ##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => rtt (map) The `rtt` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of `sets` named with a HHmmss-monitorID format where the raw RTT (round-trip time) are (times are UTC). The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/map/list/device-26932/rawping/2019/01/03/rtt/ {["000000-1234"],["000000-3456"],["000000-5678"], ["000230-1234"],["000230-3456"],["000230-5678"], ["000500-1234"],["000500-3456"],["000500-5678"], ["000730-1234"],["000730-3456"],["000730-5678"], ... ["235730-1234"],["235730-3456"],["235730-5678"], ``` ##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => rtt (map) => HHmmss-monitorID (set) The `HHmmss-monitorID` *set* in the `rtt` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a `set` where the raw RTT (round-trip time) values from the ping probe performed at time *HHmmss* (UTC) by monitor *monitorID* are stored. Set values are in microseconds (µs). The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/set/read/device-26932/rawping/2019/01/03/rtt/235730-3456/ {["12452","9873","10347","12906","96991"]} ``` ##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => ttl (map) The `ttl` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of `sets` named with a HHmmss-monitorID format where the raw TTL (time-to-live) are. The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/map/list/device-26932/rawping/2019/01/03/ttl/ {["000000-1234"],["000000-3456"],["000000-5678"], ["000230-1234"],["000230-3456"],["000230-5678"], ["000500-1234"],["000500-3456"],["000500-5678"], ["000730-1234"],["000730-3456"],["000730-5678"], ... ["235730-1234"],["235730-3456"],["235730-5678"], ``` ##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => ttl (map) => HHmmss-monitorID (set) The `HHmmss-monitorID` *set* in the `ttl` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a `set` where the raw TTL (time to live) values from the ping probe performed at time *HHmmss* (UTC) by monitor *monitorID* are stored. Set values are integers ≤ 64. The following call will not work2, but gives an example of the data structure: ```bash $ curl localhost:3000/set/read/device-26932/rawping/2019/01/03/ttl/235730-3456/ {["59","59","58","59","59"]} ``` --- 1 LWW: last writer wins 2 The REST API does not support nested maps (but the Go client does)