Developing lightweight computation at the DSG edge

Commit 8f7edbb6 authored by Roger Pueyo Centelles's avatar Roger Pueyo Centelles
Browse files

Update README


Signed-off-by: Roger Pueyo Centelles's avatarRoger Pueyo Centelles <rpueyo@ac.upc.edu>
parent 3e11e4be
......@@ -5,15 +5,15 @@ A proof-of-concept application that leverages [AntidoteDB](https://syncfree.gith
## Description
The monitoring application is distributed in three main blocks:
- Network description fetching and feeding (functional)
- Nodes assignment among the different monitoring servers [WiP]
- Actual nodes monitorisation [WiP]
- Nodes assignment among the different monitoring servers (functional, WiP)
- Actual nodes monitorisation (functional, WiP)
This proof of concept takes advantage of the [AntidoteDB Java tutorial](https://github.com/SyncFree/antidote-java-tutorial) by [Deepthi Akkoorath (@deepthidevaki)](https://github.com/deepthidevaki) and uses [Mathias Weber (@mweberUKL)](https://github.com/mweberUKL)'s Go client for AntidoteDB (previously, [João Neto (@joaomlneto)](https://github.com/joaomlneto)'s [HTTP/HTTPS REST API for AntidoteDB](https://github.com/LightKone/antidote-rest-server) was used).
## Installation
### Docker
Install `docker-ce` using your
Install `docker-ce` using your preferred method as [described here](https://docs.docker.com/install/).
### Go and required libraries
Install Go using your operating system's package manager or follow the [instructions here](https://golang.org/doc/install).
......@@ -27,7 +27,7 @@ After installing Go, download and install the folowing external libraries needed
`go get github.com/sparrc/go-ping`
- antidote-go-client:
- antidote-go-client:
`go get github.com/AntidoteDB/antidote-go-client`
......@@ -36,52 +36,306 @@ Get the tutorial's source code [here](https://github.com/SyncFree/antidote-java-
### HTTP/HTTPS REST API
Install the AntidoteDB REST API server following the [instructions here](https://github.com/LightKone/antidote-rest-server). Once everything is installed, start the server with the `antidote-rest-server` command. The server connects to one of the AntidoteDB instances (the one running on port 8087) from the Java tutorial.
The AntidoteDB REST API server is no longer needed, but you can follow the [instructions here](https://github.com/LightKone/antidote-rest-server) to install it, as it comes handy to perform quick tests.
## Building and running the application
## Running the application
### Fetch the network description and feed it to AntidoteDB
The whole Guifi.net network description is included in the `cnml.xml` file, which can be downloaded from the Guifi.net website, and is used by default:
```
The Guifi-UPC network (a small fraction of the whole Guifi.net) description file is included in `assets/cnml/upc.xml` and is used by default. The `monitor-fetch` program fetches the data from the file and pushes it to AntidoteDB:
```bash
$ cd src/monitor/fetch/
$ go run monitor-fetch.go
57642 nodes read from cnml.xml
53461 devices read from cnml.xml
39988 devices exported to /tmp/gmonitor2/devs.json
0 devices removed from AntidoteDB (0 success, 0 fail)
0 IPv4 addresses removed from AntidoteDB (0 success, 0 fail)
10634 devices added to AntidoteDB (10634 success, 0 fail) ...
```
etc. Smaller subnetworks can be used instead, like the Guifi-UPC subnetwork, for testing purposes:
19 nodes read from ../assets/cnml/upc.xml
63 devices read from ../assets/cnml/upc.xml
54 devices exported to /tmp/gmonitor2/devs.json
16 devices removed from AntidoteDB (0 success, 0 fail)
16 graphservers removed from AntidoteDB (0 success, 0 fail)
27 IPv4 addresses removed from AntidoteDB (0 success, 0 fail)
54 devices added to AntidoteDB (54 success, 0 fail)
54 graphservers added or updated to AntidoteDB (54 success, 0 fail)
67 IPv4 addresses added or updated to AntidoteDB (67 success, 0 fail)
```
$ go run monitor-fetch.go -cnml_file upc.xml
19 nodes read from upc.xml
Other small sub-networks' descriptions are available in the `assets/cnml` folder, some of them overlapping and some not.
Beyond the development and testing phase, the whole Guifi.net CNML description can be loaded (warning, it's HUGE):
```bash
$ go run monitor-fetch.go -cnml_file ../assets/cnml/guifi.xml
57642 nodes read from ../assets/cnml/guifi.xml
53461 devices read from ../assets/cnml/guifi.xml
39988 devices exported to /tmp/gmonitor2/devs.json
0 devices removed from AntidoteDB (0 success, 0 fail)
0 graphservers removed from AntidoteDB (0 success, 0 fail)
0 IPv4 addresses removed from AntidoteDB (0 success, 0 fail)
49 devices added to AntidoteDB (49 success, 0 fail)
67 IPv4 addresses added or updated to AntidoteDB (67 success, 0 fail)
```
### Assign network nodes to monitoring servers [WiP]
39988 devices added to AntidoteDB (39988 success, 0 fail)
39988 graphservers added or updated to AntidoteDB (39988 success, 0 fail)
49671 IPv4 addresses added or updated to AntidoteDB (49671 success, 0 fail)
```
go run monitor-assign.go
### Assign network nodes to monitoring servers
Each monitoring instance consists of three pieces of code, `monitor-assign`, `monitor-ping` and `monitor-snmp`. Eventually they will be merged into a single one encompassing all the functions.
The `monitor-assign` program gets the whole Guifi.net network description from AntidoteDB, checks which devices are not being monitored (or are not being monitored with enough redundancy) and randomly picks some of them to start monitoring them.
In the future, this random picking will be replaced by a smarter algorithm.
Different options can be specified, like the monitor's `ID`, the maximum number of devices to take care of monitoring, the minimum redundancy, etc.:
<details><summary><b>$ go run monitor-assign.go -id 12345 -maxDevs 5 -minMons 3</b> (click here to see the whole content)</summary>
<p>
`
Initializing...
Using ID 12345
Setting timestamp to 1560419260
Updating globalAssign...
Adding device 35578 from cnmlDevices into globalAssign
Adding device 35580 from cnmlDevices into globalAssign
Adding device 41236 from cnmlDevices into globalAssign
Adding device 52800 from cnmlDevices into globalAssign
Adding device 53410 from cnmlDevices into globalAssign
Adding device 55625 from cnmlDevices into globalAssign
Adding device 58266 from cnmlDevices into globalAssign
Adding device 66287 from cnmlDevices into globalAssign
Adding device 67954 from cnmlDevices into globalAssign
Adding device 69514 from cnmlDevices into globalAssign
Adding device 74780 from cnmlDevices into globalAssign
Adding device 74943 from cnmlDevices into globalAssign
Adding device 75036 from cnmlDevices into globalAssign
Adding device 75038 from cnmlDevices into globalAssign
Adding device 75651 from cnmlDevices into globalAssign
Adding device 92844 from cnmlDevices into globalAssign
globalAssign updated!
Initialization done. Entering infinite loop...
Setting timestamp to 1560419265
Managing the monitors list...
I am monitor 12345
1 monitors registered in the database:
12345
Updating globalAssign...
globalAssign updated!
Sanitizing the assignation list...
Getting the current monitors list...
Updating the current cnml...
Ended assignation list sanitization...
Reassignation of devices
0 devices currently assigned to this monitor (maximum: 5 devices)
Updating the current cnml...
Assigning 1 new devices
16 devices unassigned
Picking 1 nodes randomly
1 devices currently assigned to this monitor
Exporting the new assigned devices list
Setting timestamp to 1560419270
Managing the monitors list...
I am monitor 12345
1 monitors registered in the database:
12345
Updating globalAssign...
globalAssign updated!
Sanitizing the assignation list...
Getting the current monitors list...
Updating the current cnml...
Monitor 12345 found, keeping it for device 52800
Ended assignation list sanitization...
Reassignation of devices
1 devices currently assigned to this monitor (maximum: 5 devices)
Updating the current cnml...
Assigning 1 new devices
15 devices unassigned
Picking 1 nodes randomly
2 devices currently assigned to this monitor
Exporting the new assigned devices list
Setting timestamp to 1560419275
Managing the monitors list...
I am monitor 12345
1 monitors registered in the database:
12345
Updating globalAssign...
globalAssign updated!
Sanitizing the assignation list...
Getting the current monitors list...
Updating the current cnml...
Monitor 12345 found, keeping it for device 52800
Monitor 12345 found, keeping it for device 75651
Ended assignation list sanitization...
Reassignation of devices
2 devices currently assigned to this monitor (maximum: 5 devices)
Updating the current cnml...
Assigning 1 new devices
14 devices unassigned
Picking 1 nodes randomly
3 devices currently assigned to this monitor
Exporting the new assigned devices list
Setting timestamp to 1560419280
Managing the monitors list...
I am monitor 12345
1 monitors registered in the database:
12345
Updating globalAssign...
globalAssign updated!
Sanitizing the assignation list...
Getting the current monitors list...
Updating the current cnml...
Monitor 12345 found, keeping it for device 52800
Monitor 12345 found, keeping it for device 75036
Monitor 12345 found, keeping it for device 75651
Ended assignation list sanitization...
Reassignation of devices
3 devices currently assigned to this monitor (maximum: 5 devices)
Updating the current cnml...
Assigning 1 new devices
13 devices unassigned
Picking 1 nodes randomly
4 devices currently assigned to this monitor
Exporting the new assigned devices list
Setting timestamp to 1560419285
Managing the monitors list...
I am monitor 12345
1 monitors registered in the database:
12345
Updating globalAssign...
globalAssign updated!
Sanitizing the assignation list...
Getting the current monitors list...
Updating the current cnml...
Monitor 12345 found, keeping it for device 52800
Monitor 12345 found, keeping it for device 75036
Monitor 12345 found, keeping it for device 75038
Monitor 12345 found, keeping it for device 75651
Ended assignation list sanitization...
Reassignation of devices
4 devices currently assigned to this monitor (maximum: 5 devices)
Updating the current cnml...
Assigning 1 new devices
12 devices unassigned
Picking 1 nodes randomly
5 devices currently assigned to this monitor
Exporting the new assigned devices list
Setting timestamp to 1560419290
Managing the monitors list...
I am monitor 12345
1 monitors registered in the database:
12345
Updating globalAssign...
globalAssign updated!
Sanitizing the assignation list...
Getting the current monitors list...
Updating the current cnml...
Monitor 12345 found, keeping it for device 52800
Monitor 12345 found, keeping it for device 75036
Monitor 12345 found, keeping it for device 75038
Monitor 12345 found, keeping it for device 75651
Monitor 12345 found, keeping it for device 92844
Ended assignation list sanitization...
Reassignation of devices
5 devices currently assigned to this monitor (maximum: 5 devices)
Updating the current cnml...
Not assigning any new device
```
</p>
</details>
<p>
`monitor-assign` not only cares about assigning itself the devices to monitor, but also pushes this information to AntidoteDB, so that other monitors can indirectly coordinate and pick the right devices to monitor.
Additionally, it periodically sanitizes the global assignation in AntidoteDB, by pruning assignations from monitors that are not in the system anymore (crashed, unresponsive, in another network partition...)
### Monitor the nodes' liveliness (ping RTT and TTL) [WiP]
The `monitor-ping` pings the different devices the monitor is assigned periodically, and writes the information to AntidoteDB.
To have enough resolution for graphing, each device must be pinged *at least* every five minutes (and, ideally, this will happen simultaneously in other monitors, achieving the required redundancy).
The `monitor-ping` instance **must** use the same `ID` parameter as the local `monitor-assign` instance (in the future they will be merged into a single process):
<details><summary><b>$ go run monitor-ping.go -id 12345</b> (click here to see the whole content)</summary>
<p>
`
Initializing...
Using ID 12345
Initialization done. Entering infinite loop...
Pinging devices...
Pinging device 101994
10.1.27.13
I was asked to ping 10.1.27.13
false ==> offline
Pinging device 38720
10.1.26.97
I was asked to ping 10.1.26.97
true ==> online
Pinging device 42628
10.1.25.225
I was asked to ping 10.1.25.225
true ==> online
Pinging device 83865
10.1.25.130
10.228.207.130
I was asked to ping 10.1.25.130
true ==> online
Pinging device 87503
10.1.25.194
I was asked to ping 10.1.25.194
true ==> online
Pinging devices...
Pinging device 101994
10.1.27.13
I was asked to ping 10.1.27.13
false
Pinging device 38720
10.1.26.97
I was asked to ping 10.1.26.97
### Monitor the nodes [WiP]
```
go run monitor-ping.go <host>
</p>
</details>
#### Note
Prior to running the program, a system setting must be configured:
```bash
$ sudo sysctl -w net.ipv4.ping_group_range="0 2147483647"
```
### Note
Previously as of this note on Linux support: _this library attempts to send an "unprivileged" ping via UDP. On Linux, this must be enabled by setting_:
```sudo sysctl -w net.ipv4.ping_group_range="0 2147483647"```
as per this note on Linux support:
```this library attempts to send an "unprivileged" ping via UDP. On Linux, this must be enabled by setting```
### Monitor the nodes' network traffic (SNMP) [WiP]
The `monitor-snmp` periodically asks the different devices the monitor is assigned periodically for their [SNMP information](https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol) to get details about their interfaces and the inbound/outbound traffic.
To have enough resolution for graphing, each device must be probed *at least* every five minutes (and, ideally, this will happen simultaneously in other monitors, achieving the required redundancy).
The `monitor-snmp` instance **must** use the same `ID` parameter as the local `monitor-assign` instance (in the future they will be merged into a single process).
Development of this piece of code is WiP.
## Data structures in AntidoteDB
### Devices/Monitors assignation
### DevicesMonitors assignation
The primary data source for this application is the CNML file. The `monitor-fetch` application parses the specified CNML file and pushes its contents to AntidoteDB. There is a single `monitor-fetch` instance, and its writes/updates are __authoritative__.
In AntidoteDB, the data are structured as follows:
In AntidoteDB, the data are structured as follows (some structures are shared with the *Monitoring data*, described below):
#### guifi (bucket)
The `guifi` bucket contains the lists of monitors and network devices to be monitored:
......@@ -136,5 +390,99 @@ $ curl localhost:3000/register/read/device-26932/graphserver
71808
```
### Monitoring data
The collected monitoring data are stored in AntidoteDB using the following structure and data types (some structures are shared with the *Devices⇔Monitors assignation* sata, described above:
#### device-i (bucket)
The `device-i` bucket, where `i` is the numeric `ID` of a device in the `guifi/devices` set, contains the monitoring data collected about a Guifi.net device:
##### device-i (bucket) => rawping (map)
The `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored.
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/map/list/device-26932/rawping/
{["2018", "2019"]}
```
##### device-i (bucket) => rawping (map) => year (map)
The `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored.
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/map/list/device-26932/rawping/2019/
{["01","02","03","04","05","06"]}
```
##### device-i (bucket) => rawping (map) => year (map) => month (map)
The `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored.
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/map/list/device-26932/rawping/2019/01/
{["01","02","03","04","05","06","07","08","09","10","11","12","13","14","15",
"16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31"]}
```
##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map)
The `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of nested maps where the raw ping data are stored.
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/map/list/device-26932/rawping/2019/01/03/
{["rtt","ttl"]}
```
##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => rtt (map)
The `rtt` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of `sets` named with a HHmmss-monitorID format where the raw RTT (round-trip time) are (times are UTC).
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/map/list/device-26932/rawping/2019/01/03/rtt/
{["000000-1234"],["000000-3456"],["000000-5678"],
["000230-1234"],["000230-3456"],["000230-5678"],
["000500-1234"],["000500-3456"],["000500-5678"],
["000730-1234"],["000730-3456"],["000730-5678"],
...
["235730-1234"],["235730-3456"],["235730-5678"],
```
##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => rtt (map) => HHmmss-monitorID (set)
The `HHmmss-monitorID` *set* in the `rtt` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a `set` where the raw RTT (round-trip time) values from the ping probe performed at time *HHmmss* (UTC) by monitor *monitorID* are stored. Set values are in microseconds (µs).
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/set/read/device-26932/rawping/2019/01/03/rtt/235730-3456/
{["12452","9873","10347","12906","96991"]}
```
##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => ttl (map)
The `ttl` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a collection of `sets` named with a HHmmss-monitorID format where the raw TTL (time-to-live) are.
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/map/list/device-26932/rawping/2019/01/03/ttl/
{["000000-1234"],["000000-3456"],["000000-5678"],
["000230-1234"],["000230-3456"],["000230-5678"],
["000500-1234"],["000500-3456"],["000500-5678"],
["000730-1234"],["000730-3456"],["000730-5678"],
...
["235730-1234"],["235730-3456"],["235730-5678"],
```
##### device-i (bucket) => rawping (map) => year (map) => month (map) => day (map) => ttl (map) => HHmmss-monitorID (set)
The `HHmmss-monitorID` *set* in the `ttl` *map* in the `day` *map* in the `month` *map* in the `year` *map* in the `rawping` *map* in the `device-i` *bucket* is a `set` where the raw TTL (time to live) values from the ping probe performed at time *HHmmss* (UTC) by monitor *monitorID* are stored. Set values are integers ≤ 64.
The following call will not work<sup>2</sup>, but gives an example of the data structure:
```bash
$ curl localhost:3000/set/read/device-26932/rawping/2019/01/03/ttl/235730-3456/
{["59","59","58","59","59"]}
```
---
<sup>1</sup> LWW: last writer wins
<sup>1</sup> LWW: last writer wins
<sup>2</sup> The REST API does not support nested maps (but the Go client does)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment