Custom Monitoring using StatsD
Overview#
StatsD is a popular standard for developing infrastructure and application plugins. A wide suite of standard plugins are available from Statsd community and can be accessed here
sfAgent Statsd plugin integrates to Statsd client in the following way:
- Runs a daemon to listen to UDP port for data being sent by statsd client and accumulates all metrics being sent in the last N seconds (called flushinterval)
- Translates the data from statsd format to SnappyFlow’s format
- Forwards the data to SnappyFlow with necessary tags
Prerequisites#
- Create a rules file for a statsd client or contact support@snappyflow.io to create the rules file for a specific statsd client.
Configuration#
User can also manually add the configuration shown below to config.yaml under /opt/sfagent/ directory
key: <profile_key> tags: Name: <name> appName: <app_name> projectName: <project_name> metrics: plugins: - name: statsd enabled: true config: port:8125 flushinterval:30 ruleFile: /path/to/rules/file port: The UDP port on which statsd client sends metrics. sfAgent runs a statsd server listening on this port for the UDP datagrams. Default value is 8125.
flushInterval: SnappyFlow’s statsd plugin collects all the metrics received in the last N seconds and sends the data to SnappyFlow as a single document
ruleFile: User generated statsd rules file path or please contact support@snappyflow.io to create a rule file for a specific statsd client.
Operating Instructions#
Validate the statsd configuration and the rules. It is mandatory to run this command after any change is made in the statsd rules file, followed by restarting the sfAgent service.
sudo /opt/sfagent/sfagent -check-configCreating Rules File#
type= Topic1, metric= Lag, value= 500, metricType= g(gauge) - The field
typeis optional. If this field is present, it will enforce a nested json else the resulting json will be flat
Example
Kafka1.General.numTopic:5|g. In this case, namespace= Kafka1, prefix= General, metric= numTopic, value= 5, metricType= g (gauge) note
In special cases where namespace is not present and the metrics start directly with prefix, set namespace: none.
Supported datatypes are float, double, long, integer.
Rule to create nested json: "NESTED"#
Syntax
<json_key> = NESTED(namespace: <namespace>, prefix: <prefix_name>, key: <type_key>, metric: [<list of metrics along with datatypes>]) <json_key>: key of the final nested json.
<namespace>: This rule is applied to all metrics having this namespace
<prefix>: This rule is applied to all metrics having this prefix.
<key>: adds a key:value pair in the nested json
<metric>: Specify all the metrics to collect for this prefix.
Example
DB.host1.disk1.readLatency:20|g DB.host1.disk1.writeLatency:50|g Rule
latency = NESTED(namespace: DB, prefix: host1, key: diskName, metric:[readLatency:float, writeLatency:float]) Output
"latency": [ { "diskName": disk1, "readLatency":20, "writeLatency": 50 }, { "diskName": disk2, "readLatency":25, "writeLatency": 45 } ] Rule to create flat json: "FLAT"#
Syntax
<json_key> = FLAT(namespace: <namespace>, prefix: <prefix_name>, metric: <metric_name>) <namespace>: This rule is applied to all metrics having this namespace
<prefix>: This rule is applied to all metrics having this prefix.
<metric>: Specify all the metrics to collect for this prefix.
Example
Kafka1.System.cpuutil:10|g,Kafka1.System.ramutil:20|g,Rule
computeMetrics = FLAT(namespace: Kafka1, prefix: System, metric: [cpuutil:float, ramutil:float]) Output
"cpuutil": 10, “ramutil”:20 "RENDER" Rule:#
Extraction rules mentioned above, extract a set of metrics from statsd datagrams. These extracted metrics are grouped together in documents and shipped to SnappyFlow. Render rules describe grouping of metrics into documentType
Syntax
RENDER(_documentType: <doctype>, m1, m2,…mn) where m1..mn can be metric names or Rule names Example
RENDER(documentType: system, computeMetrics, latency) will create a documentType
{ plugin: statsd documentType: system "cpuutil": 10, “ramutil”: 20 "latency": [ { "diskName": disk1, "readLatency":20, "writeLatency": 50 }, { “diskName”: disk2, “readLatency”:25, “writeLatency”: 45 } ] } Tagging#
sfAgent statsD plugin is capable of parsing and forwarding the tags contained in the statsd metric datagrams. Tags are expressed in different formats based on the intended destination being Datadog, Influx or Graphite.
Add TAGTYPE rule in the statsd rules file to enable the parsing. Default TAGTYPE is None i.e. no custom tags present. In each of the formats below, the tags are recognized and passed forward into SnappyFlow documents
Cluster1.Kafka1.cpuUtil;_tag_appName=testApp1;_tag_projectName=apmProject;_documentType=cpuStats:35|c Sidekiq Use-case#
This section shows to monitor sidekiq using statsd with sfAgent.
Description#
We will use a simple ruby on rails application which shows endangered sharks’ data.
- There are two sidekiq worker configured, one to add the data and another to remove the sharks data named as 
AddEndangeredWorker and RemoveEndangeredWorker respectively. - Sidekiq statsd client is also configured to get the metrics.
- For this example, sidekiq-statsd by phstc is used as the client.
Installation#
Skip this part if the statsd client is already configured.
- Follow this documentation to setup the ruby on rails application, if needed
- To add the statsd client:
- Create a new file sidekiq.rb under config/initializers/ and add the configuration specified here.
- Install the [sidekiq-statsd gem](https://github.com/phstc/sidekiq-statsd" /l "installation) and run the application.
Sample Metrics#
Metrics are generated upon worker activation in the application.
Add endangered worker metrics
production.worker.AddEndangeredWorker.processing_time:1113|ms production.worker.AddEndangeredWorker.success:1|c production.worker.enqueued:0|g production.worker.retry_set_size:0|g production.worker.processed:69|g production.worker.failed:0|g production.worker.queues.default.enqueued:0|g production.worker.queues.default.latency:0|gRemove endangered worker metrics
production.worker.RemoveEndangeredWorker.processing_time:1472|ms production.worker.RemoveEndangeredWorker.success:1|c production.worker.enqueued:0|g production.worker.retry_set_size:0|g production.worker.processed:107|g production.worker.failed:0|g production.worker.queues.default.enqueued:0|g production.worker.queues.default.latency:0|g
Rules#
Follow the Rules User Guide section to understand the rules.
TAGTYPE = None
worker = NESTED(namespace: production, prefix: worker, key: worker_name, metric:[processing_time:double, success:float])
queues = NESTED(namespace: production, prefix: worker.queues, key: queue_name, metric:[enqueued:float, latency:float])
processedJobs = FLAT(namespace: production, prefix: worker, metric: processed:integer)
RENDER(_documentType: sidekiq, worker, queues, processedJobs) sfAgent Configuration#
Content of the /opt/sfagent/config.yaml. The rules file is /opt/sfagent/statsd-rules.txt
key: <profile_key> tags: Name: <instance-name> appName: <app-name> projectName: <project-name> metrics: plugins: - name: statsd enabled: true config: port: 8125 flushInterval: 10 ruleFile: '/opt/sfagent/statsd-rules.txt' Output
{ "_documentType": "sidekiq", "_tag_Name": "vm", "queues": [ { "latency": 0, "queue_name": "default", "enqueued": 0 } ], "_plugin": "statsD", "processedJobs": 107, "worker": [ { "processing_time": 1472, "worker_name": "RemoveEndangeredWorker", "success": 1 }, { "processing_time": 1113, "worker_name": "AddEndangeredWorker", "success": 1 } ], "_tag_projectName": "statsDProject", "_tag_uuid": "080027957dd8", "time": 1616132931981, "_tag_appName": "statsDApp" }