Writing Custom Metrics to Stackdriver in Golang

Posted on Wed 13 February 2019 in golang

Instrumentation is a critical part of any application. Along with system counters like cpu, heap, free disk, etc-- it's important to create application-level metrics to make sure health is measured closer to your customer's experience.

Example metrics could be user-registration, password-change, profile-change, etc. If you see a major spike or dip in these metrics, a wider problem could be indicated.

For this example a custom metric was needed, and no infrastructure was in place for harvesting it (e.g. collectd). Golang is handy for creating an easy-to-install daemon which performs the measurement and periodically harvests the data into stackdriver.

The overall flow for custom metrics in stackdriver goes like this: 1. Create a custom metric with the stackdriver Metric Descriptor API -- this assigns a data type, dimensions, name & resource category to the metric. 2. Push metric values to the metric, recording it's value, resources and additional dimensions.

In the following example, we wanted to measure the number of files being uploaded by a backup agent. The number of files was indicated by lines in a logfile, which we used as our indicator (see Line Counter and WatchFile). By sending the line count to stackdriver every 5 s, we were able to create a dashboard and alert monitoring backup activity.

1. Create the Metric Descriptor

For your first metric, it's easiest to use the API explorer which will auto complete the definition. As the project matures, call mectricDescriptors.create via gcloud

  1. Visit Explorer
  2. name: backup-count
  3. valueType: INTEGER
  4. remove labels. We will assign a relevant resource using the writeMetric call

2. Collect and Harvest Custom Metric with Golang

For this example, we're trivally tracking the line count. It can be expanded to parse success vs error lines and send those as two metrics, e.g. backup-count vs error-backup-count

The code below creates a 5-sec timer which counts the file's lines and sends that count to stackdriver. stackdriver reporting will allow you to run statistics e.g. avg() & max() and allow you to see the backup-count growth over time

package main

import (
        "bytes"
        "context"
        "fmt"
        "io"
        "log"
        "os"
        "time"

        monitoring "cloud.google.com/go/monitoring/apiv3"
        timestamp "github.com/golang/protobuf/ptypes/timestamp"
        metricpb "google.golang.org/genproto/googleapis/api/metric"
        monitoredres "google.golang.org/genproto/googleapis/api/monitoredres"
        monitoringpb "google.golang.org/genproto/googleapis/monitoring/v3"
)

const metricType = "custom.googleapis.com/photos/backup-count"

const (
    projectID = "YOUR-GCP-PROJECT-ID"
    instanceID = "INSTANCE-ID-FOR-TESTING"
    INTERVAL = 3
    zone = "us-east1-c"
)

func writeTimeSeriesValue(projectID, metricType string, value int) error {
        ctx := context.Background()
        c, err := monitoring.NewMetricClient(ctx)
        if err != nil {
                return err
        }
        now := &timestamp.Timestamp{
                Seconds: time.Now().Unix(),
        }
        req := &monitoringpb.CreateTimeSeriesRequest{
                Name: "projects/" + projectID,
                TimeSeries: []*monitoringpb.TimeSeries{{
                        Metric: &metricpb.Metric{
                                Type: metricType,
                        },
                        Resource: &monitoredres.MonitoredResource{
                                Type: "gce_instance",
                                Labels: map[string]string{
                                        "project_id":  projectID,
                                        "instance_id": instanceID,
                                        "zone":        zone,
                                },
                        },
                        Points: []*monitoringpb.Point{{
                                Interval: &monitoringpb.TimeInterval{
                                        StartTime: now,
                                        EndTime:   now,
                                },
                                Value: &monitoringpb.TypedValue{
                                        Value: &monitoringpb.TypedValue_Int64Value{
                                                Int64Value: int64(value),
                                        },
                                },
                        }},
                }},
        }
        log.Printf("writeTimeseriesRequest: %+v\n", req)

        err = c.CreateTimeSeries(ctx, req)
        if err != nil {
                return fmt.Errorf("could not write time series value, %v ", err)
        }
        return nil
}


// [END monitoring_read_timeseries_simple]

func LineCounter(r io.Reader) (int, error) {
        buf := make([]byte, 32*1024)
        count := 0
        lineSep := []byte{'\n'}

        for {
                c, err := r.Read(buf)
                count += bytes.Count(buf[:c], lineSep)

                switch {
                case err == io.EOF:
                        return count, nil

                case err != nil:
                        return count, err
                }
        }
}

// given interval, return the # of lines in the file
func WatchFile(interval time.Duration, filename string) <-chan int {
        handle, err := os.Open(filename)
        ticker := time.NewTicker(interval)
        sizeChan := make(chan int)
        go func() {
                for _ = range ticker.C {
                        lineCount, _ := LineCounter(handle)
                        sizeChan <- lineCount
                        if err != nil {
                                fmt.Println("error")
                        }
                        handle.Seek(0, 0)
                }
        }()
        return sizeChan
}

func main() {
        if len(os.Args) < 2 {
                fmt.Printf("%s FILENAME\n", os.Args[0])
                os.Exit(-1)
        }
        sizeChan := WatchFile(INTERVAL*time.Second, os.Args[1])
        for {
                select {
                case <-sizeChan:
                        curSize := <-sizeChan
                        fmt.Printf("Lines in %s : %d\n", os.Args[1], curSize)
                        writeTimeSeriesValue(projectID, metricType, curSize)
                }
        }
}