Testing Google Cloud Alerting with Synthetic Log Injection
Testing alerting in Google Cloud Monitoring requires a proactive approach, especially with uncommon scenarios. For assertions that almost always pass—such as DNS record verification or health checks—you can simulate failures using synthetic log injection.
The Alerting Signal: assertion_passed: false
At the heart of this strategy is a convention: services perform their internal checks and log a JSON payload with an assertion_passed boolean. When a check fails—be it a DNS mismatch or a stale cache—the service logs {"assertion_passed": false}.
A typical alerting policy filter for this signal would look like this:
jsonPayload.assertion_passed = "false"
LOG_ID("run.googleapis.com/stdout")
Alerts fire and notify the admin contact when a log entry matches these criteria.
The Strategy: Synthetic Log Injection
The most efficient method to test log-based alerting policies is to manually write a log entry that mirrors the exact structure of a failure event. By utilizing the gcloud CLI, you can inject a JSON payload into Cloud Logging that targets specific monitored resources, such as Cloud Run Jobs.
Execution
To simulate a failure for a specific job, you must match the monitored-resource-type and its corresponding labels. This ensures the alerting policy’s filter correctly identifies and processes the entry.
The following command demonstrates how to inject an error log for a Cloud Run Job named mx-check-task:
gcloud logging write "run.googleapis.com/stdout" \
'{"assertion_passed": false, "message": "Manual alert test: MX record assertion failed", "test_event": true}' \
--project="tonym-us" \
--payload-type=json \
--severity=ERROR \
--monitored-resource-type="cloud_run_job" \
--monitored-resource-labels="job_name=mx-check-task,location=us-west1,project_id=tonym-us"
Key Considerations
- Payload Accuracy: The alerting policy relies on specific JSON keys. Setting
--payload-type=jsonis critical for ensuring thejsonPayload.assertion_passedfield is parsed as a boolean rather than a raw string. - Resource Alignment: Alerting filters often restrict scope to specific resources. By defining
--monitored-resource-typeand--monitored-resource-labels, the simulated log becomes indistinguishable from a real service failure. - Incident Lifecycle: Google Cloud Monitoring suppresses duplicate notifications for the same open incident. If an incident is already active in the console, resolve it manually before running the simulation to verify that notification channels (e.g., Email, PagerDuty, Slack) are functioning correctly.
- Auditing & Filtering: To prevent synthetic tests from skewing your production metrics, include a
test_event: trueflag in the payload. You can then update your SLO and SLA dashboard filters to exclude these events usingNOT jsonPayload.test_event=true. Alerting policies can also be configured to add atest_event: truelabel to the log entry to manage SLA measurement. - LOG_ID Matching: The
LOG_ID()function in your filter must match the name of the log stream you are writing to (e.g.,"run.googleapis.com/stdout"). This ensures the alerting policy is scoped strictly to your application’s output, preventing accidental triggers from other logs that might accidentally contain similar JSON keys.
Verification
Post-injection, confirm the entry within the Logs Explorer using the following query:
resource.type="cloud_run_job"
resource.labels.job_name="mx-check-task"
jsonPayload.assertion_passed=false
jsonPayload.test_event=true
This method provides a reliable, repeatable framework for validating the entire alerting pipeline without modifying application code or disrupting production environment configurations.