Based on Stackdriver, I want to send notifications to my Centreon monitoring (behind Nagios) for workflow reasons, do you have any idea on how to do so?
Thank you
Stackdriver alerting allows webhook notifications, so you can run a server to forward the notifications anywhere you need to (including Centreon), and point the Stackdriver alerting notification channel to that server.
There are two ways to send external information in the Centreon queue without a traditional passive agent mode.
First, you can use the Centreon DSM (Dynamic Services Management) addon.
It is interesting because you don't have to register a dedicated and already known service in your configuration to match the notification.
With Centreon DSM, Centreon can receive events such as SNMP traps resulting from the detection of a problem and assign the event dynamically to a slot defined in Centreon, like a tray event.
A resource has a set number of “slots” on which alerts will be assigned (stored). While this event has not been taken into account by human action, it will remain visible in the Centreon web frontend. When the event is acknowledged, the slot becomes available for new events.
The event must be transmitted to the server via an SNMP Trap.
All the configuration is made through Centreon web interface after the module installation.
Complete explanations, screenshots, and tips are described on the online documentation: https://documentation.centreon.com/docs/centreon-dsm/en/latest/user.html
Secondly, Centreon developers added a Centreon REST API you can use to submit information to the monitoring engine.
This feature is easier to use than the SNMP Trap way.
In that case, you have to create both host/service objects before any API utilization.
To send status, please use the following URL using POST method:
api.domain.tld/centreon/api/index.php?action=submit&object=centreon_submit_results
Header
key value
Content-Type application/json
centreon-auth-token the value of authToken you got on the authentication response
Example of service body submit: The body is a JSON with the parameters provided above formatted as below:
{
"results": [
{
"updatetime": "1528884076",
"host": "Centreon-Central"
"service": "Memory",
"status": "2"
"output": "The service is in CRITICAL state"
"perfdata": "perf=20"
},
{
"updatetime": "1528884076",
"host": "Centreon-Central"
"service": "fake-service",
"status": "1"
"output": "The service is in WARNING state"
"perfdata": "perf=10"
}
]
}
Example of body response: :: The response body is a JSON with the HTTP return code, and a message for each submit:
{
"results": [
{
"code": 202,
"message": "The status send to the engine"
},
{
"code": 404,
"message": "The service is not present."
}
]
}
More information is available in the online documentation: https://documentation.centreon.com/docs/centreon/en/19.04/api/api_rest/index.html
Centreon REST API also allows to get real-time status for hosts, services and do the object configuration.
Related
I am trying to set up self-hosted runners for GitHub using Terraform with Phillips-Labs terraform-aws-github-runner module. I see the GH webhook send/receive messages, SQS queue receiving messages and those messages being retrieve. The scale-up lambda is firing and I see the following logs:
2023-01-31 11:50:15.879 INFO [scale-up:22b11002-76d2-5596-9451-4c51746730c2 index.js:119051 scaleUp] Received workflow_job from {my-org}/terraform-aws-github-self-hosted-runners
{}
2023-01-31 11:50:15.880 INFO [scale-up:22b11002-76d2-5596-9451-4c51746730c2 index.js:119084 scaleUp] Received event
{
"runnerType": "Org",
"runnerOwner": "my-org",
"event": "workflow_job",
"id": "11002102910"
}
2023-01-31 11:50:16.188 DEBUG [gh-auth:22b11002-76d2-5596-9451-4c51746730c2 index.js:118486 createAuth] GHES API URL: {"runnerType":"Org","runnerOwner":"my-org","event":"workflow_job","id":"11002102910"}
2023-01-31 11:50:16.193 WARN [scale-runners:22b11002-76d2-5596-9451-4c51746730c2 index.js:118529 Runtime.handler] Ignoring error: error:1E08010C:DECODER routines::unsupported
{
"runnerType": "Org",
"runnerOwner": "my-org",
"event": "workflow_job",
"id": "11002102910"
}
I do not see any EC2 instances being creating. I suspect the GHES API URL: should have a value after it, but I'm not certain. Also, the final log says it is ignoring an error...
I have confirmed my private key pem file is stored as a multi-line secret in secrets manager.
Any advice would be much appreciated!
It looks like not all the permissions needed by the github app are documented. I needed to add a subscription to the Workflow run event.
I am using an AWS lambda function to serve my NodeJS codebase for an Alexa Skill.
The skill makes external API calls to a custom API as well as the Amazon GameOn API, it also uses URL's which serve audio files and images from an S3 Bucket.
The issue I am having is intermittent, and is affecting about 20% of users. At random points of the skill, the user request will produce an invalid response from the skill, with the following error:
{
"Request": {
"type": "System.ExceptionEncountered",
"requestId": "amzn1.echo-api.request.ab35c3f1-b8e6-4478-945c-16f644359556",
"timestamp": "2020-05-16T19:54:24Z",
"locale": "en-US",
"error": {
"type": "INVALID_RESPONSE",
"message": "Read timed out for requestId amzn1.echo-api.request.323b1fbb-b4e8-4cdf-8f31-30c9b67e4a5d"
},
"cause": {
"requestId": "amzn1.echo-api.request.323b1fbb-b4e8-4cdf-8f31-30c9b67e4a5d"
}
},
I have looked up this issue, I believe it's something wrong with the lambda function configuration but can't figure out where!
I've tried increasing the Memory the function uses (now 256MB).
It should be noted that the function timeout is 8000ms, since this is the max time you are allowed for an Alexa response.
What causes this Read timeout issue, and what measures can I take to debug and resolve it?
Take a look at AWS XRay. By using this with your Lambda you should be able to identify the source of these timeouts.
This link should help you understand how to apply it.
We found that this was occurring when the skill was trying to access a resource which was stored on our Azure website.
The CPU and Memory allocation for the azure site was too low, and it would fail when facing a large amount of requests.
To fix, we improved the plan the app service was running on.
We're using a lambda function to respond to the 'User Migration' trigger in AWS Cognito. When something like a syntax error occurs, you can see it in cloud watch logs. However, "Exception during user migration" errors seen on the login page are no where to be found in the cloud watch logs.
Where are we supposed to look for these? I can't find any anything in the documentation and assumed it would have gone to cloud watch.
I can't test it in the lambda interface because one of the parameters being passed into the lambda function will have a function nested within the object and I can't create a test JSON setup that has that. There's also no test trigger for user migration that is pre-built.
Any ideas as to why I can't see this in cloud watch or where the exceptions would be shown would be greatly appreciated.
Unfortunately Cogntio doesn't expose any logs (or metrics, for that matter!).
The closest you can get is to view the lambda's logs in CloudWatch. If you log your response, and watch your lambda's error metric then you should mostly be able to debug issues internal to the lambda.
This does leave a few edge cases:
You won't see anything if the lambda can't be invoked (this would only happen under heavy concurrent loads either on that single lambda, or on all lambdas across your account)
If you return a bad response the lambda will succeed but the trigger action will fail and Cognito will give you a fairly generic message. At this point you're at the mercy of AWS' documentation to work out what's wrong (which can be a bit hit and miss- although StackOverflow always helps!).
You can find an example payload for the lambda in the trigger documentation:
{
"userName": "THE USERNAME",
"request": {
"password": "THE PASSWORD"
},
"response": {
// it is your responsibility to fill this bit in and return the completed object back:
"userAttributes": {
"string": "string",
...
},
"finalUserStatus": "string",
"messageAction": "string",
"desiredDeliveryMediums": [ "string", ... ],
"forceAliasCreation": boolean
}
}
n.b. As an aside, which you might know, but Lambda payloads always have to be in JSON, which does not store functions. So you should always be able to derive a test payload to use in the console.
Currently, the message specified in the Document field while creating alerting policy appears in the Document field of the Stackdriver alert email.
I would like to overwrite the entire email message body with my custom content.
How can I overwrite the message body of Stackdriver Alert email with my custom message?
Is there any other workaround to do this?
You should be able to send the notification to a webhook, and this could directly be an HTTP-triggered Cloud Function.
This Cloud Function would receive all the information from the alert, and you can follow this tutorial to use SendGrid to send your alerts.
This is a lot more complex than just setting the email notifications, but also provides you with an amazing flexibility regarding alerts, as you'll be able to not just write the message however you want, but you could process the data in any way you want:
You have low priority alerts? Then store them and just send a digest
once in a while instead of spamming.
Want to change who is sent the
alert depending on a calendar rotation? Use the function to look up
who should be notified.
And those are just some random quick ideas I got while writing this message.
The information provided in the POST body is this one (that's just a sample):
{
"incident": {
"incident_id": "f2e08c333dc64cb09f75eaab355393bz",
"resource_id": "i-4a266a2d",
"resource_name": "webserver-85",
"state": "open",
"started_at": 1385085727,
"ended_at": null,
"policy_name": "Webserver Health",
"condition_name": "CPU usage",
"url": "https://app.google.stackdriver.com/incidents/f333dc64z",
"summary": "CPU for webserver-85 is above the threshold of 1% with a value of 28.5%"
},
"version": 1.1
}
You can create a single webhook that handles all the alerts, or you can create a webhook on a per-policy basis to handle things separately.
We have a case where we need to send a json object with a push notification. Reading the documentation I found out I can do the following
iOS
{
default: req.body.message,
"APNS": {
"aps": {
"alert": {
"message": req.body.message,
"data": "{JSON Object}"
},
},
}
Android:
{
"GCM": {
"data": {
"messagee": {
"message": req.body.message,
"data": "{JSON Object}"
}
}
}
}
But, I got sceptical if we should use Message Attributes if not then what is the us of the Message Attributes !
Based on your description it seems like you do not need to use message attributes. Quoting the AWS docs:
You can also use message attributes to help structure the push notification message for mobile endpoints. In this scenario the message attributes are only used to help structure the push notification message and are not delivered to the endpoint, as they are when sending messages with message attributes to Amazon SQS endpoints.
There are some use cases for attaching message attributes to push notifications. One such use case is for TTLs on outbound messages. Again quoting the docs:
The TTL message attribute is used to specify expiration metadata about a message. This allows you to specify the amount of time that the push notification service, such as Apple Push Notification Service (APNS) or GCM, has to deliver the message to the endpoint. If for some reason (such as the mobile device has been turned off) the message is not deliverable within the specified TTL, then the message will be dropped and no further attempts to deliver it will be made. To specify TTL within message attributes, you can use the AWS Management Console, AWS software development kits (SDKs), or query API.