I would like to have posibility to log message (info , debug or error ) from sequences ..
My scenario would be in a case of error call Error Sequence and in it I would call (proxy, or connector ) to log custome message.
I'm thinking of Stackify
http://support.stackify.com/hc/en-us/categories/200398739-Errors-Logs
is it possible or anyone can point me in correct direction how can I do it ?
Thanks
If your already using Retrace (Stackify) on the server and capturing the error itself you can do this via an error log monitor with a notification group containing just yourself.
Create the notification group with yourself in it
Find the error in the error dashboard and filter down to just that error is showing (usually done by filtering for specific message text in combination with app and environment filters)
Save the query (top right corner of screen, save icon), name it something you find appropriate and descriptive.
Go to the log monitors screen and add a new log monitor, use the saved query. Set it to alert after 1x looking back an hour (to give you time to go see it) and run it every minute or 5 minutes.
Set your notification group as send at the alert level you chose (or all levels)
Hope that helps.
(Edit after some thoughts)
you can do the same thing from any log message by the way, whether its an INFO, ERROR, DEBUG, etc... all you need is to filter it down to get to what you want, save the query and create a monitor for it.
Related
I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).
The way I defined the alert is:
And the secondary aggregator is delta.
The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).
ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.
Is there something that I'm missing?
Edit:
Adding another image to show what I think might be the problem:
From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work
According to the official documentation:
"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."
That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:
"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."
The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).
So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.
I've enabled the notifications in Stackdriver and I'm getting notification e-mails for exceptions just fine.
The problem is that I don't get any notification for timeouts.
Is there any way to be notified when a Google Cloud Function is killed by timeout?
Even though a timeout is not reported as an error, you can still set up a metric for timeout log entries, and then an alert on the metric exceeding a zero threshold.
From the GCP console, go to the Stackdriver Logging viewer (/logs/viewer), and build a filter like this:
resource.type="cloud_function"
resource.labels.function_name="[YOUR_FUNCTION_NAME_HERE]"
"finished with status: timeout"
The third line is a "contains" text filter. Timeout messages consistently contain this text. You can add other things or modify as needed.
Click Create Metric. Give the metric a name like "Function timeouts", and make sure the type is counter. You can leave the optional fields blank. Submit the form, and you should be redirected to /logs/metrics.
Under User-defined Metrics, you should see your new metric. Click the three-dot button on the right and select Create alert from metric.
Give the alert policy a meaningful name. Under target, you may also get some red text about being unable to produce a line plot. Click the helpful link to switch the aligner to mean and the aggregator to none. Then under Configuration, set the condition to "is above," threshold to "0", and for "most recent value."
Proceed with building the notification and documentation as desired. Make sure you add a notification channel so you get alerted. The UI should include hints on each field.
More detail is in the official documentation.
Navigate to "Create alerting policy" using the search box at the top of the dashboard.
Under "What do you want to track?" click "Add Condition."
Configure the new condition like so:
Click "Add."
Click "Next."
Select a notification channel or create a new one.
I unchecked "Notify on incident resolution."
Click "Next."
Provide a descriptive alert name and optional documentation.
Click "Save."
Ensure that at the top of the policy you see the word "Enabled" along with a green checkmark.
Came up with a workaround for this by forcing an error before Cloud Functions times out. In terms of workflow, I think this is much easier to control and be able to consolidate all the errors in one place, rather than having to configure settings elsewhere.
Basically something like the code snippet below:
exports.cloudFunction = async (event, context, callback) => {
try {
const timeout = setTimeout(function(){
throw new Error(`Timeout: ${event}`);
}, 58000); // 2sec buffer off the default 60s timeout
// DO SOMETHING
clearTimeout(timeout);
callback();
} catch(e) {
// HANDLE ERROR
callback(e);
}
}
I am trying to create a flow using Google Cloud Dataprep. The flow takes a data set from Big Query which contains app events data from Firebase Analytics to flatten event parameters for easier analysis. I keep getting the following error before even being able to create the first step (recipe):
Transformation engine unavailable due to prior crash (exit code: -1)
See top right corner in the screenshot below
Screenshot
The error message you received is particularly challenging in that it
is so generic. The root cause could be within the platform, or it
could be in whatever execution environment you used for the job.
Unfortunately, we don't have the resources right now to capture and
document all of the error messages that can be emitted during the job
execution process, which can span a wide variety of servers and other
software platforms.
I encountered the same problem. First I tried following steps:
Refresh the browser (i.e., click the Reload button top left)
"Hard refresh" the browser (i.e., ctrl + Reload)
Clear cache + cookies (i.e., https://support.google.com/accounts/answer/9098093?co=GENIE.Platform=Desktop&hl=en&visit_id=636802035537591679-2642248633&rd=1)
References:
https://community.trifacta.com/s/question/0D51L00005dG3MXSA0/i-was-working-on-a-recipe-and-i-received-the-error-message-transformation-engine-unavailable-due-x-to-prior-crash-exit-code-1-why-am-i-getting-this-error
https://community.trifacta.com/s/question/0D51L00005choIbSAI/unable-to-develop-on-our-trifacta-42-platform-for-the-past-12-hours-steps-added-to-recipes-are-lost-and-having-to-recode-the-error-given-is-transformation-engine-unavailable-what-is-causing-this-error
However this did not solve the problem. Then I tried:
Confirm that your Chrome version is 68+. If not, please upgrade.
Navigate to chrome://nacl/ and ensure that PNaCl is enabled.
Navigate to chrome://components/ and ensure that the PNaCl Version is not 0.0.0.0. Click on Check for Updates
Did not solve the problem either.
References:
https://community.trifacta.com/s/question/0D51L00005dDrcmSAC/not-able-to-preview-data-sources-or-edit-recipes
I got the info from Trifacta, that there has been an internal issue after maintenance. So if non of the above solutions work, you just have to wait and see when they fix the problem.
This is regarding re-sending of notifications on error of same kind.
In my current project, my errors are being grouped.
Like for eg: If it is an sql error for first time, I receive a notification but when it occurs after 2 or 3 hours it is grouped under same log and 'no notification is sent'.
On what basis does error reporting group the erorrs ?
I tried to randomise the error message in order to distinguish messages but still they are being grouped under the same category. (For eg: messages be like - service unavailable - 12, service unavailable - 23 etc.. )
I want to receive notification for each and every error irrespective of its type or repitition.
Suggest a solution ?
What you're describing is alerting based on a log based metric: https://cloud.google.com/logging/docs/logs-based-metrics/charts-and-alerts#creating_a_simple_alerting_policy_on_a_counter_metric
I have read blogs and one question close to mine, but have not found a solution to my problem. I have a transformation job setup to extract three tables from 84 DBs to generate one report. My problem is when a DB connection is not available, the whole job stops.
I would like to be able to check DB connections before initializing the job, log errors for inaccessible DBs and create a new dynamic list of successful tests from which I will then run my job. I have used the check DB connections step but it still stalls when a connection is false.How can I process my list of DBs, running through to the end, without aborting the job?
First of all you have absolutely used the correct step to check the DB Connections. Now for your question, i would try to explain in parts (hope i am correct):
Case I: "My problem is when a DB connection is not available, the whole job stops"
This scenario is obvious. Whenever a step finds any error, it would throw an exception and would stop the entire execution of the Job.
But does it mean that the step "Check Db connections" would stop checking the db connections if it gets an error connecting. Answer is NO. The Step would complete testing all the connections even if it gets an error in some connection in middle. Try observing the logs carefully, it would give you a final consolidated list of all the checked db connections (check the image below):
I tried testing with 4 db connections out of which i got One error and 3 Success.
Now for the "Whole Job Stops" portion: Since the stopping behavior is obvious (as i have mentioned above), what you can do is to pass the flow using "Error hop" so that if a job finds an error, it will take the error hop. Check the image below:
Here i have used two hops: One Success and One Error. If the Job fails, it would take the error path (red colored hop) else it would take the Success path (green colored hop).
CASE II: "log errors for inaccessible DBs and create a new dynamic list of successful tests"
You can either log the errors into a separate log files or table (depends on your requirement) and then read through the log to generate a list of DB connections. Check the image below:
The output generates a list of Connections along with an Error flag.
Y : Failure in connecting to Database
N : successful connection
Note: i have used text file input since i have logged the previous step into a text file instead of database. You can customize as per your req.
I have placed sample code in gist. You can check for your ref.
Hope it helps :)