Can we generate logDNA alerts only based on the "presence" or "absence" of specified number of lines? - logdna

I have the following logs:
Taking a pg_dump
pg_dump successful
logging into cloud
logged in successfully
Uploading the backup
Upload successful
I want to generate a logDNA alert if line 1 AND line 6 do not appear every one hour.
(not worried about the rest of the lines)
I am aware that I can generate alerts if a specified number of alerts do not show up within some time interval. But wanted to understand can I generate it when a string/regex is missing!

Got it!!
We can add in "query strings in the view" and then alert based on the number of lines...
In my case, I filter line 1 AND line 6 and say alert if fewer than 2 lines appear in an hour.

Related

How can I trigger an alert based on log output?

I am using GCP and want to create an alert after not seeing a certain pattern in the output logs of a process.
As an example, my CLI process will output "YYYY-MM-DD HH:MM:SS Successfully checked X" every second.
I want to know when this fails (indicated by no log output). I am collecting logs using the normal GCP log collector.
Can this be done?
I am creating the alerts via the UI at:
https://console.cloud.google.com/monitoring/alerting/policies/create
You can create an alert based on log metric. For that, create a log based metric in Cloud Logging with the log filter that you want.
Then create an alert, aggregate per minute the metrics and set an alert when the value is below 60.
You won't have an alert for each missing message but based on a minute, you will have an alert when the expected value isn't reached.

No of items in PCollection is not affecting allocated no of workers

I have a pipeline which comprises of Three steps. In First step which is a ParDo which accepts 5 urls in a PCollection. And each of the 5 items generate thousands of urls each and output it. So input of second step is another PCollection which can be of size 100-400k. In the last step the scraped output of each url is saved to a storage service.
I have noticed that First step which generates the url list out of 5 input urls got allocated 5 workers and generates new set of urls. But once the first step is completed no of workers get reduced and reach 1. And while second step is running it's only running in 1 worker (with 1 worker my dataflow is runing for last 2 days So by looking at the logs I am making a logical assumption that the first step is completed).
So my question is eventhough the size of the PCollection is big why it's not split between workers or why more workers are not getting allocated ? Step 2 is a simple web scraper which scrape the given url and output a string. Which is then saved to a storage service
Dataflow tries to connect steps together to create fused steps. So even though you have few ParDos in your pipeline, they'll be fused together and will be executed as a single step.
Also once fused, scaling of Dataflow is limited by the step at the beginning of the fused step.
I suspect you have a Create transform that consist of few elements at the top of your pipeline. In this case Dataflow can only scale up to number of elements in this Create transform.
One way to prevent this behavior is the break fusion after one (or more) of your high fanout ParDo transforms. This can be done by adding a Reshuffle.viaRandomKey() transform after it (which contains a GroupByKey). Given that Reshuffle is an identity transform, your pipeline should not require additional changes.
See here for more information regarding fusion and ways to prevent it.

Max file count using big query data transfer job

I have about 54 000 files in my GCP bucket. When I try to schedule a big query data transfer job to move files from GCP bucket to big query, I am getting the following error:
Error code 9 : Transfer Run limits exceeded. Max size: 15.00 TB. Max file count: 10000. Found: size = 267065994 B (0.00 TB) ; file count = 54824.
I thought the max file count was 10 million.
I think that BigQuery transfer service lists all the files matching the wildcard and then use the list to load them. So it will be same that providing the full list to bq load ... therefore reaching the 10,000 URIs limit.
This is probably necessary because BigQuery transfer service will skip already loaded files, so it needs to look them one by one to decide which to actually load.
I think that your only option is to schedule a job yourself and load them directly into BigQuery. For example using Cloud Composer or writing a little cloud run service that can be invoked by Cloud Scheduler.
The Error message Transfer Run limits exceeded as mentioned before is related to a known limit for Load jobs in BigQuery. Unfortunately this is a hard limit and cannot be changed. There is an ongoing Feature Request to increase this limit but for now there is no ETA for it to be implemented.
The main recommendation for this issue is to split a single operation in multiple processes that will send data in requests that don't exceed this limit. With this we could cover the main question: "Why I see this Error message and how to avoid it?".
Is is normal to ask now "how to automate or perform these actions easier?" I can think of involve more products:
Dataflow, which will help you to process the data that will be added to BigQuery. Here is where you can send multiple requests.
Pub/Sub, will help to listen to events and automate the times where the processing will start.
Please, take a look at this suggested implementation where the aforementioned scenario is wider described.
Hope this is helpful! :)

Where to find the number of active concurrent invocations in Google Cloud Functions

I am looking for a way to see how many concurrent invocations there are active at any point in time, e.g. in a minute range. I am looking for this as I received the error:
Forbidden: 403 Exceeded rate limits: too many concurrent queries for
this project_and_region. For more information, see
https://cloud.google.com/bigquery/
The quotas are listed here: https://cloud.google.com/functions/quotas
I am fine with having quotas, but I would like to see this number in a chart. Where can I find this?
Currently there is no way of seeing that information directly. There is a workaround though. You can do as follows:
Go to Google Cloud Console > Stackdriver Logging
At the text box that says "Filter by label or text search", click on the small arrow at the end of the text box.
Choose "Convert to advanced filter"
Type that query inside:
resource.type="cloud_function"
resource.labels.function_name="[GOOGLE_CLOUD_FUNCTION_NAME]"
"Function execution started"
At "Last hour" drop down menu, choose "Custom"
Fix the start and end time
This will list all the times that the Cloud Function was executed in the time range. If it was executed multiple times, instead of counting one by one you can use the following Python script:
Open Google Cloud Shell
Install Google Cloud Logging Library $ pip install google-cloud-logging
Create a main.py file using my GitHub code example. (I have tested it and it is working as expected)
Change the date_a_str and set it as start date.
Change the date_b_str and set it as end date.
In function_name = "[CLOUD_FUNCTION_NAME]" change [CLOUD_FUNCTION_NAME] to the name of your Cloud Function.
Execute the Python code $ python main.py
You should see a response as follows:
Found entries: [XX]
Waiting up to 5 seconds.
Sent all pending logs.

AWS CloudWatch logs open from middle

All of sudden the AWS CloudWatch logs started to open from the middle, or from the beginning of the log stream. They used to open from the end of the log stream showing the latest lines. I wonder if this is something that I can configure or has AWS just changed something.
It is really frustrating when you want to follow how the progresses of your lambda app but cannot do it because when you open the log in AWS it shows the first lines in that log stream, and in order to see the latest lines you need to set a custom time frame. And it doesn't allow you to set a future timestamp into the end time, which forces you to always update the end time to see the new lines. I hope there is a solution for getting it to open the trail of the log stream.
Try clicking on ALL in timeframe option? For me recently they started setting start time, and logs are visible from that time onwards, like you described, but when I click on ALL, it shows logs regularly, like it used to.
Second thing you can do is to have rolling start of logs (like, last 15 minutes, 1 hour).
To do that, add:
;start=PT1H at the end of your URL if you want last hour
;start=PT15M at the end of your URL if you want last 15 minutes
You can change numbers depending on timeframe you want