Do a join request with Cloudwatch - amazon-web-services

I have two sets of metrics in cloudwatch:
M1 with dimension Application, Instance
M2 with dimension Status, Instance
So I can run this query individually correctly:
SELECT SUM(M1) FROM SCHEMA("mynamespace", status,instance) GROUP BY status
SELECT SUM(M2) FROM SCHEMA("mynamespace", application,instance) GROUP BY application
So one set contains the status, the other set contains the application name. Both set intersect on instance id.
But what I whish here is to run something like this:
SELECT SUM(M1) FROM SCHEMA("mynamespace", status,application,instance) GROUP BY status,application
I can achieve that with tool like Prometheus, but I do not see how doing this with Cloudwatch.
Is there a way?

Related

BigQuery: Trying to recognize scheduled queries in INFORMATION_SCHEMA. Possible idea: Add key:value label

I try to recognize scheduled queries on gcp to see which scheduled query is costing me the most.
But if I run the following query, I can only see the job_id which is not really informative, and the labels also don't contain the resource name of the scheduled query.
SELECT creation_time, job_id, parent_job_id, user_email, labels, bytes_billed
FROM `my_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
WHERE starts_with(job_id, 'scheduled_query') and parent_job_id is null
ORDER BY creation_time desc
One solution could be to add labels to the scheduled query.
I tried adding a label to a session, by adding this to my scheduled query, but this only added the query_key and label to a child job:
SET ##query_label = "scheduled_query:some_name_that_i_like";
https://cloud.google.com/bigquery/docs/adding-labels#adding-label-to-session
I also tried the bq cli client, but this also didn't work:
bq update --set_label scheduled_query:some_name_that_i_like --transfer_config projects/some_project_id_number/locations/europe/transferConfigs/some_transfer_config_number
How can I make it easier to recognize which scheduled query was run?
This question is (somewhat) related:
BigQuery - Get the display name of scheduled query from INFORMATION_SCHEMA

How to deduplicate GCP logs from Logs Explorer?

I am using GCP Logs explorer to store logging messages from my pipeline.
I need to debug an issue by looking at logs from a specific event. The message of this error is identical except for an event ID at the end.
So for example, the error message is
event ID does not exist: foo
I know that I can use the following syntax to construct a query that will return the logs with this particular message structure
resource.type="some_resource"
resource.labels.project_id="some_project"
resource.labels.job_id="some_id"
severity=WARNING
jsonPayload.message:"Event ID does not exist:"
The last line in that query will then return every log where the message has that string.
I end up with a result like this
Event ID does not exist: 1A
Event ID does not exist: 2A
Event ID does not exist: 2A
Event ID does not exist: 3A
so I wish to deduplicate that to end up with only
Event ID does not exist: 1A
Event ID does not exist: 2A
Event ID does not exist: 3A
But I don't see support for this type of deduplication in the language docs
Due to the amount of rows, I also cannot download a delimited log file.
Is it possible to deduplicate the amount of rows?
To deduplicate records with BigQuery, follow these steps:
Identify whether your dataset contains duplicates.
Create a SELECT query that aggregates the desired column using a
GROUP BY clause.
Materialize the result to a new table using CREATE OR REPLACE TABLE [tablename] AS [SELECT STATEMENT].
You can review the full tutorial in this link.
To analyze a big quantity of logs, you could route them to BigQuery and analyze the logs using Fluentd.
Fluentd has an output plugin that can use BigQuery as a destination for storing the collected logs. Using the plugin, you can directly load logs into BigQuery in near real time from many servers.
In this link, you can find a complete tutorial on how to Analyze logs using Fluentd and BigQuery.
To route your logs to BigQuery, first it is necessary to create a sink and route it to BigQuery.
Sinks control how Cloud Logging routes logs. Using sinks, you can
route some or all of your logs to supported destinations.
Sinks belong to a given Google Cloud resource: Cloud projects, billing
accounts, folders, and organizations. When the resource receives a log
entry, it routes the log entry according to the sinks contained by
that resource. The log entry is sent to the destination associated
with each matching sink.
You can route log entries from Cloud Logging to BigQuery using sinks.
When you create a sink, you define a BigQuery dataset as the
destination. Logging sends log entries that match the sink's rules to
partitioned tables that are created for you in that BigQuery dataset.
1) In the Cloud console, go to the Logs Router page:
2) Select an existing Cloud project.
3) Select Create sink.
4) In the Sink details panel, enter the following details:
Sink name: Provide an identifier for the sink; note that after you create the sink, you can't rename the sink but you can delete it and
create a new sink.
Sink description (optional): Describe the purpose or use case for the sink.
5) In the Sink destination panel, select the sink service and destination:
Select sink service: Select the service where you want your logs routed. Based on the service that you select, you can select from the
following destinations:
BigQuery table: Select or create the particular dataset to receive the routed logs. You also have the option to use partitioned tables.
For example, if your sink destination is a BigQuery dataset, the sink
destination would be the following:
bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID
Note that if you are routing logs between Cloud projects, you still
need the appropriate destination permissions.
6) In the Choose logs to include in sink panel, do the following:
In the Build inclusion filter field, enter a filter expression that
matches the log entries you want to include. If you don't set a
filter, all logs from your selected resource are routed to the
destination.
To verify you entered the correct filter, select Preview logs. This
opens the Logs Explorer in a new tab with the filter prepopulated.
7) (Optional) In the Choose logs to filter out of sink panel, do the following:
In the Exclusion filter name field, enter a name.
In the Build an exclusion filter field, enter a filter expression that
matches the log entries you want to exclude. You can also use the
sample function to select a portion of the log entries to exclude. You
can create up to 50 exclusion filters per sink. Note that the length
of a filter can't exceed 20,000 characters.
8) Select Create sink.
More information about Configuring and managing sinks here.
To review details, the formatting, and rules that apply when routing log entries from Cloud Logging to BigQuery, please follow this link.

Is there a way to change DynamoDB table status from updating to active?

I ran two table update command consecutively on each table and now all of our tables are in continual update status. The first command added kms encryption using CMK. The second command set table replication. Did this through a script for 60 tables. Now all tables have been in continuous update status for about an hour. Not one table has done updating. These are non-production tables with little data in them. Looks like the kms key was applied successfully, but I dont see any global table settings. Looks like the replication command did not apply. What could possibly be wrong? If possible how can I get a dynamodb table back to active status?

Can I load a list white with table event WSO2

I have a postgres blacklist table, I want to load this table and do a join using the event table of WSO2 DAS.
but it does not allow me to use the blacklist eat from in the query.
This is my code of sample:
#From(eventtable='rdbms', jdbc.url='jdbc:postgresql://localhost:5432/pruebabg', username='postgres', password='Easysoft16', driver.name='org.postgresql.Driver', table.name='Trazablack')
define table Trazablack (sensorValue double);
#From(eventtable='rdbms', jdbc.url='jdbc:postgresql://localhost:5432/pruebabg', username='postgres', password='Easysoft16', driver.name='org.postgresql.Driver', table.name='Trazawhite')
define table TrazaExtend (Trazawhite double);
from Trazablack
select *
insert into TrazaFiltrada;
This is the error:
"Stream/table definition with ID 'Trazablack' has not been defined in execution plan "ExecutionPlan""
it's possible?
You can't read a table like that in Siddhi, it should be done with a join query (triggered using an incoming event). Without an incoming event stream, there's no way to trigger the query.
If you don't want to feed any external events to trigger this query, you can use a Trigger in Siddhi (refer this doc for more information).
Example query that is triggered every 5 minutes:
define trigger FiveMinTriggerStream at every 5 min;
from FiveMinTriggerStream join Trazablack as t
select t.sensorValue as sensorValue
insert into TrazaFiltrada;

Unit Testing Azure EventHub, Stream Analytics Job and Storage Table

I am working on this project which uses EventHub -> Stream Analytics Job -> Storage Table / Blob structure and I want to write a couple of unit tests for it.
I can test the EventHub Sender status to see if queries have the expected behavior, but how can I check if the data is being sent to the Table Storage, since the whole process doesn't happen instantly and there is a pretty long delay from the moment I hit the EventHub and the moment the data is being saved in Storage.
First create a new Azure Table storage account and then create a new Azure table within that account. In your Stream Analytics job add a new output for Table storage. When you setup the Output details, you will need to specify the storage account, account key, table name, and which columns names in the event will represent the Azure Table partition and row keys. As an example, I set mine up like this:
After the output is setup, you can create a simple Stream Analytics query that maps input events from Event Hub to the Azure Table output. I also have an Event Hub input named 'eventhub' with Send/Listen permissions. My query looks like this:
SELECT
*
INTO
tableoutput
FROM
eventhub
At this point hit the 'Start' button in Azure portal to run the Stream Analytics job. To generate the events, you can follow the instructions here but change the event message to this:
string guid = Guid.NewGuid().ToString();
var message = "pk,rk,value\n" + guid + ",1,hello";
Console.WriteLine("{0} > Sending message: {1}", DateTime.Now, message);
eventHubClient.Send(new EventData(Encoding.UTF8.GetBytes(message)));
To eyeball the Azure Table results, download a tool like TableXplorer and enter the storage account details. Double click your Azure Table and you should see something like the following. Keep in mind you may need to periodically hit F5 on your TableXplorer query for 10-60 seconds until the data gets pushed through. When it shows up it will look like the following:
For programmatic unit testing you will need to push the partition key / row key values generated in your Event Hub code into a data structure and have a worker poll the Azure Table using point queries. A good overview on Azure Table usage is here.