How can I setup a delete Trigger on AWS RDS - amazon-web-services

I have a setup on AWS RDS with MariaDB 10.3. I have several DBs on the RDS instance. I'm trying to replicate a table (routes) from one DB (att) to another DB (pro) using triggers. I have triggers for create, update and delete. The create and update triggers works fine while the delete trigger gives the error message below. I've tested all triggers locally and they work.
My trigger looks like this.
CREATE DEFINER=`root`#`%` TRIGGER routes_delete AFTER DELETE ON
`routes` FOR EACH ROW
BEGIN
DELETE FROM `pro`.`routes`
WHERE `route_id` = OLD.route_id;
END
Error message
Query execution failed
Reason:
SQL Error [1442] [HY000]: (conn:349208) Can't update table 'routes' in
stored function/trigger because it is already used by statement which
invoked this stored function/trigger
Query is : DELETE FROM `att`.routes WHERE route_code = 78 AND company_id = 3
I don't understand what other statement is using the routes table since there is nothing else linked to it. What adjustment is needed to get this work on AWS RDS?

what other statement is using the routes table
The "other" query is the query that invoked the trigger.
...is already used by [the] statement which invoked this stored function/trigger
A trigger is not allowed to modify its own table. BEFORE INSERT and BEFORE UPDATE triggers can modify the current row before it is written to the table using the NEW alias, but that is the extent to which a trigger can modify the table where it is defined.
Triggers are subject to all the same limitations as stored functions, and a stored function...
Cannot make changes to a table that is already in use (reading or writing) by the statement invoking the stored function.
https://mariadb.com/kb/en/library/stored-function-limitations/

Related

How to apply Path Patterns in GCP Eventarc for BigQuery service's jobCompleted method?

I am developing a solution where a cloud function calls BigQuery procedure and upon successful completion of this stored proc trigger another cloud function. For this I am using Audit Logs "jobservice.jobcompleted" method. Problem with this approach is it will trigger cloud function on every job that are completed in BigQuery irrespective of dataset and procedure.
Is there any way to add Path Pattern to the filter so that it triggers only for specific query completion and not for all?
My query starts something like: CALL storedProc() ...
Also, as I tried to create a 2nd Gen function from console, I tried Eventarc trigger. But to my surprise BigQuery Event provider doesn't have Event for jobCompleted
Now I'm wondering if it's possible to trigger based on job complete event.
Update:I changed my logic now to use google.cloud.bigquery.v2.TableService.InsertTable method to make sure after inserting a record to a table it will add AuditLog message so that I can trigger the next service. This insert statement is present as the last statement in BigQuery procedure.
After running the procedure, the insert statement is inserting the data but resource name is coming as projects/<project_name>/jobs
I was expecting something like projects/<project_name>/tables/<table_name> so that I can apply path pattern on resource name.
Do I need to use different protoPayload.method?
Try to create a Log Sink for job completed with unique principal-email sv account and use pubsub with the sink.
Get pubsub published event to run destination service.

AWS-CDK - DynamoDB Initial Data

Using the AWS CDK for a Serverless project but I've hit a sticking point. My project deploys a DynamoDB table which I need to populate with data prior to my Lambda function executing.
The data that needs to be loaded is generated by making API calls and isn't static data that can be loaded by a .json file or something simple.
Any ideas on how to approach this requirement for a production workload?
You can use AwsCustomResource in order to make a PutItem call to the table.
AwsSdkCall initializeData = AwsSdkCall.builder()
.service("DynamoDB")
.action("putItem")
.physicalResourceId(PhysicalResourceId.of(tableName + "_initialization"))
.parameters(Map.ofEntries(
Map.entry("TableName", tableName),
Map.entry("Item", Map.ofEntries(
Map.entry("id", Map.of("S", "0")),
Map.entry("data", Map.of("S", data))
)),
Map.entry("ConditionExpression", "attribute_not_exists(id)")
))
.build();
AwsCustomResource tableInitializationResource = AwsCustomResource.Builder.create(this, "TableInitializationResource")
.policy(AwsCustomResourcePolicy.fromStatements(List.of(
PolicyStatement.Builder.create()
.effect(Effect.ALLOW)
.actions(List.of("dynamodb:PutItem"))
.resources(List.of(table.getTableArn()))
.build()
)))
.onCreate(initializeData)
.onUpdate(initializeData)
.build();
tableInitializationResource.getNode().addDependency(table);
The PutItem operation will be triggered if the stack is created or if table is updated (tableName is supposed to be different in that case). If it doesn't work for some reason, you can set physicalResourceId to random value, i.e. UUID, to trigger the operation for each stack update (the operation is idempotent due to ConditionExpression)
CustomResource allows you to write custom provisioning logic. In this case you could use something like a AWS Lambda Function in a Custom Resource to read in the custom json and update DynamoDb.

GCloud Dataflow recreate BigQuery table if it gets deleted during job run

I have set up a GCloud Dataflow pipeline which consumes messages from a Pub/Sub subscription, converts them to table rows and writes those rows to a corresponding BigQuery table.
Table destination is decided based on the contents of the Pub/Sub message and will occasionally lead to the situation that a table does not exist yet and has to be created first. For this I use create disposition CREATE_IF_NEEDED, which works great.
However, I have noticed that if I manually delete a newly created table in BigQuery while the Dataflow job is still running, Dataflow will get stuck and will not recreate the table. Instead I get an error:
Operation ongoing in step write-rows-to-bigquery/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 05m00s without outputting or completing in state finish at sun.misc.Unsafe.park(Native Method) at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at
java.util.concurrent.FutureTask.get(FutureTask.java:191) at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:816) at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:881) at
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:143) at
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:115) at
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)
If I go back to BigQuery and manually recreate this table, Dataflow job will continue working.
However, I am wondering if there is a way to instruct the Dataflow pipeline to recreate the table if it gets deleted during the job run?
This is not possible in the current BigqueryIO connector. From the github link of the connector present here you will observe that for StreamingWriteFn which is what your code, the table creation process is done in getOrCreateTable and this is called in finishBundle. There is a map of createdTables that is maintained and in finishBundle the table gets created if it not is already present, once it is present and stored in the hashmap it is not re-created as shown below:-
public TableReference getOrCreateTable(BigQueryOptions options, String tableSpec)
throws IOException {
TableReference tableReference = parseTableSpec(tableSpec);
if (!createdTables.contains(tableSpec)) {
synchronized (createdTables) {
// Another thread may have succeeded in creating the table in the meanwhile, so
// check again. This check isn't needed for correctness, but we add it to prevent
// every thread from attempting a create and overwhelming our BigQuery quota.
if (!createdTables.contains(tableSpec)) {
TableSchema tableSchema = JSON_FACTORY.fromString(jsonTableSchema, TableSchema.class);
Bigquery client = Transport.newBigQueryClient(options).build();
BigQueryTableInserter inserter = new BigQueryTableInserter(client);
inserter.getOrCreateTable(tableReference, WriteDisposition.WRITE_APPEND,
CreateDisposition.CREATE_IF_NEEDED, tableSchema);
createdTables.add(tableSpec);
}
}
}
return tableReference;
}
For you to meet your requirement you might have to maintain your own BigqueryIO wherein you don't perform this specific check
if (!createdTables.contains(tableSpec)) {
The more important question though is why would the table get deleted in a production system by itself? This problem should be fixed rather than trying to re-create the table from Dataflow.

Daily AWS Lambda not creating Athena partition, however commands runs successfully

I have an Athena database set up pointing at an S3 bucket containing ALB logs, and it all works correctly. I partition the table by a column called datetime and the idea is that it has the format YYYY/MM/DD.
I can manually create partitions through the Athena console, using the following command:
ALTER TABLE alb_logs ADD IF NOT EXISTS PARTITION (datetime='2019-08-01') LOCATION 's3://mybucket/AWSLogs/myaccountid/elasticloadbalancing/eu-west-1/2019/08/01/'
I have created a lambda to run daily to create a new partition, however this doesn't seem to work. I use the boto3 python client and execute the following:
result = athena.start_query_execution(
QueryString = "ALTER TABLE alb_logs ADD IF NOT EXISTS PARTITION (datetime='2019-08-01') LOCATION 's3://mybucket/AWSLogs/myaccountid/elasticloadbalancing/eu-west-1/2019/08/01/'",
QueryExecutionContext = {
'Database': 'web'
},
ResultConfiguration = {
"OutputLocation" : "s3://aws-athena-query-results-093305704519-eu-west-1/Unsaved/"
}
)
This appears to run successfully without any errors and the query execution even returns a QueryExecutionId as it should. However if I run SHOW PARTITIONS web.alb_logs; via the Athena console it hasn't created the partition.
I have a feeling it could be down to permissions, however I have given the lambda execution role full permissions to all resources on S3 and full permissions to all resources on Athena and it still doesn't seem to work.
Since Athena query execution is asynchronous your Lambda function never sees the result of the query execution, it just gets the result of starting the query.
I would be very surprised if this wasn't a permissions issue, but because of the above the error will not appear in the Lambda logs. What you can do is to log the query execution ID and look it up with the GetQueryExecution API call to see that the query succeeded.
Even better would be to rewrite your code to use the Glue APIs directly to add the partitions. Adding a partition is a quick and synchronous operation in Glue, which means you can make the API call and get a status in the same Lambda execution. Have a look at the APIs for working with partitions: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-partitions.html

Lambda Function Inconsistently Missing DynamoDB Triggers

Perhaps I have a misunderstanding of something, but I have a Lambda function that is triggered by new items being added to a DynamoDB table. My trigger is configured like so;
DynamoDB Table Name: My Table
Batch Size: 100
Starting Position: Latest
My function's code parses out any events that are not INSERT, and for the most part, this is functioning well. I am noticing, however, that some of my new records will occasionally not trigger the Lambda function (I update the record with a completed tag when the function has run). I can not find any rhyme or reason as to why, but wondering if I'm missing what the Batch Size is (I want every new record to trigger the function to run, as my users will be publishing individual records to the table).
Is this common behavior or is there more I could share to learn what could be causing this?