create custom AWS cloudwatch metric with ID from Postgres table - amazon-web-services

I have an interesting problem I need to resolve. I have a table A in Postgres. This table is treated like a queue which has a set of tasks. ID is incremental id in Postgres.
I want to have a metric to contain current processed position (ID) and the max number of ID. Those two numbers are accumulating every second.
Is there an efficient way to do it ?
The easiest way on top of my head is to execute this SQL query every 10 seconds (varies):
select blablah from table then limit 1 order by asc
to get smallest id and use the same approach to get largest id.
But this command is expensive. Is there any better way to do this ?

When you insert a new record into the table, return the record ID. When you extract a record do the same. You could cache this in memory, a file, a different DB table, etc. Then run a scheduled task to post these values to CloudWatch as a custom metric.
Example (very simple) SQL statement to return the ID when inserting new records:
INSERT INTO table (name) OUTPUT Inserted.ID VALUES('bob');

Related

how to get AWS quicksight to show the old and new value of a particular column of a table (for comparison purposes)?

what I have seen so far is that the aws glue crawler creates the table based on the latest changes in the s3 files.
let's say crawler creates a table and then I upload a CSV with updated values in one column. the crawler is run again and it updates the table's column with the updated values. I want to be able to show a comparison of the old and new data in quick sight eventually, is this scenario possible?
for example,
right now my csv file is set up as details of one aws service, like RDS is the csv file name and the columns are account id, account name, what region is it in, etc etc
there was one column of percentage with a value 50%, it gets updated with 70%. would I be able to somehow get the old value as well to show in quicksight, to say like previously it was 50% and now its 70%
Maybe this scenerio is not even valid? because I want to be able to show like what account has what cost in xyz month and show how the cost is different in other months. If I make separate tables on each update of csv then there would be 1000+ tables at one point.
If I have understood your question correctly, you are aiming to track data over time. Above you suggest creating a table for each time series, why not instead maintain a record in a table for each time series, you can then create various Analysis over the data, comparing specific months or tracking month-by-month values.

Automatic job to delete bigquery table records

Is there a way to schedule deletion of rows from bigquery table based on a column condition? Something like a job to schedule to run every day.
For example, let's say I've a column called creation_date in the table. I need to delete records when creation_date is less than current date minus one week (creation_date < current date - 7). I need the job to run everyday on a specified time and delete records based on the creation date condition.
If there aren't any built in scheduler operations, could you suggest any options available?
You have a couple simple options within BigQuery itself you can utilize.
The simplest is likely scheduled queries. This will simply just execute a command on a schedule. You can execute a DELETE statement or some other method.
Additionally you could set table or partition expirations. This one involves a little more legwork but would achieve a similar result. Based on your description it would likely be a partition expiration you would want to set up.

DynamoDB sorted pull

So I'm working with DynamoDB. My primary key is some unique identifier and then one of the other columns is a timestamp columns. I'd like to pull just the first N events and then continue to pull later events if the client requests this.
I don't know if this is possible to do in Dynamo. Is it possible? How would I go about it? I can redesign the table if needed.
Right now, it seems like the only way to do this would be to run a query all the data in the table and then sort the results returned.

How to select a partition key for for a DynamoDB query?

I have created a dynamo db table with name- "sample".It has below columns. CreatedDate will have creation time of any records inserted to this table.
Itemid,
ItemName,
ItemDescription,
CreatedDate,
UpdatedDate
I am creating a python-flask based rest api which always fetches last 100 records inserted to this table. This API (python-flask function) does not have any input parameters. It should just return the last records inserted to this table.
Question 1
What should be the partition key for this table? I am using the boto3 library to fetch records from DynamoDB. I prefer not to do scan operation because it may cause performance issues. If I use the query function it asks for a partition key. Since this rest API does not accept any input I am not sure how to use it.
Question 2
Has anyone faced similar situation? And what was done to fix this?
Note: I am pretty much newbie to DynamoDB, NoSQL and Boto
To query your table using CreatedDate without knowing the ItemId, you can use Global Secondary Index write sharding by adding an attribute (e.g., ShardId) containing a (0-N) value to every item that you will use for the global secondary index partition key.
Depending on how your items are distributed against CreatedDate, you can set the ShardId so that it is likely to have evenly distributed access patterns. For example: YYYY, YYYYMM or YYYYMMDD. Then, you create a global secondary index with ShardId as an index partition key and CreatedDate as an index sort key.
Knowing the primary key for your GSI (since the ShardId value is derived from CreatedDate), you can query the table for the 100 most recent items with query's Limit parameter (or LastEvaluatedKey if your items set size is larger than 1 MB of data).
See Using Global Secondary Index Write Sharding for Selective Table Queries.

How to query a Dynamo DB table without knowing the table name before runtime?

I want to query a Dynamo DB table based on an attribute UpdateTime such that I get the records which are updated in the last 24 hours. But this attribute is not an index in the table. I understand that I need to make this column as an index. But I do not know how do I write a query expression for this.
I saw this question but the problem is I do not know the table name on which I want to query before runtime.
To find out the table names in your DynamoDB instance, you can use the "ListTables" API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ListTables.html.
Another way to view tables and their data is via the DynamoDB Console: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html.
Once you know the table name, you can either create an index with the UpdateTime attribute as a key or scan the whole table to get the results you want. Keep in mind that scanning a table is a costly operation.
Alternatively you can create a DynamoDB Stream that captures all of the changes to your tables: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html.