Export rds query to s3 give error after some time - amazon-web-services

I am trying to export data from a table in my postgresql database to S3. When I execute the query everything goes well, the data is exported correctly to S3, until suddenly after about 16 hours, the query gives an error:
ERROR: could not upload to Amazon S3
DETAIL: Amazon S3 client returned 'Unable to parse ExceptionName: ExpiredToken Message: The provided token has expired.'.
CONTEXT: SQL function "query_export_to_s3" statement 1
What could be the problem? I thought that the token was renewed 5 minutes before its expiration.
UPDATE: The role we use to execute the query has a session duration of 12h
More updates: The query I am running is to migrate several GB of data to S3, probably around 500 GB. I made a separate query to verify the number of records and the total is 500 million, this query took 4 hours to complete. Now what I did was run a query to export those 500 million records to S3 and after about 16 hours I get the message you see above.
In S3 the result was saved in parts of 6 GB.
We repeat the query that exports to S3 about 3 times and always the same result, after about 16 hours I get the expired token error.
I'm running the query from ec2 instance.

Please check AWS authentication documentation:
The minimum session duration is 1 hour, and can be set to a maximum of 12 hours.

Related

GCP Datastore times out on large download

I'm using Objectify to access my GCP Datastore set of Entites. I have a full list of around 22000 items that I need to load into the frontend:
List<Record> recs = ofy().load().type(Record.class).order("-sync").list();
The number of records has recently increased and I get an error from the backend:
com.google.apphosting.runtime.HardDeadlineExceededError: This request (00000185caff7b0c) started at 2023/01/19 17:06:58.956 UTC and was still executing at 2023/01/19 17:08:02.545 UTC.
I thought that the move to Cloud Firestore in Datastore mode last year would have fixed this problem.
My only solution is to break down the load() into batches using 2 or 3 calls to my Ofy Service.
Is there a better way to grab all these Entities in one go?
Thanks
Tim

How to increase your Quicksight SPICE data refresh frequency

Quicksight only supports 24 refreshes / 24 Hrs for FULL REFRESH.
I want to refresh the data every 30 Mins.
Answer:
Scenario:
Let us say I want to fetch the data from the source (Jira) and push it to SPICE and render it in Quicksight Dashboards.
Requirement:
Push the data every 30 Mins once.
Quicksight supports the following:
Full refresh
Incremental refresh
Full refresh:
Process - Old data is replaced with new data.
Frequency - Every 1 Hr once
Refresh count - 24 / Day
Incremental refresh:
Process - New data get appended to the dataset.
Frequency - Every 15 Min once
Refresh count - 96 / Day
Issue:
We need to push the data every 30 Min once.
It is going to be a FULL_REFRESH
When it comes to Full Refresh Quicksight only supports Hourly refresh.
Solution:
We can leverage API support from AWS.
Package - Python Boto 3
Class - Quicksight.client
Method - create_ingestion
Process - You can manually refresh datasets by starting new SPICE ingestion.
Refresh cycle: Each 24-hour period is measured starting 24 hours before the current date and time.
Limitations:
Enterprise edition accounts 32 times in a 24-hour period.
Standard edition accounts 8 times in a 24-hour period.
Sample code:
Python - Boto for AWS:
import boto3
client = boto3.client('quicksight')
response = client.create_ingestion(
DataSetId='string',
IngestionId='string',
AwsAccountId='string',
IngestionType='INCREMENTAL_REFRESH'|'FULL_REFRESH'
)
awswrangler:
import awswrangler as wr
wr.quicksight.cancel_ingestion(ingestion_id="jira_data_sample_refresh", dataset_name="jira_db")
CLI:
aws quicksight create-ingestion --data-set-id dataSetId --ingestion-id jira_data_sample_ingestion --aws-account-id AwsAccountId --region us-east-1
API:
PUT /accounts/AwsAccountId/data-sets/DataSetId/ingestions/IngestionId HTTP/1.1
Content-type: application/json
{
"IngestionType": "string"
}
Conclusion:
Using this approach we can achieve 56 Full Refreshes for our dataset also we can go one step further and get the peak hours of our source tool (Jira) and configure the data refresh accordingly. This way we can even achieve a refresh frequency of 10 Min once.
Ref:
Quicksight
Quicksight Gallery
SPICE
Boto - Python
Boto - Create Ingestion
AWS Wrangler
CLI
API

What is Concurrently error in AWS in start query execution operation and how to solve it?

I am currently facing issue in the project where S3 buckets contain avg 50 tables and after running glue job I see this following error. I think its not the issue of memory/ or worker nodes.
{
"Error":"States.TaskFailed",
"Cause":"{\"AllocatedCapacity\":5,\"Arguments\":{\"--quotes_col_list\":\"Null\",\"--processed_prefix\":\"processed/cat2/uber/\",\"--replicated_prefix\":\"replicated/cat2/uber/\",\"--table_folder\":\"SALES_ORDER_DOCUMENT_TYPE/\",\"--devops_prefix\":\"uber_processing/glue_configuration/rename_glue_file/replicated/uber/\",\"--tablename\":\"sales_order_document_type\",\"--companies\":\"uber\",\"--metadata_path\":\"cat2/cat2_metadata.csv\",\"--reject_prefix\":\"reject/cat2/uber/\"},\"Attempt\":0,\"CompletedOn\":1641759367801,\"ErrorMessage\":\"TooManyRequestsException: An error occurred (TooManyRequestsException) when calling the StartQueryExecution operation: You have exceeded the limit for the number of queries you can run concurrently. Please reduce the number of concurrent queries submitted by this account. Contact customer support to request a concurrent query limit increase.\",\"ExecutionTime\":51,\"GlueVersion\":\"2.0\",\"Id\":\"jr_b8haonpeno503no0n3020
\",\"JobName\":\"uber_job\",\"JobRunState\":\"FAILED\",\"LastModifiedOn\":1641759367801,\"LogGroupName\":\"/aws-glue/jobs\",\"MaxCapacity\":5.0,\"NumberOfWorkers\":5,\"PredecessorRuns\":[],\"StartedOn\":1641759312689,\"Timeout\":2880,\"WorkerType\":\"G.1X\"}"
}
When I checked the query funtion it doesn't show me any query running in glue job.
response = athena_client.start_query_execution(
QueryString='msck repair table '+args['audit_table'],
ResultConfiguration={
'OutputLocation': args['athena_resultpath'] }
)
Can someone help me in QueryString='msck repair table '+args['audit_table'] what is the argument?
You mentioned the word "concurrency" bit didn't mentioned exactly what the error message is:
"ErrorMessage":"TooManyRequestsException: An error occurred (TooManyRequestsException) when calling the StartQueryExecution operation: You have exceeded the limit for the number of queries you can run concurrently
Athena has some built in soft limits, it also mentions in their docs:
A DML or DDL query quota includes both running and queued queries. For example, if you are using the default DML quota and your total of running and queued queries exceeds 25, query 26 will result in a TooManyRequestsException error.
You are simply going over the limits so your query fails, specifically the "DML query quota" i'm assuming, these soft limits are somewhat flexible and can be increased by submitting a reqest via the service quotas console

AWS Elasticsearch snapshot stuck in state IN_PROGRESS

I am using ElasticService from AWS. I am receiving Snapshot failure status in Overall health. I can see, there is one snapshot that is stuctk for almost 2 days.
id status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards
2020-07-13t13-30-56.2a009367-21fd-48ab-accc-36a3f61db683 IN_PROGRESS 1594647056 13:30:56 0 00:00:00 1.8d 342 0 0 0
I am not allowed to DELETE this snapshot:
{"Message":"Your request: '/_snapshot/cs-automated-enc/2020-07-13t13-30-56.2a009367-21fd-48ab-accc-36a3f61db683' is not allowed."}
I do not know what to do next. I am not able to fix it and lot of API calls does not work as it normally should do. I can see, it suddenly solved by itself after 2 days, but basically I do not know how to fix it and where is/was the problem.
Questions:
May I configure where should, and how often should elastic create snapshot of whole cluster? Or maybe just choose which index should be snapshoted?
May I see files in cs-automated-enc in S3 or is it not available for user, its included in Elastic AWS service?
Does snapshots stored in cs-automated-enc included in Elasticsearch price?

AWS LastModified S3 Bucket different

I'm developing a node.js function that lists the objects in an S3 Bucket via the listObjectsV2 call. In the returned json results, the date is not the same as the date shown in the S3 bucket nor in a aws cli s3 list. In fact, they are different days. I'm not sure how this is happening?
Any thoughts?
aws cli ls
aws s3 ls s3://mybucket
2018-11-08 19:38:55 24294 Thought1.mp3
S3 Page on AWS
JSON results
They are the same times, but in different timezones.
The listObjectsV2 response is giving you Zulu times (UTC or Greenwich Mean Time), which appears to be 6 hours ahead of you.
In the JSON picture you have 2018-11-09T01:38:55.000Z which is ZULU time (the Z at the very end). It means UTC/GMT time.
In the S3 console picture you have Nov 8, 2018 7:38:55 PM GMT-0600 - this time is GMT time minus 6 hours (see at the end GMT-0600) - which may be possibly the US EST time or similar. The difference between these two is exactly 6 hours.
The output from aws CLI is probably on your local computer and shows local time in the 24H format without the timezone, so it is harder to see the reason, but it matches the S3 console time.
In general, AWS returns times in the UTC time zone. This is usually quite helpful once you start deploying in multiple time-zones. On the other side, it may become tricky if you for example run your code on an EC2 instance where is configured a different timezone. So be careful when you convert from your local time to the UTC time - I would suggest you to even use some library like https://momentjs.com/ or you may create yourself more problems.