AWS Elasticsearch snapshot stuck in state IN_PROGRESS

AWS Elasticsearch snapshot stuck in state IN_PROGRESS - amazon-web-services

I am using ElasticService from AWS. I am receiving Snapshot failure status in Overall health. I can see, there is one snapshot that is stuctk for almost 2 days.
id status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards
2020-07-13t13-30-56.2a009367-21fd-48ab-accc-36a3f61db683 IN_PROGRESS 1594647056 13:30:56 0 00:00:00 1.8d 342 0 0 0
I am not allowed to DELETE this snapshot:
{"Message":"Your request: '/_snapshot/cs-automated-enc/2020-07-13t13-30-56.2a009367-21fd-48ab-accc-36a3f61db683' is not allowed."}
I do not know what to do next. I am not able to fix it and lot of API calls does not work as it normally should do. I can see, it suddenly solved by itself after 2 days, but basically I do not know how to fix it and where is/was the problem.
Questions:
May I configure where should, and how often should elastic create snapshot of whole cluster? Or maybe just choose which index should be snapshoted?
May I see files in cs-automated-enc in S3 or is it not available for user, its included in Elastic AWS service?
Does snapshots stored in cs-automated-enc included in Elasticsearch price?

Related

GCP Datastore times out on large download

I'm using Objectify to access my GCP Datastore set of Entites. I have a full list of around 22000 items that I need to load into the frontend:
List<Record> recs = ofy().load().type(Record.class).order("-sync").list();
The number of records has recently increased and I get an error from the backend:
com.google.apphosting.runtime.HardDeadlineExceededError: This request (00000185caff7b0c) started at 2023/01/19 17:06:58.956 UTC and was still executing at 2023/01/19 17:08:02.545 UTC.
I thought that the move to Cloud Firestore in Datastore mode last year would have fixed this problem.
My only solution is to break down the load() into batches using 2 or 3 calls to my Ofy Service.
Is there a better way to grab all these Entities in one go?
Thanks
Tim

Export rds query to s3 give error after some time

I am trying to export data from a table in my postgresql database to S3. When I execute the query everything goes well, the data is exported correctly to S3, until suddenly after about 16 hours, the query gives an error:
ERROR: could not upload to Amazon S3
DETAIL: Amazon S3 client returned 'Unable to parse ExceptionName: ExpiredToken Message: The provided token has expired.'.
CONTEXT: SQL function "query_export_to_s3" statement 1
What could be the problem? I thought that the token was renewed 5 minutes before its expiration.
UPDATE: The role we use to execute the query has a session duration of 12h
More updates: The query I am running is to migrate several GB of data to S3, probably around 500 GB. I made a separate query to verify the number of records and the total is 500 million, this query took 4 hours to complete. Now what I did was run a query to export those 500 million records to S3 and after about 16 hours I get the message you see above.
In S3 the result was saved in parts of 6 GB.
We repeat the query that exports to S3 about 3 times and always the same result, after about 16 hours I get the expired token error.
I'm running the query from ec2 instance.

Please check AWS authentication documentation:
The minimum session duration is 1 hour, and can be set to a maximum of 12 hours.

Where to find node logs in AWS EMR cluster?

I have pyspark program running on AWS EMR cluster.
Cluster config is like this - emr-5.31.0, hadoop 2.10.0, hive 2.3.7, hue 4.7.1, pig 0.17.0.
Program processes some files on hdfs file system but at some moment it is getting errors.
In amazon console - YARN applications - application_XXX (Spark) - executors - driver - stderr:
'could not obtain block ... file=
A little before this message there is 'Task 0 in stage 35 failed 4 times. aborting job'
If i go to amazon console - YARN applications - application_XXX (Spark) - stages - 35 - tasks - 0 - stdout - i dont see anything bad at first glance except a lot of 'GC (allocation Failure)' messages.
In its stderr - there is a WARN - 'Could not obtain block XXX, file= No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException.
If i go to monitoring tab - node status - i see that one node became unhealthy at that time and thats it. Number of nodes also changed at 'live data nodes', 'MR total nodes', 'MR active nodes', MR lost nodes' charts.
As i understand, task cannot find file on hdfs because node it was hosted on became unhealthy.
My question is where i can find the reasons node became unhealthy. I wasnt able to find any other logs on amazon console. May be there are some node-local places where this reason is stored?

Hi I launched a EMR myself some time ago, dont remember about the logs. But consulting the docs here:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html
It states that they are stored on the machines (which I assume you have the keys), they are also stored on S3 by default. Not sure in which bucket they will be created.
Best Regards :)

On the Summary page for your EMR cluster there is a section named "Configuration details".
Below that, there is a label named "Log URI". It points to an S3 URI, but, there is also a small folder icon.
Click on that icon and you can browse to the logs on the nodes for your EMR cluster.

Actually, for amazon there are more logs accessible via s3 location - there are logs for node boot and configuration part, and logs from running services on node - hdfs and yarn, which i was looking for. Path looks like this - s3 location/cluster id/node/node id/applications - here i was able to find hdfs and yarn logs.

Errors importing large CSV file to DynamoDB using Lambda

I want to import a large csv file (around 1gb with 2.5m rows and 50 columns) into a DynamoDb, so have been following this blog from AWS.
However it seems I'm up against a timeout issue. I've got to ~600,000 rows ingested, and it falls over.
I think from reading the CloudWatch log that the timeout is occurring due to the boto3 read on the CSV file (it opens the entire file first, iterates through and batches up for writing)... I tried to reduce the file size (3 columns, 10,000 rows as a test), and I got a timeout after 2500 rows.
Any thoughts here?!
TIA :)

I really appreciate the suggestions (Chris & Jarmod). After trying and failing to break things programmatically into smaller chunks, I decided to look at the approach in general.
Through research I understood there were 4 options:
Lambda Function - as per the above this fails with a timeout.
AWS Pipeline - Doesn't have a template for importing CSV to DynamoDB
Manual Entry - of 2.5m items? no thanks! :)
Use an EC2 instance to load the data to RDS and use DMS to migrate to DynamoDB
The last option actually worked well. Here's what I did:
Create an RDS database (I used the db.t2.micro tier as it was free) and created a blank table.
Create an EC2 instance (free Linux tier) and:
On the EC2 instance: use SCP to upload the CSV file to the ec2 instance
On the EC2 instance: Firstly Sudo yum install MySQL to get the tools needed, then use mysqlimport with the --local option to import the CSV file to the rds MySQL database, which took literally seconds to complete.
At this point I also did some data cleansing to remove some white spaces and some character returns that had crept into the file, just using standard SQL queries.
Using DMS I created a replication instance, endpoints for the source (rds) and target (dynamodb) databases, and finally created a task to import.
The import took around 4hr 30m
After the import, I removed the EC2, RDS, and DMS objects (and associated IAM roles) to avoid any potential costs.
Fortunately, I had a flat structure to do this against, and it was only one table. I needed the cheap speed of the dynamodb, otherwise, I'd have stuck to the RDS (I almost did halfway through the process!!!)
Thanks for reading, and best of luck if you have the same issue in the future.

Amazon redshift query aborts automatically after 1 hour

I have around 500GB compressed data in amazon s3. I wanted to load this data to Amazon Redshift. For that, I have created an internal table in AWS Athena and I am trying to load data in the internal table of Amazon Redshift.
Loading of this big data into Amazon Redshift is taking more than an hour. The problem is when I fired a query to load data it gets aborted after 1hour. I tried it 2-3 times but it's getting aborted after 1 hour. I am using Aginity Tool to fire the query. Also, in Aginity tool it is showing that query is currently running and the loader is spinning.
More Details:
Redshift cluster has 12 nodes with 2TB space for each node and I used 1.7 TB space.
S3 files are not the same size. One of them is 250GB. Some of them in MB.
I am using the command
create table table_name as select * from athena_schema.table_name
it stops exactly after 1hr.
Note: I have set the current query timeout in Aginity to 90000 sec.

I know this is an old thread, but for anyone coming here because of the same issue, I've realised that, at least for my case, the problem was the Aginity client; so, it's not related with Redshift or its Workload Manager, but only with such third party client called Aginity. In summary, use a different client like SQL Workbench and run the COPY command from there.
Hope this helps!
Carlos C.
More information, about my environment:
Redshift:
Cluster TypeThe cluster's type: Multi Node
Cluster: ds2.xlarge
NodesThe cluster's type: 4
Cluster Version: 1.0.4852
Client Environment:
Aginity Workbench for Redshift
Version 4.9.1.2686 (build 05/11/17)
Microsoft Windows NT 6.2.9200.0 (64-bit)
Network:
Connected to OpenVPN, via SSH Port tunneling.
The connection is not being dropped. This issue is only affecting the COPY command. The connection remains active.
Command:
copy tbl_XXXXXXX
from 's3://***************'
iam_role 'arn:aws:iam::***************:role/***************';
S3 Structure:
120 files of 6.2 GB each. 20 files of 874MB.
Output:
ERROR: 57014: Query (22381) cancelled on user's request
Statistics:
Start: ***************
End: ***************
Duration: 3,600.2420863

I'm not sure if following answer will solve your exact problem of timeout at exactly 1 Hr.
But, based on my experience, in case of Redshift loading data via Copy command is best and fast way. SO I feel that timeout issue shouldn't happen at all in your case.
The copy command in RedShift could load data from S3 or via SSH.
e.g.
Simple copy
copy sales from 'emr://j-SAMPLE2B500FC/myoutput/part-*' iam_role
'arn:aws:iam::0123456789012:role/MyRedshiftRole'
delimiter '\t' lzop;
e.g. Using Menifest
copy customer
from 's3://mybucket/cust.manifest'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
manifest;
PS: Even if you do it using Menifest and divide your data into Multiple files, it will be more faster as RedShift loads data in parallel.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS Elasticsearch snapshot stuck in state IN_PROGRESS - amazon-web-services

Related

GCP Datastore times out on large download

Export rds query to s3 give error after some time

Where to find node logs in AWS EMR cluster?

Errors importing large CSV file to DynamoDB using Lambda

Amazon redshift query aborts automatically after 1 hour

Categories

Resources