AWS DMS s3 target folder appear and disapear - amazon-web-services

I have a DMS replication task on-going from RDS Aurora (MySQL) to S3.
My S3 endpoint settings are
{
"CsvRowDelimiter": "\\n",
"CsvDelimiter": ",",
"BucketFolder": "dms",
"BucketName": "mybucketname",
"CompressionType": "NONE",
"EncryptionMode": "SSE_KMS",
"ServerSideEncryptionKmsKeyId": "arn:aws:kms:xxxxxxxxxxxxxxxx",
"EnableStatistics": true,
"IncludeOpForFullLoad": true,
"CdcInsertsOnly": false,
"TimestampColumnName": "TIMESTAMP",
"DatePartitionEnabled": true,
"DatePartitionSequence": "yyyymmdd",
"DatePartitionDelimiter": "slash",
"AddColumnName": true,
"Rfc4180": true
}
I have some folders in my s3 who appears and disappear.
On this particular folders, I don't have the LOAD00000001.csv and only my folder structure of the day (let's say now /2023/01/16/ and file within it.
Is this a normal behavior? If yes, how can I fix it? I'm waiting to have nothing deleted at all, even in case of schema/DDL changes.
Thanks

Related

linking to a google cloud bucket file in a terminal command?

I'm trying to find my way with Google Cloud.
I have a Debian VM Instance that I am running a server on. It is installed and working via SSH Connection in a browser window. The command to start the server is "./ninjamsrv config-file-path.cfg"
I have the config file in my default google firebase storage bucket as I will need to update it regularly.
I want to start the server referencing the cfg file in the bucket, e.g:
"./ninjamsrv gs://my-bucket/ninjam-config.cfg"
But the file is not found:
error opening configfile 'gs://my-bucket/ninjam-config.cfg'
Error loading config file!
However if I run:
"gsutil acl get gs://my-bucket/"
I see:
[
{
"entity": "project-editors-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "editors"
},
"role": "OWNER"
},
{
"entity": "project-owners-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "owners"
},
"role": "OWNER"
},
{
"entity": "project-viewers-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "viewers"
},
"role": "READER"
}
]
Can anyone advise what I am doing wrong here? Thanks
The first thing to verify is if indeed the error thrown is a permission one. Checking the logs related to the VM’s operations will certainly provide more details in that aspect, and a 403 error code would confirm if this is a permission issue. If the VM is a Compute Engine one, you can refer to this documentation about logging.
If the error is indeed a permission one, then you should verify if the permissions for this object are set as “fine-grained” access. This would mean that each object would have its own set of permissions, regardless of the bucket-level access set. You can read more about this here. You could either change the level of access to “uniform” which would grant access to all objects in the relevant bucket, or make the appropriate permissions change for this particular object.
If the issue is not a permission one, then I would recommend trying to start the server from the same .cfg file hosted on the local directory of the VM. This might point the error at the file itself, and not its hosting on Cloud Storage. In case the server starts successfully from there, you may want to re-upload the file to GCS in case the file got corrupted during the initial upload.

ElasticSearch massive purge of deleted documents reasons? (AWS ES service)

Is there any info when and why ES may trigger automatically purge of documents marked for deletion?
Where can be found logs with possible info about trigger?
The service in question is actually AWS ES, but I do not think it is related to the topic.. may be I'm wrong?
The version in question is ElasticSearch 5.1
when the merge happens, the marked document will be purged.
there are some merge policies that indicate when the merge process triggered. for example, the number of segment's files is more than 300 or the marked document is more than 15% of a segment.
there is some information here for elasticsearch 1.4:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-merge.html
It seems that the developers don't want to clarify the policies anymore.
this is an example of merge policy settings:
"merge": {
"scheduler": {
"max_thread_count": "1",
"auto_throttle": "true",
"max_merge_count": "6"
},
"policy": {
"reclaim_deletes_weight": "2.0",
"floor_segment": "2mb",
"max_merge_at_once_explicit": "30",
"max_merge_at_once": "10",
"max_merged_segment": "5gb",
"expunge_deletes_allowed": "10.0",
"segments_per_tier": "10.0",
"deletes_pct_allowed": "33.0"
}
for logging a merge process I think you should change logs level to INFO or DEBUG. (log4j settings)

AWS Amplify: How to delete the environment, when resources are already partially deleted?

TL;DR: How to delete an amplify environment, when some resources of the service have been deleted manually in the console?
So, I took a course on egghead to learn the aws amplify CLI. Unfortunately, it doesn't teach you how to delete the environment (otherwise it's great though!). My google search back then said you will have to delete the resources manually. I tried (/did) that for the resources I used. I deleted the user account for the CLI (🤦🏻‍♂️), "deleted" the cognito user pool (it still shows up in amplify status), deleted the DynamoDB and the AppSync API (also still shows up).
Now as I mentioned when I run amplify status I get:
| Category | Resource name | Operation | Provider plugin |
| -------- | --------------- | --------- | ----------------- |
| Auth | cognito559c5953 | No Change | awscloudformation |
| Api | AmplifyTodoApp | No Change | awscloudformation |
I wondered - since I thought I deleted them - do they still exist?
So I googled some more. Now it turns out there is also the command amplify delete which automatically deletes all resources associated with your amplify project. Since I deleted the account that I used for the project, that command throws:
The security token included in the request is invalid.
Is there any way I can delete these resources without the user? Are these resources even still online (since I manually deleted them and they do not show up in the console online - even in the CloudFront console)? Or will I have to delete my whole AWS account? I don't want to end up with a big bill one day for these resources.
EDIT: I also deleted the S3 bucket.
EDIT 2: So I managed to use another profile (by changing local-aws-info.json) so I don't get the security request failed error any more. Now I get the error:
Missing region in config
amplify status still yields the same response.
amplify cli determines the status by diffing amplify/#current-cloud-backend and amplify/backend folder inside your project. So what you see when you run amplify status you see isn't accurate in your case.
If you have created multiple environments (in different regions) make sure that you delete them too. The easiest way to delete them if you can't use amplify delete is to go to cloud formation in the region where you have created the environment and deleting the root stack, which ensures that all the resources created by that stack are removed.
PS: The cli creates roles for auth and unauth users when initialized and creates policies for the resources (they don't cost anything if they exist). You could delete them if you don't want them hanging around.
When some resources have been deleted manually (S3 & Cloudformation) then
$amplify delete
Gives Following :
Unable to remove env: dev because deployment bucket amplify-amplifyAPPName-dev-XYZ-deployment does not exist or has been deleted.
Stack has already been deleted or does not exist
Please look at this:
C:user\samadhan\Amplify-Projects\amplifyapp-demo>amplify delete
? Are you sure you want to continue? This CANNOT be undone. (This will delete all the environments of the project from the cloud and wi
pe out all the local files created by Amplify CLI) Yes
- Deleting resources from the cloud. This may take a few minutes...
Deleting env: dev.
Unable to remove env: dev because deployment bucket amplify-
amplifyinitdemo-dev-131139-deployment does not exist or has been deleted.
Stack has already been deleted or does not exist
\ Deleting resources from the cloud. This may take a few minutes...App
dfwx13s2bgtb1 not found.
App dfwx13s2bgtb1 not found.
√ Project already deleted in the cloud.
Project deleted locally.
App Amplify App still showing in Console Unable to delete from Console.
Please Take a look :
Solution:
Using AWS CLI You Can be Fixed This Issue.
Step 1 ) Make Sure AWS CLI is configured with the Same AWS Account if Not Please Create IAM User & Configure it with the same Region.
C:user\samadhan\Amplify-Projects\amplifyapp-demo>aws configure
AWS Access Key ID [****************HZHF]: ****************ICHK
AWS Secret Access Key [****************iBJl]:****************SnaX
Default region name [ap-south-1]: ap-south-1
Default output format [json]: json
Step 2 ) Use Following AWS CLI Commands.
C:user\samadhan\Amplify-Projects\amplifyapp-demo>>aws amplify help
Available Commands
******************
* create-app
* create-backend-environment
* create-deployment
* delete-app
* delete-backend-environment
* get-app
* list-apps
* list-backend-environments
C:user\samadhan\Amplify-Projects\amplifyapp-demo>aws amplify list-apps
{
"apps": [
{
"appId": "d39pvb2qln4v7l",
"appArn": "arn:aws:amplify:ap-south-1:850915XXXXX:apps/d39pvb2qln4v7l",
"name": "react-amplify-demo-project",
"tags": {},
"platform": "WEB",
"createTime": 1640206703.371,
"updateTime": 1640206703.371,
"environmentVariables": {
"_LIVE_PACKAGE_UPDATES": "[{\"pkg\":\"#aws-amplify/cli\",\"type\":\"npm\",\"version\":\"latest\"}]"
},
{
"appId": "d2jsl78ex1asqy",
"appArn": "arn:aws:amplify:ap-south-1:85091xxxxxxxx:apps/d2jsl78ex1asqy",
"name": "fullstackapp",
"tags": {},
"platform": "WEB",
"createTime": 1640250148.974,
"updateTime": 1640250148.974,
"environmentVariables": {
"_LIVE_PACKAGE_UPDATES": "[{\"pkg\":\"#aws-amplify/cli\",\"type\":\"npm\",\"version\":\"latest\"}]"
}
}
Step 3) Use Following CLI Command to Delete App Or App Env
C:user\samadhan\Amplify-Projects\amplifyapp-demo>aws amplify delete-app --app-id d39pvb2qln4v7l
{
"app": {
"appId": "d39pvb2qln4v7l",
"appArn": "arn:aws:amplify:ap-south-1:8509xxxxx:apps/d39pvb2qln4v7l",
"name": "react-amplify-demo-project",
"repository": "https://gitlab.com/samadhanfuke/react-amplify-demo-project",
"platform": "WEB",
"createTime": 1639077857.194,
"updateTime": 1639077857.194,
"iamServiceRoleArn": "arn:aws:iam::850915xxxx:role/amplifyconsole-backend-role",
"environmentVariables": {
"_LIVE_UPDATES": "[{\"name\":\"Amplify CLI\",\"pkg\":\"#aws-amplify/cli\",\"type\":\"npm\",\"version\":\"latest\"}]"
},
"defaultDomain": "d39pvb2qln4v7l.amplifyapp.com",
"enableBranchAutoBuild": false,
"enableBranchAutoDeletion": false,
"enableBasicAuth": false,
"customRules": [
{
"source": "/<*>",
"target": "/index.html",
"status": "404-200"
}
],
"productionBranch": {
"lastDeployTime": 1639078272.607,
"status": "SUCCEED",
"branchName": "preview"
},
"buildSpec": "version: 1\nbackend:\n phases:\n # IMPORTANT - Please verify your build commands\n build:\n commands:\n - '# Execute Amplify CLI with the helper script'\n - amplifyPush --simple\nfrontend:\n phases:\n build:\n commands: []\n artifacts:\n # IMPORTANT - Please verify your build output directory\n baseDirectory: /\n files:\n - '**/*'\n cache:\n paths: []\n",
"customHeaders": "",
"enableAutoBranchCreation": false
}
}
Amplify App With Environment Successfully Deleted.
Check-in Amplify Console.
As of 9/26/2022 several updates have been released that fix issues deleting apps/backends, including issues where the s3 bucket or cloudformation stack was already deleted
Note that deleting the amplify application as documented here, does not remove the resources created in S3. You need to delete these manually.
The content in the bucket amplify-{project name}-{env name}-{some id}-deployment is created and updated when you run amplify init, amplify push among others. It appears to be used as the remote synchronisation directory.
The S3 buckets will be recreated by the amplify root CloudFormation stack, whenever you create a new env or run amplify init.

Why is my S3 lifecycle policy not taking effect?

I have an S3 lifecycle policy to delete objects after 3 days, and I am using a prefix. My problem is that the policy works for all but one sub-directory. For example, lets say my bucket looks like this:
s3://my-bucket/myPrefix/env=dev/
s3://my-bucket/myPrefix/env=stg/
s3://my-bucket/myPrefix/env=prod/
When I check the stg and prod directories, there are no objects older than 3 days. However, when I check the dev directory, there are objects a lot older than that.
Note - There is a huge difference between the volume of data in dev compared to the other 2. Dev holds a lot more logs than the others.
My initial thought was that it was taking longer for Eventual Consistency to show what was deleted and what wasn't, but that theory is gone considering the time that has passed.
The issue seems related to the amount of data in this location under the prefix compared to the others, but I'm not sure what I can do to resolve this. Should I have another policy specific to this location, or is there a somewhere I can check to see what is causing the failure? I did not see anything in Cloudtrail for this event.
Here is my policy:
{
"Rules": [
{
"Expiration": {
"Days": 3
},
"ID": "Delete Object When Stale",
"Prefix": "myPrefix/",
"Status": "Enabled"
}
]
}

Elasticsearch Snapshot Fails With RepositoryMissingException

Three node ElasticSearch cluster on AWS. Bigdesk and Head both show a healthy cluster. All three nodes are running ES 1.3, and the latest Amazon Linux updates. When I fire off a snapshot request like:
http://localhost:9200/_snapshot/taxanalyst/201409031540-snapshot?wait_for_completion=true
the server churns away for several minutes before responding with the following:
{
"snapshot": {
"snapshot": "201409031521-snapshot",
"indices": [
"docs",
"pdflog"
],
"state": "PARTIAL",
"start_time": "2014-09-03T19:21:36.034Z",
"start_time_in_millis": 1409772096034,
"end_time": "2014-09-03T19:28:48.685Z",
"end_time_in_millis": 1409772528685,
"duration_in_millis": 432651,
"failures": [
{
"node_id": "ikauhFYEQ02Mca8fd1E4jA",
"index": "pdflog",
"reason": "RepositoryMissingException[[faxmanalips] missing]",
"shard_id": 0,
"status": "INTERNAL_SERVER_ERROR"
}
],
"shards": {
"total": 10,
"failed": 1,
"successful": 9
}
}
}
These are three nodes on three different virtual EC2 machines, but they're able to communicate via 9300/9200 without any problems. Indexing and searching works as expected. There doesn't appear to be anything in the elasticsearch log files that speaks to the server error.
Does anyone know what's going on here, or at least where a good place to start would be?
UPDATE: Turns out that each of the nodes in the cluster need to have snapshot directories that match the directory specified when you register the snapshot with the elasticsearch cluster.
I guess the next question is: when you want to tgz up the snapshot directory so you can archive it, or provision a backup cluster, is it sufficient to just tgz the snapshot directory on the Master node? Or do you have to somehow consolidate the snapshot directories of all the nodes. (That can't be right, can it?)
Elasticsearch supports shared file system repository uses the shared file system to store snapshots.
In order to register the shared file system repository it is necessary to mount the same shared filesystem to the same location on all master and data nodes.
All you need to know Put in elasticsearch.yml of all 3 nodes same repository_name.
for eg:-
path.repo:[/my_repository]
I think you are looking for this aws plugin for elasticsearch (I guess you already installed it to configure your cluster) : https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
It will allow you to create a repository mapped to a S3 bucket.
To use (create/restore/whatever) a snapshot, you need to create a repository first. Then when you will do some actions on a snapshot, Elasticsearch will directly manage it on your S3 bucket.