Deletion Operation in AWS DocumentDB - amazon-web-services

I have a question about deleting data in AWS DocumentDB.
I am using PuTTY to connect to EC2 instance and I use mongo shell command to connect with my Document DB.
I checked AWS DocumentDB documentation but I couldn't find how to delete singular data or whole data in one collection. For example I say:
rs0:PRIMARY> show databases
gg_events_document_db 0.000GB
rs0:PRIMARY> use gg_events_document_db
switched to db gg_events_document_db
rs0:PRIMARY> db.data_collection.find({"Type": "15"})
{"Type" : "15", "Humidity" : "14.3%"}
Now I have found the data and I want to delete it. What query should I need to run?
Or what if I want to delete all data in the collection? Without deleting my collection how can I do it?
Probably I am asking very basic questions but I couldn't find a query like this on my own.
I would be so happy if some experienced people in AWS DocumentDB can help me or share some resources.
Thanks a lot 🙌

Amazon DocumentDB has compatibility with MongoDB APIs for 3.6 and 4.0. This said, the same APIs can be used for this need. With respect to:
Or what if I want to delete all data in the collection? Without
deleting my collection how can I do it?
Yo can use:
db.data_collection.drop()

To delete a single document matching a filter, you would use the deleteOne() method.
For example, for your case that would be:
db.data_collection.deleteOne({"Type": "15"})
To delete all documents matching the filter, then use deleteMany().
There's also remove() method, but it was deprecated.
The drop() method deletes the entire collection.

Related

Data Pipeline Solution

We have a use-case to build data pipeline solution in which we need following things:
Ability to have multiple steps (outputs from one step should feed as input to next)
Ability to have multiple algorithms (SQL Query or probably invoke REST endpoint) in each step.
Input to first step can be anything. We have DW tables, but we can pre-process and keep the relevant information in AWS S3 or other data store.
Something like this:
Is there an existing solution that already provides functionalities similar to this or can be modified to support this?
Having something in AWS would be easier to integrate.
How about AWS Glue? Sounds like a fit to your goals...

AWS: Is there a way to delete every artifact using string matching?

I need to remove a lot of created resources in AWS. Buckets, Lambdas, cloudformation, and more. I know everything I need to delete will start with "ABC". Is there a way to just delete everything from the AWS CLI that starts with "ABC"? or even delete resource types that start with the string?
Sadly there is not a single command for all of these. You would have to create a custom script or program, e.g. in python, to list all your resources in questions, filter them out by name, and delete what is needed.
While it won't handle everything (CloudFormation isn't on their list, unfortunately), cloud-nuke can delete artifacts based on regex strings (both inclusive and exclusive) so this might be a good tool for most cases.

AWS elasticsearch log rotation

I want to use AWS elasticsearch to store the log of my application. Since there a huge amount of data to input to AWS elasticsearch ( ~30GB daily), so i would only keep 3 days of data. Are there any way to schedule data removal from AWS elasticsearch or do a log rotation? What happen if the AWS elasticsearch storage is full?
Thanks for the help
A possible way is to specify the index parameter in elasticsearchoutput to something like logstash-%{appname}-%{date_format}". Hence you can then use curator plugin in order to delete the old indices by number of days or so.
This SO pretty much explains the same. Hope it helps!
I assume you are using the AWS Amazon Elasticsearch Service?
The storage type is an EBS volume with a fixed size of disk space. If you want to keep only the last three days, I assume you have 3 indices then, like that
my-index-2017.01.30
my-index-2017.01.31
my-index-2017.02.01
Basically you can write some simple script which deletes indices older than 3 days. With the REST API it just is in Sense DELETE my-index-2017.01.30.
I recommend to use Elasticsearch Curator for the job. See https://www.elastic.co/guide/en/elasticsearch/client/curator/current/delete_indices.html
I'm not sure if the Service interface itself has an option for that. But Elasticsearch Curator should do the job for you.
Update for 2020:
AWS ES has now support for Index state management which lets you define custom management policies to automate routine tasks and apply them to indices and index patterns. You no longer need to set up and manage external processes to run your index operations.
For example, you can define a policy that moves your index into a read_only state after 30 days and then ultimately deletes it after 90 days.
Index State Management - https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/ism.html

Query AWS SNS Endpoints by User Data

Simple question, but I suspect it doesn't have a simple or easy answer. Still, worth asking.
We're creating an implementation for push notifications using AWS with our Web Server running on EC2, sending messages to a queue on SQS, which is dealt with using Lambda, which is sent finally to SNS to be delivered to the iOS/Android apps.
The question I have is this: is there a way to query SNS endpoints based on the custom user data that you can provide on creation? The only way I see to do this so far is to list all the endpoints in a given platform application, and then search through that list for the user data I'm looking for... however, a more direct approach would be far better.
Why I want to do this is simple: if I could attach a User Identifier to these Device Endpoints, and query based on that, I could avoid completely having to save the ARN to our DynamoDB database. It would save a lot of implementation time and complexity.
Let me know what you guys think, even if what you think is that this idea is impractical and stupid, or if searching through all of them is the best way to go about this!
Cheers!
There isn't the ability to have a "where" clause in ListTopics. I see two possibilities:
Create a new SNS topic per user that has some identifiable id in it. So, for example, the ARN would be something like "arn:aws:sns:us-east-1:123456789:know-prefix-user-id". The obvious downside is that you have the potential for a boat load of SNS topics.
Use a service designed for this type of usage like PubNub. Disclaimer - I don't work for PubNub or own stock but have successfully used it in multiple projects. You'll be able to target one or many users this way.
According the the [AWS documentation][1] if you try and create a new Platform Endpoint with the same User Data you should get a response with an exception including the ARN associated with the existing PlatformEndpoint.
It's definitely not ideal, but it would be a round about way of querying the User Data Endpoint attributes via exception.
//Query CustomUserData by exception
CreatePlatformEndpointRequest cpeReq = new CreatePlatformEndpointRequest().withPlatformApplicationArn(applicationArn).withToken("dummyToken").withCustomUserData("username");
CreatePlatformEndpointResult cpeRes = client.createPlatformEndpoint(cpeReq);
You should get an exception with the ARN if an endpoint with the same withCustomUserData exists.
Then you just use that ARN and away you go.

Does emrfs support custom query parameters in s3 url?

Is it possible to add customer query parameters in s3 url?
We would like to add some custom meta data to S3 objects, but would like it to be transparent to EMRFS
Something like:
s3://bucket-name/object-name?x-amz-meta-tag=magic-tag
Then in our PySpark or hadoop job, we would like to write:
data.write.csv('s3://bucket-name/object-name?x-amz-meta-tag=magic-tag')
Trying this on the emrfs shows that it treats "object-name?x-amz-meta-tag=magic-tag" as the entire object name instead of ignoring the query parameters.
I can't speak for the closed source EMRFS, but for the ASF s3 connectors, the answer is "no". Interesting proposal though; maybe you should think about contributing it to the ASF. Of course, that adds a new problem: what if existing users are creating files with ? in their names —how to retain compatibility.