Delete all data in aws neptune - amazon-web-services

I have an aws neptune cluster and which I inserted many ntriples and nquads data using sparql http api.
curl -X POST --data-binary 'update=INSERT DATA { http://test.com/s http://test.com/p http://test.com/o . }' http://your-neptune-endpoint:8182/sparql
I would like to clean all the data I inserted(not the instance)
How can I do that?

You can do a SPARQL DROP ALL to delete all your data.
If you want a truly empty database (no data, no metrics in cloudwatch, no audit history etc), then I would highly recommend creating a new cluster up fresh. It takes only a few minutes.

If you want to want to remove only the data you inserted, one strategy is to use named graphs. When you insert the data, insert it into a named graph. When you delete, delete the graph.
To insert, one way is to use a call similar to the insert you gave. Except you insert into a named graph:
curl -X POST --data-binary 'update=INSERT DATA { GRAPH http://www.example.com/named/graph { http://test.com/s http://test.com/p http://test.com/o . } }'
https://endpoint:8182/sparql
Alternative is to insert using Graph Store Protocol:
curl --request POST -H "Content-Type: text/turtle"
--data-raw " http://test.com/s http://test.com/p http://test.com/o . "
'https://endpoint:8182/sparql/gsp/?graph=http%3A//www.example.com/named/graph'
Another is to use the bulk loader, which has namedGraphUri option (https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-load.html).
Here is a delete that removes the named graph:
curl --request DELETE 'https://endpoint:8182/sparql/gsp/?graph=http%3A//www.example.com/named/graph'
See https://docs.aws.amazon.com/neptune/latest/userguide/sparql-graph-store-protocol.html for details on the Graph Store Protocol in Neptune.

Related

How can I programmatically download data from QuestDB?

Is there a way to download query results from the database such as tables or other datasets? The UI supports a CSV file download, but this is manual work to browse and download files at the moment. Is there a way I can automate this? Thanks
You can use the export REST API endpoint, this is what the UI uses under the hood. To export a table via this endpoint:
curl -G --data-urlencode "query=select * from my_table" http://localhost:9000/exp
query= may be any SQL query, so if you have a report with more granularity that needs to be regularly generated, this may be passed into the request. If you don't need anything complicated, you can redirect curl output to file
curl -G --data-urlencode "query=select * from my_table" \
http://localhost:9000/exp > myfile.csv

gsutil / gcloud storage list files by limits and pagination

Is there any way we can list files from GCS bucket with limits.
Say I have 2k objects in my bucket. But when I do gsutil ls, I only want the 1st 5 objects, not all.
How to achieve this.
Also is there any pagination available ?
gsutil ls gs://my-bucket/test_file_03102021* 2>/dev/null | grep -i ".txt$" || :
From looking at gsutil help ls, gsutil doesn't currently have an option to limit the number of items returned from an ls call.
While you could pipe the results to something like awk to get only the first 5 items, that would be pretty wasteful if you have lots of objects in your bucket (since gsutil would continue making paginated HTTP calls until it listed all N of your objects).
If you need to do this routinely on a bucket with lots of objects, you're better off writing a short script that uses one of the GCS client libraries. As an example, check out the google-cloud-storage Python library -- specifically, see the list_blobs method, which accepts a max_results parameter.
There is a pagination available when you use the API directly. If you want only the 5 first objects and you use gsutil, you will have to wait the full answer of hundreds (thousands, millions,...) of files before getting only the first 5.
If you use the API you can do this
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://storage.googleapis.com/storage/v1/b/<BUCKET_NAME>/o?alt=json&&maxResults=5" \
| jq .items[].name
Of course, you can change the max results size
You can include prefix also when you filter. More detail in the API documentation

AWS, Elasticsearch, Filebeat: Apply index policy to index pattern instead of specific index

I am working on AWS's Elasticsearch(7.9), where we are pushing logs via filebeat. To remove old logs, I have a policy, but filebeat everyday creates new indexes, I have a single index pattern filebeat-*.
Is there a way to apply index lifecycle policy to a given pattern?
Or is there a way to inform filebeat to use a single index, and keep pushing on it.
Screenshot for issue:
As you can see, only 1 index has policy, I cannot login everyday and apply policy to new indexes, not practical. :-) Any help would be nice. Thank you.
Add policy to indices
curl --user "user:password" -X POST "https://your_elasticsearch_url/_opendistro/_ism/add/sandbox-2021.01.28" -H 'Content-Type: application/json' -d'
{
"policy_id": "cleanup_old_indices"
}' | jq -r

What's the best way to add project level metadata to a google cloud project?

Labels are project level have character limitations like cannot have spaces. I could add metadata through a bigquery table, or on each server. I could also make a README.txt on the default appspot bucket.
What's the best way to add metadata at a project level? Things like what the project is about, why it's there, people responsible, stakeholders, developers, context/vocabulary. Eg when I get fired people can see what is what.
Storing Metadata:
1. Console
This is quite straightforward. Once you navigate to Metadata section under Compute Engine (Compute Engine > Metadata), you can add project-level key:value pair in the console.
2. gcloud
Type the following command in the cloud shell of the project.
gcloud compute project-info add-metadata --metadata projectMailID=abc#gmail.com
3. API
Sending a post request to the google API. This is usually a more manual task, where you need to make a GET first to get fingerprint and then post to the API using the fingerprint.
Querying Metadata:
1. curl or wget
This is the frequently used option for getting instance or project metadata.
curl "http://metadata.google.internal/computeMetadata/v1/project/" -H "Metadata-Flavor: Google"
The above command will list all the metadata associated with the given project. Metadata can be stored either in directory or a single entry. If the URL ends in /, then it lists the directory, else it shows the value of single entry key.
The custom-metadata are stored under attributes directory. This can be retrieved by:
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/" -H "Metadata-Flavor: Google"
The above command lists all custom entries made in the project. To get the value of a single entry, try this:
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/ProjectMailID" -H "Metadata-Flavor: Google"
Metadata-Flavor: Google
This header indicates that the request was sent with the intention of retrieving metadata values, rather than unintentionally purposes.
2. gcloud
The gcloud command will list all metadata and other information about the project.
gcloud compute project-info describe
3. API
Making a GET request to the API will do the equivalent of gcloud.
GET https://www.googleapis.com/compute/v1/projects/<project>
Additional Information:
Waiting For Updates
This option allows to wait for any changes to the metadata and then retrieve the updated value. This can be done by appending ?wait_for_change=true as query parameter.
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/?wait_for_change=true" -H "Metadata-Flavor: Google"
Recursive
This option is used to print recursively the entries in the directory. This can be done by appending ?recursive=true as query parameter.
curl "http://metadata.google.internal/computeMetadata/v1/project/attributes/?recursive=true" -H "Metadata-Flavor: Google"

What is the optimised way to get a count of keys in a riak bucket?

I have a riak cluster set up with 3 servers. I can look at the bitcask to establish how much disk space this cluster is currently using but I'd also like to find out how many items are currently being stored in the cluster.
The cluster is being used to store images, meaning that binary data is being stored against a key in a set of buckets. I have tried to use map reduce functions against the HTTP interface in order to return the number of items in the bucket however they have timed out.
What is the most time optimised way to get the count of the number of keys from a specific bucket?
Counting the number of keys in a bucket on the Riak cluster is not very efficient, even with the use of the MapReduce functions.
The most efficient way I have found to count the number of items is to do it on the client through the streaming API. The following example uses node-js to do this.
First install the riak-js client
npm install riak-js#latest
Then run the following on the command line to give you your count.
node -e "require('riak-js').getClient({ host: 'hostname', port: 8098 }).count('bucket');"
Here is what worked for me - put it into console, no further installs:
curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '
{"inputs":"THE_BUKET",
"query":[{"map":{"language":"javascript",
"keep":false,
"source":"function(riakobj) {return [1]; }"}},
{"reduce":{"language":"javascript",
"keep":true,
"name":"Riak.reduceSum"}}]}'
There is also a open request on features.basho.com to make this easier (because, as bennettweb pointed out, it's not the most straightforward task).
http://features.basho.com/entries/20721603-efficiently-count-keys-in-a-bucket
Upvotes, comments, etc., are encouraged.
Mark
http://docs.basho.com/riak/latest/dev/using/2i/
paragraph "Count Bucket Objects via $bucket Index"
$ curl -XPOST http://localhost:8098/mapred
-H 'Content-Type: application/json'
-d '{"inputs":{
"bucket":"mybucket",
"index":"$bucket",
"key":"mybucket"
},
"query":[{"reduce":{"language":"erlang",
"module":"riak_kv_mapreduce",
"function":"reduce_count_inputs",
"arg":{"reduce_phase_batch_size":1000}
}
}]
}'
EOF
reduce index is better than mapreduce data