Update CloudSearch document using Python boto

Update CloudSearch document using Python boto - amazon-web-services

I am using the latest boto tools for Python to add and search documents on Amazon CloudSearch. I haven't been able to find any documentation regarding the updates of documents. There is documentation for the old API here: http://boto.readthedocs.org/en/latest/cloudsearch_tut.html. Here, when adding a document you give a version number, and to quote the docs:
If you wish to update a document, you must use a higher version ID.
However, I don't find this feature in the boto namespaces for the new API (the ones with cloudsearch2). The add function no longer takes a version.
Currently what I am doing to update a document is getting it by ID, then adding it again. The logic of updating the fields is on my side.
What would be nice is to add a document with the same ID and higher version number and only fill in the fields that you want overridden, and the document should be updated.
Is there still a way to use the version of a document in the new boto API?

There is no way to use 'version' with the new boto API (cloudsearch2) because that library is built for CloudSearch version 2013-01-01, which removed the version field.
CloudSearch also does not allow you to selectively update certain fields of a document, although I agree that would be useful. This was not possible with the old version either.
This describes the SDF format for submitting documents that boto is implementing for you, in case you want to have a look at the underlying interface.
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/preparing-data.html

Related

Automatically migrate JSON data to newest version of JSON schema

I have a service running on my linux machine that reads data stored in a .json file when the machine is booting. The service then validates the incoming JSON data and modifies specific system configurations according to the data. The service is written in C++ and for the validation im using https://github.com/pboettch/json-schema-validator.
In development it was easy to modify the JSON schema and just adapt the data manually. I've started to use semantic versioning for my JSON schema and included it the following way:
JSON schema:
{
"$id": "https://my-company.org/schemas/config/0.1.0/config.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
// Start of Schema definition
}
JSON data:
{
"$schema": "https://my-comapny.org/schemas/config/0.1.0/config.schema.json",
// Rest of JSON data
}
With the addition of the version, I am able to check if a version mismatch exists before validating.
What I am looking for is a way to automatically migrate the JSON data to match the newer schema version, if a version mismatch is identified. Is there any way to automatically achieve this, or is the only way to manually edit the JSON data to match the schema?
Since I plan on releasing this as open source I would really like to include some form of automatic migration so I can just ask the user if he wants to migrate to conform to the newest schema version instead of throwing an error, if a version mismatch was identified.

What you're asking for is something which will need to make assumptions to work.
This is an age old problem and similar for databases. You can have schema migrations generated with many simple changes, but this is not viable if you wish to translate existing data automatically too.
Let's look at a basic example. You rename a field.
How would a tool know you've renamed a field vs removed an old one and added a new one? It essentially, cannot.
So, you need to write your migrations by hand.
You could use JSON transformation tools like jq or fx to create migration scripts without writing it in code, which may or may not be preferable. (jq has a steeper learning curve but it's also very powerful.)

Bulk Tag Bigquery columns with python & Google Cloud Datacatalog

Is there a way to bulk tag bigquery tables with python google.cloud.datacatalog?

If you want to take a look at sample code which uses the python google.cloud.datacatalog client library, I've put together a utilities open source script, that creates bulk Tags using a CSV as source. If you want to use a different source, you may use this script as reference, hope it helps.
create bulk tags from csv

For this purpose you may consider using DataCatalogClient() method which is included in google.cloud.datacatalog_v1 class as a part of PyPI Python google-cloud-datacatalog package leveraging Google Cloud Data Catalog API service.
By the first, you have to enable Data Catalog and BigQuery APIs
in your project;
Install Python Cloud Client Libraries for the Data Catalog API:
pip install --upgrade google-cloud-datacatalog
Set up authentication, exporting
GOOGLE_APPLICATION_CREDENTIALS environment variable holding JSON
file that contains your service account key:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
Refer to this example from official documentation that
intelligibly reflects a way creating Data catalog tag template,
attaching appropriate tag fields to the target Bigquery table using
create_tag_template() function.
Having any doubts feel free to extend you initial question or add a comment below this answer, thus we can address particular use case according to your needs.

"checksum must be specified in PUT API, when the resource already exists"

I am getting the following error while building using AWS Lex?
"checksum must be specified in PUT API, when the resource already exists"
Can someone tell what it means and how to fix it?

I was getting the same error when building my bot in the console. I found the answer here.
Refresh the page and then set the version of the bot to Latest.

The documentation states that you have to provide the checksum of a bot that already exists if you are trying to update it: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/LexModelBuildingService.html#putBot-property
"checksum — (String)
Identifies a specific revision of the $LATEST version.
When you create a new bot, leave the checksum field blank. If you specify a checksum you get a BadRequestException exception.
When you want to update a bot, set the checksum field to the checksum of the most recent revision of the $LATEST version. If you don't specify the checksum field, or if the checksum does not match the $LATEST version, you get a PreconditionFailedException exception."
That's the aws-sdk for JavaScript docs, but the same concept applies to any SDK as well as the AWS CLI.
This requires calling get-bot first, which will return the checksum of the bot among other data. Save that checksum somewhere and pass it in the params when you call put-bot
I would recommend using the tutorials here: https://docs.aws.amazon.com/lex/latest/dg/gs-console.html
That tutorial demonstrates using the AWS CLI, but the same concepts can be abstracted to use any SDK you desire.

Had the same problem.
I guess once you have published one bot, you can not anymore modify or build it.
Create another bot.

First time protege user, trying to export a simple ontology to AWS dynamodb

I am currently using protege 5.0 and have created a very simple ontology (the pizza example). I was wondering how I would export this ontology to dynamodb on AWS. I was hoping someone could post a link to a good tutorial on protege 5.0 or walk me through this. Thanks!

If you are using dynamodb just to store the content of a file and to be able to access the file at a specific URL, then the process required is just the same as for any other file type you would store on dynamodb. The default way for Protege and most other OWL related tools to access an ontology is a simple HTTP get from a provided IRI.

Is it possible to purge an entire directory with the Akamai CCU REST API using a wildcard?

I think the title says it all. We have a site that uses a URL parts to specify locale, category, product, and product variation. For example:
/[country-code]/[category-slug]/[product-slug]/[variation-id]
As we support about 10 different locales, and some products have up to 30 variations, if we change a category, we can sometimes be trying to purge up to 1500 urls.
Is it possible using the Akamai CCU REST API to add a wildcard value, say for the country code, or variation id?
/*/[category-slug]/[product-slug]/*
I have seen some mentions of wildcards around but I'm not sure if they're supported by the API

This should be possible. When you go to CCU, click on "Refresh by Directory & File Extension". Hope this helps.

You can do a purge by CPCode.
I've just written a getting started guide for the CCU API:
https://community.akamai.com/community/developer/blog/2015/08/19/getting-started-with-the-v2-open-ccu-api?sr=stream
The only difference for your case is that you'll want to allow purge by CPCode, and set up a CPCode for the file areas you want to purge at once.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Update CloudSearch document using Python boto - amazon-web-services

Related

Automatically migrate JSON data to newest version of JSON schema

Bulk Tag Bigquery columns with python & Google Cloud Datacatalog

"checksum must be specified in PUT API, when the resource already exists"

First time protege user, trying to export a simple ontology to AWS dynamodb

Is it possible to purge an entire directory with the Akamai CCU REST API using a wildcard?

Categories

Resources