Automatically migrate JSON data to newest version of JSON schema

Automatically migrate JSON data to newest version of JSON schema - c++

I have a service running on my linux machine that reads data stored in a .json file when the machine is booting. The service then validates the incoming JSON data and modifies specific system configurations according to the data. The service is written in C++ and for the validation im using https://github.com/pboettch/json-schema-validator.
In development it was easy to modify the JSON schema and just adapt the data manually. I've started to use semantic versioning for my JSON schema and included it the following way:
JSON schema:
{
"$id": "https://my-company.org/schemas/config/0.1.0/config.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
// Start of Schema definition
}
JSON data:
{
"$schema": "https://my-comapny.org/schemas/config/0.1.0/config.schema.json",
// Rest of JSON data
}
With the addition of the version, I am able to check if a version mismatch exists before validating.
What I am looking for is a way to automatically migrate the JSON data to match the newer schema version, if a version mismatch is identified. Is there any way to automatically achieve this, or is the only way to manually edit the JSON data to match the schema?
Since I plan on releasing this as open source I would really like to include some form of automatic migration so I can just ask the user if he wants to migrate to conform to the newest schema version instead of throwing an error, if a version mismatch was identified.

What you're asking for is something which will need to make assumptions to work.
This is an age old problem and similar for databases. You can have schema migrations generated with many simple changes, but this is not viable if you wish to translate existing data automatically too.
Let's look at a basic example. You rename a field.
How would a tool know you've renamed a field vs removed an old one and added a new one? It essentially, cannot.
So, you need to write your migrations by hand.
You could use JSON transformation tools like jq or fx to create migration scripts without writing it in code, which may or may not be preferable. (jq has a steeper learning curve but it's also very powerful.)

Related

TEIID Importing ddl into vdb ddl

Currently my VDB DDL file is getting quite big. I want to split into different files using the following.
IMPORT FROM REPOSITORY "DDL-FILE"
INTO test OPTIONS ("ddl-file" '/path/to/schema1.ddl')
However, this does not seem to work.
Can the DDL file path be relative, how?
The schema test, can it be VIRTUAL?
Does "DDL-FILE" refer to "ddl-file"?
What should I put in my main VDB ddl and what should I put in my extra ddl's. Should the
extra ddl's contain server configuration details or should they be defined as a VDB.
I would like to see a working example on how to use this.
This will be used in a teiid springboot project where you can only load one main vdb file. It is not workable to have one very large ddl file.
I tried multiple approaches but it does not seem to work, either giving me a null pointer with no error codes or error codes that tell me nothing.
Also the syntax in Teiid 9.3 seems different:
IMPORT FOREIGN SCHEMA public
FROM REPOSITORY DDL-FILE
INTO test OPTIONS ("ddl-file" '/path/to/schema.ddl')

This feature is currently not implemented in Teiid Spring Boot. This issue is captured in https://issues.redhat.com/browse/TEIIDSB-219
Update: I added the needed code to master, should be available with 1.7 release meanwhile you can build the master branch and test it out.

Is it possible to export a Lucidchart diagram to json and then import that into draw.io?

I am trying to convert a large Lucidchart diagram that took quite a while to Draw.io. Draw.io recommends ctr-a, ctr-c, ctr-v, but that doesn't seem to be working. Draw.io also cryptically mentions however:
draw.io supports importing the Lucidchart JSON file format. Lucidchart makes it difficult to obtain that data, so the easiest way to import is to copy and paste from editor to editor.
Has anyone ever figured out how to get the this json from Lucidchart?

Essentially, you're asking about Lucidchart's JSON exportability. Lucidchart supports JSON export from their Cloud Insights product - and steps on how to export here.
Note: this is not going to work for the standard ULM or chart style diagrams, and JSON isn't one of the current export options for the standard diagrams.
One thing you could try to hack together would be to connect your Lucidchart account to one of Zapier's two "search" actions, and then use a trigger to send and structure that data to an application like Firebase / Cloud Firestore. Once in the database, you could export the JSON file. (I haven't tried this particular use case before, but have used Zapier to successfully create a JSON tree structure from data coming from multiple applications). Hope this is helpful.

WSO2 Siddhi RDBMS Store Extension - how to set batchEnable to false

I'm using siddhi to create some app which also interacts with PostgreSQL DB. Although I'm not sure, I believe, there is a bug about making multiple updates on the same PG table, within a single event (i.e. upon receiving an event, update a record in the table, and create another one again in the same table) it seems the batch updates are causing some problems. SO, I just want to give it a try after disabling batchUpdate (it is enabled by default). I just don't know how to configure it using siddhi-sdk (via Intellij plugin). There are two related tickets:
https://github.com/wso2-extensions/siddhi-store-rdbms/issues/43
https://github.com/wso2/product-sp/issues/472
Until these are documented, I'd like to get some quick response how to set these fields.
Best regards...

I'm using siddhi to create some app which also interacts with PostgreSQL DB. Although I'm not sure, I believe, there is a bug about making multiple updates on the same PG table, within a single event (i.e. upon receiving an event, update a record in the table, and create another one again in the same table) it seems the batch updates are causing some problems.
When batchEnabled has been set to true, it will perform the insert/update operation on batch of events instead of performing those operations on each and every single event. Simply, this has been introduced to improve the performance.
The default value of this parameter is currently set to "true".
However, batchEnable configurations is done through a system parameter called, "{{RDBMS-Name}}.batchEnable" which have to be configured in the WSO2 Stream Processor's deployment.yaml
If you want to overide this property in Product-SP please find the steps below.
Open the deployment.yaml file located in {Product-SP-Home}/conf/editor/
Insert the following lines in the file.
siddhi:
extensions:
extension:
name: store
namespace: rdbms
properties:
PostgreSQL.batchEnable: true
But currently there is no way to overwrite those system configurations from the siddhi app level. Since you are using the SDK, what you can do is changing the default value of above parameter to "false".
Please find the steps below do it.
Find the siddhi-store-rdbms-4.x.xx.jar file in the siddhi
sdk. This is located in the {siddhi-sdk-home}/lib/ .
Open the jar file using an archive manager and open the
rdbms-table-config.xml file located inside it with a text editor.
Set false in <batchEnable>true</batchEnable> attribute under the
<database name="PostgreSQL"> tag and save it.

Thanks Raveen. with a simple dash (-) before "extension" I was able to set the config.
siddhi:
extensions:
- extension:
name: store
namespace: rdbms
properties:
PostgreSQL.batchEnable: false

When using cloud-init, is it possible to both specify a boot script and a data blob that can be queried from inside the instance?

I have some current instances that get some data by passing a json blob through the user data string. I would like to also pass a script to be run at boot time through the user data. Is there a way to do both of these things? I've looked at cloud-config, but setting an arbitrary value doesn't seem to be one of the options.

You're correct that on EC2, there is only one 'user-data' blob that can be specified. Cloud-init addresses this limitation by allowing the blob to be an "archive" format of sorts.
Mime Multipart
Cloud-config Archive
cloud-config archive is unfortunately not documented now, but there is an example in doc/examples/cloud-config-archive.txt. It is expected to be yaml and start with '#cloud-config-archive'. Note that yaml is a strict superset of json, so any thing that can dump json can be used to produce this yaml.
Both of these formats require changes to all consumers to "share" the shared resource of user-data. cloud-init will ignore mime types that it does not understand, and handle those that it does. You'd have to modify the other application producing and consuming user-data to do the same.

Well, cloud-init supports multi-part MIME. With that in mind you could have your boot script as one part, then a custom mime part. Note that you would need to write a python handler that tells cloud-init what to do with that part (most likely moving it to wherever your app expects it). This handler code ends up in the handlers directory as described here.

Update CloudSearch document using Python boto

I am using the latest boto tools for Python to add and search documents on Amazon CloudSearch. I haven't been able to find any documentation regarding the updates of documents. There is documentation for the old API here: http://boto.readthedocs.org/en/latest/cloudsearch_tut.html. Here, when adding a document you give a version number, and to quote the docs:
If you wish to update a document, you must use a higher version ID.
However, I don't find this feature in the boto namespaces for the new API (the ones with cloudsearch2). The add function no longer takes a version.
Currently what I am doing to update a document is getting it by ID, then adding it again. The logic of updating the fields is on my side.
What would be nice is to add a document with the same ID and higher version number and only fill in the fields that you want overridden, and the document should be updated.
Is there still a way to use the version of a document in the new boto API?

There is no way to use 'version' with the new boto API (cloudsearch2) because that library is built for CloudSearch version 2013-01-01, which removed the version field.
CloudSearch also does not allow you to selectively update certain fields of a document, although I agree that would be useful. This was not possible with the old version either.
This describes the SDF format for submitting documents that boto is implementing for you, in case you want to have a look at the underlying interface.
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/preparing-data.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js