Update only one column in GCP datastore table - google-cloud-platform

I want to update only one column in GCP Datastore table.
For Example : Table has columns id, name, descriptions, price, data.
I receive data to update only descriptions. I want to update only descriptions column without reading other data.(want to avoid read before write)
It is possible to update only column of datastore without reading data from datastore.
If not what other database in GCp allow to do it?

Cloud Datastore is a document database which stores entities, and there are no fixed columns or schema. Instead, each entity can have a different set of properties, which are similar to columns in a traditional relational database.check this document for more information
You cannot update specific properties of an entity.As this documentation says you have to update the entire entity.To update a specific property of an entity, you would need to retrieve the entire entity, modify the desired property, and then write the entire entity back to the database.

Related

How to fetch the latest schema change in BigQuery and restore deleted column within 7 days

Right now I fetch columns and data type of BQ tables via the below command:
SELECT COLUMN_NAME, DATA_TYPE
FROM `Dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name="User"
But if I drop a column using command : Alter TABLE User drop column blabla:
the column blabla is not actually deleted within 7 days(TTL) based on official documentation.
If I use the above command, the column is still there in the schema as well as the table Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
It is just that I cannot insert data into such column and view such column in the GCP console. This inconsistency really causes an issue.
If I want to write bash script to monitor schema changes and do some operation based on it.
I need more visibility on the table schema of BigQuery. The least thing I need is:
Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS can store a flag column that indicates deleted or TTL:7days
My questions are:
How can I fetch the correct schema in spanner which reflects the recently deleted the column?
If the column is not actually deleted, is there any way to easily restore it?
If you want to fetch the recently deleted column you can try searching through Cloud Logging. I'm not sure what tools Spanner supports but if you want to use Bash you can use gcloud to fetch logs. Though it will be difficult to parse the output and get the information you want.
Command used below fetched the logs for google.cloud.bigquery.v2.JobService.InsertJob since an ALTER TABLE is considered as an InsertJob and filter it based from the actual query where it says drop. The regex I used is not strict (for the sake of example), I suggest updating the regex to be stricter.
gcloud logging read 'protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob" AND protoPayload.metadata.jobChange.job.jobConfig.queryConfig.query=~"Alter table .*drop.*"'
Sample snippet from the command above (Column PADDING is deleted based from the query):
If you have options other than Bash, I suggest that you create a BQ sink for your logging and you can perform queries there and get these information. You can also use client libraries like Python, NodeJS, etc to either query in the sink or directly query in the GCP Logging.
As per this SO answer, you can use the time travel feature of BQ to query the deleted column. The answer also explains behavior of BQ to retain the deleted column within 7 days and a workaround to delete the column instantly. See the actual query used to retrieve the deleted column and the workaround on deleting a column on the previously provided link.

Perform data mapping in GCP

I have data coming from multiple hotels. These hotels are not using the same naming convention for storing the order information. I have a predefined dataset created in the bigquery(called hotel_order). I
want to map the data coming from different hotels to the single dataset in GCP, so it is easier for me to do comparisons in the bigquery.
If the column name(from hotel1) matches the bigquery dataset columnname, then the bigquery should load the data in the column, if the columnnames (from hotel orders data and dataset in bigquery) don't match, then column in the bigquery should have the null value. How do I do implement this in GCP? Problem of mapping in the GCP?
If you want to join tables together, and show a null value when a match doesn't exist, then you can do so using 'left join'.
Rough example
from hotel.orders as main left join hotel_number_one as Hotel_One on main.order_information = Hotel_One.order_information
Difficult to give a more detailed answer without more details or a working example using dbfiddle.

How to get rid of __key__ columns in BigQuery table for every 'Record' Type field?

For every 'Record' Type of my Firestore table, BigQuery is automatically adding the 'key' columns. I do not want to have these added for each of the 'Record' Type fields. How can I get rid of these extra columns automatically being added by BigQuery? (I want to get rid of the below columns in my BigQuery table schema highlighted in yellow)
This is intended behavior, citing Bigquery GCP documentation:
Each document in Firestore has a unique key that contains
information such as the document ID and the document path. BigQuery
creates a RECORD data type (also known as a STRUCT) for the key,
with nested fields for each piece of information, as described in the
following table.
Due to the fact that Firestore export method is fully integrated with GCP managed import and export service, you can't change this behavior, excluding __key__.* properties being sent for each RECORD field in the target Bigquery table.
I guess in your use case, Bigquery table modification action will require some hand-on intervention, since it requires manually changing schema data.
In order to set up this feasibility I would encourage you to raise a service request to the vendor via Google public issue tracker.

Automatically generate data documentation in the Redshift cluster

I am trying to automatically generate a data documentation in the Redshift cluster for all the maintained data products, but I am having trouble to do so.
Is there a way to fetch/store metadata about tables/columns in redshift directly?
Is there also some automatic way to determine what are the unique keys in a Redshift table?
For example an ideal solution would be to have:
Table location (cluster, schema, etc.)
Table description (what is the table for)
Each column's description (what is each column for, data type, is it a key column, if so what type, etc.)
Column's distribution (min, max, median, mode, etc.)
Columns which together form a unique entry in the table
I fully understand that getting the descriptions automatically is pretty much impossible, but I couldn't find a way to store the descriptions in redshift directly, instead I'd have to use 3rd party solutions or generally a documentation outside of the SQL scripts, which I'm not a big fan of, due to the way the data products are built right now. Thus having a way to store each table's/column's description in redshift would be greatly appreciated.
Amazon Redshift has the ability to store a COMMENT on:
TABLE
COLUMN
CONSTRAINT
DATABASE
VIEW
You can use these comments to store descriptions. It might need a bit of table joining to access.
See: COMMENT - Amazon Redshift

Data Model in DynamoDB

When using Mobile Hub (AWS), building a DynamoDB table. There is at some point the option to download the Data Model for the table. But we do not see this option (AFAIK) if we do not use Mobile Hub. So the question is: Is there a way to get the Data Model for the table, when not using Mobile Hub?
Just to clarify, DynamoDB doesn't have a full data model like RDBMS. However, it does have the hash key, partition key (if defined) and all the index details.
You can get this information using Describe table API. The API will give the output in JSON format. Kindly look at the link for more information.
Please note that all the non-key attributes are not included in the data model. This is the basic concept in NoSQL database and this is the flexibility of NoSQL database when compared to RDBMS.
The item structure (non-key attributes) need not be defined while
creating the table. In fact, DynamoDB doesn't allow to define the
non-key attributes while creating the table
The non-key attributes in one item need not be same in the another
item