aws amplify datastore syncing the whole database - amazon-web-services

In my DynamoDb I have about 200k datapoints, there will be more in the future. When I logout my local datastorage gets cleared. When I log in, datastore starts to sync it with the cloud. The problem is that the syncing is taking really long for over 200k datapoints. The datapoints are sensorik data that its displayed on a chart.
My idea is to fetch only the data directly from the database which I need without bloating up my entire local storage.
Is there a way to fetch the data what I need without saving it into the offline storage? I was thinking to rather use AWS timeseries for my chart data.

SyncExpression Configuration is required for fetch the specific data based on your need.
DOC: https://docs.amplify.aws/lib/datastore/sync/q/platform/js/
import { DataStore, syncExpression } from 'aws-amplify';
import { Post, Comment } from './models';
DataStore.configure({
syncExpressions: [
syncExpression(Post, () => {
return post => post.rating.gt(5);
}),
syncExpression(Comment, () => {
return comment => comment.status.eq('active');
}),
]
});

Related

Cloud Function to import data into CloudSQL from cloud storage bucket but getting already schema exist error

I'm trying to import data into CloudSQL instance from cloud storage bucket using cloud function.
How can i delete schema's before importing the data using a single cloud function?
I am using Node.js in cloud function.
error:
error: exit status 3 stdout(capped at 100k bytes): SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET stderr: ERROR: schema "< >" already exists
https://cloud.google.com/sql/docs/mysql/admin-api/rest/v1beta4/instances/import
in below code where do i need to put delete all existing schema's apart from public schema?
Entry point : importDatabase
index.js
const {google} = require('googleapis');
const {auth} = require("google-auth-library");
var sqlAdmin = google.sqladmin('v1beta4');
exports.importDatabase = (_req, res) => {
async function doIt() {
const authRes = await auth.getApplicationDefault();
let authClient = authRes.credential;
var request = {
project: 'my-project', // TODO: Update placeholder value.
instance: 'my-instance', // TODO: Update placeholder value.
resource: {
importContext: {
kind: "sql#importContext",
fileType: "SQL", // CSV
uri: <bucket path>,
database: <database-name>
// Options for importing data as SQL statements.
// sqlimportOptions: {
// /**
},
auth: authClient,
};
sqladmin.instances.import(request, function(err, result) {
if (err) {
console.log(err);
} else {
console.log(result);
}
res.status(200).send("Command completed", err, result);
});
}
doIt();
};
package.json
{
"name": "import-database",
"version": "0.0.1",
"dependencies": {
"googleapis": "^39.2.0",
"google-auth-library": "3.1.2"
}
}
The error looks to be occurring due to a previous aborted import managed to transfer the "schema_name" schema, and then this subsequent import was done without first re-initializing the DB,check helpful document on Cloud SQL import
One way to prevent this issue is to change the create statements in the SQL file from:
CREATE SCHEMA schema_name;
to
CREATE SCHEMA IF NOT EXISTS schema_name;
As far the removing of currently created schema is considered by default, only user or service accounts with the Cloud SQL Admin (roles/cloudsql.admin) or Owner (roles/owner) role have the permission to delete a Cloud SQL instance,please check the helpful document on cloudsql.instances.delete to help you understand the next steps.You can also define an IAM custom role for the user or service account that includes the cloudsql.instances.delete permission. This permission is supported in IAM custom roles.
As a best practice for import export operations, we recommend that you adopt the principle of least privilege, which in this case would mean creating a custom role and adding that specific permission and assigning it to your service account. Alternatively, the service account could be given the “Cloud SQL Admin” role, or the “Cloud Composer API Service Agent” role, which include this permission, and would therefore allow you to execute this command.
NOTE:It is recommended and advised to revalidate any delete actions performed as this may lead to loss of useful data.

CRMint - Import audiences from BigQuery to Google Analytics

I am not sure what's the easiest way to import audiences from BigQuery to Google Analytics. I have a BigQuery table called dataset_a.georgia with 2 columns fullVisitorId (string) and predictions (string).
Based on CRMint documentation, there is a worker called GAAudiencesUpdater which is doing what I am looking for.
However, one of the required parameters is a GA audience JSON template. I have a hard time to understand how exactly I should write this JSON as the basic JSON example is pretty long and hard to understand. I just have 2 columns to import, the fullVisitorId is already a default variable in Google Analytics while I just created a custom dimension called predictions with the index 183 and user scope.
I talked to a senior developer who had a bit of experience with CRMint and he suggested me to not use the GAAudiencesUpdater because of how hard it is to write the GA audience JSON template. He suggested me to import my BigQuery data into a CSV, store that CSV into Google Cloud Storage before to import that CSV to Google Analytics with the worker to GADataImporter.
Any suggestions?
They updated the version. You don't have to import JSON now.
You can find here an example
GAAudiencesUpdater
Here's the basic Google Analytics JSON template example:
{
"audienceType": "STATE_BASED",
"linkedAdAccounts": [
{
"linkedAccountId": "{% AW_ACCOUNT_ID %}",
"type": "ADWORDS_LINKS"
}
],
"linkedViews": [
{% GA_VIEW_ID %}
],
"name": "CRMint Tier GA %(tier)i",
"stateBasedAudienceDefinition": {
"includeConditions": {
"daysToLookBack": 7,
"segment": "users::condition::ga:dimension{% CD_TIER %}=#%(code)s",
"isSmartList": false,
"membershipDurationDays": %(duration)i
},
"excludeConditions": {
"segment": "users::condition::ga:dimension{% CD_TIER %}==dur93j#%(tier)ihg#2d6",
"exclusionDuration": "TEMPORARY"
}
}
}

How to return an entire Datastore table by name using Node.js on a Google Cloud Function

I want to retrieve a table (with all rows) by name. I want to HTTP request using something like this on the body {"table": user}.
Tried this code without success:
'use strict';
const {Datastore} = require('#google-cloud/datastore');
// Instantiates a client
const datastore = new Datastore();
exports.getUsers = (req, res) => {
//Get List
const query = this.datastore.createQuery('users');
this.datastore.runQuery(query).then(results => {
const customers = results[0];
console.log('User:');
customers.forEach(customer => {
const cusKey = customer[this.datastore.KEY];
console.log(cusKey.id);
console.log(customer);
});
})
.catch(err => { console.error('ERROR:', err); });
}
Google Datastore is a NoSQL database that is working with entities and not tables. What you want is to load all the "records" which are "key identifiers" in Datastore and all their "properties", which is the "columns" that you see in the Console. But you want to load them based the "Kind" name which is the "table" that you are referring to.
Here is a solution on how to retrieve all the key identifiers and their properties from Datastore, using HTTP trigger Cloud Function running in Node.js 8 environment.
Create a Google Cloud Function and choose the trigger to HTTP.
Choose the runtime to be Node.js 8
In index.js replace all the code with this GitHub code.
In package.json add:
{
"name": "sample-http",
"version": "0.0.1",
"dependencies": {
"#google-cloud/datastore": "^3.1.2"
}
}
Under Function to execute add loadDataFromDatastore, since this is the name of the function that we want to execute.
NOTE: This will log all the loaded records into the Stackdriver logs
of the Cloud Function. The response for each record is a JSON,
therefore you will have to convert the response to a JSON object to
get the data you want. Get the idea and modify the code accordingly.

Google Cloud Firestore Triggers

How to add a new attribute for a newly created document by cloud function that was triggered by onCreate() cloud firestore trigger.
Does one can use same approach to upadate a document in client side as well as to server side ie in Cloud Functions?
Per the docs, you can use event.data.ref to perform operations:
exports.addUserProperty = functions.firestore
.document('users/{userId}')
.onCreate(event => {
// Get an object representing the document
// e.g. {'name': 'Marie', 'age': 66}
var data = event.data.data();
// add a new property to the user object, write it to Firestore
return event.data.ref.update({
"born": "Poland"
});
});

Copying one table to another in DynamoDB

What's the best way to identically copy one table over to a new one in DynamoDB?
(I'm not worried about atomicity).
Create a backup(backups option) and restore the table with a new table name. That would get all the data into the new table.
Note: Takes considerable amount of time depending on the table size
I just used the python script, dynamodb-copy-table, making sure my credentials were in some environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), and it worked flawlessly. It even created the destination table for me.
python dynamodb-copy-table.py src_table dst_table
The default region is us-west-2, change it with the AWS_DEFAULT_REGION env variable.
AWS Pipeline provides a template which can be used for this purpose: "CrossRegion DynamoDB Copy"
See: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-crossregion-ddb-create.html
The result is a simple pipeline that looks like:
Although it's called CrossRegion you can easily use it for the same region as long the destination table name is different (Remember that table names are unique per account and region)
You can use Scan to read the data and save it to the new table.
On the AWS forums a guy from the AWS team posted another approach using EMR: How Do I Duplicate a Table?
Here's one solution to copy all items from one table to another, just using shell scripting, the AWS CLI and jq. Will work OK for smallish tables.
# exit on error
set -eo pipefail
# tables
TABLE_FROM=<table>
TABLE_TO=<table>
# read
aws dynamodb scan \
--table-name "$TABLE_FROM" \
--output json \
| jq "{ \"$TABLE_TO\": [ .Items[] | { PutRequest: { Item: . } } ] }" \
> "$TABLE_TO-payload.json"
# write
aws dynamodb batch-write-item --request-items file://"$TABLE_TO-payload.json"
# clean up
rm "$TABLE_TO-payload.json"
If you both tables to be identical, you'd want to delete all items in TABLE_TO first.
DynamoDB now supports importing from S3.
https://aws.amazon.com/blogs/database/amazon-dynamodb-can-now-import-amazon-s3-data-into-a-new-table/
So, probably in almost all use cases, the easiest and cheapest way to replicate a table is
Use "Export to S3" feature to dump entire table into S3. Since this uses backup to generate the dump, table's throughput is not affected, and is very fast as well. You need to have backups (PITR) enabled. See https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/
Use "Import from S3" to import the dump created in step 1. This automatically requires you to create a new table.
Use this node js module : copy-dynamodb-table
This is a little script I made to copy the contents of one table to another.
It's based on the AWS-SDK v3. Not sure how well it would scale to big tables but as a quick and dirty solution it does the job.
It gets your AWS credentials from a profile in ~/.aws/credentials change default to the name of the profile you want to use.
Other than that it takes two args one for the source table and one for destination
const { fromIni } = require("#aws-sdk/credential-providers");
const { DynamoDBClient, ScanCommand, PutItemCommand } = require("#aws-sdk/client-dynamodb");
const ddbClient = new DynamoDBClient({
credentials: fromIni({profile: "default"}),
region: "eu-west-1",
});
const args = process.argv.slice(2);
console.log(args)
async function main() {
const { Items } = await ddbClient.send(
new ScanCommand({
TableName: args[0],
})
);
console.log("Successfully scanned table")
console.log("Copying", Items.length, "Items")
const putPromises = [];
Items.forEach((item) => {
putPromises.push(
ddbClient.send(
new PutItemCommand({
TableName: args[1],
Item: item,
})
)
);
});
await Promise.all(putPromises);
console.log("Successfully copied table")
}
main();
Usage
node copy-table.js <source_table_name> <destination_table_name>
Python + boto3 🚀
The script is idempotent as far as you maintain the same Keys.
import boto3
def migrate(source, target):
dynamo_client = boto3.client('dynamodb', region_name='us-east-1')
dynamo_target_client = boto3.client('dynamodb', region_name='us-west-2')
dynamo_paginator = dynamo_client.get_paginator('scan')
dynamo_response = dynamo_paginator.paginate(
TableName=source,
Select='ALL_ATTRIBUTES',
ReturnConsumedCapacity='NONE',
ConsistentRead=True
)
for page in dynamo_response:
for item in page['Items']:
dynamo_target_client.put_item(
TableName=target,
Item=item
)
if __name__ == '__main__':
migrate('awesome-v1', 'awesome-v2')
On November 29th, 2017 Global Tables was introduced. This may be useful depending on your use case, which may not be the same as the original question. Here are a few snippets from the blog post:
Global Tables – You can now create tables that are automatically replicated across two or more AWS Regions, with full support for multi-master writes, with a couple of clicks. This gives you the ability to build fast, massively scaled applications for a global user base without having to manage the replication process.
...
You do not need to make any changes to your existing code. You simply send write requests and eventually consistent read requests to a DynamoDB endpoint in any of the designated Regions (writes that are associated with strongly consistent reads should share a common endpoint). Behind the scenes, DynamoDB implements multi-master writes and ensures that the last write to a particular item prevails. When you use Global Tables, each item will include a timestamp attribute representing the time of the most recent write. Updates are propagated to other Regions asynchronously via DynamoDB Streams and are typically complete within one second (you can track this using the new ReplicationLatency and PendingReplicationCount metrics).
Another option is to download the table as a .csv file and upload it with the following snippet of code.
This also eliminates the need for providing your AWS credentials to a packages such as the one #ezzat suggests.
Create a new folder and add the following two files and your exported table
Edit uploadToDynamoDB.js and add the filename of the exported table and your table name
Run npm install in the folder
Run node uploadToDynamodb.js
File: Package.json
{
"name": "uploadtodynamodb",
"version": "1.0.0",
"description": "",
"main": "uploadToDynamoDB.js",
"author": "",
"license": "ISC",
"dependencies": {
"async": "^3.1.1",
"aws-sdk": "^2.624.0",
"csv-parse": "^4.8.5",
"fs": "0.0.1-security",
"lodash": "^4.17.15",
"uuid": "^3.4.0"
}
}
File: uploadToDynamoDB.js
var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');
var _ = require('lodash')
var AWS = require('aws-sdk');
// If your table is in another region, make sure to update this
AWS.config.update({ region: "eu-central-1" });
var ddb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });
var csv_filename = "./TABLE_CSV_EXPORT_FILENAME.csv";
var tableName = "TABLENAME"
function prepareData(data_chunk) {
const items = data_chunk.map(obj => {
const keys = Object.keys(obj)
let attr = Object.values(obj)
attr = attr.map(a => {
let newAttr;
// Can we make this an integer
if (isNaN(Number(a))) {
newAttr = { "S": a }
} else {
newAttr = { "N": a }
}
return newAttr
})
let item = _.zipObject(keys, attr)
return {
PutRequest: {
Item: item
}
}
})
var params = {
RequestItems: {
[tableName]: items
}
};
return params
}
rs = fs.createReadStream(csv_filename);
parser = parse({
columns : true,
delimiter : ','
}, function(err, data) {
var split_arrays = [], size = 25;
while (data.length > 0) {
split_arrays.push(data.splice(0, size));
}
data_imported = false;
chunk_no = 1;
async.each(split_arrays, function(item_data, callback) {
const params = prepareData(item_data)
ddb.batchWriteItem(
params,
function (err, data) {
if (err) {
console.log("Error", err);
} else {
console.log("Success", data);
}
});
}, function() {
// run after loops
console.log('all data imported....');
});
});
rs.pipe(parser);
It's been a very long time since the question was posted and AWS has been continuously improvising features. At the time of writing this answer, we have the option to export the Table to S3 bucket then use the import feature to import this data from S3 into a new table which automatically will re-create a new table with the data. Plese refer this blog for more idea on export & import
Best part is that you get to change the name, PK or SK.
Note: You have to enable PITR (might incur additional costs). Always best to refer documents.
Here is another simple python util script for this: ddb_table_copy.py. I use it often.
usage: ddb_table_copy.py [-h] [--dest-table DEST_TABLE] [--dest-file DEST_FILE] source_table
Copy all DynamoDB items from SOURCE_TABLE to either DEST_TABLE or DEST_FILE. Useful for migrating data during a stack teardown/re-creation.
positional arguments:
source_table Name of source table in DynamoDB.
optional arguments:
-h, --help show this help message and exit
--dest-table DEST_TABLE
Name of destination table in DynamoDB.
--dest-file DEST_FILE
2) a valid file path string to save the items to, e.g. 'items.json'.