How can I add new training phrases to an intent in Dialogflow CX using its API? - google-cloud-platform

I want to know if it's possible to train Dialogflow CX through API. By placing the new training phrases in my code (I am using NodeJS) and automatically update the list of phrases in that intent. One thing to add, I want to add a new phrase to the intent list no update an existing phrase.
Thank you in advance!
I was reading the documentation of Dialogflow CX and found this, https://github.com/googleapis/nodejs-dialogflow-cx/blob/main/samples/update-intent.js. But, this implementation will update a specific phrase instead of add it to the list.

Using the sample code that you have provided in your question, I updated it to show how to add a new phrase to the list. newTrainingPhrase will contain the training phrase, append newTrainingPhrase to intent[0].trainingPhrases and set updateMask to "training_phrases" to point to the part of the intent you would like to update.
See code below:
'use strict';
async function main(projectId, agentId, intentId, location, displayName) {
const {IntentsClient} = require('#google-cloud/dialogflow-cx');
const intentClient = new IntentsClient({apiEndpoint: 'us-central1-dialogflow.googleapis.com'});
async function updateIntent() {
const projectId = 'your-project-id';
const agentId = 'your-agent-id';
const intentId = 'your-intent-id';
const location = 'us-central1'; // define your location
const displayName = 'store.hours'; // define display name
const agentPath = intentClient.projectPath(projectId);
const intentPath = `${agentPath}/locations/${location}/agents/${agentId}/intents/${intentId}`;
//define your training phrase
var newTrainingPhrase = {
"parts": [
{
"text": "What time do you open?",
"parameterId": ""
}
],
"id": "",
"repeatCount": 1
};
const intent = await intentClient.getIntent({name: intentPath});
intent[0].trainingPhrases.push(newTrainingPhrase);
const updateMask = {
paths: ['training_phrases'],
};
const updateIntentRequest = {
intent: intent[0],
updateMask,
languageCode: 'en',
};
//Send the request for update the intent.
const result = await intentClient.updateIntent(updateIntentRequest);
console.log(result);
}
updateIntent();
}
process.on('unhandledRejection', err => {
console.error(err.message);
process.exitCode = 1;
});
main(...process.argv.slice(2));

Related

Google cloud video intelligence can't annotate multiple features

I've been using Google Cloud Video Intelligence for text detection. Now, I want to use it for speech transcription so I added SPEECH_TRANSCRIPTION feature to TEXT_DETECTION but the response only contains result for one feature, the last one.
const gcsUri = 'gs://path-to-the-video-on-gcs'
const request = {
inputUri: gcsUri,
features: ['TEXT_DETECTION', 'SPEECH_TRANSCRIPTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const [operationResult] = await operation.promise();
const annotationResult = operationResult.annotationResults[0]
const textAnnotations = annotationResult.textAnnotations
const speechTranscriptions = annotationResult.speechTranscriptions
console.log(textAnnotations) // --> []
console.log(speechTranscriptions) // --> [{...}]
Is this a case where annotation is performed on only one feature at a time?
Annotation will be performed for both features. Below is an example code.
const videoIntelligence = require('#google-cloud/video-intelligence');
const client = new videoIntelligence.VideoIntelligenceServiceClient();
const gcsUri = 'gs://cloud-samples-data/video/JaneGoodall.mp4';
async function analyzeVideoTranscript() {
const videoContext = {
speechTranscriptionConfig: {
languageCode: 'en-US',
enableAutomaticPunctuation: true,
},
};
const request = {
inputUri: gcsUri,
features: ['TEXT_DETECTION','SPEECH_TRANSCRIPTION'],
videoContext: videoContext,
};
const [operation] = await client.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
console.log('Result------------------->');
console.log(results[0].annotationResults);
var i=1;
results[0].annotationResults.forEach(annotationResult=> {
console.log("annotation result no: "+i+" =======================>")
console.log(annotationResult.speechTranscriptions);
console.log(annotationResult.textAnnotations);
i++;
});
}
analyzeVideoTranscript();
N.B: What I have found is that annotationResult may not return the result in the same order of the declared features . You may want to change the code accordingly as per your need.

SAM invoke won't take local env vars

I have a sample SAM application with basic endpoints. I just want to run it locally by:
sam local invoke -e events/event-post-item.json putItemFunction --profile myprofile -n local.json
local.json is as follows:
{
"getAllItems": {
"SAMPLE_TABLE": "mywebservices-SampleTable-1BS18COYN2SHV"
},
"getById": {
"SAMPLE_TABLE": "mywebservices-SampleTable-1BS18COYN2SHV"
},
"putItem": {
"SAMPLE_TABLE": "mywebservices-SampleTable-1BS18COYN2SHV"
}
}
And following is the code for putItemFunction
// Create clients and set shared const values outside of the handler
// Create a DocumentClient that represents the query to add an item
const dynamodb = require('aws-sdk/clients/dynamodb');
const docClient = new dynamodb.DocumentClient();
// Get the DynamoDB table name from environment variables
const tableName = process.env.SAMPLE_TABLE;
/**
* A simple example includes a HTTP post method to add one item to a DynamoDB table.
*/
exports.putItemHandler = async (event) => {
const { body, httpMethod, path } = event;
if (httpMethod !== 'POST') {
throw new Error(`postMethod only accepts POST method, you tried: ${httpMethod} method.`);
}
// All log statements are written to CloudWatch by default. For more information, see
// https://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-logging.html
console.log('received:', JSON.stringify(event));
// Get id and name from the body of the request
const { id, name } = JSON.parse(body);
// Creates a new item, or replaces an old item with a new item
// https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html#put-property
const params = {
TableName: tableName,
Item: { id, name },
};
await docClient.put(params).promise();
const response = {
statusCode: 200,
body,
};
console.log(`response from: ${path} statusCode: ${response.statusCode} body: ${response.body}`);
return response;
};
I run this, and I get a "resource not found" error. I have made sure that the profile details are correct.
The proble is with this line in the handler: const tableName = process.env.SAMPLE_TABLE;
If I hard code the table name here, it works fine. Otherwise the function produces tableName value "SampleTable" always...
It should take the value form the env variables I have provided. Not "SampleTable"... What am I doing wrong?
The keys in the local environment file local.json are the lambda function names.
"putItemFunction:", not "putItem:" in your case.

How do I force to open a specific activity after a user set the alarm?

I'm trying to make dose schedule app that when the user set the alarm the app shows a page to check if the user takes a medicine or not. and the user should choose snooze or done with swiping ("done" to the left, "snooze" to the right).
I want the app gets opened automatically from the background on time.
I've already tried "nativescript-local-notification", but this one, the user must press the notification to open or enter the app and read "nativescript background service" but it seems to be the same as I've tried.
Could you tell me the way or give me some example to do?
I've solved it by myself. I put the solution that might be helped someone like me.
First you have set an alarm.
alarm.helper.js
import * as AlarmReceiver from '#/services/AlarmReceiver' // Do not remove
export const setAlarm = data => {
const ad = utils.ad
const context = ad.getApplicationContext()
const alarmManager = application.android.context.getSystemService(android.content.Context.ALARM_SERVICE)
const intent = new android.content.Intent(context, io.nerdrun.AlarmReceiver.class)
const { id, time, title, name } = data
// set up alarm
intent.putExtra('id', id)
intent.putExtra('title', title)
intent.putExtra('name', name)
intent.putExtra('time', time.toString())
const pendingIntent = android.app.PendingIntent.getBroadcast(context, id, intent, android.app.PendingIntent.FLAG_UPDATE_CURRENT)
alarmManager.setExact(alarmManager.RTC_WAKEUP, time.getTime(), pendingIntent)
console.log('registered alarm')
}
Extends AlarmReceiver on Android.
AlarmReceiver.js
export const AlarmReceiver = android.content.BroadcastReceiver.extend('io.nerdrun.AlarmReceiver', {
init: function() {
console.log('init receiver')
},
onReceive: function(context, intent) {
console.log('You got the receiver man!!')
const activityIntent = new android.content.Intent(context, com.tns.NativeScriptActivity.class)
const id = intent.getExtras().getInt('id')
const title = intent.getExtras().getString('title')
const name = intent.getExtras().getString('name')
const time = intent.getExtras().getString('time')
activityIntent.putExtra('id', id)
activityIntent.putExtra('title', title)
activityIntent.putExtra('name', name)
activityIntent.putExtra('time', time)
activityIntent.setFlags(android.content.Intent.FLAG_ACTIVITY_NEW_TASK)
context.startActivity(activityIntent)
}
})
register receiver to your manifest.
AndroidManifest.xml
<receiver android:name="io.nerdrun.AlarmReceiver" />
Of course, you can extend Activity on android into your project, but I haven't implemented it.
After the receiver worked it would navigate to Main Activity, you might control whatever you want in app.js below:
app.js
application.on(application.resumeEvent, args => {
if(args.android) {
console.log('resume succeed!!!')
const android = args.android
const intent = android.getIntent()
const extras = intent.getExtras()
if(extras) {
const id = extras.getInt('id')
const title = extras.getString('title')
const name = extras.getString('name')
const time = extras.getString('time')
Vue.prototype.$store = store
Vue.prototype.$navigateTo(routes.home, { clearHistory: true, props: props })
}
}
}
})

Alexa ASK Lambda bug

I'm trying to make a skill where after the LaunchRequest, an initial welcome message is played in the function StartGame asking the user for their school, and then the user says their school in the SetSchool intent, and then the skill says a message. Right now there's a bug in the last part, and I don't know how to debug it.
The error:
My code:
/* eslint-disable func-names */
/* eslint-disable dot-notation */
/* eslint-disable new-cap */
/* eslint quote-props: ['error', 'consistent']*/
/**
* This sample demonstrates a simple skill built with the Amazon Alexa Skills
* nodejs skill development kit.
* This sample supports en-US lauguage.
* The Intent Schema, Custom Slots and Sample Utterances for this skill, as well
* as testing instructions are located at https://github.com/alexa/skill-sample-nodejs-trivia
**/
'use strict';
const Alexa = require('alexa-sdk');
const questions = require('./question');
const ANSWER_COUNT = 4; // The number of possible answers per trivia question.
const GAME_LENGTH = 10; // The number of questions per trivia game.
const GAME_STATES = {
TRIVIA: '_TRIVIAMODE', // Asking trivia questions.
START: '_STARTMODE', // Entry point, start the game.
HELP: '_HELPMODE', // The user is asking for help.
};
const APP_ID = undefined; // TODO replace with your app ID (OPTIONAL)
const languageString = {
'en': {
'translation': {
'QUESTIONS': questions['HS_QUESTIONS_EN_US'],
'GAME_NAME': 'Science Bowl',
'HELP_MESSAGE': 'I will ask you %s multiple choice questions. Respond with the number of the answer. ' +
'For example, say one, two, three, or four. To start a new game at any time, say, start game. ',
'REPEAT_QUESTION_MESSAGE': 'To repeat the last question, say, repeat. ',
'ASK_MESSAGE_START': 'Would you like to start playing?',
...
},
},
};
const newSessionHandlers = {
'LaunchRequest': function () {
this.handler.state = GAME_STATES.START;
this.emitWithState('StartGame', true);
},
'SetSchool': function() {
this.handler.state = GAME_STATES.START;
this.emitWithState('School', true);
},
'AMAZON.StartOverIntent': function () {
this.handler.state = GAME_STATES.START;
this.emitWithState('StartGame', true);
},
'AMAZON.HelpIntent': function () {
this.handler.state = GAME_STATES.HELP;
this.emitWithState('helpTheUser', true);
},
'Unhandled': function () {
const speechOutput = this.t('START_UNHANDLED');
this.emit(':ask', speechOutput, speechOutput);
},
};
...
const startStateHandlers = Alexa.CreateStateHandler(GAME_STATES.START, {
'StartGame': function (newGame) {
let speechOutput = newGame ? this.t('NEW_GAME_MESSAGE', this.t('GAME_NAME')) + this.t('WELCOME_MESSAGE', GAME_LENGTH.toString()) : '';
this.handler.state = GAME_STATES.START;
this.emit(':ask', speechOutput, speechOutput);
},
'School': function(newGame) {
this.handler.state = GAME_STATES.START;
this.response.speak('test');
this.emit(':responseReady');
}
});
exports.handler = function (event, context) {
const alexa = Alexa.handler(event, context);
alexa.appId = APP_ID;
// To enable string internationalization (i18n) features, set a resources object.
alexa.resources = languageString;
alexa.registerHandlers(newSessionHandlers, startStateHandlers, triviaStateHandlers, helpStateHandlers); // these were defined earlier
alexa.execute();
};
I excluded most of the code so it would fit here. I would like to try and debug it but I don't even know how to view the error messages. What do I do?
If you're hosting on AWS Lambda logs can be found in CloudWatch.
From the AWS console open cloudwatch, then click on the Logs link in the left hand menu. you should be able to find your Lambda service from there.
That said your problem seems to be an issue of Intent definition by state.
You've already set the state to START but the startStateHandlers doesn't have the SetSchool Intent defined.
To fix you'd either have to add a SetSchool intent definition to the startStateHandlers OR reset the state to one that does contain the SetSchool intent before emitting your response in the StartGame handler.

How can I import bulk data from a CSV file into DynamoDB?

I am trying to import a CSV file data into AWS DynamoDB.
Here's what my CSV file looks like:
first_name last_name
sri ram
Rahul Dravid
JetPay Underwriter
Anil Kumar Gurram
In which language do you want to import the data? I just wrote a function in Node.js that can import a CSV file into a DynamoDB table. It first parses the whole CSV into an array, splits array into (25) chunks and then batchWriteItem into table.
Note: DynamoDB only allows writing up to 25 records at a time in batchinsert. So we have to split our array into chunks.
var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');
var csv_filename = "YOUR_CSV_FILENAME_WITH_ABSOLUTE_PATH";
rs = fs.createReadStream(csv_filename);
parser = parse({
columns : true,
delimiter : ','
}, function(err, data) {
var split_arrays = [], size = 25;
while (data.length > 0) {
split_arrays.push(data.splice(0, size));
}
data_imported = false;
chunk_no = 1;
async.each(split_arrays, function(item_data, callback) {
ddb.batchWriteItem({
"TABLE_NAME" : item_data
}, {}, function(err, res, cap) {
console.log('done going next');
if (err == null) {
console.log('Success chunk #' + chunk_no);
data_imported = true;
} else {
console.log(err);
console.log('Fail chunk #' + chunk_no);
data_imported = false;
}
chunk_no++;
callback();
});
}, function() {
// run after loops
console.log('all data imported....');
});
});
rs.pipe(parser);
Updated 2019 Javascript code
I didn't have much luck with any of the Javascript code samples above. Starting with Hassan Siddique answer above, I've updated to the latest API, included sample credential code, moved all user config to the top, added uuid()'s when missing and stripped out blank strings.
const fs = require('fs');
const parse = require('csv-parse');
const async = require('async');
const uuid = require('uuid/v4');
const AWS = require('aws-sdk');
// --- start user config ---
const AWS_CREDENTIALS_PROFILE = 'serverless-admin';
const CSV_FILENAME = "./majou.csv";
const DYNAMODB_REGION = 'eu-central-1';
const DYNAMODB_TABLENAME = 'entriesTable';
// --- end user config ---
const credentials = new AWS.SharedIniFileCredentials({
profile: AWS_CREDENTIALS_PROFILE
});
AWS.config.credentials = credentials;
const docClient = new AWS.DynamoDB.DocumentClient({
region: DYNAMODB_REGION
});
const rs = fs.createReadStream(CSV_FILENAME);
const parser = parse({
columns: true,
delimiter: ','
}, function(err, data) {
var split_arrays = [],
size = 25;
while (data.length > 0) {
split_arrays.push(data.splice(0, size));
}
data_imported = false;
chunk_no = 1;
async.each(split_arrays, function(item_data, callback) {
const params = {
RequestItems: {}
};
params.RequestItems[DYNAMODB_TABLENAME] = [];
item_data.forEach(item => {
for (key of Object.keys(item)) {
// An AttributeValue may not contain an empty string
if (item[key] === '')
delete item[key];
}
params.RequestItems[DYNAMODB_TABLENAME].push({
PutRequest: {
Item: {
id: uuid(),
...item
}
}
});
});
docClient.batchWrite(params, function(err, res, cap) {
console.log('done going next');
if (err == null) {
console.log('Success chunk #' + chunk_no);
data_imported = true;
} else {
console.log(err);
console.log('Fail chunk #' + chunk_no);
data_imported = false;
}
chunk_no++;
callback();
});
}, function() {
// run after loops
console.log('all data imported....');
});
});
rs.pipe(parser);
I've created a gem for this.
Now you can install it by running gem install dynamocli, then you can use the command:
dynamocli import your_data.csv --to your_table
Here is the link to the source code: https://github.com/matheussilvasantos/dynamocli
As a lowly dev without perms to create a Data Pipeline, I had to use this javascript. Hassan Sidique's code was slightly out of date, but this worked for me:
var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');
const AWS = require('aws-sdk');
const dynamodbDocClient = new AWS.DynamoDB({ region: "eu-west-1" });
var csv_filename = "./CSV.csv";
rs = fs.createReadStream(csv_filename);
parser = parse({
columns : true,
delimiter : ','
}, function(err, data) {
var split_arrays = [], size = 25;
while (data.length > 0) {
//split_arrays.push(data.splice(0, size));
let cur25 = data.splice(0, size)
let item_data = []
for (var i = cur25.length - 1; i >= 0; i--) {
const this_item = {
"PutRequest" : {
"Item": {
// your column names here will vary, but you'll need do define the type
"Title": {
"S": cur25[i].Title
},
"Col2": {
"N": cur25[i].Col2
},
"Col3": {
"N": cur25[i].Col3
}
}
}
};
item_data.push(this_item)
}
split_arrays.push(item_data);
}
data_imported = false;
chunk_no = 1;
async.each(split_arrays, (item_data, callback) => {
const params = {
RequestItems: {
"tagPerformance" : item_data
}
}
dynamodbDocClient.batchWriteItem(params, function(err, res, cap) {
if (err === null) {
console.log('Success chunk #' + chunk_no);
data_imported = true;
} else {
console.log(err);
console.log('Fail chunk #' + chunk_no);
data_imported = false;
}
chunk_no++;
callback();
});
}, () => {
// run after loops
console.log('all data imported....');
});
});
rs.pipe(parser);
You can use AWS Data Pipeline which is for things like this. You can upload your csv file to S3 and then use Data Pipeline to retrieve and populate a DynamoDB table. They have a step-by-step tutorial.
I wrote a tool to do this using parallel execution that requires no dependencies or developer tooling installed on the machine (it's written in Go).
It can handle:
Comma separated (CSV) files
Tab separated (TSV) files
Large files
Local files
Files on S3
Parallel imports within AWS using AWS Step Functions to import > 4M rows per minute
No dependencies (no need for .NET, Python, Node.js, Docker, AWS CLI etc.)
It's available for MacOS, Linux, Windows and Docker: https://github.com/a-h/ddbimport
Here's the results of my tests showing that it can import a lot faster in parallel using AWS Step Functions.
I'm describing the tool in more detail at AWS Community Summit on the 15th May 2020 at 1155 BST - https://www.twitch.tv/awscomsum
Before getting to my code, some notes on testing this locally
I recommend using a local version of DynamoDB, in case you want to sanity check this before you start incurring charges and what not. I made some small modifications before posting this, so be sure to test with whatever means make sense to you. There is a fake batch upload job I commented out, which you could use in lieu of any DynamoDB service, remote or local, to verify in stdout that this is working to your needs.
dynamodb-local
See dynamodb-local on npmjs or manual install
If you went the manual install route, you can start dynamodb-local with something like this:
java -Djava.library.path=<PATH_TO_DYNAMODB_LOCAL>/DynamoDBLocal_lib/\
-jar <PATH_TO_DYNAMODB_LOCAL>/DynamoDBLocal.jar\
-inMemory\
-sharedDb
The npm route may be simpler.
dynamodb-admin
Along with that, see dynamodb-admin.
I installed dynamodb-admin with npm i -g dynamodb-admin. It can then be run with:
dynamodb-admin
Using them:
dynamodb-local defaults to localhost:8000.
dynamodb-admin is a web page that defaults to localhost:8001. Once you launch these two services, open localhost:8001 in your browser to view and manipulate the database.
The script below doesn't create the database. Use dynamodb-admin for this.
Credit goes to...
Ben Nadel.
The code
I'm not as experienced with JS & Node.js as I am with other languages, so please forgive any JS faux pas.
You'll notice each group of concurrent batches is purposely slowed down by 900ms. This was a hacky solution, and I'm leaving it here to serve as an example (and because of laziness, and because you're not paying me).
If you increase MAX_CONCURRENT_BATCHES, you will want to calculate the appropriate delay amount based on your WCU, item size, batch size, and the new concurrency level.
Another approach would be to turn on Auto Scaling and implement exponential backoff for each failed batch. Like I mention below in one of the comments, this really shouldn't be necessary with some back-of-the-envelope calculations to figure out how many writes you can actually do, given your WCU limit and data size, and just let your code run at a predictable rate the entire time.
You might wonder why I didn't just let AWS SDK handle concurrency. Good question. Probably would have made this slightly simpler. You could experiment by applying the MAX_CONCURRENT_BATCHES to the maxSockets config option, and modifying the code that creates arrays of batches so that it only passes individual batches forward.
/**
* Uploads CSV data to DynamoDB.
*
* 1. Streams a CSV file line-by-line.
* 2. Parses each line to a JSON object.
* 3. Collects batches of JSON objects.
* 4. Converts batches into the PutRequest format needed by AWS.DynamoDB.batchWriteItem
* and runs 1 or more batches at a time.
*/
const AWS = require("aws-sdk")
const chalk = require('chalk')
const fs = require('fs')
const split = require('split2')
const uuid = require('uuid')
const through2 = require('through2')
const { Writable } = require('stream');
const { Transform } = require('stream');
const CSV_FILE_PATH = __dirname + "/../assets/whatever.csv"
// A whitelist of the CSV columns to ingest.
const CSV_KEYS = [
"id",
"name",
"city"
]
// Inadequate WCU will cause "insufficient throughput" exceptions, which in this script are not currently
// handled with retry attempts. Retries are not necessary as long as you consistently
// stay under the WCU, which isn't that hard to predict.
// The number of records to pass to AWS.DynamoDB.DocumentClient.batchWrite
// See https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
const MAX_RECORDS_PER_BATCH = 25
// The number of batches to upload concurrently.
// https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-maxsockets.html
const MAX_CONCURRENT_BATCHES = 1
// MAKE SURE TO LAUNCH `dynamodb-local` EXTERNALLY FIRST IF USING LOCALHOST!
AWS.config.update({
region: "us-west-1"
,endpoint: "http://localhost:8000" // Comment out to hit live DynamoDB service.
});
const db = new AWS.DynamoDB()
// Create a file line reader.
var fileReaderStream = fs.createReadStream(CSV_FILE_PATH)
var lineReaderStream = fileReaderStream.pipe(split())
var linesRead = 0
// Attach a stream that transforms text lines into JSON objects.
var skipHeader = true
var csvParserStream = lineReaderStream.pipe(
through2(
{
objectMode: true,
highWaterMark: 1
},
function handleWrite(chunk, encoding, callback) {
// ignore CSV header
if (skipHeader) {
skipHeader = false
callback()
return
}
linesRead++
// transform line into stringified JSON
const values = chunk.toString().split(',')
const ret = {}
CSV_KEYS.forEach((keyName, index) => {
ret[keyName] = values[index]
})
ret.line = linesRead
console.log(chalk.cyan.bold("csvParserStream:",
"line:", linesRead + ".",
chunk.length, "bytes.",
ret.id
))
callback(null, ret)
}
)
)
// Attach a stream that collects incoming json lines to create batches.
// Outputs an array (<= MAX_CONCURRENT_BATCHES) of arrays (<= MAX_RECORDS_PER_BATCH).
var batchingStream = (function batchObjectsIntoGroups(source) {
var batchBuffer = []
var idx = 0
var batchingStream = source.pipe(
through2.obj(
{
objectMode: true,
writableObjectMode: true,
highWaterMark: 1
},
function handleWrite(item, encoding, callback) {
var batchIdx = Math.floor(idx / MAX_RECORDS_PER_BATCH)
if (idx % MAX_RECORDS_PER_BATCH == 0 && batchIdx < MAX_CONCURRENT_BATCHES) {
batchBuffer.push([])
}
batchBuffer[batchIdx].push(item)
if (MAX_CONCURRENT_BATCHES == batchBuffer.length &&
MAX_RECORDS_PER_BATCH == batchBuffer[MAX_CONCURRENT_BATCHES-1].length)
{
this.push(batchBuffer)
batchBuffer = []
idx = 0
} else {
idx++
}
callback()
},
function handleFlush(callback) {
if (batchBuffer.length) {
this.push(batchBuffer)
}
callback()
}
)
)
return (batchingStream);
})(csvParserStream)
// Attach a stream that transforms batch buffers to collections of DynamoDB batchWrite jobs.
var databaseStream = new Writable({
objectMode: true,
highWaterMark: 1,
write(batchBuffer, encoding, callback) {
console.log(chalk.yellow(`Batch being processed.`))
// Create `batchBuffer.length` batchWrite jobs.
var jobs = batchBuffer.map(batch =>
buildBatchWriteJob(batch)
)
// Run multiple batch-write jobs concurrently.
Promise
.all(jobs)
.then(results => {
console.log(chalk.bold.red(`${batchBuffer.length} batches completed.`))
})
.catch(error => {
console.log( chalk.red( "ERROR" ), error )
callback(error)
})
.then( () => {
console.log( chalk.bold.red("Resuming file input.") )
setTimeout(callback, 900) // slow down the uploads. calculate this based on WCU, item size, batch size, and concurrency level.
})
// return false
}
})
batchingStream.pipe(databaseStream)
// Builds a batch-write job that runs as an async promise.
function buildBatchWriteJob(batch) {
let params = buildRequestParams(batch)
// This was being used temporarily prior to hooking up the script to any dynamo service.
// let fakeJob = new Promise( (resolve, reject) => {
// console.log(chalk.green.bold( "Would upload batch:",
// pluckValues(batch, "line")
// ))
// let t0 = new Date().getTime()
// // fake timing
// setTimeout(function() {
// console.log(chalk.dim.yellow.italic(`Batch upload time: ${new Date().getTime() - t0}ms`))
// resolve()
// }, 300)
// })
// return fakeJob
let promise = new Promise(
function(resolve, reject) {
let t0 = new Date().getTime()
let printItems = function(msg, items) {
console.log(chalk.green.bold(msg, pluckValues(batch, "id")))
}
let processItemsCallback = function (err, data) {
if (err) {
console.error(`Failed at batch: ${pluckValues(batch, "line")}, ${pluckValues(batch, "id")}`)
console.error("Error:", err)
reject()
} else {
var params = {}
params.RequestItems = data.UnprocessedItems
var numUnprocessed = Object.keys(params.RequestItems).length
if (numUnprocessed != 0) {
console.log(`Encountered ${numUnprocessed}`)
printItems("Retrying unprocessed items:", params)
db.batchWriteItem(params, processItemsCallback)
} else {
console.log(chalk.dim.yellow.italic(`Batch upload time: ${new Date().getTime() - t0}ms`))
resolve()
}
}
}
db.batchWriteItem(params, processItemsCallback)
}
)
return (promise)
}
// Build request payload for the batchWrite
function buildRequestParams(batch) {
var params = {
RequestItems: {}
}
params.RequestItems.Provider = batch.map(obj => {
let item = {}
CSV_KEYS.forEach((keyName, index) => {
if (obj[keyName] && obj[keyName].length > 0) {
item[keyName] = { "S": obj[keyName] }
}
})
return {
PutRequest: {
Item: item
}
}
})
return params
}
function pluckValues(batch, fieldName) {
var values = batch.map(item => {
return (item[fieldName])
})
return (values)
}
Here's my solution. I relied on the fact that there was some type of header indicating what column did what. Simple and straight forward. No pipeline nonsense for a quick upload..
import os, json, csv, yaml, time
from tqdm import tqdm
# For Database
import boto3
# Variable store
environment = {}
# Environment variables
with open("../env.yml", 'r') as stream:
try:
environment = yaml.load(stream)
except yaml.YAMLError as exc:
print(exc)
# Get the service resource.
dynamodb = boto3.resource('dynamodb',
aws_access_key_id=environment['AWS_ACCESS_KEY'],
aws_secret_access_key=environment['AWS_SECRET_KEY'],
region_name=environment['AWS_REGION_NAME'])
# Instantiate a table resource object without actually
# creating a DynamoDB table. Note that the attributes of this table
# are lazy-loaded: a request is not made nor are the attribute
# values populated until the attributes
# on the table resource are accessed or its load() method is called.
table = dynamodb.Table('data')
# Header
header = []
# Open CSV
with open('export.csv') as csvfile:
reader = csv.reader(csvfile,delimiter=',')
# Parse Each Line
with table.batch_writer() as batch:
for index,row in enumerate(tqdm(reader)):
if index == 0:
#save the header to be used as the keys
header = row
else:
if row == "":
continue
# Create JSON Object
# Push to DynamoDB
data = {}
# Iterate over each column
for index,entry in enumerate(header):
data[entry.lower()] = row[index]
response = batch.put_item(
Item=data
)
# Repeat
Another quick workaround is to load your CSV to RDS or any other mysql instance first, which is quite easy to do (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html) and then use DMS (AWS Database Migration Service) to load the entire data to dynamodb. You'll have to create a role for DMS before you can load the data. But this works wonderfully without having to run any scripts.
I used https://github.com/GorillaStack/dynamodb-csv-export-import. It is super simple and worked like a charm. I just followed the instructions in the README:
# Install globally
npm i -g #gorillastack/dynamodb-csv-export-import
# Set AWS region
export AWS_DEFAULT_REGION=us-east-1
# Use it for your CSV and dynamo table
dynamodb-csv-export-import my-exported-file.csv MyDynamoDbTableName
Here's a simpler solution. And with this solution, you don't have to remove empty string attributes.
require('./env'); //contains aws secret/access key
const parse = require('csvtojson');
const AWS = require('aws-sdk');
// --- start user config ---
const CSV_FILENAME = __dirname + "/002_subscribers_copy_from_db.csv";
const DYNAMODB_TABLENAME = '002-Subscribers';
// --- end user config ---
//You could add your credentials here or you could
//store it in process.env like I have done aws-sdk
//would detect the keys in the environment
AWS.config.update({
region: process.env.AWS_REGION
});
const db = new AWS.DynamoDB.DocumentClient({
convertEmptyValues: true
});
(async ()=>{
const json = await parse().fromFile(CSV_FILENAME);
//this is efficient enough if you're processing small
//amounts of data. If your data set is large then I
//suggest using dynamodb method .batchWrite() and send
//in data in chunks of 25 (the limit) and find yourself
//a more efficient loop if there is one
for(var i=0; i<json.length; i++){
console.log(`processing item number ${i+1}`);
let query = {
TableName: DYNAMODB_TABLENAME,
Item: json[i]
};
await db.put(query).promise();
/**
* Note: If "json" contains other nested objects, you would have to
* loop through the json and parse all child objects.
* likewise, you would have to convert all children into their
* native primitive types because everything would be represented
* as a string.
*/
}
console.log('\nDone.');
})();
One way of importing/exporting stuff:
"""
Batch-writes data from a file to a dynamo-db database.
"""
import json
import boto3
# Get items from DynamoDB table like this:
# aws dynamodb scan --table-name <table-name>
# Create dynamodb client.
client = boto3.client(
'dynamodb',
aws_access_key_id='',
aws_secret_access_key=''
)
with open('', 'r') as file:
data = json.loads(file.read())['Items']
# Execute write-data request for each item.
for item in data:
client.put_item(
TableName='',
Item=item
)
The simplest solution is probably to use a template / solution made by AWS:
Implementing bulk CSV ingestion to Amazon DynamoDB
https://aws.amazon.com/blogs/database/implementing-bulk-csv-ingestion-to-amazon-dynamodb/
With this approach, you use the template provided to create a CloudFormation stack including an S3 bucket, a Lambda function, and a new DynamoDB table. The lambda is triggered to run on upload to the S3 bucket and inserts into the table in batches.
In my case, I wanted to insert into an existing table, so I just changed the Lambda function's environment variable once the stack was created.
Follow the instruction in the following link to import data to existing tables in DynamoDB:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SampleData.LoadData.html
Please note, the name of the tables is what you must find here:
https://console.aws.amazon.com/dynamodbv2/home
And the name of the table is used inside the json file, the name of the json file itself is not important. For example I have a table as Country-kdezpod7qrap7nhpjghjj-staging, then for importing data to that table I must make a json file like this:
{
"Country-kdezpod7qrap7nhpjghjj-staging": [
{
"PutRequest": {
"Item": {
"id": {
"S": "ir"
},
"__typename": {
"S": "Country"
},
"createdAt": {
"S": "2021-01-04T12:32:09.012Z"
},
"name": {
"S": "Iran"
},
"self": {
"N": "1"
},
"updatedAt": {
"S": "2021-01-04T12:32:09.012Z"
}
}
}
}
]
}
If you don't know how to create the items for each PutRequest then you can create an item in your DB with mutation and then try to duplicate it, then it will show the structure of one item for you:
If you have a huge list of items in your CSV file, you can use the following npm tool to generate the json file:
https://www.npmjs.com/package/json-dynamo-putrequest
Then we can use the following command to import the data:
aws dynamodb batch-write-item --request-items file://Country.json
If it import the data successfully, you must see the following output:
{
"UnprocessedItems": {}
}
Also please note that with this method you can only have 25 PutRequest items in your array. So if you want to push 100 items you need to create 4 files.
You can try using batch writes and multiprocessing to speed up your bulk import.
import csv
import time
import boto3
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
current_milli_time = lambda: int(round(time.time() * 1000))
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('table_name')
def add_users_in_batch(data):
with table.batch_writer() as batch:
for item in data:
batch.put_item(Item = item)
def run_batch_migration():
start = current_milli_time()
row_count = 0
batch = []
batches = []
with open(CSV_PATH, newline = '') as csvfile:
reader = csv.reader(csvfile, delimiter = '\t', quotechar = '|')
for row in reader:
row_count += 1
item = {
'email': row[0],
'country': row[1]
}
batch.append(item)
if row_count % 25 == 0:
batches.append(batch)
batch = []
batches.append(batch)
pool.map(add_users_in_batch, batches)
print('Number of rows processed - ', str(row_count))
end = current_milli_time()
print('Total time taken for migration : ', str((end - start) / 1000), ' secs')
if __name__ == "__main__":
run_batch_migration()
Try this. This is much simple and helpful.
You can now natively bulk import into DynamoDB in CSV, DynamoDB JSON or Amazon Ion formats. This requires your data to be present in an S3 bucket. No code required.
blog - https://aws.amazon.com/blogs/database/amazon-dynamodb-can-now-import-amazon-s3-data-into-a-new-table/
docs - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/S3DataImport.HowItWorks.html
Key considerations while using this native feature particularly for CSV data:
You can specify the table's Partition Key (PK)/Sort Key (SK) and their data types, and all other CreateTable parameters
Feature currently supports only importing into a new table each time
Data with the same PK and SK will be overwritten (similar to a PutItem operation)
Except for the PK and SK, all other fields in the CSV will be considered as DynamoDB Strings. If this is not favorable, you can convert the data into DynamoDB JSON/Amazon Ion format before importing with explicit data types
Any Global Secondary Indexes created as part of the ImportTable operation will be populated free of cost. Import cost depends on uncompressed source data size.
GSIs created at Import time will also map data types as per source data. All non key attributes will still however be considered as DynamoDB Strings
ImportTable consumes no write capacity on the table, so you could create the table with 1 WCU and the import performance will be same as a ImportTable performed for table with 100K WCU