UpdateExpression Increment value of nested map within list - amazon-web-services

I have the following table structure:
{myHashKey: x, someList: [{info: uniqueID, myAttribute: number, ...}, ...]}
I'd like to increment number, for a given info sub-attribute.
ie, say I have the following item inserted in dynamodb:
{myHashKey: 'xxxx', someList: [{info: 'a', myAttribute: 1}, {info: 'b', myAttribute: 42}]}
What UpdateExpression would I have to perform in order to update the item to increment the myAttribute for a given info by a given number, say I want to increment the myAttribute 1 of info a to 5, ie get from the above to this:
{myHashKey: 'xxxx', someList: [{info: 'a', myAttribute: 5}, {info: 'b', myAttribute: 42}]}
?
I read a bunch of docs and other stackoverflow posts but still can't achieve it.

You cannot efficiently target the item in the list to update. I do not think it is possible with your table structure. I had some time ago the same issue, but I didn't found a solution. I've change may table structure.
I guess your table is not well designed for what you want to do. DynamoDB is a column database, make use of it.
Table suggestion:
Hash: myHashKey // the same as now
Range: info
other fields myAttribute
When you know myHashKey and info you may easily find and update.
var docClient = new AWS.DynamoDB.DocumentClient();
function async increment(myHashKeyValue, infoValue, value = 1) {
var params = {
TableName:table,
Key:{
"myHashKey": myHashKeyValue,
"info": infoValue,
},
UpdateExpression: "set #myAttribute = #myAttribute + :val",
ExpressionAttributeValues:{
":val": value,
},
ExpressionAttributeNames:{
'#myAttribute': 'myAttribute',
},
ReturnValues:"UPDATED_NEW"
};
await docClient.update(params).promise();
});
}

Related

Loading savedsearch in Suitescript doesnt include all columns. NetSuite

When loading a saved search in suitescript it doesnt include all columns, for example the summed columns in the end are not included. I tried getResults function but because im loading this in mapreduce getInputData function, because of huge data the script timelimit gets exceeded (SSS_TIME_LIMIT_EXCEEDED).
From the below screenshot the marked columns are not visible when i:
function getInputData(){
var mainSrch = search.load({ id: 'customsearch1000' });
return mainSrch;
}
Below is the result i get in the script:
{
"recordType": null,
"id": "16187",
"values": {
"GROUP(trandate)": "22/06/2022",
"GROUP(type)": {
"value": "VendBill",
"text": "Bill"
},
"GROUP(tranid)": "36380",
"GROUP(location)": {
"value": "140",
"text": "ACBD"
},
"GROUP(custitem_item_category.item)": {
"value": "13",
"text": "Frozen Food"
},
"GROUP(custitem_item_subcategory.item)": {
"value": "66",
"text": "Frozen Fruits & Vegetables"
},
"GROUP(itemid.item)": "MN-FGGH10271310",
"GROUP(displayname.item)": "ABC Product",
"GROUP(custcol_po_line_barcode)": "883638668390",
"GROUP(locationquantityonhand.item)": "4",
"SUM(quantity)": "1",
"SUM(totalvalue.item)": "4460.831",
"SUM(custcol_po_unit_price)": "8.00",
"SUM(formulanumeric)": "0"
}
}
Is there any way to get all the columns while loading saved search?
I haven't seen this particular issue before but Netsuite does have an issue sorting by any formulaX column other than the first one so seeing this is not surprising.
If you have no selection criteria on the aggregate values you could:
modify your search to have no summary types or formula numeric columns
in the map phase group them by the original search's grouping columns (no governance cost)
in the reduce phase calculate the values for the formulanumeric columns (no governance cost)
proceed with your original reduce phase logic.
As an alternative to my previous answer you can split your process into parts.
Modify you saved search to include column labels
Use N/task to schedule your search with a map reduce script as a dependency using addInboundDependency
If your search finishes successfully the map reduce script will be called with your search file
return the file from your getInputData phase. You'll have to modify your map/reduce script to handle a different format but if your search can complete at all you'll be able to process it.
Below is a fragment of a script that does this but uses a schedules script as the dependency. Map/reduce scripts are also supported.
var filePath = folderPath+ (folderPath.length ? '/' : '') + name;
var searchTask = task.create({
taskType: task.TaskType.SEARCH,
savedSearchId: searchId,
filePath: filePath
});
var dependency = task.create({
taskType:task.TaskType.SCHEDULED_SCRIPT,
scriptId:'customscript_kotn_s3_defer_transfer',
deploymentId:deferredDeployment,
params:{
custscript_kotn_deferred_s3_folder: me.getParameter({name:'custscript_kotn_s3_folder'}),
custscript_kotn_deferred_s3_file: filePath
}
});
searchTask.addInboundDependency(dependency);
var taskId = searchTask.submit();
log.audit({
title:'queued '+ name,
details: taskId
});

How to use auto increment for primary key id in dynamodb

I am new to dynamodb. I want to auto increment id value when I use putitem with dynamodb.
Is possible to do that?
This is anti-pattern in DynamoDB which is build to scale across many partitions/shards/servers. DynamoDB does not support auto-increment primary keys due to scaling limitations and cannot be guaranteed across multiple servers.
Better option is to assemble primary key from multiple indices. Primary key can be up to 2048 bytes. There are few options:
Use UUID as your key - possibly time based UUID which makes it unique, evenly distributed and carries time value
Use randomly generated number or timestamp + random (possibly bit-shifting) like: ts << 12 + random_number
Use another service or DynamoDB itself to generate incremental unique id (requires extra call)
Following code will auto-increment counter in DynamoDB and then you can use it as primary key.
var documentClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: 'sampletable',
Key: { HashKey : 'counters' },
UpdateExpression: 'ADD #a :x',
ExpressionAttributeNames: {'#a' : "counter_field"},
ExpressionAttributeValues: {':x' : 1},
ReturnValues: "UPDATED_NEW" // ensures you get value back
};
documentClient.update(params, function(err, data) {});
// once you get new value, use it as your primary key
My personal favorite is using timestamp + random inspired by Instagram's Sharding ID generation at http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
Following function will generate id for a specific shard (provided as parameter). This way you can have unique key, which is assembled from timestamp, shard no. and some randomness (0-512).
var CUSTOMEPOCH = 1300000000000; // artificial epoch
function generateRowId(shardId /* range 0-64 for shard/slot */) {
var ts = new Date().getTime() - CUSTOMEPOCH; // limit to recent
var randid = Math.floor(Math.random() * 512);
ts = (ts * 64); // bit-shift << 6
ts = ts + shardId;
return (ts * 512) + randid;
}
var newPrimaryHashKey = "obj_name:" + generateRowId(4);
// output is: "obj_name:8055517407349240"
DynamoDB doesn't provide this out of the box. You can generate something in your application such as UUIDs that "should" be unique enough for most systems.
I noticed you were using Node.js (I removed your tag). Here is a library that provides UUID functionality: node-uuid
Example from README
var uuid = require('node-uuid');
var uuid1 = uuid.v1();
var uuid2 = uuid.v1({node:[0x01,0x23,0x45,0x67,0x89,0xab]});
var uuid3 = uuid.v1({node:[0, 0, 0, 0, 0, 0]})
var uuid4 = uuid.v4();
var uuid5 = uuid.v4();
You probably can use AtomicCounters.
With AtomicCounters, you can use the UpdateItem operation to implement
an atomic counter—a numeric attribute that is incremented,
unconditionally, without interfering with other write requests. (All
write requests are applied in the order in which they were received.)
With an atomic counter, the updates are not idempotent. In other
words, the numeric value increments each time you call UpdateItem.
You might use an atomic counter to track the number of visitors to a
website. In this case, your application would increment a numeric
value, regardless of its current value. If an UpdateItem operation
fails, the application could simply retry the operation. This would
risk updating the counter twice, but you could probably tolerate a
slight overcounting or undercounting of website visitors.
Came across a similar issue, where I required auto-incrementing primary key in my table. We could use some randomization techniques to generate a random key and store it using that. But it won't be in a incremental fashion.
If you require something in incremental fashion, you can use Unix Time as your primary key. Not assuring, that you can get a accurate incrementation(one-by-one), but yes every record you put, it would be in incremental fashion, with respect to the difference in how much time each record in inserted in.
Not a complete solution, if you don't want to read the entire table and get it's last id and then increment it.
Following is the code for inserting a record in DynamoDB using NodeJS:
.
.
const params = {
TableName: RANDOM_TABLE,
Item: {
ip: this.ip,
id: new Date().getTime()
}
}
dynamoDb.put(params, (error, result) => {
console.log(error, result);
});
.
.
If you are using NoSQL Dynamo DB then using Dynamoose, you can easily set default unique id, here is the simple user create example
// User.modal.js
const dynamoose = require("dynamoose");
const { v4: uuidv4 } = require("uuid");
const userSchema = new dynamoose.Schema(
{
id: {
type: String,
hashKey: true,
},
displayName: String,
firstName: String,
lastName: String,
},
{ timestamps: true },
);
const User = dynamoose.model("User", userSchema);
module.exports = User;
// User.controller.js
exports.create = async (req, res) => {
const user = new User({ id: uuidv4(), ...req.body }); // set unique id
const [err, response] = await to(user.save());
if (err) {
return badRes(res, err);
}
return goodRes(res, reponse);
};
Update for 2022 :
I was looking for the same issue and came across following research.
DynamoDB still doesn't support auto-increment of primary keys.
https://aws.amazon.com/blogs/database/simulating-amazon-dynamodb-unique-constraints-using-transactions/
Also the package node-uuid is now deprecated. They recommend we use uuid package instead that creates RFC4122 compliant UUID's.
npm install uuid
import { v4 as uuidv4 } from 'uuid';
uuidv4(); // ⇨ '9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d'
For Java developers, there is the DynamoDBMapper, which is a simple ORM. This supports the DynamoDBAutoGeneratedKey annotation. It doesn't increment a numeric value like a typical "Long id", but rather generates a UUID like other answers here suggest. If you're mapping classes as you would with Hibernate, GORM, etc., this is more natural with less code.
I see no caveats in the docs about scaling issues. And it eliminates the issues with under or over-counting as you have with the auto-incremented numeric values (which the docs do call out).

dynamodb - scan items by value inside array

I'm doing a table scan. This table has an array as one of its fields, the "apps" field (apps is not a key of any kind). I want to select all rows, whose apps array contains a certain value "MyApp". I tried something of that kind, but my syntax is incorrect:
ComparisonOperator = "#apps CONTAINS :v",
ExpressionAttributeNames = {
'#apps': 'apps'
},
ExpressionAttributeValues = {
":v": "MyApp"
}
Thanks.
The documentation about Condition Expressions clearly states that the appropiate syntax is:
contains(#apps, :v)
The correct request would be:
FilterExpression: "contains(#apps, :v)",
ExpressionAttributeNames: { "#apps": "apps" },
ExpressionAttributeValues: { ":v": "MyApp" }

Should I denormalize or run multiple queries in DocumentDb?

I'm learning about data modeling in DocumentDb. Here's where I need some advice
Please see what my documents look like down below.
I can take two approaches here both with pros and cons.
Scenario 1:
If I keep the data denormalized (see my documents below) by keeping project team member information i.e. first, last name, email, etc. in the same document as the project, I can get the information I need in one query BUT when Jane Doe gets married and her last name changes, I'd have to update a lot of documents in the Projects collection. I'd also have to be extremely careful in making sure that all collections with documents that contain employee information get updated as well. If, for example, I update Jane Doe's name in Projects collection but forget to update the TimeSheets collection, I'd be in trouble!
Scenario 2:
If I keep data somewhat normalized and keep only EmployeeId in the project documents, I can then run three queries whenever I want to get a projects list:
Query 1 returns projects list
Query 2 would give me EmployeeId's of all project team members that appear in the first query
Query 3 for employee information i.e. first, last name, email, etc. I'd use the result of Query 2 to run this one
I can then combine all the data in my application.
The problem here is that DocumentDb seems to have a lot of limitations now. I may be reading hundreds of projects with hundreds of employees in project teams. Looks like there's no efficient way to get all employee information whose Id's appear in my second query. Again, please keep in mind that I may need to pull hundreds of employee information here. If the following SQL query is what I'd use for employee data, I may have to run the same query a few times to get all the information I need because I don't think I can have hundreds of OR statements:
SELECT e.Id, e.firstName, e.lastName, e.emailAddress
FROM Employees e
WHERE e.Id = 1111 OR e.Id = 2222
I understand that DocumentDb is still in preview and some of these limitations will be fixed. With that said, how should I approach this problem? How can I efficiently both store/manage and retrieve all project data I need -- including project team information? Is Scenario 1 a better solution or Scenario 2 or is there a better third option?
Here's what my documents look like. First, the project document:
{
id: 789,
projectName: "My first project",
startDate: "9/6/2014",
projectTeam: [
{ id: 1111, firstName: "John", lastName: "Smith", position: "Sr. Engineer" },
{ id: 2222, firstName: "Jane", lastName: "Doe", position: "Project Manager" }
]
}
And here are two employee documents which reside in the Employees collection:
{
id: 1111,
firstName: "John",
lastName: "Smith",
dateOfBirth: "1/1/1967',
emailAddresses: [
{ email: "jsmith#domain1.com", isPrimary: "true" },
{ email: "john.smith#domain2.com", isPrimary: "false" }
]
},
{
id: 2222,
firstName: "Jane",
lastName: "Doe",
dateOfBirth: "3/8/1975',
emailAddresses: [
{ email: "jane#domain1.com", isPrimary: "true" }
]
}
I believe you're on the right track in considering the trade-offs between normalizing or de-normalizing your project and employee data. As you've mentioned:
Scenario 1) If you de-normalize your data model (couple projects and employee data together) - you may find yourself having to update many projects when you update an employee.
Scenario 2) If you normalize your data model (decouple projects and employee data) - you would have to query for projects to retrieve employeeIds and then query for the employees if you wanted to get the list of employees belonging to a project.
I would pick the appropriate trade-off given your application's use case. In general, I prefer de-normalizing when you have a read-heavy application and normalizing when you have a write-heavy application.
Note that you can avoid having to make multiple roundtrips between your application and the database by leveraging DocumentDB's store procedures (queries would be performed on DocumentDB-server-side).
Here's an example store procedure for retrieving employees belonging to a specific projectId:
function(projectId) {
/* the context method can be accessed inside stored procedures and triggers*/
var context = getContext();
/* access all database operations - CRUD, query against documents in the current collection */
var collection = context.getCollection();
/* access HTTP response body and headers from the procedure */
var response = context.getResponse();
/* Callback for processing query on projectId */
var projectHandler = function(documents) {
var i;
for (i = 0; i < documents[0].projectTeam.length; i++) {
// Query for the Employees
queryOnId(documents[0].projectTeam[i].id, employeeHandler);
}
};
/* Callback for processing query on employeeId */
var employeeHandler = function(documents) {
response.setBody(response.getBody() + JSON.stringify(documents[0]));
};
/* Query on a single id and call back */
var queryOnId = function(id, callbackHandler) {
collection.queryDocuments(collection.getSelfLink(),
'SELECT * FROM c WHERE c.id = \"' + id + '\"', {},
function(err, documents) {
if (err) {
throw new Error('Error' + err.message);
}
if (documents.length < 1) {
throw 'Unable to find id';
}
callbackHandler(documents);
}
);
};
// Query on the projectId
queryOnId(projectId, projectHandler);
}
Even though DocumentDB supports limited OR statements during the preview - you can still get relatively good performance by splitting the employeeId-lookups into a bunch of asynchronous server-side queries.

Couchbase Map/Reduce to count total by document type

I'm storing event data in Couchbase documents like this:
{
user: {
id: '0BE2DA2B-9C8F-432D-88C2-B2C1D8D0E4B4',
device: { 'manufacturer': 'Apple', 'os': 'iOS', 'name': 'iPhone', 'version': '5S' }
},
event_type: 'INTERACTION_A',
country: 'GB',
timestamp: 1398781631233
}
I have created Map/Reduce queries to tell me how many events iPhone users have submitted. However, is it possible to use Map/Reduce to query how many unique devices by OS are submitting events?
Each individual device might have submitted 1000s of events, but the result would show how many unique devices, by OS, the system has seen. I'm trying to end up with a data that looks something like this:
{ 'iOS': 2343, 'Android': 6343 }
Is it possible to do this in a single Couchbase view?
Yes, it's possible. You just need to use group=true&group_level=1 in your query.
Create a view like:
map : function(){
emit(doc.os, null);
}
reduce: _count
Then add group=true&group_level=1 to your query:
http://127.0.0.1:8092/default/_design/dev_<designDocName>/_view/<viewName>?connection_timeout=60000&limit=10&skip=0&group=true&group_level=1
Also check this links for more examples:
Writing a simple group by with map-reduce (Couchbase)
http://hardlifeofapo.com/basic-couchbase-querying-for-sql-people/
http://blog.couchbase.com/understanding-grouplevel-view-queries-compound-keys
I think my original question might have been too vague. However, I have reached this solution:
map: function (doc, meta) {
emit([doc.user.device.os, doc.user.id], null);
}
reduce: function (keys, values, rereduce) {
var os = {};
keys.forEach(function (k) { os[k] = 1; });
return Object.keys(os).length;
}
Running this view with group=true&group_level=1 gives me what I wanted.
I'm not confident it will scale, or whether it needs to consider rereduce, however it works for my test data set.