Function scan in DynamoDB doesn't bring some of the results - amazon-web-services

I got a function in AWS Lambda that lists every patient in a table from DynamoDB. I realized that some items from the table were not on the list. This is my function to list:
module.exports.listPatients = async (event) => {
try {
const queryString = {
limit: 5,
...event.queryStringParameters,
};
const { limit, next, name } = queryString;
const localParams = {
...patientsParams,
Limit: limit,
FilterExpression: "contains(full_name, :full_name)",
ExpressionAttributeValues: { ":full_name": name },
};
if (next) {
localParams.ExclusiveStartKey = {
id: next,
};
}
const data = await dynamoDb.scan(localParams).promise();
const nextToken = data.LastEvaluatedKey ? data.LastEvaluatedKey.id : "";
const result = {
items: data.Items,
next_token: nextToken,
};
return {
statusCode: 200,
body: JSON.stringify(result),
};
} catch (error) {
console.log("Error: ", error);
return {
statusCode: error.statusCode ? error.statusCode : 500,
body: JSON.stringify({
error: error.name ? error.name : "Exception",
message: error.message ? error.message : "Unknown error",
}),
};
}
};
Am I missing something?
I tried with and without a limit, removed the filters, and yet nothing.
I tested one of the ids with get() to test with the server can find one of those who are missing, and it worked.
I am using Serverless to deploy the code, and when I try offline, it's working.
Stackoverflow recommended this post when writing my question, but I am using DynamoDB.DocumentClient without specifying the full attribute type in the filter expression:
How to scan in DynamoDB without primary sort key with Nodejs

Looks like you are paginating using scan(). Using query() with some Global Secondary Indexes and ScanIndexForward would give you a much better performance. scan() doesn't scale well when your data grows.

Related

DynamoDB JavaScript PutItemCommand is neither failing nor working

Please note: although this question mentions AWS SAM, it is 100% a DynamoDB JavaScript SDK question at heart and can be answered by anyone with experience writing JavaScript Lambdas (or any client-side apps) against DynamoDB using the AWS DynamoDB client/SDK.
So I used AWS SAM to provision a new DynamoDB table with the following attributes:
FeedbackDynamoDB:
Type: AWS::DynamoDB::Table
Properties:
TableName: commentary
AttributeDefinitions:
- AttributeName: id
AttributeType: S
KeySchema:
- AttributeName: id
KeyType: HASH
ProvisionedThroughput:
ReadCapacityUnits: 5
WriteCapacityUnits: 5
StreamSpecification:
StreamViewType: NEW_IMAGE
This configuration successfully creates a DynamoDB table called commentary. However, when I view this table in the DynamoDB web console, I noticed a few things:
it has a partition key of id (type S)
it has no sort key
it has no (0) indexes
it has a read/write capacity mode of "5"
I'm not sure if this raises any red flags with anyone but I figured I would include those details, in case I've configured anything incorrectly.
Now then, I have a JavaScript (TypeScript) Lambda that instantiates a DynamoDB client (using the JavaScript SDK) and attempts to add a record/item to this table:
// this code is in a file named app.ts:
import { APIGatewayProxyEvent, APIGatewayProxyResult } from 'aws-lambda';
import { User, allUsers } from './users';
import { Commentary } from './commentary';
import { PutItemCommand } from "#aws-sdk/client-dynamodb";
import { DynamoDBClient } from "#aws-sdk/client-dynamodb";
export const lambdaHandler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
try {
const ddbClient = new DynamoDBClient({ region: "us-east-1" });
let status: number = 200;
let responseBody: string = "\"message\": \"hello world\"";
const { id, content, createdAt, providerId, receiverId } = JSON.parse(event.body);
const commentary = new Commentary(id, content, createdAt, providerId, receiverId);
console.log("deserialized this into commentary");
console.log("and the deserialized commentary has content of: " + commentary.getContent());
await provideCommentary(ddbClient, commentary);
responseBody = "\"message\": \"received commentary -- check dynamoDb!\"";
return {
statusCode: status,
body: responseBody
};
} catch (err) {
console.log(err);
return {
statusCode: 500,
body: JSON.stringify({
message: err.stack,
}),
};
}
};
const provideCommentary = async (ddbClient: DynamoDBClient, commentary: Commentary) => {
const params = {
TableName: "commentary",
Item: {
id: {
S: commentary.getId()
},
content: {
S: commentary.getContent()
},
createdAt: {
S: commentary.getCreatedAt()
},
providerId: {
N: commentary.getProviderId()
},
receiverId: {
N: commentary.getReceiverId()
}
}
};
console.log("about to try to insert commentary into dynamo...");
try {
console.log("wait for it...")
const rc = await ddbClient.send(new PutItemCommand(params));
console.log("DDB response:", rc);
} catch (err) {
console.log("hmmm something awry. something....in the mist");
console.log("Error", err.stack);
throw err;
}
};
Where commentary.ts is:
class Commentary {
private id: string;
private content: string;
private createdAt: Date;
private providerId: number;
private receiverId: number;
constructor(id: string, content: string, createdAt: Date, providerId: number, receiverId: number) {
this.id = id;
this.content = content;
this.createdAt = createdAt;
this.providerId = providerId;
this.receiverId = receiverId;
}
public getId(): string {
return this.id;
}
public getContent(): string {
return this.content;
}
public getCreatedAt(): Date {
return this.createdAt;
}
public getProviderId(): number {
return this.providerId;
}
public getReceiverId(): number {
return this.receiverId;
}
}
export { Commentary };
When I update the Lambda with this handler code, and hit the Lambda with the following curl (the Lambda is invoked by an API Gateway URL that I can hit via curl/http):
curl -i --request POST 'https://<my-api-gateway>.execute-api.us-east-1.amazonaws.com/Stage/feedback' \
--header 'Content-Type: application/json' -d '{"id":"123","content":"test feedback","createdAt":"2022-12-02T08:45:26.261-05:00","providerId":457,"receiverId":789}'
I get the following HTTP 500 response:
{"message":"SerializationException: NUMBER_VALUE cannot be converted to String\n
Am I passing it a bad request body (in the curl) or do I need to tweak something in app.ts and/or commentary.ts?
Interestingly the DynamoDB API expects numerical fields of items as strings. For example:
"N": "123.45"
The doc says;
Numbers are sent across the network to DynamoDB as strings, to maximize compatibility across languages and libraries. However, DynamoDB treats them as number type attributes for mathematical operations.
Have you tried sending your input with the numerical parameters as strings as shown below? (See providerId and receiverId)
{
"id":"123",
"content":"test feedback",
"createdAt":"2022-12-02T08:45:26.261-05:00",
"providerId":"457",
"receiverId":"789"
}
You can convert these IDs into string when you're populating your input Item:
providerId: {
N: String(commentary.getProviderId())
},
receiverId: {
N: String(commentary.getReceiverId())
}
You could also use .toString() but then you'd get errors if the field is not set (null or undefined).
Try using a promise to see the outcome:
client.send(command).then(
(data) => {
// process data.
},
(error) => {
// error handling.
}
);
Everything seems alright with your table setup, I believe it's Lambda async issue with the JS sdk. I'm guessing Lambda is not waiting on your code and exiting early. Can you include your full lambda code.

lambda function returning null for deleting item in DynamoDB

hi Ive been trying to get my lambda function to delete an item in dynamo db but the function is simply returning null and i have no idea how to even start debugging it, hoping someone here has the knowledge to help
my table has guid as its primary partition key and username as its sort key
heres my code in .js
const AWS = require("aws-sdk");
// Initialising the DynamoDB SDK
const documentClient = new AWS.DynamoDB.DocumentClient();
exports.handler = async (event) => {
const { guid, username } = event
const params = {
TableName: "Items", // The name of your DynamoDB table
Key:{
"guid": {"S" : guid},
"username": {"S" : username}
}
};
try {
// Utilising the scan method to get all items in the table
documentClient.delete(params, function(err, data) {
if (err) {
return("Unable to delete item. Error JSON:", JSON.stringify(err, null, 2));
} else {
return("DeleteItem succeeded:", JSON.stringify(data, null, 2));
}
});
}
catch (e) {
return {
statusCode: 500,
body: e
};
}
};
this is the payload for the test event im using in lambda
{
"guid": "34",
"username": "newusername"
}
You are using async function handler. So your function probably just finishes before your code actually has a chance to execute.
You can overcome this issue by wrapping your code around new Promise as shown in the docs

Apollo client mutation with writeQuery not triggering UI update

I have a mutation to create a new card object, and I expect it should be added to the user interface after update. Cache, Apollo Chrome tool, and console logging reflect the changes, but the UI does not without a manual reload.
const [createCard, { loading, error }] = useMutation(CREATE_CARD, {
update(cache, { data: { createCard } }) {
let localData = cache.readQuery({
query: CARDS_QUERY,
variables: { id: deckId }
});
localData.deck.cards = [...localData.deck.cards, createCard];
;
client.writeQuery({
query: CARDS_QUERY,
variables: { id: parseInt(localData.deck.id, 10) },
data: { ...localData }
});
I have changed cache.writeQuery to client.writeQuery, but that didn't solve the problem.
For reference, here is the Query I am running...
const CARDS_QUERY = gql`
query CardsQuery($id: ID!) {
deck(id: $id) {
id
deckName
user {
id
}
cards {
id
front
back
pictureName
pictureUrl
createdAt
}
}
toggleDeleteSuccess #client
}
`;
I managed the same result without the cloneDeep method. Just using the spread operator solved my problem.
const update = (cache, {data}) => {
const queryData = cache.readQuery({query: USER_QUERY})
const cartItemId = data.cartItem.id
queryData.me.cart = queryData.me.cart.filter(v => v.id !== cartItemId)
cache.writeQuery({query: USER_QUERY, data: {...queryData}})
}
Hope this helps someone else.
Ok, finally ran into a long Github thread discussing their solutions for the same issue. The solution that ultimately worked for me was deep cloning the data object (I personally used Lodash cloneDeep), which after passing in the mutated data object to cache.writeQuery, it was finally updating the UI. Ultimately, it still seems like there ought to be a way to trigger the UI update, considering the cache reflects the changes.
Here's the after, view my original question for the before...
const [createCard, { loading, error }] = useMutation(CREATE_CARD, {
update(cache, { data: { createCard } }) {
const localData = cloneDeep( // Lodash cloneDeep to make a fresh object
cache.readQuery({
query: CARDS_QUERY,
variables: { id: deckId }
})
);
localData.deck.cards = [...localData.deck.cards, createCard]; //Push the mutation to the object
cache.writeQuery({
query: CARDS_QUERY,
variables: { id: localData.deck.id },
data: { ...localData } // Cloning ultimately triggers the UI update since writeQuery now sees a new object.
});
},
});

Adding a where clause to AWS DynamoDB

I am trying to create a getItem request in AWS Lambda to access DynamoDB like so:
dynamodb.getItem({
TableName: "DataTable",
Key: {
user: {
S: user
},
deleted: {
BOOL: false
}
}
}, function(err, data) {
if (err) return fn(err);
else {
if ('Item' in data) {
fn(null, user);
} else {
fn(null, null); // User not found
}
}
});
It worked fine when I passed the user in as that was the primary key on the table. I added a deleted boolean to create a soft delete on users. But one I added that in the schema errors started to happen as deleted isn't part of the primary key. I want a way to add it as a where clause coming from the relational DB world. How is this done? Thanks. :o)
The getItem cannot be used if the data has to be filtered by any non-key attributes.
I think in the above case, the 'deleted'attribute is a non-key attribute. So, the Query API should be used to filter the data along with key attribute.
Please refer the FilterExpression in the below example.
FilterExpression : 'deleted = :createdate'
(AWS.Request) query(params = {}, callback)
Sample code:-
var params = {
TableName : table,
KeyConditionExpression : 'yearkey = :hkey and title = :rkey',
FilterExpression : 'deleted = :deleted',
ExpressionAttributeValues : {
':hkey' : year_val,
':rkey' : title,
':deleted' : {BOOL : false}
}
};
docClient.query(params, function(err, data) {
if (err) {
console.error("Unable to read item. Error JSON:", JSON.stringify(err,
null, 2));
} else {
console.log("GetItem succeeded:", JSON.stringify(data, null, 2));
}
});
For your use case, the key, filter condition and expression attribute value should be as mentioned below:-
KeyConditionExpression : 'user = :user',
FilterExpression : 'deleted = :deleted',
ExpressionAttributeValues : {
':user' : 'John',
':deleted' : {BOOL : false}
}

How Do I Make a Faster Riak MapReduce Query?

How can we make our MapReduce Queries Faster?
We have built an application using a five node Riak DB cluster.
Our data model is composed of three buckets: matches, leagues, and teams.
Matches contains links to leagues and teams:
Model
var match = {
id: matchId,
leagueId: meta.leagueId,
homeTeamId: meta.homeTeamId,
awayTeamId: meta.awayTeamId,
startTime: m.match.startTime,
firstHalfStartTime: m.match.firstHalfStartTime,
secondHalfStartTime: m.match.secondHalfStartTime,
score: {
goals: {
a: 1*safeGet(m.match, 'score.goals.a'),
b: 1*safeGet(m.match, 'score.goals.b')
},
corners: {
a: 1*safeGet(m.match, 'score.corners.a'),
b: 1*safeGet(m.match, 'score.corners.b')
}
}
};
var options = {
index: {
leagueId: match.leagueId,
teamId: [match.homeTeamId, match.awayTeamId],
startTime: match.startTime || match.firstHalfStartTime || match.secondHalfStartTime
},
links: [
{ bucket: 'leagues', key: match.leagueId, tag: 'league' },
{ bucket: 'teams', key: match.homeTeamId, tag: 'home' },
{ bucket: 'teams', key: match.awayTeamId, tag: 'away' }
]
};
match.model = 'match';
modelCache.save('matches', match.id, match, options, callback);
Queries
We write a query that returns results from several buckets, one way is to query each bucket separately. The other way is to use links to combine results from a single query.
Two versions of the query we tried both take over a second, no matter how small our bucket size.
The first version uses two map phases, which we modeled after this post (Practical Map-Reduce: Forwarding and Collecting).
#!/bin/bash
curl -X POST \
-H "content-type: application/json" \
-d #- \
http://localhost:8091/mapred \
<<EOF
{
"inputs":{
"bucket":"matches",
"index":"startTime_bin",
"start":"2012-10-22T23:00:00",
"end":"2012-10-24T23:35:00"
},
"query": [
{"map":{"language": "javascript", "source":"
function(value, keydata, arg){
var match = Riak.mapValuesJson(value)[0];
var links = value.values[0].metadata.Links;
var result = links.map(function(l) {
return [l[0], l[1], match];
});
return result;
}
"}
},
{"map":{"language": "javascript", "source": "
function(value, keydata, arg) {
var doc = Riak.mapValuesJson(value)[0];
return [doc, keydata];
}
"}
},
{"reduce":{
"language": "javascript",
"source":"
function(values) {
var merged = {};
values.forEach(function(v) {
if(!merged[v.id]) {
merged[v.id] = v;
}
});
var results = [];
for(key in merged) {
results.push(merged[key]);
}
return results;
}
"
}
}
]
}
EOF
In the second version we do four separate Map-Reduce queries to get the objects from the three buckets:
async.series([
//First get all matches
function(callback) {
db.mapreduce
.add(inputs)
.map(function (val, key, arg) {
var data = Riak.mapValuesJson(val)[0];
if(arg.leagueId && arg.leagueId != data.leagueId) {
return [];
}
var d = new Date();
var date = data.startTime || data.firstHalfStartTime || data.secondHalfStartTime;
d.setFullYear(date.substring(0, 4));
d.setMonth(date.substring(5, 7) - 1);
d.setDate(date.substring(8, 10));
d.setHours(date.substring(11, 13));
d.setMinutes(date.substring(14, 16));
d.setSeconds(date.substring(17, 19));
d.setMilliseconds(0);
startTimestamp = d.getTime();
var short = {
id: data.id,
l: data.leagueId,
h: data.homeTeamId,
a: data.awayTeamId,
t: startTimestamp,
s: data.score,
c: startTimestamp
};
return [short];
}, {leagueId: query.leagueId, page: query.page}).reduce(function (val, key) {
return val;
}).run(function (err, matches) {
matches.forEach(function(match) {
result.match[match.id] = match; //Should maybe filter this
leagueIds.push(match.l);
teamIds.push(match.h);
teamIds.push(match.a);
});
callback();
});
},
//Then get all leagues, teams and lines in parallel
function(callback) {
async.parallel([
//Leagues
function(callback) {
db.getMany('leagues', leagueIds, function(err, leagues) {
if (err) { callback(err); return; }
leagues.forEach(function(league) {
visibleLeagueIds[league.id] = true;
result.league[league.id] = {
r: league.regionId,
n: league.name,
s: league.name
};
});
callback();
});
},
//Teams
function(callback) {
db.getMany('teams', teamIds, function(err, teams) {
if (err) { callback(err); return; }
teams.forEach(function(team) {
result.team[team.id] = {
n: team.name,
h: team.name,
s: team.stats
};
});
callback();
});
}
], callback);
}
], function(err) {
if (err) { callback(err); return; }
_.each(regionModel.getAll(), function(region) {
result.region[region.id] = {
id: region.id,
c: 'https://d1goqbu19rcwi8.cloudfront.net/icons/silk-flags/' + region.icon + '.png',
n: region.name
};
});
var response = {
success: true,
result: {
modelRecords: result,
paging: {
page: query.page,
pageSize: 50,
total: result.match.length
},
time: moment().diff(a)/1000.00,
visibleLeagueIds: visibleLeagueIds
}
};
callback(null, JSON.stringify(response, null, '\t'));
});
How do we make these queries faster?
Additional info:
We are using riak-js and node.js to run our queries.
One way to make it at least a bit faster would be to deploy the JavaScript mapreduce functions to the server instead of passing them through as part of the job. (see description of js_source_dir parameter here). This is usually recommended if you have a JavaScript functions that you run repeatedly.
As there is some overhead associated with running JavaScript mapreduce functions compared to native ones implemented in Erlang, using non-JavaScript functions where possible may also help.
The two map phase functions in your first query appear to be designed to work around the limitation that a normal linking phase (which I believe is more efficient) does not pass on the record being processed (the matches record). The first function includes all the links and passes on the match data as additional data in JSON form, while the second passes on the data of the match as well as the linked record in JSON form.
I have written a simple Erlang function that includes all links as well as the ID of the record passed in. This could be used together with the native Erlang function riak_kv_mapreduce:map_object_value to replace the two map phase functions in your first example, removing some of the JavaScript usage. As in the existing solution, I would expect you to receive a number of duplicates as several matches may link to the same league/team.
-module(riak_mapreduce_example).
-export([map_link/3]).
%% #spec map_link(riak_object:riak_object(), term(), term()) ->
%% [{{Bucket :: binary(), Key :: binary()}, Props :: term()}]
%% #doc map phase function for adding linked records to result set
map_link({error, notfound}, _, _) ->
[];
map_link(RiakObject, Props, _) ->
Bucket = riak_object:bucket(RiakObject),
Key = riak_object:key(RiakObject),
Meta = riak_object:get_metadata(RiakObject),
Current = [{{Bucket, Key}, Props}],
Links = case dict:find(<<"Links">>, Meta) of
{ok, List} ->
[{{B, K}, Props} || {{B, K}, _Tag} <- List];
error ->
[]
end,
lists:append([Current, Links]).
The results of these can either be sent back to the client for aggregation or passed into a reduce phase function as in the example you provided.
The example function would need to be compiled and installed on all nodes, and may require a restart.
Another way to improve performance (that very well may not be an option for you) would perhaps be alter the data model in order to avoid having to use mapreduce queries for performance critical queries altogether.