How to Search Nested Array of Object in DynamoDB - amazon-web-services

I am new to dynamoDb, i would like to search nested array properties. For ex my table has sample data given below
[{
id: '123',
name: 'test',
subShops: [
{
shopId: '234',
shopName: 'New Shop'
},
{
shopId: '345',
shopName: 'New Shop 2'
}
]
},
{
id: '1234',
name: 'test2',
subShops: [
{
shopId: '2345',
shopName: 'New Shop 3'
},
{
shopId: '3456',
shopName: 'New Shop 4'
}
]
}
]
I want to search where name : ['test', 'test2', 'test3'] or subShops[].shopeName where ['New Shop', 'New Shop 2', ''New Shop 3].
I have existing code for only name : ['test', 'test2', 'test3']
const params: AWS.DynamoDB.DocumentClient.ScanInput = {
TableName: VENDOR_TABLE_INFO.Name,
ExpressionAttributeNames: { "#Id": "name" },
FilterExpression: `#Id in (${Object.keys(keyValues).toString()}) or contains (subShops, :category2)`,
ExpressionAttributeValues: {
...keyValues,
':category2': {
...keyValues
}
}
};

Please notice that DynamoDB (DDB) is mainly a hyperscale key-value serverless datastore with very limited query pattern and flexibility, you need to be ok with that to use it.
In each DDB table you can only define one hash key (pk), and up to 5 local secondary index (sort key) for querying. And you can have up to 20 Global Secondary Index (GSI)
In you example, you have hash key of "id", and then if you need to query by "name" only, you need to build a GSI with name as hash key, and included the needed fields in the projection. There is no way to query by "shopname" in sub shop array unless you "flaten" the JSON tree structure.
In short, if you want JSON tree level data query/manipulation and all of your data is JSON documents, i would suggest you to use Amazon DocumentDB which is MongoDB 4 compatible, or directly use MongoDB itself.

Related

DyanmoDB can't satisfy KeySchema

I am trying to build a DynamoDB table using boto,which will save the various aspects of an IAM policy in the table. I have defined the attributes for keyschema, I do not understand the error.I am very new to DYanmoDB and AWS. This is my code:
table =dynamodb.create_table(
TableName='GoodTable',
KeySchema=[
{
'AttributeName': 'Name',
'KeyType': 'HASH'
},
{
'AttributeName': 'Instance Type',
'KeyType': 'RANGE'
},
{
'AttributeName': 'Region',
'KeyType': 'RANGE'
},
{
'AttributeName': 'Volume Size',
'KeyType': 'RANGE'
},
],
AttributeDefinitions=[
{
"AttributeName": "Name",
"AttributeType": "S"
},
{
"AttributeName": "Instance Type",
"AttributeType": "S"
},
{
"AttributeName": "Region",
"AttributeType": "S"
},
{
"AttributeName": "Volume Size",
"AttributeType": "N"
}
],
ProvisionedThroughput={
"ReadCapacityUnits": 1,
"WriteCapacityUnits": 1
}
)
time.sleep(20)
table = dynamodb.Table('GoodTable')
response = table.put_item(
Item= {
'Name': 'GoodName',
}
)
response = table.put_item(
Item= {
'Instance Type': 't2.micro',
}
)
response = table.put_item(
Item= {
'Region': 'us-east-1',
}
)
response = table.put_item(
Item= {
'Volume Size': '20',
}
)
This is the error I am getting:
botocore.exceptions.ClientError: An error occurred (ValidationException) when
calling the CreateTable operation: 1 validation error detected: Value '[com.amazonaws.dynamodb.v20120810.KeySchemaElement#ad4dcbcd, com.amazonaws.dynamodb.v20120810.KeySchemaElement#126b7ad8, com.amazonaws.dynamodb.v20120810.KeySchemaElement#ca666a07, com.amazonaws.dynamodb.v20120810.KeySchemaElement#6478bc3a]' at 'keySchema' failed to satisfy constraint: Member must have length less than or equal to 2
You can only have 2 fields as a primary key in DynamoDB. You can have only one Hash Key and One Range Key Max.
CreateTable
For a composite primary key (partition key and sort key), you must provide exactly two elements, in this order: The first element must have a KeyType of HASH, and the second element must have a KeyType of RANGE.
You can setup Secondary Indexes in DynamoDB
There are two issues in your code:
As already pointed out, you can't have more then two key attributes unless using global or local secondary indies.
dynamodb.Table('GoodTable') is incorrect as you need resource.
You can check the modified code:
table = dynamodb.create_table(
TableName='GoodTable',
KeySchema=[
{
'AttributeName': 'Name',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
"AttributeName": "Name",
"AttributeType": "S"
}
],
ProvisionedThroughput={
"ReadCapacityUnits": 1,
"WriteCapacityUnits": 1
}
)
import time
time.sleep(20)
table = boto3.resource('dynamodb').Table('GoodTable')
response = table.put_item(
Item= {
'Name': 'GoodName',
'Instance Type': 't2.micro',
}
)
response = table.put_item(
Item= {
'Name': 'GoodName2',
'Instance Type': 't2.micro',
}
)
response = table.put_item(
Item= {
'Name': 'GoodName3',
'Region': 'us-east-1',
}
)
response = table.put_item(
Item= {
'Name': 'GoodName4',
'Volume Size': '20',
}
)
The field name is "KeySchema" not "TableSchema" or anything else, it defines only the key. The table is "schema-less", which means that each record can have a different structure and there is no need to define it. You must define only the key. In DynamoDB the key is either only a HASH column or HASH + RANGE columns. You should think about which of those two possibilities you want to use. If you use HASH + RANGE you have to query the table with both as well. Reading many records is costly.
So think a bit about what you want to store and how you would query that. Design the hash key accordingly.
There is a big argument from the single table complex hash key data model by the AWS Principal NoSQL technologist Rick Houlihan https://youtu.be/HaEPXoXVf2k?t=2573 . When I watched the video I started designing my DynamoDB tables differently and it improved my life.
Then natural tendency is to select one column which is kind of unique and use it as a hash key, but it really limits your query options. A well designed hash key can help you querying without additional indexes, so your solution is cheaper and more efficient.
As I mentioned above - beside the key, there is no structure defined. But it does not mean that each record should be completely random. However, it does make sense to store multiple item types in one table where each item type has the same structure - see the video, it's worth it.
In your case - using the instance name as a hash can be risky, because it may not be unique across the regions. You can even have two instances with the same name in the same region, because the name is just a tag. If you do not know or do not want to store the instance ID you have to come up with some other clever solution.
For example the hash can be: INSTANCE:: and the sort key can be instance creation time. There is an additional work to compose and decompose the key for each record. I solved it by creating a python class which wraps put_item/get_item in a method which handles the keys.

Update list items in dynamodb

My data structure in AWS DynamoDB looks like this:
{ key: 'roomNameOne',
value: {
attendees: ['A', 'B', 'C'] // this is a set,
wsConnections: [{ connectiondId: 'foo', domain: 'xyz.com' }, { connectiondId: 'bar', domain: 'xyz.com' }]
}
}
{ key: 'roomNameTwo',
value: {
attendees: ['X', 'Y', 'Z'],
wsConnections: [{ connectiondId: 'foo', domain: 'xyz.com' }, { connectiondId: 'bar', domain: 'xyz.com' }]
}
}
Now when I get a request that connectionId: foo is lost, I want to remove that entry from all the items.
So after DynamoDB update operation my list should look like this:
{ key: 'roomNameOne',
value: {
attendees: ['A', 'B', 'C'] // this is a set,
wsConnections: [{ connectiondId: 'bar', domain: 'xyz.com' }]
}
}
{ key: 'roomNameTwo',
value: {
attendees: ['X', 'Y', 'Z'],
wsConnections: [{ connectiondId: 'bar', domain: 'xyz.com' }]
}
}
Can you please help me with the query for update? The trick here is I don't know the room names, but while connection, I am aware of what all room names a connection is interested in.
Unfortunately, DynamoDB does not allow for this type of operation on a complex attribute (e.g. list of maps).
Modeling one-to-many relationships using complex attributes is a useful pattern. However, one of the drawbacks of this approach is that you won't be able to perform the types of operations you're describing.
If you have access patterns that require you to update wsConnections, you might consider modeling the relationship by making each entry of the wsConnections list it's own item in DynamoDB. For example
Storing your data in this way would make it easier for you to remove connections. For example, if you wanted to remove bar from your connections, you could perform the following operation
ddbClient.delete({
TableName: "YOUR_TABLE_NAME",
Key: {PK: "roomNameOne", SK: "wsConnection#bar"}
})
EDIT: If you don't have access to the PK, your only option is a scan operation.
ddbClient.scan({
"TableName": "YOUR TABLE NAME",
"FilterExpression": "contains(#key, :value)",
"ExpressionAttributeValues": {
":value": {
"S": "foo"
}
},
"ExpressionAttributeNames": {
"#key": "connections"
}
})
This will scan the entire database looking for items whose connections attribute contains "foo". This will let you fetch the list of items, which you can then update and persist back to DDB.
This approach is not ideal. The scan operation will search the entire database, which can be horribly inefficient. You'd also have to issue multiple requests to DDB; one to fetch and one to update. multiple roundtrips aren't the end of the world, but again, not ideal.
To unlock more flexible and efficient access patterns, it would be ideal to get the data out of the wsConnections list attribute. As long a the data is buried in a complex attribute, your options will be limited.

How to get a response of BatchWriteItem DocumentClient?

I'm going to put an item in two dynamodb tables. This is my request params of BatchWriteItem operation.
RequestItems: {
first_table: [{
PutRequest: {
Item: {
employee_id: '123',
company_id: '123',
job_position: 'manager'
}
}
}],
second_table: [{
PutRequest: {
Item: {
facility_id: '123',
company_id: '123',
job_position: 'manager'
}
}
}]
},
ReturnConsumedCapacity: "TOTAL"
My item is updated succefully but I get this response -
UnprocessedItems: {}
How can I get response with updated data? Thanks
Its not possible to get a response containing the items you have put using BatchWriteItem. PutItem can return overwritten values, but not new ones.
You might consider:
1) Using the data you already have. Afterall, you know the items have been written and you already have them.
2) If you want some statistics on your batchwrite you could use
"ReturnItemCollectionMetrics": "SIZE"
3) Query for the items after you have written them.

Apollo: Refetch queries that have multiple variable permutations after mutation

Let's say I have a table that lists a bunch of Posts using a query like:
const PostsQuery = gql`
query posts($name: string) {
posts {
id
name
status
}
}
`;
const query = apolloClient.watchQuery({query: PostsQuery});
query.subscribe({
next: (posts) => console.log(posts) // [ {name: "Post 1", id: '1', status: 'pending' }, { name: "Paul's Post", id: '2', status: 'pending'} ]
});
Then later my user comes along and enters a value in a search field and calls this code:
query.setVariables({name: 'Paul'})
It fetches the filtered posts and logs it out fine.
// [ { name: "Paul's Post", id: '2', status: 'pending'} ]
Now, in my table there is a button that changes the status of a post from 'Pending' to 'Active'. The user clicks that and it calls code like:
const PostsMutation = gql`
mutation activatePost($id: ID!) {
activatePost(id: $id) {
ok
object {
id
name
status
}
}
}
`;
apolloClient.mutate({mutation: PostsMutation});
All is well with the mutation, but now I want to refetch the table data so it has the latest, so I make a change:
apolloClient.mutate({
mutation: PostsMutation,
refetchQueries: [{query: PostsQuery, variables: {name: 'Paul'}]
});
Hurray, it works!
// [ { name: "Paul's Post", id: '2', status: 'active'} ]
But... now my user clears the search query, expecting the results to update.
query.setVariables({});
// [ {name: "Post 1", id: '1', status: 'pending' }, { name: "Paul's Post", id: '2', status: 'pending'} ]
Oh no! Because the data was not refetched in our mutation with our "original" variables (meaning none), we are getting stale data!
So how do you handle a situation where you have a mutation that may affect a query that could have many permutations of variables?
I had a similar issue, I am using Apollo with Angular, so I am not sure if this method will work with React Client, but it should.
If you look closely at refetchQueries properties on the mutate method, you will see that the function can also return a string array of query names to refetch. By returning just the query name as a string, you do not need to worry about the variables. Be advised that this will refetch all the queries matching the name. So if you had a lot queries with different variables it could end up being a large request. But, in my case it is worth the trade off. If this is a problem, you could also get access to the queryManager through apolloClient.queryManager which you could use to do some more fine grained control of what to refetch. I didn't implement it, but it looked very possible. I found the solution below fits my needs fine.
In your code, what you need to do is:
apolloClient.mutate({
mutation: PostsMutation,
refetchQueries: (mutationResult) => ['PostQueries']
});
This will refetch any query with the name 'PostQueries'. Again, it is possible to only refetch a subset of them if you dig into the queryManager and do some filtering on the active watch queries. But, that is another exercise.

Multi-level include filter with LoopBack JS

My problem is that I can't figure out how to get multilevel relations structures in one request with LoopBack backend. I have 3 models: Continent, Country, County. What I would like to do is to GET a continent, and recieve all the countries, and all the counties within.
The relationship between them:
Continent hasMany Country, and Country belongsTo Continent
Country hasMany County, and County belongsTo Country
So the REST api call to /api/Continent/1 returns
{
"id": 1
"name":"Europe"
}
Now, I want to get all the countries and counties with the Continent, so I do a query to /api/Continent/1?filters[include]=country
Still, I don't get the countys.
What kind of query should I make to get a list which includes both relation levels? Like this:
{
"id": 1,
"name": "Europe",
"country": [
id: 1,
name:"United Kingdom",
county:[
{id:1,name:"Avon"},
{id:2,name:"Bedfordshire"},
...
],
...
]
}
Thanks for your help!
The syntax is:
/api/Continent/1?filter={"include": {"country": "countys"}}
Hello hoping it's not too late for an answer. After a thorough flipping of their docs inside and out on this issue, I ended up writing a remote method to achieve that deep level multiple includes. It's not so clear how to go about it at the REST api level.
Continent.listContinents = function(limit,skip,order,cb) {
Continent.find({
limit:limit,
skip:skip,
order:order,
include:[{country:'county'}],
}, cb);
};
Continent.remoteMethod('listContinents', {
description:"List continents. Include the related country and county information",
returns: {arg: 'continents', type: 'array'},
accepts: [{arg: 'limit', type: 'number', http: { source: 'query' }},
{arg: 'skip', type: 'number', http: { source: 'query' }},
{arg: 'order', type: 'string', http: { source: 'query' }}],
http: {path:'/list', verb: 'get'}
});
I have added some additional query string parameters limit, order and skip to enable pagnination and ordering..but not a must :)
Also this is assuming you already have relation types defined between Continent and Country then Country and County.