I'm currently working with the DynamoDB AWS API and just found two different ways of doing apparently the same thing, I was actually wondering if there is any performance or benefit in user one instead of another.
My current scenario is restricted to just a table, so I will be only manipulating that one without going any further.
Is there any benefit from using this (which I think that in my scenario is simpler that forming a request every single time I want to check key existence, and I can also have a fixed Table object to request any time I need to)...
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("my-table");
table.getItem("sample");
... with this one?
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
HashMap<String, AttributeValue> key = new HashMap<String, AttributeValue>();
key.put("Artist", new AttributeValue().withS("sample1"));
key.put("SongTitle", new AttributeValue().withS("sample2"));
GetItemRequest request = new GetItemRequest()
.withTableName("my-table")
.withKey(key);
I've taken the code from the actual examples in the AWS website.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.SDKs.Interfaces.LowLevel.html
https://docs.aws.amazon.com/en_en/amazondynamodb/latest/developerguide/JavaDocumentAPIWorkingWithTables.html#JavaDocumentAPIListTables
Table is more or less a light-weight wrapper around the DynamoDB client object. (You can see this by looking at its source code on Guthub.) You can use whichever one you think makes your code the most readable and maintainable.
Related
dynamodb yields an item in a string format:
cdb = boto3.client('dynamodb', region_name='us-west-2')
db = boto3.resource('dynamodb', region_name='us-west-2')
table = db.Table('my-table')
response = table.scan()
my_data = response['Items']
foo = my_data[0]
print(foo)
# {'theID': 'fffff8f-dfsadfasfds-fsdsaf', 'theNumber': Decimal('1')}
Now, when I treat this like a black-box unit, do nothing, and return it to the db via put-item, I'll get many errors indicating none of the values in the dictionary are the expected type:
cdb.put_item(TableName='my-table', Item=foo, ReturnValues="ALL_OLD")
# many errors
I'd like to rely on boto3 to do everything and avoid manipulating the values if possible. Is there a utility available that will convert a response item into the format it needs to be to be placed back in the db?
You should use your table resource to write items because you also use it to read.
Something like this should do the trick:
table.put_item(Item=foo, ReturnValues="ALL_OLD")
You're reading the data using the higher-level resource API, which maps many native Python Types to DynamoDB types, and trying to write using the lower-level client API, which doesn't do that.
The simple solution is also to use the resource-API to write, and it will perform the mappings.
(I'm inferring this based on your method signature for put_item, the question is not overly clear...)
Apparently there is also a serializer that can be used:
from boto3.dynamodb.types import TypeSerializer
I have a DynamoDB-based web application that uses DynamoDB to store my large JSON objects and perform simple CRUD operations on them via a web API. I would like to add a new table that acts like a categorization of these values. The user should be able to select from a selection box which category the object belongs to. If a desirable category does not exist, the user should be able to create a new category specifying a name which will be available to other objects in the future.
It is critical to the application that every one of these categories be given a integer ID that increments starting the first at 1. These numbers that are auto generated will turn into reproducible serial numbers for back end reports that will not use the user-visible text name.
So I would like to have a simple API available from the web fronted that allows me to:
A) GET /category : produces { int : string, ... } of all categories mapped to an ID
B) PUSH /category : accepts string and stores the string to the next integer
Here are some ideas for how to handle this kind of project.
Store it in DynamoDB with integer indexes. This leaves has some benefits but it leaves a lot to be desired. Firstly, there's no auto incrementing ID in DynamoDB, but I could definitely get the state of the table, create a new ID, and store the result. This might have issues with consistency and race conditions but there's probably a way to achieve this safely. It might, however, be a big anti pattern to use DynamoDB this way.
Store it in DynamoDB as one object in a table with some random index. Just store the mapping as a JSON object. This really forgets the notion of tables in DynamoDB and uses it as a simple file. It might also run into some issues with race conditions.
Use AWS ElasticCache to have a Redis key value store. This might be "the right" decision but the downside is that ElasticCache is an always on DB offering where you pay per hour. For a low-traffic web site like mine I'd be paying minumum $12/mo I think and I would really like for this to be pay per access/update due to the low volume. I'm not sure there's an auto increment feature for Redis built in the way I'd need it. But it's pretty trivial to make a trasaction that gets the length of the table, adds one, and stores a new value. Race conditions are easily avoid with this solution.
Use a SQL database like AWS Aurora or MYSQL. Well this has the same upsides as Redis, but it's also more overkill than Redis is, and also it costs a lot more and it's still always on.
Run my own in memory web service or MongoDB etc... still you're paying for constant containers running. Writing my own thing is obviously silly but I'm sure there are services that match this issue perfectly but they'd all require a constant container to run.
Is there a food way to just store a simple list, or integer mapping like this that doesn't cost a constant monthly cost? Is there a better way to do this with DynamoDB?
Store the maxCounterValue as an item in DyanamoDB.
For the PUSH /category, perform the following:
Get the current maxCounterValue.
TransactWrite:
Put the category name and id into a new item with id = maxCounterValue + 1.
Update the maxCounterValue +1, add a ConditionExpression to check that maxCounterValue = :valueFromGetOperation.
If TransactWrite fails, start at 1 again, try X more times
I want to truncate dynamodb table which can have up to 3 millions to 4 millions of records. what is the best way?
Right now I am using scan which does not give good performance(I have tried to delete only for few records: 3):
DynamoDB dynamoDB = new DynamoDB(amazonDynamoDBClient);
Table table = dynamoDB.getTable("table-test");
ItemCollection<ScanOutcome> resultItems = table.scan();
Iterator<Item> itemsItr = resultItems.iterator();
while(itemsItr.hasNext()){
Item item = itemsItr.next();
String itemPk = (String) item.get("PK");
String itemSk = (String) item.get("SK");
DeleteItemSpec deleteItemSpec = new DeleteItemSpec().withPrimaryKey("PK", itemPk, "SK", itemSk);
table.deleteItem(deleteItemSpec);
}
The best way is to delete your table, and create new one of the same name. This is how clearing all data from DynamoDB is usually performed.
As Marcin already answered, the best way is to delete your table and create a new one. It is certainly the cheapest way - because any other way would require scanning the entire table and paying for the read capacity units required to do it.
In some cases, however, you might want to delete old items while the table is still actively used. In that case you can use a Scan like you wanted, but can do it much more efficiently than you did: First, don't run individual DeleteItem requests sequentially, waiting for one delete to complete before asking for the next one... You can send batches of 25 deletes in one BatchWriteItem request. You can also send multiple BatchWriteItem requests in parallel. Finally, for even faster deletion, you can parallelize your Scan to multiple threads or even machines - see the parallel scan section of the DynamoDB documentation. Just don't forget that if you delete items while the table is still actively written to, you need a way to tell old items which you want to delete, from new items that you don't want to delete - as the scan may start producing these new items as well.
Finally, if you find yourself often clearing old data from a table - you should consider whether you can use DynamoDB's TTL feature, where DynamoDB automatically looks for expired items (based on an expiration-time attribute on each item) and deletes them - at no cost to you.
I'm trying to add a DynamoDBVersionAttribute to incorporate optimistic locking when accessing/updating items in a DynamoDB table. However, I'm unable to figure out how exactly to add the version attribute.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html seems to state that using it as an annotation in the class that creates the table is the way to go. However, our codebase is creating new tables in a format similar to this:
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
List<AttributeDefinition> attributeDefinitions= new
ArrayList<AttributeDefinition>();
attributeDefinitions.add(new
AttributeDefinition().withAttributeName("Id").withAttributeType("N"));
List<KeySchemaElement> keySchema = new ArrayList<KeySchemaElement>();
keySchema.add(new
KeySchemaElement().withAttributeName("Id").withKeyType(KeyType.HASH));
CreateTableRequest request = new CreateTableRequest()
.withTableName(tableName)
.withKeySchema(keySchema)
.withAttributeDefinitions(attributeDefinitions)
.withProvisionedThroughput(new ProvisionedThroughput()
.withReadCapacityUnits(5L)
.withWriteCapacityUnits(6L));
Table table = dynamoDB.createTable(request);
I'm not able to find out how to add the VersionAttribute through the Java code as described above. It's not an attribute definitions so unsure where it goes. Any guidance as to where I can add this VersionAttribute in the CreateTable request?
As far as I'm aware, the #DynamoDBVersionAttribute annotation for optimistic locking is only available for tables modeled specifically for DynamoDBMapper queries. Using DynamoDBMapper is not a terrible approach, since it effectively creates an ORM for CRUD operations on DynamoDB items.
But if your existing codebase can't make use of it, your next best bet is probably to use conditional writes to increment a version number if it's equal to what you expect it to be (i.e. roll your own optimistic locking). Unfortunately, you would need to include the increment / condition to every write you want to be optimistically locked.
Your code just creates a table, but then in order to use DynamoDBMapper to access that table, you need to create a class that represents it. For example if your table is called Users, you should create a class called Users, and use annotations to link it to the table.
You can keep your table creation code, but you need to create the DynamoDBMapper class. You can then do all of your loading, saving and querying using the DynamoDBMapper class.
When you have created the class, just give it a field called version and put the annotation on it, DynamoDBMapper will take care of the rest.
Is it possible to get DynamoDB to automatically generate unique IDs when adding new items to a table?
I noticed the Java API mentions #DynamoDBAutoGeneratedKey so I'm assuming there's a way to get this working with PHP as well.
If so, does the application code generate these IDs or is it done on the DynamoDB side?
Good question - while conceptually possible, this seems not currently available as a DynamoDB API level feature, insofar neither CreateTable nor PutItem refer to such a functionality.
The #DynamoDBAutoGeneratedKey notation you have noticed is a Java annotation, i.e. syntactic sugar offered by the Java SDK indeed:
An annotation, in the Java computer programming language, is a special
form of syntactic metadata that can be added to Java source code.
As such #DynamoDBAutoGeneratedKey is one of the Amazon DynamoDB Annotations offered as part of the Object Persistence Model within the Java SDK's high-level API (see Using the Object Persistence Model with Amazon DynamoDB):
Marks a hash key or range key property as being auto-generated. The
Object Persistence Model will generate a random UUID when saving these
attributes. Only String properties can be marked as auto-generated
keys.
While working with dynamodb in javascript with nodejs. I use the npm module uuid to genrate unique key.
Ex:
id=uuid.v1();
refer :uuid npm
By using schema based AWS dynamodb data mapper library in Node.js, Hash key (id) will be generated automatically. Auto generated ids are based on uuid v4.
For more details, have a look on following aws package.
Data Mapper with annotation
Data Mapper package for Javascript
Sample snipet
#table('my_table')
class MyDomainClass {
#autoGeneratedHashKey()
id: string;
#rangeKey({defaultProvider: () => new Date()})
createdAt: Date;
}
The client can create a (for all intents and purposes) unique ID either by picking a long random id (DynamoDB supports 128-bit integers, for example), or by picking an ID which contains the client's IP address, CPU number, and current time - or something along these lines.
The UUID standard even includes a standard way to do this (and you have libraries in various languages to create such UUIDs on the client side), but you don't really need to use a standard.
And interesting question is how do you plan to find these items if they have random keys. Or are you planning to use a secondary index?
The 2022 answer is here:
https://dev.to/prabusah_53/aws-lambda-in-built-uuid-382f
External libraries are no longer needed.
Here is another good method taken from mkyong
http://www.mkyong.com/java/how-to-get-current-timestamps-in-java/
I adjusted his method to get the milliseconds instead of the actual date
java.util.Date date= new java.util.Date();
System.out.println(new Timestamp(date.getTime()).getTime());
The approach I'm taking is to use the current timestamp for the hash-key (or the range-key, if using a range-key too). Store the timestamp as an integer, representing the number of milliseconds since the start of the "UNIX epoch" (in the UTC timezone). Many date/time libraries can produce this number for you.
This has the advantage that if you want to have a "creation time" field in your table, your UUID already stores this information. Just call another method in your date/time library to convert the timestamp to a readable format.
(Be sure to handle the exception which will occur if a second item is created in the same table with the same millisecond timestamp; just fall back and retry the operation in that case, with a slightly later, current timestamp.)
For example:
User table
hash-key only: userID (timestamp of the creation of this user).
WidgetAttributes table
hash-key plus range-key.
hash-key: userID (use the userID from the User table of the user to whom the widget belongs).
range-key: attribID (use the timestamp of the creation of this widget-attribute).
Now you can run "query" operations on the WidgetAttributes table to get all widget-attributes for a certain user; by using "greater-than-zero" as the query-parameter for the range-key.