How to force a BatchWriteItem failure - amazon-web-services

I'm currently writing integration tests for my BatchWriteItem logic using Spock/Groovy. I'm running a docker container which spins up a real DynamoDb table for this same purpose.
This is my logic in Java for BatchWriteItems
public Promise<Boolean> createItemsInBatch(ClientKey clientKey, String accountId, List<SrItems> srItems) {
List<Item> items = srItems.stream()
.map(srItem -> createItemFromSrItem(clientKey, createItemRef(srItem.getId(), accountId), srItem))
.collect(Collectors.toList());
List<List<Item>> batchItems = Lists.partition(items, 25);
var promises = batchItems.stream().map(itemsList -> Blocking.get(() -> {
TableWriteItems tableWriteItems = new TableWriteItems(table.getTableName());
tableWriteItems.withItemsToPut(itemsList);
BatchWriteItemOutcome outcome = dynamoDB.batchWriteItem(tableWriteItems);
return outcome.getUnprocessedItems().values().stream().flatMap(Collection::stream).collect(Collectors.toList());
})).collect(Collectors.toList());
return ParallelPromises.yieldAll(promises).map((List<? extends ExecResult<List<WriteRequest>>> results) -> {
if(results.isEmpty()) {
return true;
} else {
results.stream().map(Result::getValue).flatMap(Collection::stream).forEach(failure -> {
var failedItem = failure.getPutRequest().getItem();
logger.error(append("item", failedItem), "Failed to batch write item");
});
return false;
}
});
}
And this is my current implementation for the test (happy path)
#Unroll
def "createItemsInBatch - #description"(description, srItemsList, createResult) {
given:
def dynamoItemService = new DynamoItemService(realTable, amazonDynamoDBClient1) //passing the table running in the docker image + the dynamo client associated
when:
def promised = ExecHarness.yieldSingle {
dynamoItemService.createItemsInBatch(CLIENT_KEY, 'account-id', srItemsList as List<SrItem>)
}
then:
promised.success == createResult
where:
description | srItemsList | createResult
"single batch req not reaching batch size limit" | srItems(10) | true
"double batch req reaching batch size limit" | srItems(25) | true
"triple batch req reaching batch size limit" | srItems(51) | true
}
For context:
srItems() is a function that just creates a bunch of different items to be injected in the service for the BatchWriteItem request
Want I want now is to be able to test the unhappy path of my logic, i.e. get some UnprocessedItems from my outcome, testing the below code is actually doing its job
BatchWriteItemOutcome outcome = dynamoDB.batchWriteItem(tableWriteItems);
return outcome.getUnprocessedItems().values().stream().flatMap(Collection::stream).collect(Collectors.toList());
Any help would be greatly appreciated

This is quite easy to do actually, we can force throttling on your DynamoDB table which will result in UnprocessedItems....
Configure your table to have 1WCU and disable auto-scaling. Now run your BatchWriteItem in batches of 25 for a couple of seconds and DynamoDB will begin to throttle requests, which will return the throttled items in the UnprocessedItems response, testing your unhappy path.

Related

Write more than 25 items using BatchWriteItemEnhancedRequest Dynamodb JAVA SDK 2

I have an List items to be inserted into the DynamoDb collection. The size of the list may vary from 100 to 10k. I looking for an optimised way to Batch Write all the items using the BatchWriteItemEnhancedRequest (JAVA SDK2). What is the best way to add the items into the WriteBatch builder and then write the request using BatchWriteItemEnhancedRequest?
My Current Code:
WriteBatch.Builder<T> builder = BatchWriteItemEnhancedRequest.builder().writeBatches(builder.build()).build();
items.forEach(item -> { builder.addPutItem(item); });
BatchWriteItemEnhancedRequest bwr = BatchWriteItemEnhancedRequest.builder().writeBatches(builder.build()).build()
BatchWriteResult batchWriteResult =
DynamoDB.enhancedClient().batchWriteItem(getBatchWriteItemEnhancedRequest(builder));
do {
// Check for unprocessed keys which could happen if you exceed
// provisioned throughput
List<T> unprocessedItems = batchWriteResult.unprocessedPutItemsForTable(getTable());
if (unprocessedItems.size() != 0) {
unprocessedItems.forEach(unprocessedItem -> {
builder.addPutItem(unprocessedItem);
});
batchWriteResult = DynamoDB.enhancedClient().batchWriteItem(getBatchWriteItemEnhancedRequest(builder));
}
} while (batchWriteResult.unprocessedPutItemsForTable(getTable()).size() > 0);
Looking for a batching logic and a more better way to execute the BatchWriteItemEnhancedRequest.
I came up with a utility class to deal with that. Their batches of batches approach in v2 is overly complex for most use cases, especially when we're still limited to 25 items overall.
public class DynamoDbUtil {
private static final int MAX_DYNAMODB_BATCH_SIZE = 25; // AWS blows chunks if you try to include more than 25 items in a batch or sub-batch
/**
* Writes the list of items to the specified DynamoDB table.
*/
public static <T> void batchWrite(Class<T> itemType, List<T> items, DynamoDbEnhancedClient client, DynamoDbTable<T> table) {
Stream<List<T>> chunksOfItems = Lists.partition(items, MAX_DYNAMODB_BATCH_SIZE);
chunksOfItems.forEach(chunkOfItems -> {
List<T> unprocessedItems = batchWriteImpl(itemType, chunkOfItems, client, table);
while (!unprocessedItems.isEmpty()) {
// some failed (provisioning problems, etc.), so write those again
unprocessedItems = batchWriteImpl(itemType, unprocessedItems, client, table);
}
});
}
/**
* Writes a single batch of (at most) 25 items to DynamoDB.
* Note that the overall limit of items in a batch is 25, so you can't have nested batches
* of 25 each that would exceed that overall limit.
*
* #return those items that couldn't be written due to provisioning issues, etc., but were otherwise valid
*/
private static <T> List<T> batchWriteImpl(Class<T> itemType, List<T> chunkOfItems, DynamoDbEnhancedClient client, DynamoDbTable<T> table) {
WriteBatch.Builder<T> subBatchBuilder = WriteBatch.builder(itemType).mappedTableResource(table);
chunkOfItems.forEach(subBatchBuilder::addPutItem);
BatchWriteItemEnhancedRequest.Builder overallBatchBuilder = BatchWriteItemEnhancedRequest.builder();
overallBatchBuilder.addWriteBatch(subBatchBuilder.build());
return client.batchWriteItem(overallBatchBuilder.build()).unprocessedPutItemsForTable(table);
}
}

Is it possible to build an OLTP/CRUD HTTP server using AkkaHttp, AkkaStreams, Alpakka and a database?

It is clear to me that using Actors of course it is possible: for instance https://github.com/chbatey/akka-http-typed.git is using AkkaHttp and typed actors.
But it is unclear to me if just using AkkaStreams and its Alpakka connectors library (which includes databases), if is it possible to do regular CRUD / OLTP services, or just data replication from one database to another, or other OLAP / batch / stream processing scenarios.
If you know how it can be done please indicate a few details and if you can provide an example on github for instance that would be great.
The way I am thinking it may be possible is that the server is involved in two conversations / stateful stream transformation: one with the outside world over HTTP, and one with the database. I am not sure if this is possible to be modelled like that.
https://doc.akka.io/docs/alpakka/current/slick.html seems to offer both UPDATE/INSERTS as a Sink as well as pointed SELECT to a certain id as a Source. Do you know if an example app is there or can you broadly mention how the wiring would happen with Akka Http?
I put a demo here, hope it can help you.
Creating table, database is mysql.
CREATE TABLE test(id VARCHAR(32))
sbt:
"com.lightbend.akka" %% "akka-stream-alpakka-slick" % "1.1.0",
"mysql" % "mysql-connector-java" % "5.1.40"
Code:
package tech.parasol.scala.crud
import java.sql.SQLException
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives.{complete, get, path, _}
import akka.stream.alpakka.slick.scaladsl.{Slick, SlickSession}
import akka.stream.scaladsl.Sink
import akka.stream.{ActorAttributes, ActorMaterializer, Supervision}
import com.typesafe.config.ConfigFactory
import scala.concurrent.Future
import scala.io.StdIn
import scala.util.{Failure, Success}
object CrudTest1 {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("CrudTest1")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val hostName = "120.0.0.1"
val rocketDbConfig =
s"""
|db-config {
| profile = "slick.jdbc.MySQLProfile$$"
| db {
| dataSourceClass = "slick.jdbc.DriverDataSource"
| properties = {
| driver = "com.mysql.jdbc.Driver"
| url = "jdbc:mysql://${hostName}:3306/rocket?useUnicode=true&characterEncoding=utf8&rewriteBatchedStatements=true&useSSL=false"
| user = "root"
| password = "passw0rd"
| }
| }
|}
|
""".stripMargin
implicit val session = SlickSession.forConfig("db-config", ConfigFactory.parseString(rocketDbConfig))
import session.profile.api._
def persistence(message: String) = {
def insert(message: String): DBIO[Int] = {
sqlu"""INSERT INTO test(id) VALUES (${message})"""
}
session.db.run(insert(message)).map {
case _ => message
}.recover {
case e : SQLException => {
throw new Exception("Database error ===>")}
case e : Exception => {
throw new Exception("Database error.")}
}
}
val route = path("hello" / Segment ) { name =>
get {
val res = persistence(name)
onComplete(res) {
case Success(value) => {
complete(s"<h1>Say hello to ${name}</h1>")
}
case Failure(e) => {
complete(s"<h1>Failed to say hello to ${name}</h1>")
}
}
}
}
val bindingFuture = Http().bindAndHandle(route, "localhost", 8088)
println(s"Server online at http://localhost:8088/\nPress RETURN to stop...")
StdIn.readLine() // let it run until user presses return
bindingFuture
.flatMap(_.unbind()) // trigger unbinding from the port
.onComplete(_ => system.terminate()) // and shutdown when done
}
}
Yes, basically at every request receive in AkkaHttp, we create an AkkaStreams Graph (just a pipeline typically), basically just the Slick Alpakka Source from the database, maybe prefixed by some operators, and then returned in AkkaHttp, which of course supports Source. More details at [https://www.quora.com/Is-it-possible-to-build-an-OLTP-CRUD-HTTP-server-using-Akka-HTTP-Akka-Streams-Alpakka-and-a-database-Do-you-know-any-examples-of-code-on-GitHub-or-elsewhere/answer/Nicolae-Marasoiu]

Calling a Web Service (containg multiple pages) does not load all the pages (without an added sleep delay)

My question is about a strange behavious I notice both on my iPhone device and the codenameone simulator (NetBeans).
I invoke the following code below which calls a google web service to provide a list of food places around a GPS coordinate:
The web service that is called is as follows (KEY OBSCURED):
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXXXXXXXXXXXXXXXX
Each result contains the next page token and thus, the second call (for the subsequent page) is as follows:
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXXXXXXXXXXXXXXXX&pagetoken=YYYYYYYYYYYYYYYYYY
public static byte[] getWSResponseData(String urlString, boolean usePost)
{
ConnectionRequest r = new ConnectionRequest();
r.setUrl(urlString);
r.setPost(usePost);
InfiniteProgress prog = new InfiniteProgress();
Dialog dlg = prog.showInifiniteBlocking();
r.setDisposeOnCompletion(dlg);
NetworkManager.getInstance().addToQueueAndWait(r);
try
{
Thread.sleep(2000);
}
catch (InterruptedException ex)
{
}
byte[] responseData = r.getResponseData();
return responseData;
}
public static void getLocationsList(double lat, double lng)
{
boolean done = false;
while (!done)
{
byte[] responseData = getWSResponseData(finalURL,false);
result = Result.fromContent(parser.parseJSON(new InputStreamReader(new ByteArrayInputStream(responseData))));
String venueNames[] = result.getAsStringArray("/results/name");
nextToken = result.getAsString("/next_page_token");
if ( nextToken == null || nextToken.equals(""))
done = true;
else
finalURL = completeURL + "&pagetoken=" + nextToken;
}
.....
}
This code works fine with the sleep timer, but when I remove the Thread.sleep, only the first page gets called.
Any help would be appreciated.
Using the debugger does not help as this is a timing issue and the issue does not occur when using the debugger.
Also when I put some print statements into the code
while (!done)
{
String nextToken = null;
**System.out.println(finalURL);**
...
}
System.out.println("Total Number of entries returned: " + itemCount);
I get the following output:
First Run (WITHOUT SLEEP):
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXX
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXX&pagetoken=CqQCF...
Total Number of entries returned: 20
Using the network monitor I see that the response to the second WS call returns:
{
"html_attributions" : [],
"results" : [],
"status" : "INVALID_REQUEST"
}
Which is strange as when I cut and paste the WS URL into my browser, it works fine...
Second Run (WITH SLEEP):
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXX
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXX&pagetoken=CqQCFQEAA...
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXX&pagetoken=CsQDtQEAA...
Total Number of entries returned: 60
Well it seems to be a google API issue as indicated here:
Paging on Google Places API returns status INVALID_REQUEST
I still could not get it to work by changing the WS URL with a random parameter as they suggested, but I will keep trying and post something here if I get it to work. For now I will just keep a 2 second delay between the calls which seems to work.
Well gave up on using the google WS for this and switched to Yelp, works very well:
https://api.yelp.com/v3/businesses/search?.....

Why don't I see high performance with Reactive Kafka? (0.11 release)

Why can't I see the high TPS (transactions/second) performance with Reactive Kafa that has been produced by the project's authors?
This code, derived from the benchmark code in the reactive kafka project, is a run of 2M records populated in a single-partition topic. When run I get TPS of about 140K. Not awful, but far short of the 100s of 1000s hoped for.
My largest concern here is this is a only a 1-partition topic, which really isn't a real test case.
case class RunTest4(msgCount: Int, producer: com.foo.Producer, kafkaHost: String, groupId: String, topic: String)(implicit system: ActorSystem) {
// Pre-populate a topic w/some records (2 million)
producer.populate(msgCount, topic)
Thread.sleep(2000)
partitionInfo(topic)
val partitionTarget = msgCount - 1
val settings = ConsumerSettings(system, new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers(kafkaHost)
.withGroupId(groupId)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
def consumerAtLeastOnceBatched(batchSize: Int)(implicit mat: Materializer): Unit = {
val promise = Promise[Unit]
val control = Consumer.committableSource(settings, Subscriptions.topics(topic))
.map {
msg => msg.committableOffset
}
.batch(batchSize.toLong, first => CommittableOffsetBatch.empty.updated(first)) { (batch, elem) =>
batch.updated(elem)
}
.mapAsync(3) { m =>
m.commitScaladsl().map(_ => m)(ExecutionContexts.sameThreadExecutionContext)
}
.toMat(Sink.foreach { batch =>
if (batch.offsets().head._2 >= partitionTarget)
promise.complete(Success(()))
})(Keep.left)
.run()
println("Control is: " + control.getClass.getName)
val now = System.currentTimeMillis()
Await.result(promise.future, 30.seconds)
val later = System.currentTimeMillis()
println("TPS: " + (msgCount / ((later - now) / 1000.0)))
control.shutdown()
groupInfo(groupId)
}
private def partitionInfo(topic: String) =
kafka.tools.GetOffsetShell.main(Array("--topic", topic, "--broker-list", kafkaHost, "--time", "-1"))
private def groupInfo(group: String) =
kafka.admin.ConsumerGroupCommand.main(Array("--describe", "--group", group, "--bootstrap-server", kafkaHost, "--new-consumer"))
}
This test is (I hope) a good way to handle multiple partitions per topic--a much more realistic situation. When I run this with a batch size of 10,000 and a topic w/2M records populated across 4 topic partitions my test times out with a wait of 30 seconds, meaning whenever it finished it would have had TPS of <67K (2M/30)... not great really. (This test will succeed with a smaller record population, but that's not the test!)
(For reference, my LateKafka project (produces a Source), which admittedly is skeletal, hits above 300K TPS for the same test, and using a native KafkaConsumer is around 500K on my laptop.)
case class RunTest3(msgCount: Int, producer: com.foo.Producer, kafkaHost: String, groupId: String, topic: String)(implicit system: ActorSystem) {
// Pre-populate a topic w/some records (2 million)
producer.populate(msgCount, topic)
Thread.sleep(2000)
partitionInfo(topic)
val partitionTarget = msgCount - 1
val settings = ConsumerSettings(system, new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers(kafkaHost)
.withGroupId(groupId)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
def consumerAtLeastOnceBatched(batchSize: Int)(implicit mat: Materializer): Unit = {
val promise = Promise[Unit]
val control = Consumer.committablePartitionedSource(settings, Subscriptions.topics(topic))
.flatMapMerge(4, _._2)
.map {
msg => msg.committableOffset
}
.batch(batchSize.toLong, first => CommittableOffsetBatch.empty.updated(first)) { (batch, elem) =>
batch.updated(elem)
}
.mapAsync(3) { m =>
m.commitScaladsl().map(_ => m)(ExecutionContexts.sameThreadExecutionContext)
}
.toMat(Sink.foreach { batch =>
if (batch.offsets().head._2 >= partitionTarget)
promise.complete(Success(()))
})(Keep.left)
.run()
println("Control is: " + control.getClass.getName)
val now = System.currentTimeMillis()
Await.result(promise.future, 30.seconds)
val later = System.currentTimeMillis()
println("TPS: " + (msgCount / ((later - now) / 1000.0)))
control.shutdown()
groupInfo(groupId)
}
private def partitionInfo(topic: String) =
kafka.tools.GetOffsetShell.main(Array("--topic", topic, "--broker-list", kafkaHost, "--time", "-1"))
private def groupInfo(group: String) =
kafka.admin.ConsumerGroupCommand.main(Array("--describe", "--group", group, "--bootstrap-server", kafkaHost, "--new-consumer"))
}
Are these expected results or is there something wrong with my test code?

Deletion from amazon dynamodb

Is there any efficient way to delete all the items from a amazon dynamodb tabe at once.I have gone through the aws docs but there it's shown deletion of a single item.
Do the following steps:
Make delete table request
In the response you will get the TableDescription
Using TableDescription create the table again.
For step 1 and 2 click here
for step 3 click here
That's what I do in my application.
DynamoDBMapper will do the job in few lines :
AWSCredentials credentials = new PropertiesCredentials(credentialFile);
client = new AmazonDynamoDBClient(credentials);
DynamoDBMapper mapper = new DynamoDBMapper(this.client);
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
PaginatedScanList<LogData> result = mapper.scan(LogData.class, scanExpression);
for (LogData data : result) {
mapper.delete(data);
}
As ihtsham says, the most efficient way is to delete and re-create the table. However, if that is not practical (e.g. due to complex configuration of the table, such as Lambda triggers), here are some AWS CLI commands to delete all records. They require the jq program for JSON processing.
Deleting records one-by-one (slow!), assuming your table is called my_table, your partition key is called partition_key, and your sort key (if any) is called sort_key:
aws dynamodb scan --table-name my_table | \
jq -c '.Items[] | { partition_key, sort_key }' | \
tr '\n' '\0' | \
xargs -0 -n1 -t aws dynamodb delete-item --table-name my_table --key
Deleting records in batches of up to 25 records:
aws dynamodb scan --table-name my_table | \
jq -c '[.Items | keys[] as $i | { index: $i, value: .[$i]}] | group_by(.index / 25 | floor)[] | { "my_table": [.[].value | { "DeleteRequest": { "Key": { partition_key, sort_key }}}] }' | \
tr '\n' '\0' | \
xargs -0 -n1 -t aws dynamodb batch-write-item --request-items
If you start seeing non-empty UnprocessedItems responses, your write capacity has been exceeded. You can account for this by reducing the batch size. For me, each batch takes about a second to submit, so with a write capacity of 5 per second, I set the batch size to 5.
Just for the record, a quick solution with item-by-item delete in Python 3 (using Boto3 and scan()):
(Credentials need to be set.)
def delete_all_items(table_name):
# Deletes all items from a DynamoDB table.
# You need to confirm your intention by pressing Enter.
import boto3
client = boto3.client('dynamodb')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
response = client.describe_table(TableName=table_name)
keys = [k['AttributeName'] for k in response['Table']['KeySchema']]
response = table.scan()
items = response['Items']
number_of_items = len(items)
if number_of_items == 0: # no items to delete
print("Table '{}' is empty.".format(table_name))
return
print("You are about to delete all ({}) items from table '{}'."
.format(number_of_items, table_name))
input("Press Enter to continue...")
with table.batch_writer() as batch:
for item in items:
key_dict = {k: item[k] for k in keys}
print("Deleting " + str(item) + "...")
batch.delete_item(Key=key_dict)
delete_all_items("test_table")
Obviously, this shouldn't be used for tables with a lot of items. (100+) For that, the delete / recreate approach is cheaper and more efficient.
You will want to use BatchWriteItem if you can't drop the table. If all your entries are within a single HashKey, you can use the Query API to retrieve the records, and then delete them 25 items at a time. If not, you'll probably have to Scan.
Alternatively, you could provide a simple wrapper around AmazonDynamoDBClient (from the official SDK) that collects a Set of Hash/Range keys that exist in your table. Then you wouldn't need to Query or Scan for the items you inserted after the test, since you would already have the Set built. That would look something like this:
public class KeyCollectingAmazonDynamoDB implements AmazonDynamoDB
{
private final AmazonDynamoDB delegate;
// HashRangePair is something you have to define
private final Set<Key> contents;
public InsertGatheringAmazonDynamoDB( AmazonDynamoDB delegate )
{
this.delegate = delegate;
this.contents = new HashSet<>();
}
#Override
public PutItemResult putItem( PutItemRequest putItemRequest )
throws AmazonServiceException, AmazonClientException
{
contents.add( extractKey( putItemRequest.getItem() ) );
return delegate.putItem( putItemRequest );
}
private Key extractKey( Map<String, AttributeValue> item )
{
// TODO Define your hash/range key extraction here
// Create a Key object
return new Key( hashKey, rangeKey );
}
#Override
public DeleteItemResult deleteItem( DeleteItemRequest deleteItemRequest )
throws AmazonServiceException, AmazonClientException
{
contents.remove( deleteItemRequest.getKey() );
return delegate.deleteItem( deleteItemRequest );
}
#Override
public BatchWriteItemResult batchWriteItem( BatchWriteItemRequest batchWriteItemRequest )
throws AmazonServiceException, AmazonClientException
{
// Similar extraction, but in bulk.
for ( Map.Entry<String, List<WriteRequest>> entry : batchWriteItemRequest.getRequestItems().entrySet() )
{
String tableName = entry.getKey();
List<WriteRequest> writeRequests = entry.getValue();
for ( WriteRequest writeRequest : writeRequests )
{
PutRequest putRequest = writeRequest.getPutRequest();
if ( putRequest != null )
{
// Add to Set just like putItem
}
DeleteRequest deleteRequest = writeRequest.getDeleteRequest();
if ( deleteRequest != null )
{
// Remove from Set just like deleteItem
}
}
}
// Write through to DynamoDB
return delegate.batchWriteItem( batchWriteItemRequest );
}
// remaining methods elided, since they're direct delegation
}
Key is a class within the DynamoDB SDK that accepts zero, one, or two AttributeValue objects in the constructor to represent a hash key or a hash/range key. Assuming it's equals and hashCode methods work, you can use in within the Set I described. If they don't, you'll have to write your own Key class.
This should get you a maintained Set for use within your tests. It's not specific to a table, so you might need to add another layer of collection if you're using multiple tables. That would change Set<Key> to something like Map<TableName, Set<Key>>. You would need to look at the getTableName() property to pick the correct Set to update.
Once your test finishes, grabbing the contents of the table and deleting should be straightforward.
One final suggestion: use a different table for testing than you do for your application. Create an identical schema, but give the table a different name. You probably even want a different IAM user to prevent your test code from accessing your production table. If you have questions about that, feel free to open a separate question for that scenario.
You can recreate a DynamoDB table using AWS Java SDK
// Init DynamoDB client
AmazonDynamoDB dynamoDB = AmazonDynamoDBClientBuilder.standard().build();
// Get table definition
TableDescription tableDescription = dynamoDB.describeTable("my-table").getTable();
// Delete table
dynamoDB.deleteTable("my-table");
// Create table
CreateTableRequest createTableRequest = new CreateTableRequest()
.withTableName(tableDescription.getTableName())
.withAttributeDefinitions(tableDescription.getAttributeDefinitions())
.withProvisionedThroughput(new ProvisionedThroughput()
.withReadCapacityUnits(tableDescription.getProvisionedThroughput().getReadCapacityUnits())
.withWriteCapacityUnits(tableDescription.getProvisionedThroughput().getWriteCapacityUnits())
)
.withKeySchema(tableDescription.getKeySchema());
dynamoDB.createTable(createTableRequest);
I use following javascript code to do it:
async function truncate(table, keys) {
const limit = (await db.describeTable({
TableName: table
}).promise()).Table.ProvisionedThroughput.ReadCapacityUnits;
let total = 0;
let lastEvaluatedKey = null;
do {
const qp = {
TableName: table,
Limit: limit,
ExclusiveStartKey: lastEvaluatedKey,
ProjectionExpression: keys.join(' '),
};
const qr = await ddb.scan(qp).promise();
lastEvaluatedKey = qr.LastEvaluatedKey;
const dp = {
RequestItems: {
},
};
dp.RequestItems[table] = [];
if (qr.Items) {
for (const i of qr.Items) {
const dr = {
DeleteRequest: {
Key: {
}
}
};
keys.forEach(k => {
dr.DeleteRequest.Key[k] = i[k];
});
dp.RequestItems[table].push(dr);
if (dp.RequestItems[table].length % 25 == 0) {
await ddb.batchWrite(dp).promise();
total += dp.RequestItems[table].length;
dp.RequestItems[table] = [];
}
}
if (dp.RequestItems[table].length > 0) {
await ddb.batchWrite(dp).promise();
total += dp.RequestItems[table].length;
dp.RequestItems[table] = [];
}
}
console.log(`Deleted ${total}`);
setTimeout(() => {}, 1000);
} while (lastEvaluatedKey);
}
(async () => {
truncate('table_name', ['id']);
})();
In this case, you may delete the table and create a new one.
Example:
from __future__ import print_function # Python 2/3 compatibility
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-west-2', endpoint_url="http://localhost:8000")
table = dynamodb.Table('Movies')
table.delete()