Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM - amazon-web-services

I am working with AWS keyspaces and trying to insert data from C# but getting this error."Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM". can anyone please help out here.
AWS keyspace
CREATE KEYSPACE IF NOT EXISTS "DevOps"
WITH REPLICATION={'class': 'SingleRegionStrategy'} ;
Table
CREATE TABLE IF NOT EXISTS "DevOps"."projectdetails" (
"id" UUID PRIMARY KEY,
"name" text,
"lastupdatedtime" timestamp,
"baname" text,
"customerid" UUID)
C# code
public async Task AddRecord(List<projectdetails> projectDetails)
{
try
{
if (projectDetails.Count > 0)
{
foreach (var item in projectDetails)
{
projectdetails projectData = new projectdetails();
projectData.id = item.id;
projectData.name = item.name;
projectData.baname = "Vishal";
projectData.lastupdatedtime = item.lastupdatedtime;
projectData.customerid = 1;
await mapper.InsertAsync<projectdetails>(projectData);
}
}
}
catch (Exception e)
{
}
}

The error clearly says that you need to use correct consistency level LOCAL_QUORUM instead of the LOCAL_ONE that is used by default. AWS documentation says that for write operations, it's only the consistency level supported. You can set consistency level by using the version of InsertAsync that accepts the CqlQueryOptions, like this (maybe create instance of the query options only once, during initialization of the application):
mapper.InsertAsync<projectdetails>(projectData,
new CqlQueryOptions().SetConsistencyLevel(ConsistencyLevel.LocalQuorum))

Related

Strange behavior trying to perform data migration in Dynamo DB

We're trying to make a simple data migration in one of our tables in DDB.
Basically we're adding a new field and we need to backfill all the Documents in one of our tables.
This table has around 700K documents.
The process we follow is quite simple:
Manually trigger a lambda that will scan the table and for each document, will update the document and continue doing the same til its close to the 15 minutes top, in that case
Puts LastEvaluatedKey into SQS to trigger new lambda execution that uses that key to continue scanning.
Process goes on spawining lambdas sequentially as needed until there are no more documents
The problem we found is as follows...
Once the migration is done we noticed that the number of documents updated is way lower than the total number of documents existing in that table. It's a random value, not the same always but it ranges from tens of thousands to hundreds of thousands (worst case we seen was 300K difference).
This is obviously a problem, because if we scan the documents again, it seems obvious some documents were not migrated. We thought at first this was because of some clients updating/inserting new documents but the throughput on that table is not that large that will justify such a big difference, so this is not that there are new documents being added while we run the migration.
We tried a second approach that was first scanning, because if we only scan, we noticed that number of scan documents == count of documents in table, so we tried to dump the IDs of the documents in another table, then scan that table and update those items again. Funny thing, same problem happens with this new table with just IDs, there are way less than the count in the table we want to update, thus, we're back to square one.
We thought about using parallel scans but I don't see how this could benefit plus I don't want to compromise reading capacity for the table while running the migration.
Anybody with experience in data migrations in DDB can shed some light here? We're not able to figure out what we're doing wrong.
UPDATE: Sharing the function that is triggered and actually scans and updates
#Override
public Map<String, AttributeValue> migrateDocuments(String lastEvaluatedKey, String typeKey){
LOG.info("Migrate Documents started {} ", lastEvaluatedKey);
int noOfDocumentsMigrated = 0;
Map<String, AttributeValue> docLastEvaluatedKey = null;
DynamoDBMapperConfig documentConfig = new DynamoDBMapperConfig.TableNameOverride("KnowledgeDocumentMigration").config();
if(lastEvaluatedKey != null) {
docLastEvaluatedKey = new HashMap<String,AttributeValue>();
docLastEvaluatedKey.put("base_id", new AttributeValue().withS(lastEvaluatedKey));
docLastEvaluatedKey.put("type_key",new AttributeValue().withS(typeKey));
}
Instant endTime = Instant.now().plusSeconds(840);
LOG.info("Migrate Documents endTime:{}", endTime);
try {
do {
ScanResultPage<Document> docScanList = documentDao.scanDocuments(docLastEvaluatedKey, documentConfig);
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("Migrate Docs- docScanList Size: {}", docScanList.getScannedCount());
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("lastEvaluatedKey:{}", docLastEvaluatedKey);
final int chunkSize = 25;
final AtomicInteger counter = new AtomicInteger();
final Collection<List<Document>> docChunkList = docScanList.getResults().stream()
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values();
List<List<Document>> docListSplit = docChunkList.stream().collect(Collectors.toList());
docListSplit.forEach(docList -> {
TransactionWriteRequest documentTx = new TransactionWriteRequest();
for (Document document : docList) {
LOG.info("Migrate Documents- docList Size: {}", docList.size());
LOG.info("Migrate Documents- Doc Id: {}", document.getId());
if (!StringUtils.isNullOrEmpty(document.getType()) && document.getType().equalsIgnoreCase("Faq")) {
if (docIdsList.contains(document.getId())) {
LOG.info("this doc already migrated:{}", document);
} else {
docIdsList.add(document.getId());
}
if ((!StringUtils.isNullOrEmpty(document.getFaq().getQuestion()))) {
LOG.info("doc FAQ {}", document.getFaq().getQuestion());
document.setTitle(document.getFaq().getQuestion());
document.setTitleSearch(document.getFaq().getQuestion().toLowerCase());
documentTx.addUpdate(document);
}
} else if (StringUtils.isNullOrEmpty(document.getType())) {
if (!StringUtils.isNullOrEmpty(document.getTitle()) ) {
if (!StringUtils.isNullOrEmpty(document.getQuestion())) {
document.setTitle(document.getQuestion());
document.setQuestion(null);
}
LOG.info("title {}", document.getTitle());
document.setTitleSearch(document.getTitle().toLowerCase());
documentTx.addUpdate(document);
}
}
}
if (documentTx.getTransactionWriteOperations() != null
&& !documentTx.getTransactionWriteOperations().isEmpty() && docList.size() > 0) {
LOG.info("DocumentTx size {}", documentTx.getTransactionWriteOperations().size());
documentDao.executeTransaction(documentTx, null);
}
});
noOfDocumentsMigrated = noOfDocumentsMigrated + docScanList.getScannedCount();
}while(docLastEvaluatedKey != null && (endTime.compareTo(Instant.now()) > 0));
LOG.info("Migrate Documents execution finished at:{}", Instant.now());
if(docLastEvaluatedKey != null && docLastEvaluatedKey.get("base_id") != null)
sqsAdapter.get().sendMessage(docLastEvaluatedKey.get("base_id").toString(), docLastEvaluatedKey.get("type_key").toString(),
MIGRATE, MIGRATE_DOCUMENT_QUEUE_NAME);
LOG.info("No Of Documents Migrated:{}", noOfDocumentsMigrated);
}catch(Exception e) {
LOG.error("Exception", e);
}
return docLastEvaluatedKey;
}
Note: I would've added this speculation as a comment but my reputation does not allow
I think the issue that you're seeing here could be caused by the Scans not being ordered. So as long as your Scan would be executed in a single lambda I'd expect to you see that everything was handled fine. However, as soon as you hit the runtime limit of the lambda & start a new one your Scan will essentially get a new "ScanID" which might come in a different order. Based on the different order you're now skipping a certain set of entries.
I haven't tried to replicate this behavior & sadly there is no clear indication in the AWS documentation whether a Scan Request can be created in a new Session/Application.
I think #Charles' suggestion might help you in this case as you can simply run the entire migration in one process.

Update specific attribute using UpdateItemEnhancedRequest DynamoDb java sdk2

We have a DynamoDB table which has an attribute counter, which will be decremented asynchronously by multiple lambda based on an event. I am trying to update the counter using UpdateItemEnhancedRequest (using the Dynamodb Enhanced Client. - JAVA SDK 2). I am able to build the condition for updating the counter but it updates the entire item and not just the counter. Can somebody please guide on how to update a single attribute using DynamoDb Enhanced Client?
Code Sample
public void update(String counter, T item) {
AttributeValue value = AttributeValue.builder().n(counter).build();
Map<String, AttributeValue> expressionValues = new HashMap<>();
expressionValues.put(":value", value);
Expression myExpression = Expression.builder()
.expression("nqctr = :value")
.expressionValues(expressionValues)
.build();
UpdateItemEnhancedRequest<T> updateItemEnhancedRequest =
UpdateItemEnhancedRequest.builder(collectionClassName)
.item(item)
.conditionExpression(myExpression)
.build();
getTable().updateItem(updateItemEnhancedRequest);
}
When you update a specific column, you need to specify which column to update. Assume we have this table:
Now assume we want to update the archive column. You need to specify the column in your code. Here we change the archive column of the item that corresponds to the key to Closed (a single column update). Notice we specify the column name by using the HashMap object named updatedValues.
// Archives an item based on the key
public String archiveItem(String id){
DynamoDbClient ddb = getClient();
HashMap<String,AttributeValue> itemKey = new HashMap<String,AttributeValue>();
itemKey.put("id", AttributeValue.builder()
.s(id)
.build());
HashMap<String, AttributeValueUpdate> updatedValues =
new HashMap<String,AttributeValueUpdate>();
// Update the column specified by name with updatedVal
updatedValues.put("archive", AttributeValueUpdate.builder()
.value(AttributeValue.builder()
.s("Closed").build())
.action(AttributeAction.PUT)
.build());
UpdateItemRequest request = UpdateItemRequest.builder()
.tableName("Work")
.key(itemKey)
.attributeUpdates(updatedValues)
.build();
try {
ddb.updateItem(request);
return"The item was successfully archived";
NOTE: This is not the Enhanced Client.
This code is from the AWS Tutorial that show how to build a Java web app by using Spring Boot. Full tutorial here:
Creating the DynamoDB web application item tracker
TO update a single column using the Enhanced Client, call the Table method. This returns a DynamoDbTable instance. Now you can call the updateItem method.
Here is the logic to update the the archive column using the Enhanced Client. Notice you get a Work object, call its setArchive then pass the Work object. workTable.updateItem(r->r.item(work));
Java code:
// Update the archive column by using the Enhanced Client.
public String archiveItemEC(String id) {
DynamoDbClient ddb = getClient();
try {
DynamoDbEnhancedClient enhancedClient = DynamoDbEnhancedClient.builder()
.dynamoDbClient(getClient())
.build();
DynamoDbTable<Work> workTable = enhancedClient.table("Work", TableSchema.fromBean(Work.class));
//Get the Key object.
Key key = Key.builder()
.partitionValue(id)
.build();
// Get the item by using the key.
Work work = workTable.getItem(r->r.key(key));
work.setArchive("Closed");
workTable.updateItem(r->r.item(work));
return"The item was successfully archived";
} catch (DynamoDbException e) {
System.err.println(e.getMessage());
System.exit(1);
}
return "";
}
This answer shows both ways to update a single column in a DynamoDB table. The above tutorial now shows this way.
In your original solution, you're misinterpreting the meaning of the conditionExpression attribute. This is used to validate conditions that must be true on the item that matches the key in order to perform the update, not the expression to perform the update itself.
There is a way to perform this operation with the enhanced client without needing to fetch the object before making an update. The UpdateItemEnhancedRequest class has an ignoreNulls attribute that will exclude all null attributes from the update. This is false by default, which is what causes a full overwrite of the object.
Let's assume this is the structure of your item (without all the enhanced client annotations and boilerplate, you can add those):
class T {
public String partitionKey;
public Int counter;
public String someOtherAttribute
public T(String partitionKey) {
this.partitionKey = partitionKey;
this.counter = null;
this.someOtherAttribute = null
}
}
You can issue an update of just the counter, and only if the item exists, like this:
public void update(Int counter, String partitionKey) {
T updateItem = new T(partitionKey)
updateItem.counter = counter
Expression itemExistsExpression = Expression.builder()
.expression("attribute_exists(partitionKey)")
.build();
UpdateItemEnhancedRequest<T> updateItemEnhancedRequest =
UpdateItemEnhancedRequest.builder(collectionClassName)
.item(item)
.conditionExpression(itemExistsExpression)
.ignoreNulls(true)
.build();
getTable().updateItem(updateItemEnhancedRequest);
}

Google Dataflow template job not scaling when writing records to Google datastore

I have a small dataflow job triggered from a cloud function using a dataflow template. The job basically reads from a table in Bigquery, converts the resultant Tablerow to a Key-Value, and writes the Key-Value to Datastore.
This is what my code looks like :-
PCollection<TableRow> bigqueryResult = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQuery()).usingStandardSql()
.withoutValidation());
bigqueryResult.apply("WriteFromBigqueryToDatastore", ParDo.of(new DoFn<TableRow, String>() {
#ProcessElement
public void processElement(ProcessContext pc) {
TableRow row = pc.element();
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactory keyFactoryCounts = datastore.newKeyFactory().setNamespace("MyNamespace")
.setKind("MyKind");
Key key = keyFactoryCounts.newKey("Key");
Builder builder = Entity.newBuilder(key);
builder.set("Key", BooleanValue.newBuilder("Value").setExcludeFromIndexes(true).build());
Entity entity= builder.build();
datastore.put(entity);
}
}));
This pipeline runs fine when the number of records I try to process is anywhere in the range of 1 to 100. However, when I try putting more load on the pipeline, ie, ~10000 records, the pipeline does not scale (eventhough autoscaling is set to THROUGHPUT based and maximumWorkers is specified to as high as 50 with an n1-standard-1 machine type). The job keeps processing 3 or 4 elements per second with one or two workers. This is impacting the performance of my system.
Any advice on how to scale up the performance is very welcome.
Thanks in advance.
Found a solution by using DatastoreIO instead of the datastore client.
Following is the snippet I used,
PCollection<TableRow> row = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQueryForSegmentedUsers()).usingStandardSql()
.withoutValidation());
PCollection<com.google.datastore.v1.Entity> userEntity = row.apply("ConvertTablerowToEntity", ParDo.of(new DoFn<TableRow, com.google.datastore.v1.Entity>() {
#SuppressWarnings("deprecation")
#ProcessElement
public void processElement(ProcessContext pc) {
final String namespace = "MyNamespace";
final String kind = "MyKind";
com.google.datastore.v1.Key.Builder keyBuilder = DatastoreHelper.makeKey(kind, "root");
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
final com.google.datastore.v1.Key ancestorKey = keyBuilder.build();
TableRow row = pc.element();
String entityProperty = "sample";
String key = "key";
com.google.datastore.v1.Entity.Builder entityBuilder = com.google.datastore.v1.Entity.newBuilder();
com.google.datastore.v1.Key.Builder keyBuilder1 = DatastoreHelper.makeKey(ancestorKey, kind, key);
if (namespace != null) {
keyBuilder1.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder1.build());
entityBuilder.getMutableProperties().put(entityProperty, DatastoreHelper.makeValue("sampleValue").build());
pc.output(entityBuilder.build());
}
}));
userEntity.apply("WriteToDatastore", DatastoreIO.v1().write().withProjectId(options.getProject()));
This solution was able to scale from 3 elements per second with 1 worker to ~1500 elements per second with 20 workers.
At least with python's ndb client library it's possible to write up to 500 entities at a time in a single .put_multi() datastore call - a whole lot faster than calling .put() for one entity at a time (the calls are blocking on the underlying RPCs)
I'm not a java user, but a similar technique appears to be available for it as well. From Using batch operations:
You can use the batch operations if you want to operate on multiple
entities in a single Cloud Datastore call.
Here is an example of a batch call:
Entity employee1 = new Entity("Employee");
Entity employee2 = new Entity("Employee");
Entity employee3 = new Entity("Employee");
// ...
List<Entity> employees = Arrays.asList(employee1, employee2, employee3);
datastore.put(employees);

Dynamodb batch put if item does not exist

DynamoDB does not support batch update, it supports batch put only.
But is it possible to batchPut only of item with key does not exist?
In the batchWriteItem, there is the following note:
For example, you cannot specify conditions on individual put and delete requests, and BatchWriteItem does not return deleted items in the response.
Instead, I would recommend using putItem with a conditional expression. Towards the bottom of the putItem documentation there is the following note:
[...] To prevent a new item from replacing an existing item, use a
conditional expression that contains the attribute_not_exists function
with the name of the attribute being used as the partition key for the
table [...]
So make sure to add the following to ConditionExpression (using NodeJS syntax here)
const params = {
Item: {
userId: {
S: "Beyonce",
}
},
ConditionExpression: "attribute_not_exists(userId)"
};

How to use auto increment for primary key id in dynamodb

I am new to dynamodb. I want to auto increment id value when I use putitem with dynamodb.
Is possible to do that?
This is anti-pattern in DynamoDB which is build to scale across many partitions/shards/servers. DynamoDB does not support auto-increment primary keys due to scaling limitations and cannot be guaranteed across multiple servers.
Better option is to assemble primary key from multiple indices. Primary key can be up to 2048 bytes. There are few options:
Use UUID as your key - possibly time based UUID which makes it unique, evenly distributed and carries time value
Use randomly generated number or timestamp + random (possibly bit-shifting) like: ts << 12 + random_number
Use another service or DynamoDB itself to generate incremental unique id (requires extra call)
Following code will auto-increment counter in DynamoDB and then you can use it as primary key.
var documentClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: 'sampletable',
Key: { HashKey : 'counters' },
UpdateExpression: 'ADD #a :x',
ExpressionAttributeNames: {'#a' : "counter_field"},
ExpressionAttributeValues: {':x' : 1},
ReturnValues: "UPDATED_NEW" // ensures you get value back
};
documentClient.update(params, function(err, data) {});
// once you get new value, use it as your primary key
My personal favorite is using timestamp + random inspired by Instagram's Sharding ID generation at http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
Following function will generate id for a specific shard (provided as parameter). This way you can have unique key, which is assembled from timestamp, shard no. and some randomness (0-512).
var CUSTOMEPOCH = 1300000000000; // artificial epoch
function generateRowId(shardId /* range 0-64 for shard/slot */) {
var ts = new Date().getTime() - CUSTOMEPOCH; // limit to recent
var randid = Math.floor(Math.random() * 512);
ts = (ts * 64); // bit-shift << 6
ts = ts + shardId;
return (ts * 512) + randid;
}
var newPrimaryHashKey = "obj_name:" + generateRowId(4);
// output is: "obj_name:8055517407349240"
DynamoDB doesn't provide this out of the box. You can generate something in your application such as UUIDs that "should" be unique enough for most systems.
I noticed you were using Node.js (I removed your tag). Here is a library that provides UUID functionality: node-uuid
Example from README
var uuid = require('node-uuid');
var uuid1 = uuid.v1();
var uuid2 = uuid.v1({node:[0x01,0x23,0x45,0x67,0x89,0xab]});
var uuid3 = uuid.v1({node:[0, 0, 0, 0, 0, 0]})
var uuid4 = uuid.v4();
var uuid5 = uuid.v4();
You probably can use AtomicCounters.
With AtomicCounters, you can use the UpdateItem operation to implement
an atomic counter—a numeric attribute that is incremented,
unconditionally, without interfering with other write requests. (All
write requests are applied in the order in which they were received.)
With an atomic counter, the updates are not idempotent. In other
words, the numeric value increments each time you call UpdateItem.
You might use an atomic counter to track the number of visitors to a
website. In this case, your application would increment a numeric
value, regardless of its current value. If an UpdateItem operation
fails, the application could simply retry the operation. This would
risk updating the counter twice, but you could probably tolerate a
slight overcounting or undercounting of website visitors.
Came across a similar issue, where I required auto-incrementing primary key in my table. We could use some randomization techniques to generate a random key and store it using that. But it won't be in a incremental fashion.
If you require something in incremental fashion, you can use Unix Time as your primary key. Not assuring, that you can get a accurate incrementation(one-by-one), but yes every record you put, it would be in incremental fashion, with respect to the difference in how much time each record in inserted in.
Not a complete solution, if you don't want to read the entire table and get it's last id and then increment it.
Following is the code for inserting a record in DynamoDB using NodeJS:
.
.
const params = {
TableName: RANDOM_TABLE,
Item: {
ip: this.ip,
id: new Date().getTime()
}
}
dynamoDb.put(params, (error, result) => {
console.log(error, result);
});
.
.
If you are using NoSQL Dynamo DB then using Dynamoose, you can easily set default unique id, here is the simple user create example
// User.modal.js
const dynamoose = require("dynamoose");
const { v4: uuidv4 } = require("uuid");
const userSchema = new dynamoose.Schema(
{
id: {
type: String,
hashKey: true,
},
displayName: String,
firstName: String,
lastName: String,
},
{ timestamps: true },
);
const User = dynamoose.model("User", userSchema);
module.exports = User;
// User.controller.js
exports.create = async (req, res) => {
const user = new User({ id: uuidv4(), ...req.body }); // set unique id
const [err, response] = await to(user.save());
if (err) {
return badRes(res, err);
}
return goodRes(res, reponse);
};
Update for 2022 :
I was looking for the same issue and came across following research.
DynamoDB still doesn't support auto-increment of primary keys.
https://aws.amazon.com/blogs/database/simulating-amazon-dynamodb-unique-constraints-using-transactions/
Also the package node-uuid is now deprecated. They recommend we use uuid package instead that creates RFC4122 compliant UUID's.
npm install uuid
import { v4 as uuidv4 } from 'uuid';
uuidv4(); // ⇨ '9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d'
For Java developers, there is the DynamoDBMapper, which is a simple ORM. This supports the DynamoDBAutoGeneratedKey annotation. It doesn't increment a numeric value like a typical "Long id", but rather generates a UUID like other answers here suggest. If you're mapping classes as you would with Hibernate, GORM, etc., this is more natural with less code.
I see no caveats in the docs about scaling issues. And it eliminates the issues with under or over-counting as you have with the auto-incremented numeric values (which the docs do call out).