Getting local partitions on Ignite node - c++

Is there a way to get partitions of an Ignite node in C++?
I would like to parallelize scan queries over the partitions.
Something similar to this in Java:
ignite.compute(ignite.cluster().forDataNodes("myCache"))
.broadcast(new IgniteCallable<Void>() {
#IgniteInstanceResource
private Ignite ignite0;
#Override public Void call() throws Exception {
ClusterNode localNode = ignite0.cluster().localNode();
// get partitions
int[] parts = ignite0.affinity("myCache").primaryPartitions(localNode);
partList.parallelStream().forEach(p -> {
ScanQuery<Integer, Record> qry = new ScanQuery().setLocal(true).setPartition(p);
// query over the partition.
...
}
}

There is no yet Cluster API in Ignite C++, though there is some Compute API. You can track ticket [1] for updates on Cluster API.
[1] - https://issues.apache.org/jira/browse/IGNITE-5708

Related

Update specific attribute using UpdateItemEnhancedRequest DynamoDb java sdk2

We have a DynamoDB table which has an attribute counter, which will be decremented asynchronously by multiple lambda based on an event. I am trying to update the counter using UpdateItemEnhancedRequest (using the Dynamodb Enhanced Client. - JAVA SDK 2). I am able to build the condition for updating the counter but it updates the entire item and not just the counter. Can somebody please guide on how to update a single attribute using DynamoDb Enhanced Client?
Code Sample
public void update(String counter, T item) {
AttributeValue value = AttributeValue.builder().n(counter).build();
Map<String, AttributeValue> expressionValues = new HashMap<>();
expressionValues.put(":value", value);
Expression myExpression = Expression.builder()
.expression("nqctr = :value")
.expressionValues(expressionValues)
.build();
UpdateItemEnhancedRequest<T> updateItemEnhancedRequest =
UpdateItemEnhancedRequest.builder(collectionClassName)
.item(item)
.conditionExpression(myExpression)
.build();
getTable().updateItem(updateItemEnhancedRequest);
}
When you update a specific column, you need to specify which column to update. Assume we have this table:
Now assume we want to update the archive column. You need to specify the column in your code. Here we change the archive column of the item that corresponds to the key to Closed (a single column update). Notice we specify the column name by using the HashMap object named updatedValues.
// Archives an item based on the key
public String archiveItem(String id){
DynamoDbClient ddb = getClient();
HashMap<String,AttributeValue> itemKey = new HashMap<String,AttributeValue>();
itemKey.put("id", AttributeValue.builder()
.s(id)
.build());
HashMap<String, AttributeValueUpdate> updatedValues =
new HashMap<String,AttributeValueUpdate>();
// Update the column specified by name with updatedVal
updatedValues.put("archive", AttributeValueUpdate.builder()
.value(AttributeValue.builder()
.s("Closed").build())
.action(AttributeAction.PUT)
.build());
UpdateItemRequest request = UpdateItemRequest.builder()
.tableName("Work")
.key(itemKey)
.attributeUpdates(updatedValues)
.build();
try {
ddb.updateItem(request);
return"The item was successfully archived";
NOTE: This is not the Enhanced Client.
This code is from the AWS Tutorial that show how to build a Java web app by using Spring Boot. Full tutorial here:
Creating the DynamoDB web application item tracker
TO update a single column using the Enhanced Client, call the Table method. This returns a DynamoDbTable instance. Now you can call the updateItem method.
Here is the logic to update the the archive column using the Enhanced Client. Notice you get a Work object, call its setArchive then pass the Work object. workTable.updateItem(r->r.item(work));
Java code:
// Update the archive column by using the Enhanced Client.
public String archiveItemEC(String id) {
DynamoDbClient ddb = getClient();
try {
DynamoDbEnhancedClient enhancedClient = DynamoDbEnhancedClient.builder()
.dynamoDbClient(getClient())
.build();
DynamoDbTable<Work> workTable = enhancedClient.table("Work", TableSchema.fromBean(Work.class));
//Get the Key object.
Key key = Key.builder()
.partitionValue(id)
.build();
// Get the item by using the key.
Work work = workTable.getItem(r->r.key(key));
work.setArchive("Closed");
workTable.updateItem(r->r.item(work));
return"The item was successfully archived";
} catch (DynamoDbException e) {
System.err.println(e.getMessage());
System.exit(1);
}
return "";
}
This answer shows both ways to update a single column in a DynamoDB table. The above tutorial now shows this way.
In your original solution, you're misinterpreting the meaning of the conditionExpression attribute. This is used to validate conditions that must be true on the item that matches the key in order to perform the update, not the expression to perform the update itself.
There is a way to perform this operation with the enhanced client without needing to fetch the object before making an update. The UpdateItemEnhancedRequest class has an ignoreNulls attribute that will exclude all null attributes from the update. This is false by default, which is what causes a full overwrite of the object.
Let's assume this is the structure of your item (without all the enhanced client annotations and boilerplate, you can add those):
class T {
public String partitionKey;
public Int counter;
public String someOtherAttribute
public T(String partitionKey) {
this.partitionKey = partitionKey;
this.counter = null;
this.someOtherAttribute = null
}
}
You can issue an update of just the counter, and only if the item exists, like this:
public void update(Int counter, String partitionKey) {
T updateItem = new T(partitionKey)
updateItem.counter = counter
Expression itemExistsExpression = Expression.builder()
.expression("attribute_exists(partitionKey)")
.build();
UpdateItemEnhancedRequest<T> updateItemEnhancedRequest =
UpdateItemEnhancedRequest.builder(collectionClassName)
.item(item)
.conditionExpression(itemExistsExpression)
.ignoreNulls(true)
.build();
getTable().updateItem(updateItemEnhancedRequest);
}

Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM

I am working with AWS keyspaces and trying to insert data from C# but getting this error."Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM". can anyone please help out here.
AWS keyspace
CREATE KEYSPACE IF NOT EXISTS "DevOps"
WITH REPLICATION={'class': 'SingleRegionStrategy'} ;
Table
CREATE TABLE IF NOT EXISTS "DevOps"."projectdetails" (
"id" UUID PRIMARY KEY,
"name" text,
"lastupdatedtime" timestamp,
"baname" text,
"customerid" UUID)
C# code
public async Task AddRecord(List<projectdetails> projectDetails)
{
try
{
if (projectDetails.Count > 0)
{
foreach (var item in projectDetails)
{
projectdetails projectData = new projectdetails();
projectData.id = item.id;
projectData.name = item.name;
projectData.baname = "Vishal";
projectData.lastupdatedtime = item.lastupdatedtime;
projectData.customerid = 1;
await mapper.InsertAsync<projectdetails>(projectData);
}
}
}
catch (Exception e)
{
}
}
The error clearly says that you need to use correct consistency level LOCAL_QUORUM instead of the LOCAL_ONE that is used by default. AWS documentation says that for write operations, it's only the consistency level supported. You can set consistency level by using the version of InsertAsync that accepts the CqlQueryOptions, like this (maybe create instance of the query options only once, during initialization of the application):
mapper.InsertAsync<projectdetails>(projectData,
new CqlQueryOptions().SetConsistencyLevel(ConsistencyLevel.LocalQuorum))

Using Spanner within Apache Beam Dataflow

I am trying to add a Spanner connection within an Apache Beam ParDo(DoFn). I need to lookup some rows as part of the ParDo. The dataflow creates a number of workers (usually 4 max) and I use the startBundle and finishBundle methods to open and close the spanner connections for the workers lifetime. Then within the processElement method I perform the lookup for each item passing the DatabaseClient and using a singleUseReadOnlyTransaction.
I should add this is running as a dataflow under GCP
Some code to illustrate this.
private static CustomDoFn<String, TransactionImport> processRow = new CustomDoFn<String, TransactionImport>(){
private static final long serialVersionUID = 1L;
private Spanner spanner = null;
private DatabaseClient dbClient = null;
#StartBundle
public void startBundle(StartBundleContext c){
TransactionFileOptions options = c.getPipelineOptions().as(TransactionFileOptions.class);
com.google.cloud.spanner.SpannerOptions spannerOptions = com.google.cloud.spanner.SpannerOptions.newBuilder().build();
spanner = spannerOptions.getService();
String spannerProjectID = options.getSpannerProjectId();
String spannerInstanceID = options.getSpannerInstanceId();
String spannerDatabaseID = options.getSpannerDatabaseId();
DatabaseId db = DatabaseId.of(spannerProjectID, spannerInstanceID, spannerDatabaseID);
dbClient = spanner.getDatabaseClient(db);
}
#FinishBundle
public void finishBundle(FinishBundleContext c){
spanner.close();
}
#ProcessElement
public void processElement(DoFn<String, TransactionImport>.ProcessContext c) throws Exception {
TransactionImport import = new TransactionImport();
Statement statement = Statement.newBuilder("SELECT * FROM Table1 WHERE Name= #Name")
.bind("Name").to( text)
.build();
ResultSet resultSet = dbClient.singleUseReadOnlyTransaction().executeQuery(statement);
// set some value on import dependant on retrieved value
c.output(import);
}
This always results in the dataflow not completing and when I check the log I see:
Processing stuck in step Process Rows for at least 05m00s without outputting or completing in state process
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
at com.google.common.util.concurrent.Uninterruptibles.takeUninterruptibly(Uninterruptibles.java:233)
at com.google.cloud.spanner.SessionPool$Waiter.take(SessionPool.java:411)
at com.google.cloud.spanner.SessionPool$Waiter.access$3300(SessionPool.java:399)
at com.google.cloud.spanner.SessionPool.getReadSession(SessionPool.java:754)
at com.google.cloud.spanner.DatabaseClientImpl.singleUseReadOnlyTransaction(DatabaseClientImpl.java:52)
at com.mycompany.pt.SpannerDataAccess.getBinDetails(SpannerDataAccess.java:197)
at com.mycompany.pt.transactionFiles.TransactionFileDataflow$1.processLine(TransactionFileDataflow.java:411)
at com.mycompany.pt.transactionFiles.TransactionFileDataflow$1.processElement(TransactionFileDataflow.java:336)
at com.mycompany.pt.transactionFiles.TransactionFileDataflow$1$DoFnInvoker.invokeProcessElement(Unknown Source)
`
Does anyone have any experience using Spanner like this within a ParDo?
I'm not a spanner expert, but maybe I can help:
You should use #Setup/#Teardown to connect & disconnect from spanner. #{Start,Finish}Bundle gets called multiple times over the lifetime of a worker. See here for more details: https://beam.apache.org/documentation/execution-model/#bundling-and-persistence
Does your processElement method ever emit an element using
c.output(...)? If not, beam will think your pipeline is stuck

Google Dataflow template job not scaling when writing records to Google datastore

I have a small dataflow job triggered from a cloud function using a dataflow template. The job basically reads from a table in Bigquery, converts the resultant Tablerow to a Key-Value, and writes the Key-Value to Datastore.
This is what my code looks like :-
PCollection<TableRow> bigqueryResult = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQuery()).usingStandardSql()
.withoutValidation());
bigqueryResult.apply("WriteFromBigqueryToDatastore", ParDo.of(new DoFn<TableRow, String>() {
#ProcessElement
public void processElement(ProcessContext pc) {
TableRow row = pc.element();
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactory keyFactoryCounts = datastore.newKeyFactory().setNamespace("MyNamespace")
.setKind("MyKind");
Key key = keyFactoryCounts.newKey("Key");
Builder builder = Entity.newBuilder(key);
builder.set("Key", BooleanValue.newBuilder("Value").setExcludeFromIndexes(true).build());
Entity entity= builder.build();
datastore.put(entity);
}
}));
This pipeline runs fine when the number of records I try to process is anywhere in the range of 1 to 100. However, when I try putting more load on the pipeline, ie, ~10000 records, the pipeline does not scale (eventhough autoscaling is set to THROUGHPUT based and maximumWorkers is specified to as high as 50 with an n1-standard-1 machine type). The job keeps processing 3 or 4 elements per second with one or two workers. This is impacting the performance of my system.
Any advice on how to scale up the performance is very welcome.
Thanks in advance.
Found a solution by using DatastoreIO instead of the datastore client.
Following is the snippet I used,
PCollection<TableRow> row = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQueryForSegmentedUsers()).usingStandardSql()
.withoutValidation());
PCollection<com.google.datastore.v1.Entity> userEntity = row.apply("ConvertTablerowToEntity", ParDo.of(new DoFn<TableRow, com.google.datastore.v1.Entity>() {
#SuppressWarnings("deprecation")
#ProcessElement
public void processElement(ProcessContext pc) {
final String namespace = "MyNamespace";
final String kind = "MyKind";
com.google.datastore.v1.Key.Builder keyBuilder = DatastoreHelper.makeKey(kind, "root");
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
final com.google.datastore.v1.Key ancestorKey = keyBuilder.build();
TableRow row = pc.element();
String entityProperty = "sample";
String key = "key";
com.google.datastore.v1.Entity.Builder entityBuilder = com.google.datastore.v1.Entity.newBuilder();
com.google.datastore.v1.Key.Builder keyBuilder1 = DatastoreHelper.makeKey(ancestorKey, kind, key);
if (namespace != null) {
keyBuilder1.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder1.build());
entityBuilder.getMutableProperties().put(entityProperty, DatastoreHelper.makeValue("sampleValue").build());
pc.output(entityBuilder.build());
}
}));
userEntity.apply("WriteToDatastore", DatastoreIO.v1().write().withProjectId(options.getProject()));
This solution was able to scale from 3 elements per second with 1 worker to ~1500 elements per second with 20 workers.
At least with python's ndb client library it's possible to write up to 500 entities at a time in a single .put_multi() datastore call - a whole lot faster than calling .put() for one entity at a time (the calls are blocking on the underlying RPCs)
I'm not a java user, but a similar technique appears to be available for it as well. From Using batch operations:
You can use the batch operations if you want to operate on multiple
entities in a single Cloud Datastore call.
Here is an example of a batch call:
Entity employee1 = new Entity("Employee");
Entity employee2 = new Entity("Employee");
Entity employee3 = new Entity("Employee");
// ...
List<Entity> employees = Arrays.asList(employee1, employee2, employee3);
datastore.put(employees);

How to implement asynchronyous response?

I have controller with method that blocks the Play server thread due to very slow Database query. I need to implement controller method in a way that it don't block the thread.
I have read documentation: http://www.playframework.org/documentation/1.2.4/asynchronous
There's absolutely no examples anywhere on how to do this. The only thing that I found close is this https://github.com/playframework/play/blob/master/samples-and-tests/chat/app/controllers/LongPolling.java
It simply wraps result in await();
When I try to do that it doesn't work.
routes:
GET /blog Controller.blog
Controller (this is not an actual slow query but everything else is identical):
public static void blog() {
String queryStr = "SELECT b FROM Blog b ORDER BY createTime DESC";
JPAQuery q = Blog.find(queryStr);
List<Blog> bList = q.fetch(100);
List<BlogDTO> list = new ArrayList<BlogDTO>(bList.size());
for (Blog b : bList) {
BlogDTO obj = new BlogDTO(b);
list.add(obj);
}
renderJSON(list);
}
I tried List<Blog> bList = await(q.fetch(100)); but that doesn't work.
I have not worked with Future and promises before.
Can anyone give me any pointers on how to approach this?
For me the best way to do this is to use a Job that returns a List object. Then in your controller you can await for the job termination :
public static void blog() {
List<BlogDTO> list = await(new BlogPostJob().now());
renderJSON(list);
}
and you put your jpa code in your job
Because JDBC uses blocking IO, any slow database query will always block a Thread.
The only way seems to be using Job for that purpose.