I have a requirement where i need 40 threads to do a certain task (merging) and around 20 threads to do another task(persistence). Merging takes about 5X more time than persistence.
I am using message driven beans to accomplish this concurrency.
I created one MDB RecordMerger with the following configuration
#MessageDriven(activationConfig = {
#ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Queue"),
#ActivationConfigProperty(propertyName = "destination", propertyValue = "testing/RecordMerger"),
#ActivationConfigProperty(propertyName = "maxSessions", propertyValue = "40")
})
and I did similar thing for persistence
#MessageDriven(activationConfig = {
#ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Queue"),
#ActivationConfigProperty(propertyName = "destination", propertyValue = "testing/RecordPersistor"),
#ActivationConfigProperty(propertyName = "maxSessions", propertyValue = "20")
})
My configuration in tomee.xml is as follows
<Container id="MyJmsMdbContainer" ctype="MESSAGE">
ResourceAdapter = MyJmsResourceAdapter
InstanceLimit = 40
</Container>
The recordmerging queue has a very fast production, so there are always new elements in record merging queue. Record merging queue puts the data in Record Persistence queue.
The problem that i am facing is that when record merger has been configured to use 40 threads, my tomee server does not instantiates record persistence MDB, which results in records being piled up in that queue. If i reduce the maxSession property of record merger to 20, both the MDB's start getting instantiated.
Can any one please guide me what i need to do to ensure that both MDB's are running and record merger is having 40 threads.
you likely also need to set threadPoolSize=40 (default to 30) in the resource adapter definition (tomee.xml)
Related
We're trying to make a simple data migration in one of our tables in DDB.
Basically we're adding a new field and we need to backfill all the Documents in one of our tables.
This table has around 700K documents.
The process we follow is quite simple:
Manually trigger a lambda that will scan the table and for each document, will update the document and continue doing the same til its close to the 15 minutes top, in that case
Puts LastEvaluatedKey into SQS to trigger new lambda execution that uses that key to continue scanning.
Process goes on spawining lambdas sequentially as needed until there are no more documents
The problem we found is as follows...
Once the migration is done we noticed that the number of documents updated is way lower than the total number of documents existing in that table. It's a random value, not the same always but it ranges from tens of thousands to hundreds of thousands (worst case we seen was 300K difference).
This is obviously a problem, because if we scan the documents again, it seems obvious some documents were not migrated. We thought at first this was because of some clients updating/inserting new documents but the throughput on that table is not that large that will justify such a big difference, so this is not that there are new documents being added while we run the migration.
We tried a second approach that was first scanning, because if we only scan, we noticed that number of scan documents == count of documents in table, so we tried to dump the IDs of the documents in another table, then scan that table and update those items again. Funny thing, same problem happens with this new table with just IDs, there are way less than the count in the table we want to update, thus, we're back to square one.
We thought about using parallel scans but I don't see how this could benefit plus I don't want to compromise reading capacity for the table while running the migration.
Anybody with experience in data migrations in DDB can shed some light here? We're not able to figure out what we're doing wrong.
UPDATE: Sharing the function that is triggered and actually scans and updates
#Override
public Map<String, AttributeValue> migrateDocuments(String lastEvaluatedKey, String typeKey){
LOG.info("Migrate Documents started {} ", lastEvaluatedKey);
int noOfDocumentsMigrated = 0;
Map<String, AttributeValue> docLastEvaluatedKey = null;
DynamoDBMapperConfig documentConfig = new DynamoDBMapperConfig.TableNameOverride("KnowledgeDocumentMigration").config();
if(lastEvaluatedKey != null) {
docLastEvaluatedKey = new HashMap<String,AttributeValue>();
docLastEvaluatedKey.put("base_id", new AttributeValue().withS(lastEvaluatedKey));
docLastEvaluatedKey.put("type_key",new AttributeValue().withS(typeKey));
}
Instant endTime = Instant.now().plusSeconds(840);
LOG.info("Migrate Documents endTime:{}", endTime);
try {
do {
ScanResultPage<Document> docScanList = documentDao.scanDocuments(docLastEvaluatedKey, documentConfig);
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("Migrate Docs- docScanList Size: {}", docScanList.getScannedCount());
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("lastEvaluatedKey:{}", docLastEvaluatedKey);
final int chunkSize = 25;
final AtomicInteger counter = new AtomicInteger();
final Collection<List<Document>> docChunkList = docScanList.getResults().stream()
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values();
List<List<Document>> docListSplit = docChunkList.stream().collect(Collectors.toList());
docListSplit.forEach(docList -> {
TransactionWriteRequest documentTx = new TransactionWriteRequest();
for (Document document : docList) {
LOG.info("Migrate Documents- docList Size: {}", docList.size());
LOG.info("Migrate Documents- Doc Id: {}", document.getId());
if (!StringUtils.isNullOrEmpty(document.getType()) && document.getType().equalsIgnoreCase("Faq")) {
if (docIdsList.contains(document.getId())) {
LOG.info("this doc already migrated:{}", document);
} else {
docIdsList.add(document.getId());
}
if ((!StringUtils.isNullOrEmpty(document.getFaq().getQuestion()))) {
LOG.info("doc FAQ {}", document.getFaq().getQuestion());
document.setTitle(document.getFaq().getQuestion());
document.setTitleSearch(document.getFaq().getQuestion().toLowerCase());
documentTx.addUpdate(document);
}
} else if (StringUtils.isNullOrEmpty(document.getType())) {
if (!StringUtils.isNullOrEmpty(document.getTitle()) ) {
if (!StringUtils.isNullOrEmpty(document.getQuestion())) {
document.setTitle(document.getQuestion());
document.setQuestion(null);
}
LOG.info("title {}", document.getTitle());
document.setTitleSearch(document.getTitle().toLowerCase());
documentTx.addUpdate(document);
}
}
}
if (documentTx.getTransactionWriteOperations() != null
&& !documentTx.getTransactionWriteOperations().isEmpty() && docList.size() > 0) {
LOG.info("DocumentTx size {}", documentTx.getTransactionWriteOperations().size());
documentDao.executeTransaction(documentTx, null);
}
});
noOfDocumentsMigrated = noOfDocumentsMigrated + docScanList.getScannedCount();
}while(docLastEvaluatedKey != null && (endTime.compareTo(Instant.now()) > 0));
LOG.info("Migrate Documents execution finished at:{}", Instant.now());
if(docLastEvaluatedKey != null && docLastEvaluatedKey.get("base_id") != null)
sqsAdapter.get().sendMessage(docLastEvaluatedKey.get("base_id").toString(), docLastEvaluatedKey.get("type_key").toString(),
MIGRATE, MIGRATE_DOCUMENT_QUEUE_NAME);
LOG.info("No Of Documents Migrated:{}", noOfDocumentsMigrated);
}catch(Exception e) {
LOG.error("Exception", e);
}
return docLastEvaluatedKey;
}
Note: I would've added this speculation as a comment but my reputation does not allow
I think the issue that you're seeing here could be caused by the Scans not being ordered. So as long as your Scan would be executed in a single lambda I'd expect to you see that everything was handled fine. However, as soon as you hit the runtime limit of the lambda & start a new one your Scan will essentially get a new "ScanID" which might come in a different order. Based on the different order you're now skipping a certain set of entries.
I haven't tried to replicate this behavior & sadly there is no clear indication in the AWS documentation whether a Scan Request can be created in a new Session/Application.
I think #Charles' suggestion might help you in this case as you can simply run the entire migration in one process.
We have Navision Dynamics 2017 which has Sales Orders exposed as a SOAP web service. Technically, I am supposed to be able to create sales orders via this web service.
We also have another system built in C# .NET that has staff sales orders that need to go into Navision. This ordering system has all the information like customer, item, quantity, price etc to be able to create a valid order in Navision.
Can someone tell me how I can call the service and create a sales header and Line from the staff sales orders system into Navision..
Preferably a walk through tutorial would be ideal. I've searched and can't seem to find one that I can follow
The classic 'goto' for NAV services was always the following blog post, albeit it's an example for PHP. Take into account changes are required in NAV to be able to interact with the service (Hint: Activate NTLM):
https://blogs.msdn.microsoft.com/freddyk/2010/01/19/connecting-to-nav-web-services-from-php/
There is now an updated version by the same autor, complementing the original post:
https://blogs.msdn.microsoft.com/freddyk/2016/11/06/connecting-to-nav-web-services-from-php-take-2/
Example for C#:
https://blogs.msdn.microsoft.com/freddyk/2010/01/19/connecting-to-nav-web-services-from-c-using-web-reference/
Example for completing a Sales Order:
https://blogs.msdn.microsoft.com/freddyk/2009/11/17/extending-page-web-services-and-creating-a-sales-order-again/
I googled it for you. Is for Nav 2013, but it is all the same in 2017.
https://community.dynamics.com/nav/b/ishwarsblogspot/archive/2016/09/26/register-and-consume-codeunit-as-a-web-service-in-nav-2013-r2
Here's what I did in the rest API that I developed with .NET Core 5, I created 2 pages (one for sales header and one for sales lines) and one code-unit for invoking calculate discount action on NAV.
[Route("order")]
[HttpPost]
private async Task<dynamic> createOrder(orderDTO request)
{
var systemService = this.OrderServiceProvider.GetProxy();
List<OrderServiceReference.Sales_Quote_Line> lineList = new List<OrderServiceReference.Sales_Quote_Line>();
foreach (OrderLine orderLine in request.Sales_Quote_Line)
{
Sales_Quote_Line line = new Sales_Quote_Line()
{
Type = OrderServiceReference.Type.Item,
TypeSpecified = true,
No = orderLine.No,
Quantity = orderLine.Quantity,
QuantitySpecified = true
};
lineList.Add(line);
}
var task = await systemService.CreateAsync(new OrderServiceReference.Create()
{
Dis_SQ = new Dis_SQ()
{
Salesperson_Code = request.Salesperson_Code,
Sell_to_Customer_No = request.Sell_to_Customer_No,
Order_Date = new DateTime(),
SalesLines = lineList.ToArray()
}
});
var salesQuotes = task.Dis_SQ;
// var systemService2 = new DiscountServiceReference.Dis_Discount_Cal_PortClient();
var calculateSystemWebService = this.CalculateDiscountServiceProvider.GetProxy();
await calculateSystemWebService.CalcOffersSHAsync(new CalcOffersSH()
{
Body = new CalcOffersSHBody()
{
pDocNo = salesQuotes.No,
pDocType = 0
}
});
// get sales lines
var systemService1 = this.OrderLinesServiceProvider.GetProxy();
var task1 = await systemService1.ReadMultipleAsync(new SalesOrderServiceReference.ReadMultiple()
{
filter = new HHT_SO_Filter[]
{
new HHT_SO_Filter()
{
Criteria = salesQuotes.No,
Field = HHT_SO_Fields.Document_No
}
},
bookmarkKey = "",
setSize = 200
});
return new
{
sales_header = task.Dis_SQ,
sales_line = task1.ReadMultiple_Result1
};
}
If you need any more help comment below.
I have a small dataflow job triggered from a cloud function using a dataflow template. The job basically reads from a table in Bigquery, converts the resultant Tablerow to a Key-Value, and writes the Key-Value to Datastore.
This is what my code looks like :-
PCollection<TableRow> bigqueryResult = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQuery()).usingStandardSql()
.withoutValidation());
bigqueryResult.apply("WriteFromBigqueryToDatastore", ParDo.of(new DoFn<TableRow, String>() {
#ProcessElement
public void processElement(ProcessContext pc) {
TableRow row = pc.element();
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactory keyFactoryCounts = datastore.newKeyFactory().setNamespace("MyNamespace")
.setKind("MyKind");
Key key = keyFactoryCounts.newKey("Key");
Builder builder = Entity.newBuilder(key);
builder.set("Key", BooleanValue.newBuilder("Value").setExcludeFromIndexes(true).build());
Entity entity= builder.build();
datastore.put(entity);
}
}));
This pipeline runs fine when the number of records I try to process is anywhere in the range of 1 to 100. However, when I try putting more load on the pipeline, ie, ~10000 records, the pipeline does not scale (eventhough autoscaling is set to THROUGHPUT based and maximumWorkers is specified to as high as 50 with an n1-standard-1 machine type). The job keeps processing 3 or 4 elements per second with one or two workers. This is impacting the performance of my system.
Any advice on how to scale up the performance is very welcome.
Thanks in advance.
Found a solution by using DatastoreIO instead of the datastore client.
Following is the snippet I used,
PCollection<TableRow> row = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQueryForSegmentedUsers()).usingStandardSql()
.withoutValidation());
PCollection<com.google.datastore.v1.Entity> userEntity = row.apply("ConvertTablerowToEntity", ParDo.of(new DoFn<TableRow, com.google.datastore.v1.Entity>() {
#SuppressWarnings("deprecation")
#ProcessElement
public void processElement(ProcessContext pc) {
final String namespace = "MyNamespace";
final String kind = "MyKind";
com.google.datastore.v1.Key.Builder keyBuilder = DatastoreHelper.makeKey(kind, "root");
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
final com.google.datastore.v1.Key ancestorKey = keyBuilder.build();
TableRow row = pc.element();
String entityProperty = "sample";
String key = "key";
com.google.datastore.v1.Entity.Builder entityBuilder = com.google.datastore.v1.Entity.newBuilder();
com.google.datastore.v1.Key.Builder keyBuilder1 = DatastoreHelper.makeKey(ancestorKey, kind, key);
if (namespace != null) {
keyBuilder1.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder1.build());
entityBuilder.getMutableProperties().put(entityProperty, DatastoreHelper.makeValue("sampleValue").build());
pc.output(entityBuilder.build());
}
}));
userEntity.apply("WriteToDatastore", DatastoreIO.v1().write().withProjectId(options.getProject()));
This solution was able to scale from 3 elements per second with 1 worker to ~1500 elements per second with 20 workers.
At least with python's ndb client library it's possible to write up to 500 entities at a time in a single .put_multi() datastore call - a whole lot faster than calling .put() for one entity at a time (the calls are blocking on the underlying RPCs)
I'm not a java user, but a similar technique appears to be available for it as well. From Using batch operations:
You can use the batch operations if you want to operate on multiple
entities in a single Cloud Datastore call.
Here is an example of a batch call:
Entity employee1 = new Entity("Employee");
Entity employee2 = new Entity("Employee");
Entity employee3 = new Entity("Employee");
// ...
List<Entity> employees = Arrays.asList(employee1, employee2, employee3);
datastore.put(employees);
I've been researching A LOT for past 2 weeks and can't pinpoint the exact reason of my Meteor app returning results too slow.
Currently I have only a single collection in my Mongo database with around 2,00,000 documents. And to search I am using Meteor subscriptions on the basis of a given keyword. Here is my query:
db.collection.find({$or:[
{title:{$regex:".*java.*", $options:"i"}},
{company:{$regex:".*java.*", $options:"i"}}
]})
When I run above query in mongo shell, the results are returned instantly. But when I use it in Meteor client, the results take almost 40 seconds to return from server. Here is my meteor client code:
Template.testing.onCreated(function () {
var instance = this;
// initialize the reactive variables
instance.loaded = new ReactiveVar(0);
instance.limit = new ReactiveVar(20);
instance.autorun(function () {
// get the limit
var limit = instance.limit.get();
var keyword = Router.current().params.query.k;
var searchByLocation = Router.current().params.query.l;
var startDate = Session.get("startDate");
var endDate = Session.get("endDate");
// subscribe to the posts publication
var subscription = instance.subscribe('sub_testing', limit,keyword,searchByLocation,startDate,endDate);
// if subscription is ready, set limit to newLimit
$('#searchbutton').val('Searching');
if (subscription.ready()) {
$('#searchbutton').val('Search');
instance.loaded.set(limit);
} else {
console.log("> Subscription is not ready yet. \n\n");
}
});
instance.testing = function() {
return Collection.find({}, {sort:{id:-1},limit: instance.loaded.get()});
}
And here is my meteor server code:
Meteor.publish('sub_testing', function(limit,keyword,searchByLocation,startDate,endDate) {
Meteor._sleepForMs(200);
var pat = ".*" + keyword + ".*";
var pat2 = ".*" + searchByLocation + ".*";
return Jobstesting.find({$or:[{title:{$regex: pat, $options:"i"}}, { company:{$regex:pat,$options:"i"}},{ description:{$regex:pat,$options:"i"}},{location:{$regex:pat2,$options:"i"}},{country:{$regex:pat2,$options:"i"}}],$and:[{date_posted: { $gte : endDate, $lt: startDate }},{sort:{date_posted:-1},limit: limit,skip: limit});
});
One point I'd also like to mention here that I use "Load More" pagination and by default the limit parameter gets 20 records. On each "Load More" click, I increment the limit parameter by 20 so on first click it is 20, on second click 40 and so on...
Any help where I'm going wrong would be appreciated.
But when I use it in Meteor client, the results take almost 40 seconds to return from server.
You may be misunderstanding how Meteor is accessing your data.
Queries run on the client are processed on the client.
Meteor.publish - Makes data available on the server
Meteor.subscribe - Downloads that data from the server to the client.
Collection.find - Looks through the data on the client.
If you think the Meteor side is slow, you should time it server side (print time before/after) and file a bug.
If you're implementing a pager, you might try a meteor method instead, or
a pager package.
I have the following config
akka{
actor {
deployment{
/my-router {
dispatcher = akka.actor.my-dispatcher
router = round-robin-pool
nr-of-instances = 100
cluster {
enabled = on
max-nr-of-instances-per-node = 30
}
}
}
my-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 4
parallelism-factor = 2.0
parallelism-max = 20
}
}
}
}
I've found(with help of visualVM) that there are no threads of my-dispatcher used. However, if I specify my-dispatcher via .withDispatcher("akka.actor.my-dispatcher") when I create props for my router via FromConfig I can observe that threads. I can state that I observe those threads because I see threads with name like this: actorSystenName-akka.actor.my-dispatcher-8
So questions are:
How to set dispatcher for a router via config?
Will this dispatcher be used for routees(that are obviously childs of a router)?
What are the differences between specifying dispatcher via config and via withDispatcher?
I've also tried to surround config's dispatcher setting in "", but still didn't observe threads in visualVM with dispatcher's name, so do threads' names have such pattern {actorSystenName}-{dispatcher}-{number}?
EDIT
I've found that pool-dispatcher property can be used for setting router's dispatcher for children(routees). But FromConfig which extends Pool lacks overriding of usePoolDispatcher method. So 1 more question: is this made(usePoolDispatcher is not overriden in FromConfig) intentioally or FromConfig is not designed for such usage?