How to configure different batch sizes for different queues in "webjobs v3"? - azure-webjobs

There is some documentation on configuring different batch sizes for different queues using customQueueProcessor in webjobs v2 using .net framework. I would like to know how this is handled in webjobs v3?
var builder = new HostBuilder()
.UseEnvironment("Development")
.ConfigureWebJobs(b =>
{
b.AddAzureStorageCoreServices();
b.AddAzureStorage(a =>
{
a.BatchSize = 1;
});
})
This batch size is applicable for all the QueueTriggers in the code. How to use custom values for different queues?

If you want to set BatchSize per queue, you can implement an IQueueProcessorFactory:
public class CustomQueueProcessorFactory : IQueueProcessorFactory
{
public QueueProcessor Create(QueueProcessorFactoryContext context)
{
if (context.Queue.Name.Equals("fooqueue"))
{
// demonstrates how batch processing behavior can be customized
// per queue (as opposed to the global settings that apply to ALL queues)
context.BatchSize = 3;
context.NewBatchThreshold = 4;
...
}
return new QueueProcessor(context);
}
}
In this case - all queues will use your default BatchSize configuration, but queue trigger "fooqueue" will have BatchSize set to 3.
Register CustomQueueProcessorFactory in your ConfigureServices method as so:
builder.ConfigureServices((services) =>
{
services.AddSingleton<IQueueProcessorFactory, CustomQueueProcessorFactory>();
});

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

I'm looking for some examples of usage of Triggers and Timers in Apache beam, I wanted to use Processing-time timers for listening my data from pub sub in every 5 minutes and using Processing time triggers processing the above data collected in an hour altogether in python.
Please take a look at the following resources: Stateful processing with Apache Beam and Timely (and Stateful) Processing with Apache Beam
The first blog post is more general in how to handle states for context, and the second has some examples on buffering and triggering after a certain period of time, which seems similar to what you are trying to do.
A full example was requested. Here is what I was able to come up with:
PCollection<String> records =
pipeline.apply(
"ReadPubsub",
PubsubIO.readStrings()
.fromSubscription(
"projects/{project}/subscriptions/{subscription}"));
TupleTag<Iterable<String>> every5MinTag = new TupleTag<>();
TupleTag<Iterable<String>> everyHourTag = new TupleTag<>();
PCollectionTuple timersTuple =
records
.apply("WithKeys", WithKeys.of(1)) // A KV<> is required to use state. Keying by data is more appropriate than hardcode.
.apply(
"Batch",
ParDo.of(
new DoFn<KV<Integer, String>, Iterable<String>>() {
#StateId("buffer5Min")
private final StateSpec<BagState<String>> bufferedEvents5Min =
StateSpecs.bag();
#StateId("count5Min")
private final StateSpec<ValueState<Integer>> countState5Min =
StateSpecs.value();
#TimerId("every5Min")
private final TimerSpec every5MinSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#StateId("bufferHour")
private final StateSpec<BagState<String>> bufferedEventsHour =
StateSpecs.bag();
#StateId("countHour")
private final StateSpec<ValueState<Integer>> countStateHour =
StateSpecs.value();
#TimerId("everyHour")
private final TimerSpec everyHourSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#ProcessElement
public void process(
#Element KV<Integer, String> record,
#StateId("count5Min") ValueState<Integer> count5MinState,
#StateId("countHour") ValueState<Integer> countHourState,
#StateId("buffer5Min") BagState<String> buffer5Min,
#StateId("bufferHour") BagState<String> bufferHour,
#TimerId("every5Min") Timer every5MinTimer,
#TimerId("everyHour") Timer everyHourTimer) {
if (Objects.firstNonNull(count5MinState.read(), 0) == 0) {
every5MinTimer
.offset(Duration.standardMinutes(1))
.align(Duration.standardMinutes(1))
.setRelative();
}
buffer5Min.add(record.getValue());
if (Objects.firstNonNull(countHourState.read(), 0) == 0) {
everyHourTimer
.offset(Duration.standardMinutes(60))
.align(Duration.standardMinutes(60))
.setRelative();
}
bufferHour.add(record.getValue());
}
#OnTimer("every5Min")
public void onTimerEvery5Min(
OnTimerContext context,
#StateId("buffer5Min") BagState<String> bufferState,
#StateId("count5Min") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(every5MinTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
#OnTimer("everyHour")
public void onTimerEveryHour(
OnTimerContext context,
#StateId("bufferHour") BagState<String> bufferState,
#StateId("countHour") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(everyHourTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
})
.withOutputTags(every5MinTag, TupleTagList.of(everyHourTag)));
timersTuple
.get(every5MinTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<<do something every 5 min>>);
timersTuple
.get(everyHourTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<< do something every hour>>);
pipeline.run().waitUntilFinish();

Parallel HTTP requests with limited concurrency in redux-observable epic using rxjs

Have been trying to solve the issue for a while.
Currently I have an array of objects (i call them tiles), which is pretty big.
I have an API endpoint where I should send this objects one by one, this API returns nothing, just status.
I need to send this objects to endpoint in parallel and concurrent manner and when the last of them is successful I should emit some string value which goes to redux store.
const tilesEpic =(action$, _state$) => {
action$.pipe(
ofType('TILE_ACTION'),
map(tilesArray => postTilesConcurrently(tilesArray),
map(someId => someReduxAction(someId),
)
const postTilesConcurrently = (tilesArray) => {
const tilesToObservables = tilesArray.map(tile => defer(() => postTile(tile))
return from(tileToObservables).pipe(mergeAll(concurrencyLimit))
}
The problem is that I have no idea how to emit someId from postTilesConcurrently, now it triggers action after each request is complete.
mergeAll() will subscribe to all sources in parallel but it will also emit each result immediatelly. So instead you could use for example forkJoin() (
you could use toArray() operator as well).
forkJoin(tilesToObservables)
.pipe(
map(results => results???), // Get `someId` somehow from results
);
forkJoin() will emit just once after all source Observables emit at least once and complete. This means for each source Observable you'll get only the last value it emitted.
After Martin's reply I have adjusted my code in order to use forkJoin
const tilesEpic =(action$, _state$) => {
action$.pipe(
ofType('TILE_ACTION'),
concatMap(tilesArray => postTilesConcurrently(tilesArray),
map(({someId}) => someReduxAction(someId),
)
const postTilesConcurrently = (tilesArray) => {
const tilesToObservables = tilesArray.map(tile => defer(() => postTile(tile))
return forkJoin({
images: from(tileToObservables).pipe(mergeAll(concurrencyLimit)),
someId: from([someId]),
}

Write more than 25 items using BatchWriteItemEnhancedRequest Dynamodb JAVA SDK 2

I have an List items to be inserted into the DynamoDb collection. The size of the list may vary from 100 to 10k. I looking for an optimised way to Batch Write all the items using the BatchWriteItemEnhancedRequest (JAVA SDK2). What is the best way to add the items into the WriteBatch builder and then write the request using BatchWriteItemEnhancedRequest?
My Current Code:
WriteBatch.Builder<T> builder = BatchWriteItemEnhancedRequest.builder().writeBatches(builder.build()).build();
items.forEach(item -> { builder.addPutItem(item); });
BatchWriteItemEnhancedRequest bwr = BatchWriteItemEnhancedRequest.builder().writeBatches(builder.build()).build()
BatchWriteResult batchWriteResult =
DynamoDB.enhancedClient().batchWriteItem(getBatchWriteItemEnhancedRequest(builder));
do {
// Check for unprocessed keys which could happen if you exceed
// provisioned throughput
List<T> unprocessedItems = batchWriteResult.unprocessedPutItemsForTable(getTable());
if (unprocessedItems.size() != 0) {
unprocessedItems.forEach(unprocessedItem -> {
builder.addPutItem(unprocessedItem);
});
batchWriteResult = DynamoDB.enhancedClient().batchWriteItem(getBatchWriteItemEnhancedRequest(builder));
}
} while (batchWriteResult.unprocessedPutItemsForTable(getTable()).size() > 0);
Looking for a batching logic and a more better way to execute the BatchWriteItemEnhancedRequest.
I came up with a utility class to deal with that. Their batches of batches approach in v2 is overly complex for most use cases, especially when we're still limited to 25 items overall.
public class DynamoDbUtil {
private static final int MAX_DYNAMODB_BATCH_SIZE = 25; // AWS blows chunks if you try to include more than 25 items in a batch or sub-batch
/**
* Writes the list of items to the specified DynamoDB table.
*/
public static <T> void batchWrite(Class<T> itemType, List<T> items, DynamoDbEnhancedClient client, DynamoDbTable<T> table) {
Stream<List<T>> chunksOfItems = Lists.partition(items, MAX_DYNAMODB_BATCH_SIZE);
chunksOfItems.forEach(chunkOfItems -> {
List<T> unprocessedItems = batchWriteImpl(itemType, chunkOfItems, client, table);
while (!unprocessedItems.isEmpty()) {
// some failed (provisioning problems, etc.), so write those again
unprocessedItems = batchWriteImpl(itemType, unprocessedItems, client, table);
}
});
}
/**
* Writes a single batch of (at most) 25 items to DynamoDB.
* Note that the overall limit of items in a batch is 25, so you can't have nested batches
* of 25 each that would exceed that overall limit.
*
* #return those items that couldn't be written due to provisioning issues, etc., but were otherwise valid
*/
private static <T> List<T> batchWriteImpl(Class<T> itemType, List<T> chunkOfItems, DynamoDbEnhancedClient client, DynamoDbTable<T> table) {
WriteBatch.Builder<T> subBatchBuilder = WriteBatch.builder(itemType).mappedTableResource(table);
chunkOfItems.forEach(subBatchBuilder::addPutItem);
BatchWriteItemEnhancedRequest.Builder overallBatchBuilder = BatchWriteItemEnhancedRequest.builder();
overallBatchBuilder.addWriteBatch(subBatchBuilder.build());
return client.batchWriteItem(overallBatchBuilder.build()).unprocessedPutItemsForTable(table);
}
}

Delaying actions using Decentraland's ECS

How do I make an action occur with a delay, but after a timeout?
The setTimeout() function doesn’t work in Decentraland scenes, so is there an alternative?
For example, I want an entity to wait 300 milliseconds after it’s clicked before I remove it from the engine.
To implement this you’ll have to create:
A custom component to keep track of time
A component group to keep track of all the entities with a delay in the scene
A system that updates the timers con all these
components on each frame.
It sounds rather complicated, but once you created one delay, implementing another delay only takes one line.
The component:
#Component("timerDelay")
export class Delay implements ITimerComponent{
elapsedTime: number;
targetTime: number;
onTargetTimeReached: (ownerEntity: IEntity) => void;
private onTimeReachedCallback?: ()=> void
/**
* #param millisecs amount of time in milliseconds
* #param onTimeReachedCallback callback for when time is reached
*/
constructor(millisecs: number, onTimeReachedCallback?: ()=> void){
this.elapsedTime = 0
this.targetTime = millisecs / 1000
this.onTimeReachedCallback = onTimeReachedCallback
this.onTargetTimeReached = (entity)=>{
if (this.onTimeReachedCallback) this.onTimeReachedCallback()
entity.removeComponent(this)
}
}
}
The component group:
export const delayedEntities = engine.getComponentGroup(Delay)
The system:
// define system
class TimerSystem implements ISystem {
update(dt: number){
for (let entity of delayedEntities.entities) {
let timerComponent = entity.getComponent(component)
timerComponent.elapsedTime += dt
if (timerComponent.elapsedTime >= timerComponent.targetTime){
timerComponent.onTargetTimeReached(entity)
}
})
}
}
// instance system
engine.addSystem(new TimerSystem())
Once all these parts are in place, you can simply do the following to delay an execution in your scene:
const myEntity = new Entity()
myEntity.addComponent(new Delay(1000, () => {
log("time ran out")
}))
engine.addEntity(myEntity)
A few years late, but the OP's selected answer is kind of deprecated because you can accomplish a delay doing:
import { Delay } from "node_modules/decentraland-ecs-utils/timer/component/delay"
const ent = new Entity
ent.addComponent(new Delay(3 * 1000, () => {
// this code will run when time is up
}))
Read the docs.
Use the utils.Delay() function in the utils library.
This function just takes the delay time in milliseconds, and the function you want to execute.
Here's the full documentation, explaining how to add the library + how to use this function, including example code:
https://www.npmjs.com/package/decentraland-ecs-utils

Schedule/batch for large number of webservice callouts?

I'am new to Apex and I have to call a webservice for every account (for some thousands of accounts).
Usualy a single webservice request takes 500 to 5000 ms.
As far as I know schedulable and batchable classes are required for this task.
My idea was to group the accounts by country codes (Europe only) and start a batch for every group.
First batch is started by the schedulable class, next ones start in batch finish method:
global class AccValidator implements Database.Batchable<sObject>, Database.AllowsCallouts {
private List<String> countryCodes;
private countryIndex;
global AccValidator(List<String> countryCodes, Integer countryIndex) {
this.countryCodes = countryCodes;
this.countryIndex = countryIndex;
...
}
// Get Accounts for current country code
global Database.QueryLocator start(Database.BatchableContext bc) {...}
global void execute(Database.BatchableContext bc, list<Account> myAccounts) {
for (Integer i = 0; i < this.AccAccounts.size(); i++) {
// Callout for every Account
HttpRequest request ...
Http http = new Http();
HttpResponse response = http.send(request);
...
}
}
global void finish(Database.BatchableContext BC) {
if (this.countryIndex < this.countryCodes.size() - 1) {
// start next batch
Database.executeBatch(new AccValidator(this.countryCodes, this.countryIndex + 1), 200);
}
}
global static List<String> getCountryCodes() {...}
}
And my schedule class:
global class AccValidatorSchedule implements Schedulable {
global void execute(SchedulableContext sc) {
List<String> countryCodes = AccValidator.getCountryCodes();
Id AccAddressID = Database.executeBatch(new AccValidator(countryCodes, 0), 200);
}
}
Now I'am stuck with Salesforces execution governors and limits:
For nearly all callouts I get the exceptions "Read timed out" or "Exceeded maximum time allotted for callout (120000 ms)".
I also tried asynchronous callouts, but they don't work with batches.
So, is there any way to schedule a large number of callouts?
Have you tried to limit your execute method to 100? Salesforce only allows 100 callout per transaction. I.e.
Id AccAddressID = Database.executeBatch(new AccValidator(countryCodes, 0), 100);
Perhaps this might help you:
https://salesforce.stackexchange.com/questions/131448/fatal-errorsystem-limitexception-too-many-callouts-101