Schedule/batch for large number of webservice callouts?

Schedule/batch for large number of webservice callouts? - web-services

I'am new to Apex and I have to call a webservice for every account (for some thousands of accounts).
Usualy a single webservice request takes 500 to 5000 ms.
As far as I know schedulable and batchable classes are required for this task.
My idea was to group the accounts by country codes (Europe only) and start a batch for every group.
First batch is started by the schedulable class, next ones start in batch finish method:
global class AccValidator implements Database.Batchable<sObject>, Database.AllowsCallouts {
private List<String> countryCodes;
private countryIndex;
global AccValidator(List<String> countryCodes, Integer countryIndex) {
this.countryCodes = countryCodes;
this.countryIndex = countryIndex;
...
}
// Get Accounts for current country code
global Database.QueryLocator start(Database.BatchableContext bc) {...}
global void execute(Database.BatchableContext bc, list<Account> myAccounts) {
for (Integer i = 0; i < this.AccAccounts.size(); i++) {
// Callout for every Account
HttpRequest request ...
Http http = new Http();
HttpResponse response = http.send(request);
...
}
}
global void finish(Database.BatchableContext BC) {
if (this.countryIndex < this.countryCodes.size() - 1) {
// start next batch
Database.executeBatch(new AccValidator(this.countryCodes, this.countryIndex + 1), 200);
}
}
global static List<String> getCountryCodes() {...}
}
And my schedule class:
global class AccValidatorSchedule implements Schedulable {
global void execute(SchedulableContext sc) {
List<String> countryCodes = AccValidator.getCountryCodes();
Id AccAddressID = Database.executeBatch(new AccValidator(countryCodes, 0), 200);
}
}
Now I'am stuck with Salesforces execution governors and limits:
For nearly all callouts I get the exceptions "Read timed out" or "Exceeded maximum time allotted for callout (120000 ms)".
I also tried asynchronous callouts, but they don't work with batches.
So, is there any way to schedule a large number of callouts?

Have you tried to limit your execute method to 100? Salesforce only allows 100 callout per transaction. I.e.
Id AccAddressID = Database.executeBatch(new AccValidator(countryCodes, 0), 100);
Perhaps this might help you:
https://salesforce.stackexchange.com/questions/131448/fatal-errorsystem-limitexception-too-many-callouts-101

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

I'm looking for some examples of usage of Triggers and Timers in Apache beam, I wanted to use Processing-time timers for listening my data from pub sub in every 5 minutes and using Processing time triggers processing the above data collected in an hour altogether in python.

Please take a look at the following resources: Stateful processing with Apache Beam and Timely (and Stateful) Processing with Apache Beam
The first blog post is more general in how to handle states for context, and the second has some examples on buffering and triggering after a certain period of time, which seems similar to what you are trying to do.
A full example was requested. Here is what I was able to come up with:
PCollection<String> records =
pipeline.apply(
"ReadPubsub",
PubsubIO.readStrings()
.fromSubscription(
"projects/{project}/subscriptions/{subscription}"));
TupleTag<Iterable<String>> every5MinTag = new TupleTag<>();
TupleTag<Iterable<String>> everyHourTag = new TupleTag<>();
PCollectionTuple timersTuple =
records
.apply("WithKeys", WithKeys.of(1)) // A KV<> is required to use state. Keying by data is more appropriate than hardcode.
.apply(
"Batch",
ParDo.of(
new DoFn<KV<Integer, String>, Iterable<String>>() {
#StateId("buffer5Min")
private final StateSpec<BagState<String>> bufferedEvents5Min =
StateSpecs.bag();
#StateId("count5Min")
private final StateSpec<ValueState<Integer>> countState5Min =
StateSpecs.value();
#TimerId("every5Min")
private final TimerSpec every5MinSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#StateId("bufferHour")
private final StateSpec<BagState<String>> bufferedEventsHour =
StateSpecs.bag();
#StateId("countHour")
private final StateSpec<ValueState<Integer>> countStateHour =
StateSpecs.value();
#TimerId("everyHour")
private final TimerSpec everyHourSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#ProcessElement
public void process(
#Element KV<Integer, String> record,
#StateId("count5Min") ValueState<Integer> count5MinState,
#StateId("countHour") ValueState<Integer> countHourState,
#StateId("buffer5Min") BagState<String> buffer5Min,
#StateId("bufferHour") BagState<String> bufferHour,
#TimerId("every5Min") Timer every5MinTimer,
#TimerId("everyHour") Timer everyHourTimer) {
if (Objects.firstNonNull(count5MinState.read(), 0) == 0) {
every5MinTimer
.offset(Duration.standardMinutes(1))
.align(Duration.standardMinutes(1))
.setRelative();
}
buffer5Min.add(record.getValue());
if (Objects.firstNonNull(countHourState.read(), 0) == 0) {
everyHourTimer
.offset(Duration.standardMinutes(60))
.align(Duration.standardMinutes(60))
.setRelative();
}
bufferHour.add(record.getValue());
}
#OnTimer("every5Min")
public void onTimerEvery5Min(
OnTimerContext context,
#StateId("buffer5Min") BagState<String> bufferState,
#StateId("count5Min") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(every5MinTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
#OnTimer("everyHour")
public void onTimerEveryHour(
OnTimerContext context,
#StateId("bufferHour") BagState<String> bufferState,
#StateId("countHour") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(everyHourTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
})
.withOutputTags(every5MinTag, TupleTagList.of(everyHourTag)));
timersTuple
.get(every5MinTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<<do something every 5 min>>);
timersTuple
.get(everyHourTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<< do something every hour>>);
pipeline.run().waitUntilFinish();

Delaying actions using Decentraland's ECS

How do I make an action occur with a delay, but after a timeout?
The setTimeout() function doesn’t work in Decentraland scenes, so is there an alternative?
For example, I want an entity to wait 300 milliseconds after it’s clicked before I remove it from the engine.

To implement this you’ll have to create:
A custom component to keep track of time
A component group to keep track of all the entities with a delay in the scene
A system that updates the timers con all these
components on each frame.
It sounds rather complicated, but once you created one delay, implementing another delay only takes one line.
The component:
#Component("timerDelay")
export class Delay implements ITimerComponent{
elapsedTime: number;
targetTime: number;
onTargetTimeReached: (ownerEntity: IEntity) => void;
private onTimeReachedCallback?: ()=> void
/**
* #param millisecs amount of time in milliseconds
* #param onTimeReachedCallback callback for when time is reached
*/
constructor(millisecs: number, onTimeReachedCallback?: ()=> void){
this.elapsedTime = 0
this.targetTime = millisecs / 1000
this.onTimeReachedCallback = onTimeReachedCallback
this.onTargetTimeReached = (entity)=>{
if (this.onTimeReachedCallback) this.onTimeReachedCallback()
entity.removeComponent(this)
}
}
}
The component group:
export const delayedEntities = engine.getComponentGroup(Delay)
The system:
// define system
class TimerSystem implements ISystem {
update(dt: number){
for (let entity of delayedEntities.entities) {
let timerComponent = entity.getComponent(component)
timerComponent.elapsedTime += dt
if (timerComponent.elapsedTime >= timerComponent.targetTime){
timerComponent.onTargetTimeReached(entity)
}
})
}
}
// instance system
engine.addSystem(new TimerSystem())
Once all these parts are in place, you can simply do the following to delay an execution in your scene:
const myEntity = new Entity()
myEntity.addComponent(new Delay(1000, () => {
log("time ran out")
}))
engine.addEntity(myEntity)

A few years late, but the OP's selected answer is kind of deprecated because you can accomplish a delay doing:
import { Delay } from "node_modules/decentraland-ecs-utils/timer/component/delay"
const ent = new Entity
ent.addComponent(new Delay(3 * 1000, () => {
// this code will run when time is up
}))
Read the docs.

Use the utils.Delay() function in the utils library.
This function just takes the delay time in milliseconds, and the function you want to execute.
Here's the full documentation, explaining how to add the library + how to use this function, including example code:
https://www.npmjs.com/package/decentraland-ecs-utils

Calling a Web Service (containg multiple pages) does not load all the pages (without an added sleep delay)

My question is about a strange behavious I notice both on my iPhone device and the codenameone simulator (NetBeans).
I invoke the following code below which calls a google web service to provide a list of food places around a GPS coordinate:
The web service that is called is as follows (KEY OBSCURED):
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXXXXXXXXXXXXXXXX
Each result contains the next page token and thus, the second call (for the subsequent page) is as follows:
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXXXXXXXXXXXXXXXX&pagetoken=YYYYYYYYYYYYYYYYYY
public static byte[] getWSResponseData(String urlString, boolean usePost)
{
ConnectionRequest r = new ConnectionRequest();
r.setUrl(urlString);
r.setPost(usePost);
InfiniteProgress prog = new InfiniteProgress();
Dialog dlg = prog.showInifiniteBlocking();
r.setDisposeOnCompletion(dlg);
NetworkManager.getInstance().addToQueueAndWait(r);
try
{
Thread.sleep(2000);
}
catch (InterruptedException ex)
{
}
byte[] responseData = r.getResponseData();
return responseData;
}
public static void getLocationsList(double lat, double lng)
{
boolean done = false;
while (!done)
{
byte[] responseData = getWSResponseData(finalURL,false);
result = Result.fromContent(parser.parseJSON(new InputStreamReader(new ByteArrayInputStream(responseData))));
String venueNames[] = result.getAsStringArray("/results/name");
nextToken = result.getAsString("/next_page_token");
if ( nextToken == null || nextToken.equals(""))
done = true;
else
finalURL = completeURL + "&pagetoken=" + nextToken;
}
.....
}
This code works fine with the sleep timer, but when I remove the Thread.sleep, only the first page gets called.
Any help would be appreciated.
Using the debugger does not help as this is a timing issue and the issue does not occur when using the debugger.
Also when I put some print statements into the code
while (!done)
{
String nextToken = null;
**System.out.println(finalURL);**
...
}
System.out.println("Total Number of entries returned: " + itemCount);
I get the following output:
First Run (WITHOUT SLEEP):
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXX
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXX&pagetoken=CqQCF...
Total Number of entries returned: 20
Using the network monitor I see that the response to the second WS call returns:
{
"html_attributions" : [],
"results" : [],
"status" : "INVALID_REQUEST"
}
Which is strange as when I cut and paste the WS URL into my browser, it works fine...
Second Run (WITH SLEEP):
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXX
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXX&pagetoken=CqQCFQEAA...
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.714353,-74.00597299999998&radius=200&types=food&key=XXXXXXXXX&pagetoken=CsQDtQEAA...
Total Number of entries returned: 60

Well it seems to be a google API issue as indicated here:
Paging on Google Places API returns status INVALID_REQUEST
I still could not get it to work by changing the WS URL with a random parameter as they suggested, but I will keep trying and post something here if I get it to work. For now I will just keep a 2 second delay between the calls which seems to work.

Well gave up on using the google WS for this and switched to Yelp, works very well:
https://api.yelp.com/v3/businesses/search?.....

Koa middleware - generator concurrency testing

I've hit a bit of an interesting road block in my attempt at writing unit tests for some middleware as I can't seem to come up with a feasible means to fake two concurrent connections for a generator function which is a piece of koa middleware.
I have a constructor function that takes some setup options and returns a generator. This generator has access to some variables via closure which increment per request and decrement when the complete. Here is a subset of the code to give you an idea of what i'm trying to accomplish.
module.exports = function (options = {}) {
let connections = 0;
let {
max = 100
...
} = options;
return function *() {
connections++
...
if (connections > max) {
connections--;
// callback here
}
...
}
}
In simple terms I want to be able to keep track of multiple simultaneous "connections" in which I fire a callback when a max number of requests have been met. However, in my test i get back a single instance of this generator and can only call it once mimicking a single request, thus i can never meet the connections > max conditional
it("Should trigger callback when max connections reached", () => {
const gen = middleware({
max: 1,
onMax: function (current, max) {
this.maxReached = true;
}
}).call(context);
gen.next();
expect(context.maxReached).to.be.true;
});

Sometimes you just need a good night sleep to dream your answer. This was simply a matter of calling the same generator with two different contexts that represented two different requests and store a value to tests against on the latter. The counter would still increment because I never returned up the middleware chain (response) in order to decrement. It's more of a fake concurrency.
const middleware = limiter({
max: 1,
onMax: function (current, max) {
this.maxReached = true;
}
});
middleware.call(reqContext).next();
middleware.call(secondReqContext).next();
expect(secondReqContext.maxReached).to.be.true;

MySQL Asynchronous?

Im basically facing a blocking problem.
I have my server coded based on C++ Boost.ASIO using 8 threads since the server has 8 logical cores.
My problem is a thread may face 0.2~1.5 seconds of blocking on a MySQL query and I honestly don't know how to go around that since MySQL C++ Connector does not support asynchronous queries, and I don't know how to design the server "correctly" to use multiple threads for doing the queries.
This is where I'm asking for opinions of what to do in this case.
Create 100 threads for async' query sql?
Could I have an opinion from experts about this?

Okay, the proper solution to this would be to extend Asio and write a mysql_service implementation to integrate this. I was almost going to find out how this is done right away, but I wanted to get started using an "emulation".
The idea is to have
your business processes using an io_service (as you are already doing)
a database "facade" interface that dispatches async queries into a different queue (io_service) and posts the completion handler back onto the business_process io_service
A subtle tweak needed here you need to keep the io_service on the business process side from shutting down as soon as it's job queue is empty, since it might still be awaiting a response from the database layer.
So, modeling this into a quick demo:
namespace database
{
// data types
struct sql_statement { std::string dml; };
struct sql_response { std::string echo_dml; }; // TODO cover response codes, resultset data etc.
I hope you will forgive my gross simplifications :/
struct service
{
service(unsigned max_concurrent_requests = 10)
: work(io_service::work(service_)),
latency(mt19937(), uniform_int<int>(200, 1500)) // random 0.2 ~ 1.5s
{
for (unsigned i = 0; i < max_concurrent_requests; ++i)
svc_threads.create_thread(boost::bind(&io_service::run, &service_));
}
friend struct connection;
private:
void async_query(io_service& external, sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
service_.post(bind(&service::do_async_query, this, ref(external), std::move(query), completion_handler));
}
void do_async_query(io_service& external, sql_statement q, boost::function<void(sql_response response)> completion_handler)
{
this_thread::sleep_for(chrono::milliseconds(latency())); // simulate the latency of a db-roundtrip
external.post(bind(completion_handler, sql_response { q.dml }));
}
io_service service_;
thread_group svc_threads; // note the order of declaration
optional<io_service::work> work;
// for random delay
random::variate_generator<mt19937, uniform_int<int> > latency;
};
The service is what coordinates a maximum number of concurrent requests (on the "database io_service" side) and ping/pongs the completion back onto another io_service (the async_query/do_async_query combo). This stub implementation emulates latencies of 0.2~1.5s in the obvious way :)
Now comes the client "facade"
struct connection
{
connection(int connection_id, io_service& external, service& svc)
: connection_id(connection_id),
external_(external),
db_service_(svc)
{ }
void async_query(sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
db_service_.async_query(external_, std::move(query), completion_handler);
}
private:
int connection_id;
io_service& external_;
service& db_service_;
};
connection is really only a convenience so we don't have to explicitly deal with various queues on the calling site.
Now, let's implement a demo business process in good old Asio style:
namespace domain
{
struct business_process : id_generator
{
business_process(io_service& app_service, database::service& db_service_)
: id(generate_id()), phase(0),
in_progress(io_service::work(app_service)),
db(id, app_service, db_service_)
{
app_service.post([=] { start_select(); });
}
private:
int id, phase;
optional<io_service::work> in_progress;
database::connection db;
void start_select() {
db.async_query({ "select * from tasks where completed = false" }, [=] (database::sql_response r) { handle_db_response(r); });
}
void handle_db_response(database::sql_response r) {
if (phase++ < 4)
{
if ((id + phase) % 3 == 0) // vary the behaviour slightly
{
db.async_query({ "insert into tasks (text, completed) values ('hello', false)" }, [=] (database::sql_response r) { handle_db_response(r); });
} else
{
db.async_query({ "update * tasks set text = 'update' where id = 123" }, [=] (database::sql_response r) { handle_db_response(r); });
}
} else
{
in_progress.reset();
lock_guard<mutex> lk(console_mx);
std::cout << "business_process " << id << " has completed its work\n";
}
}
};
}
This business process starts by posting itself on the app service. It then does a number of db queries in succession, and eventually exits (by doing in_progress.reset() the app service is made aware of this).
A demonstration main, starting 10 business processes on a single thread:
int main()
{
io_service app;
database::service db;
ptr_vector<domain::business_process> bps;
for (int i = 0; i < 10; ++i)
{
bps.push_back(new domain::business_process(app, db));
}
app.run();
}
In my sample, business_processes don't do any CPU intensive work, so there's no use in scheduling them across CPU's, but if you wanted you could easily achieve this, by replacing the app.run() line with:
thread_group g;
for (unsigned i = 0; i < thread::hardware_concurrency(); ++i)
g.create_thread(boost::bind(&io_service::run, &app));
g.join_all();
See the demo running Live On Coliru

I'm not a MySQL guru, but the following is generic multithreading advice.
Having NumberOfThreads == NumberOfCores is appropriate when none of the threads ever block and you are just splitting the load over all CPUs.
A common pattern is to have multiple threads per CPU, so one is executing while another is waiting on something.
In your case, I'd be inclined to set NumberOfThreads = n * NumberOfCores where 'n' is read from a config file, a registry entry or some other user-settable value. You can test the system with different values of 'n' to fund the optimum. I'd suggest somewhere around 3 for a first guess.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Schedule/batch for large number of webservice callouts? - web-services

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

Delaying actions using Decentraland's ECS

Calling a Web Service (containg multiple pages) does not load all the pages (without an added sleep delay)

Koa middleware - generator concurrency testing

MySQL Asynchronous?

Categories

Resources