BulkProcessor .add() not finishing when number of bulks > concurrentRequests

BulkProcessor .add() not finishing when number of bulks > concurrentRequests - amazon-web-services

Here is a sample of the code flow:
Trigger the process with an API specifying bulkSize and totalRecords.
Use those parameters to acquire data from DB
Create a processor with the bulkSize.
Send both the data and processor into a method which:
-iterates over the resultset, assembles a JSON for each result, calls a method if the final JSON is not empty and adds that final JSON to the process using processor.add() method.
This is where the outcome of the code is split
After this, if the concurrentRequest parameter is 0 or 1 or any value < (totalRecords/bulkSize), the processor.add() line is where the code stalls and never continues to the next debug line.
However, when we increase the concurrentRequest parameter to a value > (totalRecords/bulkSize), the code is able to finish the .add() function and move onto the next line.
My reasoning leads me to believe we might be having issues with our BulkProcessListener which is making the .add() no close or finish like it is supposed to. I would really appreciate some more insight about this topic!
Here is the Listener we are using:
private class BulkProcessorListener implements Listener {
#Override
public void beforeBulk(long executionId, BulkRequest request) {
// Some log statements
}
#Override
public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
// More log statements
}
#Override
public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
// Log statements
}
}
Here is the createProcessor():
public synchronized BulkProcessor createProcessor(int bulkActions) {
Builder builder = BulkProcessor.builder((request, bulkListener) -> {
long timeoutMin = 60L;
try {
request.timeout(TimeValue.timeValueMinutes(timeoutMin));
// Log statements
client.bulkAsync(request, RequestOptions.DEFAULT,new ResponseActionListener<BulkResponse>());
}catch(Exception ex) {
ex.printStackTrace();
}finally {
}
}, new BulkProcessorListener());
builder.setBulkActions(bulkActions);
builder.setBulkSize(new ByteSizeValue(buldSize, ByteSizeUnit.MB));
builder.setFlushInterval(TimeValue.timeValueSeconds(5));
builder.setConcurrentRequests(0);
builder.setBackoffPolicy(BackoffPolicy.noBackoff());
return builder.build();
}
Here is the method where we call processor.add():
#SuppressWarnings("deprecation")
private void addData(BulkProcessor processor, String indexName, JSONObject finalDataJSON, Map<String, String> previousUniqueObject) {
// Debug logs
processor.add(new IndexRequest(indexName, INDEX_TYPE,
previousUniqueObject.get(COMBINED_ID)).source(finalDataJSON.toString(), XContentType.JSON));
// Debug logs
}

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

I'm looking for some examples of usage of Triggers and Timers in Apache beam, I wanted to use Processing-time timers for listening my data from pub sub in every 5 minutes and using Processing time triggers processing the above data collected in an hour altogether in python.

Please take a look at the following resources: Stateful processing with Apache Beam and Timely (and Stateful) Processing with Apache Beam
The first blog post is more general in how to handle states for context, and the second has some examples on buffering and triggering after a certain period of time, which seems similar to what you are trying to do.
A full example was requested. Here is what I was able to come up with:
PCollection<String> records =
pipeline.apply(
"ReadPubsub",
PubsubIO.readStrings()
.fromSubscription(
"projects/{project}/subscriptions/{subscription}"));
TupleTag<Iterable<String>> every5MinTag = new TupleTag<>();
TupleTag<Iterable<String>> everyHourTag = new TupleTag<>();
PCollectionTuple timersTuple =
records
.apply("WithKeys", WithKeys.of(1)) // A KV<> is required to use state. Keying by data is more appropriate than hardcode.
.apply(
"Batch",
ParDo.of(
new DoFn<KV<Integer, String>, Iterable<String>>() {
#StateId("buffer5Min")
private final StateSpec<BagState<String>> bufferedEvents5Min =
StateSpecs.bag();
#StateId("count5Min")
private final StateSpec<ValueState<Integer>> countState5Min =
StateSpecs.value();
#TimerId("every5Min")
private final TimerSpec every5MinSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#StateId("bufferHour")
private final StateSpec<BagState<String>> bufferedEventsHour =
StateSpecs.bag();
#StateId("countHour")
private final StateSpec<ValueState<Integer>> countStateHour =
StateSpecs.value();
#TimerId("everyHour")
private final TimerSpec everyHourSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#ProcessElement
public void process(
#Element KV<Integer, String> record,
#StateId("count5Min") ValueState<Integer> count5MinState,
#StateId("countHour") ValueState<Integer> countHourState,
#StateId("buffer5Min") BagState<String> buffer5Min,
#StateId("bufferHour") BagState<String> bufferHour,
#TimerId("every5Min") Timer every5MinTimer,
#TimerId("everyHour") Timer everyHourTimer) {
if (Objects.firstNonNull(count5MinState.read(), 0) == 0) {
every5MinTimer
.offset(Duration.standardMinutes(1))
.align(Duration.standardMinutes(1))
.setRelative();
}
buffer5Min.add(record.getValue());
if (Objects.firstNonNull(countHourState.read(), 0) == 0) {
everyHourTimer
.offset(Duration.standardMinutes(60))
.align(Duration.standardMinutes(60))
.setRelative();
}
bufferHour.add(record.getValue());
}
#OnTimer("every5Min")
public void onTimerEvery5Min(
OnTimerContext context,
#StateId("buffer5Min") BagState<String> bufferState,
#StateId("count5Min") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(every5MinTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
#OnTimer("everyHour")
public void onTimerEveryHour(
OnTimerContext context,
#StateId("bufferHour") BagState<String> bufferState,
#StateId("countHour") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(everyHourTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
})
.withOutputTags(every5MinTag, TupleTagList.of(everyHourTag)));
timersTuple
.get(every5MinTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<<do something every 5 min>>);
timersTuple
.get(everyHourTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<< do something every hour>>);
pipeline.run().waitUntilFinish();

Graceful termination

I am trying to implement the following use case as part of my akka learning
I would like to calculate the total streets in all cities of all states. I have a database that contain the details needed. Here is what i have so far
Configuration
akka.actor.deployment {
/CityActor{
router = random-pool
nr-of-instances = 10
}
/StateActor {
router = random-pool
nr-of-instances = 1
}}
Main
public static void main(String[] args) {
try {
Config conf = ConfigFactory
.parseReader(
new FileReader(ClassLoader.getSystemResource("config/forum.conf").getFile()))
.withFallback(ConfigFactory.load());
System.out.println(conf);
final ActorSystem system = ActorSystem.create("AkkaApp", conf);
final ActorRef masterActor = system.actorOf(Props.create(MasterActor.class), "Migrate");
masterActor.tell("", ActorRef.noSender());
} catch (Exception e) {
e.printStackTrace();
}
}
MasterActor
public class MasterActor extends UntypedActor {
private final ActorRef randomRouter = getContext().system()
.actorOf(Props.create(StateActor.class).withRouter(new akka.routing.FromConfig()), "StateActor");
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof String) {
getContext().watch(randomRouter);
for (String aState : getStates()) {
randomRouter.tell(aState, getSelf());
}
randomRouter.tell(new Broadcast(PoisonPill.getInstance()), getSelf());
} else if (message instanceof Terminated) {
Terminated ater = (Terminated) message;
if (ater.getActor().equals(randomRouter)) {
getContext().system().terminate();
}
}
}
public List<String> getStates() {
return new ArrayList<String>(Arrays.asList("CA", "MA", "TA", "NJ", "NY"));
};}
StateActor
public class StateActor extends UntypedActor {
private final ActorRef randomRouter = getContext().system()
.actorOf(Props.create(CityActor.class).withRouter(new akka.routing.FromConfig()), "CityActor");
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof String) {
System.out.println("Processing state " + message);
for (String aCity : getCitiesForState((String) message)) {
randomRouter.tell(aCity, getSelf());
}
Thread.sleep(1000);
}
}
public List<String> getCitiesForState(String stateName) {
return new ArrayList<String>(Arrays.asList("Springfield-" + stateName, "Salem-" + stateName,
"Franklin-" + stateName, "Clinton-" + stateName, "Georgetown-" + stateName));
};}
CityActor
public class CityActor extends UntypedActor {
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof String) {
System.out.println("Processing city " + message);
Thread.sleep(1000);
}
}}
Did i implement this use case properly?
I cannot get the code to terminate properly, i get dead letters messages. I know why i am getting them, but not sure how to properly implement it.
Any help is greatly appreciated.
Thanks

I tested and ran your use case with Akka 2.4.17. It works and terminate properly, without any dead letters logged.
Here are some remarks/suggestions to improve your understanding of the Akka toolkit:
Do not use Thread.sleep() inside an actor. Basically, it is never a good practice since a same thread may do many tasks for many actors (this is the default behavior with a shared thread pool). Instead, you can use an Akka scheduler or assign a single thread to a specific Actor (see this post for more details). See also the Akka documentation about that topic.
Having some dead letters is not always an issue. It generally arises when the system stops an Actor that had some messages within its mailbox. In this case, the remaining unprocessed messages are sent to deadLetters of the ActorSystem. I recommend you to check the configuration you provided for the logging of dead letters. If the file forum.conf you provided is your complete configuration file for Akka, you may want to customize some additional settings. See the page Logging of Dead Letters and Stopping actors on Akka's website. For instance, you could have a section like this:
akka {
# instead of System.out.println(conf);
log-config-on-start = on
# Max number of dead letters to log
log-dead-letters = 10
log-dead-letters-during-shutdown = on
}
Instead of using System.out.println() to log/debug, it is more convenient to set up a dedicated logger for each Actor that provides you additional information such as dispatchers, Actor name, etc. If your are interested, have a look to the Logging page.
Use some custom immutable message objects instead of systematic Strings. At first, it may seem painful to have to declare new additional classes but in the end it helps to better design complex behaviors and it's more readable. For instance, an actor A can answer to a RequestMsg coming from an actor B with an AnswerMsg or a custom ErrorMsg. Then, for your actor B, you will end up with the following onReceive() method:
#Override
public void onReceive(Object message) {
if (message instanceof AnswerMsg) {
// OK
AnswerMsg answerMsg = (AnswerMsg) message;
// ...
}
if (message instanceof ErrorMsg) {
// Not OK
ErrorMsg errorMsg = (ErrorMsg) message;
// ...
}
else {
// Unexpected behaviour, log it
log.error("Error, received " + message.toString() + " object.")
}
}
I hope that these resources will be useful for you.
Have a happy Akka programming! ;)

Issue with a WS verifier method when migrating from Play 2.4 to Play 2.5

I have a method I need to refactor, as F.Promise has been deprecated in Play 2.5. It's pretty readable actually. It sends a request and authenticates via a custom security token and returns true if the response is 200.
public boolean verify(final String xSassToken){
WSRequest request = WS.url(mdVerifyXSassTokenURL)
.setHeader("X-SASS", xSassToken)
.setMethod("GET");
final F.Promise<WSResponse> responsePromise = request.execute();
try {
final WSResponse response = responsePromise.get(10000);
int status = response.getStatus();
if(status == 200 ) { //ok
return true;
}
} catch (Exception e) {
return false;
}
return false;
}
First thing I had to do was change this line:
final F.Promise<WSResponse> responsePromise = request.execute();
To this:
final CompletionStage<WSResponse> responsePromise = request.execute();
However, CompletionStage(T) doesn't have an equivalent get() method so I'm not sure the quickest and easiest way to get a WSResponse that I can verify the status of.

Yes, it does not. At least not directly.
What you are doing is "wrong" in the context of PlayFramework. get is a blocking call and you should avoid blocking as much as possible. That is why WS offers a non blocking API and a way to handle asynchronous results. So, first, you should probably rewrite your verify code to be async:
public CompletionStage<Boolean> verify(final String xSassToken) {
return WS.url(mdVerifyXSassTokenURL)
.setHeader("X-SASS", xSassToken)
.setMethod("GET")
.execute()
.thenApply(response -> response.getStatus() == Http.Status.OK);
}
Notice how I'm using thenApply to return a new a java.util.concurrent.CompletionStage instead of a plain boolean. That means that the code calling verify can also do the same. Per instance, an action at your controller can do something like this:
public class MyController extends Controller {
public CompletionStage<Result> action() {
return verify("whatever").thenApply(success -> {
if (success) return ok("successful request");
else return badRequest("xSassToken was not valid");
});
}
public CompletionStage<Boolean> verify(final String xSassToken) { ... }
}
This way your application will be able to handle a bigger workload without hanging.
Edit:
Since you have to maintain compatibility, this is what I would do to both evolve the design and also to keep code compatible while migrating:
/**
* #param xSassToken the token to be validated
* #return if the token is valid or not
*
* #deprecated Will be removed. Use {#link #verifyToken(String)} instead since it is non blocking.
*/
#Deprecated
public boolean verify(final String xSassToken) {
try {
return verifyToken(xSassToken).toCompletableFuture().get(10, TimeUnit.SECONDS);
} catch (Exception e) {
return false;
}
}
public CompletionStage<Boolean> verifyToken(final String xSassToken) {
return WS.url(mdVerifyXSassTokenURL)
.setHeader("X-SASS", xSassToken)
.setMethod("GET")
.execute()
.thenApply(response -> response.getStatus() == Http.Status.OK);
}
Basically, deprecate the old verify method and suggest users to migrate to new one.

Project Reactor: wait while broadcaster finish

There is a Broadcaster, that accepts strings and append them to a StringBuilder.
I want to test it.
I have to use Thread#sleep to wait, while the broadcaster finish processing of strings. I want to remove sleep.
I tried to use Control#debug() unsuccessfully.
public class BroadcasterUnitTest {
#Test
public void test() {
//prepare
Environment.initialize();
Broadcaster<String> sink = Broadcaster.create(Environment.newDispatcher()); //run broadcaster in separate thread (dispatcher)
StringBuilder sb = new StringBuilder();
sink
.observe(s -> sleep(100)) //long-time operation
.consume(sb::append);
//do
sink.onNext("a");
sink.onNext("b");
//assert
sleep(500);//wait while broadcaster finished (if comment this line then the test will fail)
assertEquals("ab", sb.toString());
}
private void sleep(int millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}

I'm not familiar with Broadcaster (and it's probably deprecated since the question is old), but these 3 ways could be helpful in general:
When testing Project-Reactor's Fluxes and stuff, you're probably better of using their testing library made specially for this. Their reference and the Javadoc on that part are pretty good, and I'll just copy an example that speaks for itself here:
#Test
public void testAppendBoomError() {
Flux<String> source = Flux.just("foo", "bar");
StepVerifier.create(
appendBoomError(source))
.expectNext("foo")
.expectNext("bar")
.expectErrorMessage("boom")
.verify();
}
You could just block() by yourself on the Fluxes and Monos and then run checks. And note that if an error is emitted, this will result in an exception. But have a feeling you'll find yourself needing to write more code for some cases (e.g., checking the Flux has emitted 2 items X & Y then terminated with error) and you'd be then re-implementing StepVerifier.
#Test
public void testFluxOrMono() {
Flux<String> source = Flux.just(2, 3);
List<Integer> result = source
.flatMap(i -> multiplyBy2Async(i))
.collectList()
.block();
// run your asserts on the list. Reminder: the order may not be what you expect because of the `flatMap`
// Or with a Mono:
Integer resultOfMono = Mono.just(5)
.flatMap(i -> multiplyBy2Async(i))
.map(i -> i * 4)
.block();
// run your asserts on the integer
}
You could use the general solutions to async testing like CountDownLatch, but, again, wouldn't recommend and would give you trouble in some cases. For example, if you don't know the number of receivers in advance you'll need to use something else.

Per answer above, I found blockLast() helped.
#Test
public void MyTest()
{
Logs.Info("Start test");
/* 1 */
// Make a request
WebRequest wr1 = new WebRequest("1", "2", "3", "4");
String json1 = wr1.toJson(wr1);
Logs.Info("Flux");
Flux<String> responses = controller.getResponses(json1);
/* 2 */
Logs.Info("Responses in");
responses.subscribe(s -> mySub.myMethod(s)); // Test for strings is in myMethod
Logs.Info("Test thread sleeping");
Thread.sleep(2000);
/* 3 */
Logs.Info("Test thread blocking");
responses.blockLast();
Logs.Info("Finish test");
}

Clojure java class casting

Is it possible to cast in clojure with java style?
This is java code which I want to implement in clojure:
public class JavaSoundRecorder {
// the line from which audio data is captured
TargetDataLine line;
/**
* Captures the sound and record into a WAV file
*/
void start() {
try {
AudioFormat format = new AudioFormat(16000, 8,
2, true, true);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
System.out.println(AudioSystem.isLineSupported(info));
// checks if system supports the data line
if (!AudioSystem.isLineSupported(info)) {
System.out.println("Line not supported");
System.exit(0);
}
//line = (TargetDataLine) AudioSystem.getLine(info);
line = AudioSystem.getTargetDataLine(format);
line.open(format);
line.start(); // start capturing
System.out.println("Start capturing...");
AudioInputStream ais = new AudioInputStream(line);
System.out.println("Start recording...");
// start recording
AudioSystem.write(ais, AudioFileFormat.Type.WAVE, new File("RecordAudio.wav"));
} catch (LineUnavailableException ex) {
ex.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
/**
* Closes the target data line to finish capturing and recording
*/
void finish() {
line.stop();
line.close();
System.out.println("Finished");
}
/**
* Entry to run the program
*/
public static void main(String[] args) {
final JavaSoundRecorder recorder = new JavaSoundRecorder();
// creates a new thread that waits for a specified
// of time before stopping
Thread stopper = new Thread(new Runnable() {
public void run() {
try {
Thread.sleep(6000);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
recorder.finish();
}
});
stopper.start();
// start recording
recorder.start();
}
}
And this is what I made in clojure
(def audioformat (new javax.sound.sampled.AudioFormat 16000 8 2 true true))
(def info (new javax.sound.sampled.DataLine$Info javax.sound.sampled.TargetDataLine audioformat))
(if (not= (javax.sound.sampled.AudioSystem/isLineSupported info))(print "dataline not supported")(print "ok lets start\n"))
(def line (javax.sound.sampled.AudioSystem/getTargetDataLine audioformat))
(.open line audioformat)
are there any solutions?

this issue was explained rather well on the Clojure group here:
https://groups.google.com/forum/#!topic/clojure/SNcT6d-TTaQ
You should not need to do the cast (see the discussion in the comments about the super types of the object we have), however you will need to type hint the invocation of open:
(.open ^javax.sound.sampled.TargetDataLine line audioformat)
Remember that java casts don't really do very much (not like C++ where a cast might completely transform an underlying object).
I am not sure what this code is supposed to do, so I don't know whether it has worked or not. Certainly, I can now run your example without error.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

BulkProcessor .add() not finishing when number of bulks > concurrentRequests - amazon-web-services

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

Graceful termination

Issue with a WS verifier method when migrating from Play 2.4 to Play 2.5

Project Reactor: wait while broadcaster finish

Clojure java class casting

Categories

Resources