RxJava - SwitchMap alike with multiple limited active streams - concurrency

I'm wondering how to transform an observable similarly to switchMap but instead of limiting to single active stream have multiple (limited) streams.
The purpose is to have multiple tasks working concurrently up to some tasks count limit, and allow new tasks to start with FIFO queue strategy, meaning any new task arrive will start immediately and the oldest task in queue will be canceled.
switchMap will create Observable for each emission of the source and will cancel previous running Observable stream once new one created, I want to achieve something similar but allow concurrency with some level (like flatMap), meaning allowing number of Observables to be created for each emission, and run concurrently up to some concurrency limit, when the concurrency limit is reached, the oldest observable will be cancel and the new one will started.
Actually, This is also similar to flatMap with maxConcurrent, but instead of new Observables waiting in queue when maxConcurrent is reached, cancel the older Observables and enter the new one immediately.

You could try with this transformer:
public static <T, R> Observable.Transformer<T, R> switchFlatMap(
int n, Func1<T, Observable<R>> mapper) {
return f ->
Observable.defer(() -> {
final AtomicInteger ingress = new AtomicInteger();
final Subject<Integer, Integer> cancel =
PublishSubject.<Integer>create().toSerialized();
return f.flatMap(v -> {
int id = ingress.getAndIncrement();
Observable<R> o = mapper.call(v)
.takeUntil(cancel.filter(e -> e == id + n));
cancel.onNext(id);
return o;
});
})
;
}
The demonstration:
public static void main(String[] args) {
PublishSubject<Integer> ps = PublishSubject.create();
#SuppressWarnings("unchecked")
PublishSubject<Integer>[] pss = new PublishSubject[3];
for (int i = 0; i < pss.length; i++) {
pss[i] = PublishSubject.create();
}
AssertableSubscriber<Integer> ts = ps
.compose(switchFlatMap(2, v -> pss[v]))
.test();
ps.onNext(0);
ps.onNext(1);
pss[0].onNext(1);
pss[0].onNext(2);
pss[0].onNext(3);
pss[1].onNext(10);
pss[1].onNext(11);
pss[1].onNext(12);
ps.onNext(2);
pss[0].onNext(4);
pss[2].onNext(20);
pss[2].onNext(21);
pss[2].onNext(22);
pss[1].onCompleted();
pss[2].onCompleted();
ps.onCompleted();
ts.assertResult(1, 2, 3, 10, 11, 12, 20, 21, 22);
}

Though a ready made solution is unavailable, something like below should assist.
public static void main(String[] args) {
Observable.create(subscriber -> {
for (int i = 0; i < 5; i++) {
Observable.timer(i, TimeUnit.SECONDS).toBlocking().subscribe();
subscriber.onNext(i);
}
})
.switchMap(
n -> {
System.out.println("Main task emitted event - " + n);
return Observable.interval(1, TimeUnit.SECONDS).take((int) n * 3)
.doOnUnsubscribe(() -> System.out.println("Unsubscribed for main task event - "+ n));
}).subscribe(n2 -> System.out.println("\t" + n2));
Observable.timer(20, TimeUnit.SECONDS).toBlocking().subscribe();
}
Observable.create section creates a slow producer which emits items in a fashion of emit 0, sleep for 1s and emit 1, sleep for 2s and emit 2 and so on.
switchMap creates Observable objects for each element which emits numbers every second. You also can note that it prints a line every time an element is emitted by the main Observable and also when it is unsubscribed.
Thus, probably in your case, you might be interested to close the oldest task with doOnUnsubscribe. Hope it helps.
Below pseudo code might help better in understanding.
getTaskObservable()
.switchMap(
task -> {
System.out.println("Main task emitted event - " + task);
return Observable.create(subscriber -> {
initiateTaskAndNotify(task, subscriber);
}).doOnUnsubscribe(() -> checkAndKillIfMaxConcurrentTasksReached(task));
}).subscribe(value -> System.out.println("Done with task and got output" + value));

Related

How to define testable timer loop in kotlin (android)?

I want to have a periodic timer loop (e.g. 1 second intervals). There are many ways to do that, but I haven't found a solution that would be suitable for unit testing.
Timer should be precise
Unit test should be able to skip the waiting
The closest that I came to a solution was to use coroutines: A simple loop with delay, runBlockingTest and advanceTimeBy.
coScope.launch {
while (isActive) {
// do stuff
delay(1000L)
}
}
and
#Test
fun timer_test() = coScope.runBlockingTest {
... // start job
advanceTimeBy(9_000L)
... // cancel job
}
It works to some degree, but the timer is not precise as it does not account for the execution time.
I haven't found a way to query internal timer used in a coroutine scope or a remaining timeout value inside withTimeoutOrNull:
coScope.launch {
withTimeoutOrNull(999_000_000L) { // max allowed looping time
while (isActive) {
// do stuff
val timeoutLeft // How to get that value ???
delay(timeoutLeft.mod(1000L))
}
}
}
Next idea was to use ticker:
coScope.launch {
val tickerChannel = ticker(1000L, 0L, coroutineContext)
var referenceTimer = 0L
for (event in tickerChannel) {
// do stuff
println(referenceTimer)
referenceTimer += 1000L
}
}
However, the connection between TestCoroutineDispatcher() and ticker does not produce right results:
private val coDispatcher = TestCoroutineDispatcher()
#Test timerTest() = runBlockingTest(coDispatcher) {
myTimer.lauchPeriodicJob()
advanceTimeBy(20_000L) // or delay(20_000L)
myTimer.cancelPeriodicJob()
println("end_of_test")
}
rather consistently results in:
0
1000
2000
3000
4000
5000
6000
end_of_test
I am also open for any alternative approaches that satisfy the two points above.

Flutter add item to list

I would like to add an item to a list:
void submitAll() async {
List<UserSearchItem> userSearchItems = [];
Firestore.instance
.collection('insta_users')
.snapshots()
.listen((data) =>
data.documents.forEach((doc){
print(data.documents.length);
User user = new User.fromDocument(doc);
UserSearchItem searchItem = new UserSearchItem(user);
userSearchItems.add(searchItem);
print(user.bio);
}));
print("Loaded");
print(userSearchItems.length);
}
But if I print the length of the list to the console, it always says, the list is 0 long...
print(userSearchItems.length);
Any suggegstions?
Best Regards
I will try to give an explanation of what is happing here take a look on this code:
import 'dart:async';
void main() {
List<int> userSearchItems = [];
Timer _sendTimeOutTimer;
const oneSec = Duration(seconds: 2);
_sendTimeOutTimer = Timer.periodic(oneSec, (Timer t) {
userSearchItems.add(1);
print(userSearchItems.length); // result 1 and it will be executed after 2 seconds
_sendTimeOutTimer.cancel();
});
print(userSearchItems.length); // result 0 and it will be executed first
}
The print inside asynchronous action(Timer) it will be executed after 2 seconds means after the asynchronous action ends but the one which is outside of asynchronous action(Timer) it will be executed directly without waiting 2 seconds, in your case the asynchronous action is listening to data .listen((data) =>, so if you print the length outside of your asynchronous action you will not see the deferent because the item is not added yet.
Solution: you can create function witch return Future and then wait until it's finished then print the length.
List<UserSearchItem> userSearchItems = [];
Future<String> submitAll() async {
Firestore.instance
.collection('insta_users')
.snapshots()
.listen((data) =>
data.documents.forEach((doc){
print(data.documents.length);
User user = new User.fromDocument(doc);
UserSearchItem searchItem = new UserSearchItem(user);
userSearchItems.add(searchItem);
print(user.bio);
return 'success';
}));
}
void yourFunction() async{
await submitAll();
print("Loaded");
print(userSearchItems.length);
}
Then call yourFunction().
Try to add print(userSearchItems.length); inside forEach after adding the item and you will see the real length.
There is a simple way tu add new data tu a list on Flutter.
for (var i = 0; i < list.length; i++) {
double newValue=newValue+1; //This is just an example,you should put what you'r trying to add here.
list[i]["newValueName"] = newValue; //This is how we add the new value tu the list,
}
See if it work by doing a:
print(list);//You can put a breakpoint her to see it more clearly
Hope it helps.

What does the "throughput-deadline-time" configuration option do?

I've stumbled on the throughput-deadline-time configuration property for Akka dispatchers, and it looks like an interesting option, however the only mention of it I could find in the whole documentation is the following:
# Throughput deadline for Dispatcher, set to 0 or negative for no deadline
throughput-deadline-time = 0ms
I think we can agree that this is not very helpful.
So what does throughput-deadline-time control, and what impact does it have when on my dispatcher?
So I had a look at the Akka source code, and found this method in the Mailbox that seems to implement the behavior of throughput-deadline-time:
/**
* Process the messages in the mailbox
*/
#tailrec private final def processMailbox(
left: Int = java.lang.Math.max(dispatcher.throughput, 1),
deadlineNs: Long = if (dispatcher.isThroughputDeadlineTimeDefined == true) System.nanoTime + dispatcher.throughputDeadlineTime.toNanos else 0L): Unit =
if (shouldProcessMessage) {
val next = dequeue()
if (next ne null) {
if (Mailbox.debug) println(actor.self + " processing message " + next)
actor invoke next
if (Thread.interrupted())
throw new InterruptedException("Interrupted while processing actor messages")
processAllSystemMessages()
if ((left > 1) && ((dispatcher.isThroughputDeadlineTimeDefined == false) || (System.nanoTime - deadlineNs) < 0))
processMailbox(left - 1, deadlineNs)
}
}
This piece of code makes it clear: throughput-deadline-time configures the maximum amount of time that will be spent processing the same mailbox, before switching to the mailbox of another actor.
In other words, if you configure a dispatcher with:
my-dispatcher {
throughput = 100
throughput-deadline-time = 1ms
}
Then the mailbox of the actors will process at most 100 messages at a time, during at most 1ms, whenever the first of those limits is hit, Akka switches to another actor/mailbox.

Worker pool for a potentially recursive task (i.e., each job can queue other jobs)

I'm writing an application that the user can start with a number of "jobs" (URLs actually). At the beginning (main routine), I add these URLs to a queue, then start x goroutines that work on these URLs.
In special cases, the resource a URL points to may contain even more URLs which have to be added to the queue. The 3 workers are waiting for new jobs to come in and process them. The problem is: once EVERY worker is waiting for a job (and none is producing any), the workers should stop altogether. So either all of them work or no one works.
My current implementation looks something like this and I don't think it's elegant. Unfortunately I couldn't think of a better way that wouldn't include race conditions and I'm not entirely sure if this implementation actually works as intended:
var queue // from somewhere
const WORKER_COUNT = 3
var done chan struct{}
func work(working chan int) {
absent := make(chan struct{}, 1)
// if x>1 jobs in sequence are popped, send to "absent" channel only 1 struct.
// This implementation also assumes that the select statement will be evaluated "in-order" (channel 2 only if channel 1 yields nothing) - is this actually correct? EDIT: It is, according to the specs.
one := false
for {
select {
case u, ok := <-queue.Pop():
if !ok {
close(absent)
return
}
if !one {
// I have started working (delta + 1)
working <- 1
absent <- struct{}{}
one = true
}
// do work with u (which may lead to queue.Push(urls...))
case <-absent: // no jobs at the moment. consume absent => wait
one = false
working <- -1
}
}
}
func Start() {
working := make(chan int)
for i := 0; i < WORKER_COUNT; i++ {
go work(working)
}
// the amount of actually working workers...
sum := 0
for {
delta := <-working
sum += delta
if sum == 0 {
queue.Close() // close channel -> kill workers.
done <- struct{}{}
return
}
}
}
Is there a better way to tackle this problem?
You can use a sync.WaitGroup (see docs) to control the lifetime of the workers, and use a non-blocking send so workers can't deadlock when they try to queue up more jobs:
package main
import "sync"
const workers = 4
type job struct{}
func (j *job) do(enqueue func(job)) {
// do the job, calling enqueue() for subtasks as needed
}
func main() {
jobs, wg := make(chan job), new(sync.WaitGroup)
var enqueue func(job)
// workers
for i := 0; i < workers; i++ {
go func() {
for j := range jobs {
j.do(enqueue)
wg.Done()
}
}()
}
// how to queue a job
enqueue = func(j job) {
wg.Add(1)
select {
case jobs <- j: // another worker took it
default: // no free worker; do the job now
j.do(enqueue)
wg.Done()
}
}
todo := make([]job, 1000)
for _, j := range todo {
enqueue(j)
}
wg.Wait()
close(jobs)
}
The difficulty with trying to avoid deadlocks with a buffered channel is that you have to allocate a big enough channel up front to definitely hold all pending tasks without blocking. Problematic unless, say, you have a small and known number of URLs to crawl.
When you fall back to doing ordinary recursion in the current thread, you don't have that static buffer-size limit. Of course, there are still limits: you'd probably run out of RAM if too much work were pending, and theoretically you could exhaust the stack with deep recursion (but that's hard!). So you'd need to track pending tasks some more sophisticated way if you were, say, crawling the Web at large.
Finally, as a more complete example, I'm not super proud of this code, but I happened to write a function to kick off a parallel sort that's recursive in the same way your URL fetching is.

MySQL Asynchronous?

Im basically facing a blocking problem.
I have my server coded based on C++ Boost.ASIO using 8 threads since the server has 8 logical cores.
My problem is a thread may face 0.2~1.5 seconds of blocking on a MySQL query and I honestly don't know how to go around that since MySQL C++ Connector does not support asynchronous queries, and I don't know how to design the server "correctly" to use multiple threads for doing the queries.
This is where I'm asking for opinions of what to do in this case.
Create 100 threads for async' query sql?
Could I have an opinion from experts about this?
Okay, the proper solution to this would be to extend Asio and write a mysql_service implementation to integrate this. I was almost going to find out how this is done right away, but I wanted to get started using an "emulation".
The idea is to have
your business processes using an io_service (as you are already doing)
a database "facade" interface that dispatches async queries into a different queue (io_service) and posts the completion handler back onto the business_process io_service
A subtle tweak needed here you need to keep the io_service on the business process side from shutting down as soon as it's job queue is empty, since it might still be awaiting a response from the database layer.
So, modeling this into a quick demo:
namespace database
{
// data types
struct sql_statement { std::string dml; };
struct sql_response { std::string echo_dml; }; // TODO cover response codes, resultset data etc.
I hope you will forgive my gross simplifications :/
struct service
{
service(unsigned max_concurrent_requests = 10)
: work(io_service::work(service_)),
latency(mt19937(), uniform_int<int>(200, 1500)) // random 0.2 ~ 1.5s
{
for (unsigned i = 0; i < max_concurrent_requests; ++i)
svc_threads.create_thread(boost::bind(&io_service::run, &service_));
}
friend struct connection;
private:
void async_query(io_service& external, sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
service_.post(bind(&service::do_async_query, this, ref(external), std::move(query), completion_handler));
}
void do_async_query(io_service& external, sql_statement q, boost::function<void(sql_response response)> completion_handler)
{
this_thread::sleep_for(chrono::milliseconds(latency())); // simulate the latency of a db-roundtrip
external.post(bind(completion_handler, sql_response { q.dml }));
}
io_service service_;
thread_group svc_threads; // note the order of declaration
optional<io_service::work> work;
// for random delay
random::variate_generator<mt19937, uniform_int<int> > latency;
};
The service is what coordinates a maximum number of concurrent requests (on the "database io_service" side) and ping/pongs the completion back onto another io_service (the async_query/do_async_query combo). This stub implementation emulates latencies of 0.2~1.5s in the obvious way :)
Now comes the client "facade"
struct connection
{
connection(int connection_id, io_service& external, service& svc)
: connection_id(connection_id),
external_(external),
db_service_(svc)
{ }
void async_query(sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
db_service_.async_query(external_, std::move(query), completion_handler);
}
private:
int connection_id;
io_service& external_;
service& db_service_;
};
connection is really only a convenience so we don't have to explicitly deal with various queues on the calling site.
Now, let's implement a demo business process in good old Asio style:
namespace domain
{
struct business_process : id_generator
{
business_process(io_service& app_service, database::service& db_service_)
: id(generate_id()), phase(0),
in_progress(io_service::work(app_service)),
db(id, app_service, db_service_)
{
app_service.post([=] { start_select(); });
}
private:
int id, phase;
optional<io_service::work> in_progress;
database::connection db;
void start_select() {
db.async_query({ "select * from tasks where completed = false" }, [=] (database::sql_response r) { handle_db_response(r); });
}
void handle_db_response(database::sql_response r) {
if (phase++ < 4)
{
if ((id + phase) % 3 == 0) // vary the behaviour slightly
{
db.async_query({ "insert into tasks (text, completed) values ('hello', false)" }, [=] (database::sql_response r) { handle_db_response(r); });
} else
{
db.async_query({ "update * tasks set text = 'update' where id = 123" }, [=] (database::sql_response r) { handle_db_response(r); });
}
} else
{
in_progress.reset();
lock_guard<mutex> lk(console_mx);
std::cout << "business_process " << id << " has completed its work\n";
}
}
};
}
This business process starts by posting itself on the app service. It then does a number of db queries in succession, and eventually exits (by doing in_progress.reset() the app service is made aware of this).
A demonstration main, starting 10 business processes on a single thread:
int main()
{
io_service app;
database::service db;
ptr_vector<domain::business_process> bps;
for (int i = 0; i < 10; ++i)
{
bps.push_back(new domain::business_process(app, db));
}
app.run();
}
In my sample, business_processes don't do any CPU intensive work, so there's no use in scheduling them across CPU's, but if you wanted you could easily achieve this, by replacing the app.run() line with:
thread_group g;
for (unsigned i = 0; i < thread::hardware_concurrency(); ++i)
g.create_thread(boost::bind(&io_service::run, &app));
g.join_all();
See the demo running Live On Coliru
I'm not a MySQL guru, but the following is generic multithreading advice.
Having NumberOfThreads == NumberOfCores is appropriate when none of the threads ever block and you are just splitting the load over all CPUs.
A common pattern is to have multiple threads per CPU, so one is executing while another is waiting on something.
In your case, I'd be inclined to set NumberOfThreads = n * NumberOfCores where 'n' is read from a config file, a registry entry or some other user-settable value. You can test the system with different values of 'n' to fund the optimum. I'd suggest somewhere around 3 for a first guess.