Is QuestDB (embedded) TableWriter thread-safe?

Is QuestDB (embedded) TableWriter thread-safe? - questdb

I am using questdb (embedded) to store a bunch of time series.
I would like to run my storage method inside a parallel stream, but I don't know if TableWriter is thread-safe.
Here is the code:
SqlExecutionContextImpl ctx = new SqlExecutionContextImpl(engine, 1);
try (TableWriter writer = engine.getWriter(ctx.getCairoSecurityContext(), name, "writing")) {
tickerData.stream().parallel().forEach(
r -> {
Instant i = r.getDateTime("DateTime")
.atZone(EST)
.toInstant();
long ts = TimestampFormatUtils.parseTimestamp(i.toString());
TableWriter.Row row = writer.newRow(ts);
row.putDouble(0, r.getDouble("x1"));
row.putDouble(1, r.getDouble("x2"));
row.putDouble(2, r.getDouble("y1"));
row.putDouble(3, r.getDouble("y2"));
row.putDouble(4, r.getDouble("z"));
row.append();
writer.commit();
} catch (NumericException ex) {
log.error("Cannot parse the date {}", r.getDateTime("DateTime"));
} catch (Exception ex) {
log.error("Cannot write to table {}!", name, ex);
}
});
}
This throws all sort of errors, is there a way to make the storage process parallel?
Thanks,
Juan

The short answer is TableWriter is not thread safe. You will be responsible to not use it in parallel threads.
A bit longer answer is that even in stand alone QuestDB parallel writing is restricted. It is only possible from multiple ILP connections at the moment.

Related

Unit testing suspend coroutine

a bit new to Kotlin and testing it... I am trying to test a dao object wrapper with using a suspend method which uses an awaitFirst() for an SQL return object. However, when I wrote the unit test for it, it is just stuck in a loop. And I would think it is due to the awaitFirst() is not in the same scope of the testing
Implementation:
suspend fun queryExecution(querySpec: DatabaseClient.GenericExecuteSpec): OrderDomain {
var result: Map<String, Any>?
try {
result = querySpec.fetch().first().awaitFirst()
} catch (e: Exception) {
if (e is DataAccessResourceFailureException)
throw CommunicationException(
"Cannot connect to " + DatabaseConstants.DB_NAME +
DatabaseConstants.ORDERS_TABLE + " when executing querySelect",
"querySelect",
e
)
throw InternalException("Encountered R2dbcException when executing SQL querySelect", e)
}
if (result == null)
throw ResourceNotFoundException("Resource not found in Aurora DB")
try {
return OrderDomain(result)
} catch (e: Exception) {
throw InternalException("Exception when parsing to OrderDomain entity", e)
} finally {
logger.info("querySelect;stage=end")
}
}
Unit Test:
#Test
fun `get by orderid id, null`() = runBlocking {
// Assign
Mockito.`when`(fetchSpecMock.first()).thenReturn(monoMapMock)
Mockito.`when`(monoMapMock.awaitFirst()).thenReturn(null)
// Act & Assert
val exception = assertThrows<ResourceNotFoundException> {
auroraClientWrapper.queryExecution(
databaseClient.sql("SELECT * FROM orderTable WHERE orderId=:1").bind("1", "123") orderId
)
}
assertEquals("Resource not found in Aurora DB", exception.message)
}
I noticed this issue on https://github.com/Kotlin/kotlinx.coroutines/issues/1204 but none of the work around has worked for me...
Using runBlocking within Unit Test just causes my tests to never complete. Using runBlockingTest explicitly throws an error saying "Job never completed"... Anyone has any idea? Any hack at this point?
Also I fairly understand the point of you should not be using suspend with a block because that kinda defeats the purposes of suspend since it is releasing the thread to continue later versus blocking forces the thread to wait for a result... But then how does this work?
private suspend fun queryExecution(querySpec: DatabaseClient.GenericExecuteSpec): Map {
var result: Map<String, Any>?
try {
result = withContext(Dispatchers.Default) {
querySpec.fetch().first().block()
}
return result
}
Does this mean withContext will utilize a new thread, and re-use the old thread elsewhere? Which then doesnt really optimize anything since I will still have one thread that is being blocked regardless of spawning a new context?

Found the solution.
The monoMapMock is a mock value from Mockito. Seems like the kotlinx-test coroutines can't intercept an async to return a mono. So I forced the method that I can mock, to return a real Mono value instead of a Mocked Mono. To do so, as suggested by Louis. I stop mocking it and return a real value
#Test
fun `get by orderid id, null`() = runBlocking {
// Assign
Mockito.`when`(fetchSpecMock.first()).thenReturn(Mono.empty())
Mockito.`when`(monoMapMock.awaitFirst()).thenReturn(null)
// Act & Assert
val exception = assertThrows<ResourceNotFoundException> {
auroraClientWrapper.queryExecution(
databaseClient.sql("SELECT * FROM orderTable WHERE orderId=:1").bind("1", "123") orderId
)
}
assertEquals("Resource not found in Aurora DB", exception.message)
}

How to execute multiple sql query parallel in java

I have 3 method which is returning a List of result and my sql query execute in each method and returning a list of result .I want to execute all 3 method parallel so it will not wait to complete of one and another . I saw one stachoverflow post but its not working.
that link is [How to execute multiple queries in parallel instead of sequentially?
[Execute multiple queries in parallel via Streams
I want to solve using java 8 features.
But the above link how I can call multiple method please tell me.

Execute multiple queries in parallel via Streams works for your task. Here is a sample code which demonstrates it:
public static void main(String[] args) {
// Create Stream of tasks:
Stream<Supplier<List<String>>> tasks = Stream.of(
() -> getServerListFromDB(),
() -> getAppListFromDB(),
() -> getUserFromDB());
List<List<String>> lists = tasks
// Supply all the tasks for execution and collect CompletableFutures
.map(CompletableFuture::supplyAsync).collect(Collectors.toList())
// Join all the CompletableFutures to gather the results
.stream()
.map(CompletableFuture::join).collect(Collectors.toList());
System.out.println(lists);
}
private static List<String> getUserFromDB() {
try {
TimeUnit.SECONDS.sleep((long) (Math.random() * 3));
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " getUser");
return Arrays.asList("User1", "User2", "User3");
}
private static List<String> getAppListFromDB() {
try {
TimeUnit.SECONDS.sleep((long) (Math.random() * 3));
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " getAppList");
return Arrays.asList("App1", "App2", "App3");
}
private static List<String> getServerListFromDB() {
try {
TimeUnit.SECONDS.sleep((long) (Math.random() * 3));
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " getServer");
return Arrays.asList("Server1", "Server2", "Server3");
}
The output is:
ForkJoinPool.commonPool-worker-1 getServer
ForkJoinPool.commonPool-worker-3 getUser
ForkJoinPool.commonPool-worker-2 getAppList
[[Server1, Server2, Server3], [App1, App2, App3], [User1, User2, User3]]
You can see that default ForkJoinPool.commonPool is used and each get* method is executed from separate thread in this pool. You just need to run your SQL queries inside these get* methods

The solution from Alex won't work in an expected way. CompletableFuture executes effects one after another. You will have to run parallel streams for this or use CompletableFuture with parallelStream

Boost::Thread: Removing a thread from a dynamic group?

Consider this context:
Having a group of threads doing some work (that work is in a infinite loop, embedded project) where the number of threads (and some parameters) depends from a Database result.
What I need is to remove or create threads from that group when there´s a change in the database.
Here is the code:
for (result::const_iterator pin = pinesBBB.begin(); pin != pinesBBB.end(); ++pin)
{
string pinStr = pin["pin"].as<string>();
boost::thread hiloNuevo(bind(WorkPin, pinStr));
Worker.add_thread(&hiloNuevo);
}
Where result is pqxx::result from pqxx library.
This piece of code iterates a table from an SQL query result and creates a thread for every record found.
After that, there´s this code that checks the same table every a couple of minutes:
`
void ThreadWorker(boost::thread_group *worker, string *pinesLocales)
{
int threadsVivosInt = worker->size();
string *pinesDB;
int contador;
for (;;)
{
contador = 0;
sleep(60);
try
{
result pinesBBB = TraerPines();
for (result::const_iterator pin = pinesBBB.begin(); pin != pinesBBB.end(); ++pin)
{
pinesDB[contador] = pin["pin"].as<string>();
contador++;
}
thread hiloMuerto
}
catch (...)
{
sleep(360);
}
}
}
`
What I want to do is access this thread_group worker and remove one of those threads.
I´ve tryed using an Int index like worker[0] and with thread´s ID boost::thread::id
I can remove a thread using a native_handle and then using an plattform specific like pthread_cancel but I can´t get the thread from the thread group.
Any ideas? Thanks!

boost::thread_group::remove_thread() removes the specified thread from a given thread_group. Once you've done this, you're now responsible for managing the thread.

MySQL Asynchronous?

Im basically facing a blocking problem.
I have my server coded based on C++ Boost.ASIO using 8 threads since the server has 8 logical cores.
My problem is a thread may face 0.2~1.5 seconds of blocking on a MySQL query and I honestly don't know how to go around that since MySQL C++ Connector does not support asynchronous queries, and I don't know how to design the server "correctly" to use multiple threads for doing the queries.
This is where I'm asking for opinions of what to do in this case.
Create 100 threads for async' query sql?
Could I have an opinion from experts about this?

Okay, the proper solution to this would be to extend Asio and write a mysql_service implementation to integrate this. I was almost going to find out how this is done right away, but I wanted to get started using an "emulation".
The idea is to have
your business processes using an io_service (as you are already doing)
a database "facade" interface that dispatches async queries into a different queue (io_service) and posts the completion handler back onto the business_process io_service
A subtle tweak needed here you need to keep the io_service on the business process side from shutting down as soon as it's job queue is empty, since it might still be awaiting a response from the database layer.
So, modeling this into a quick demo:
namespace database
{
// data types
struct sql_statement { std::string dml; };
struct sql_response { std::string echo_dml; }; // TODO cover response codes, resultset data etc.
I hope you will forgive my gross simplifications :/
struct service
{
service(unsigned max_concurrent_requests = 10)
: work(io_service::work(service_)),
latency(mt19937(), uniform_int<int>(200, 1500)) // random 0.2 ~ 1.5s
{
for (unsigned i = 0; i < max_concurrent_requests; ++i)
svc_threads.create_thread(boost::bind(&io_service::run, &service_));
}
friend struct connection;
private:
void async_query(io_service& external, sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
service_.post(bind(&service::do_async_query, this, ref(external), std::move(query), completion_handler));
}
void do_async_query(io_service& external, sql_statement q, boost::function<void(sql_response response)> completion_handler)
{
this_thread::sleep_for(chrono::milliseconds(latency())); // simulate the latency of a db-roundtrip
external.post(bind(completion_handler, sql_response { q.dml }));
}
io_service service_;
thread_group svc_threads; // note the order of declaration
optional<io_service::work> work;
// for random delay
random::variate_generator<mt19937, uniform_int<int> > latency;
};
The service is what coordinates a maximum number of concurrent requests (on the "database io_service" side) and ping/pongs the completion back onto another io_service (the async_query/do_async_query combo). This stub implementation emulates latencies of 0.2~1.5s in the obvious way :)
Now comes the client "facade"
struct connection
{
connection(int connection_id, io_service& external, service& svc)
: connection_id(connection_id),
external_(external),
db_service_(svc)
{ }
void async_query(sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
db_service_.async_query(external_, std::move(query), completion_handler);
}
private:
int connection_id;
io_service& external_;
service& db_service_;
};
connection is really only a convenience so we don't have to explicitly deal with various queues on the calling site.
Now, let's implement a demo business process in good old Asio style:
namespace domain
{
struct business_process : id_generator
{
business_process(io_service& app_service, database::service& db_service_)
: id(generate_id()), phase(0),
in_progress(io_service::work(app_service)),
db(id, app_service, db_service_)
{
app_service.post([=] { start_select(); });
}
private:
int id, phase;
optional<io_service::work> in_progress;
database::connection db;
void start_select() {
db.async_query({ "select * from tasks where completed = false" }, [=] (database::sql_response r) { handle_db_response(r); });
}
void handle_db_response(database::sql_response r) {
if (phase++ < 4)
{
if ((id + phase) % 3 == 0) // vary the behaviour slightly
{
db.async_query({ "insert into tasks (text, completed) values ('hello', false)" }, [=] (database::sql_response r) { handle_db_response(r); });
} else
{
db.async_query({ "update * tasks set text = 'update' where id = 123" }, [=] (database::sql_response r) { handle_db_response(r); });
}
} else
{
in_progress.reset();
lock_guard<mutex> lk(console_mx);
std::cout << "business_process " << id << " has completed its work\n";
}
}
};
}
This business process starts by posting itself on the app service. It then does a number of db queries in succession, and eventually exits (by doing in_progress.reset() the app service is made aware of this).
A demonstration main, starting 10 business processes on a single thread:
int main()
{
io_service app;
database::service db;
ptr_vector<domain::business_process> bps;
for (int i = 0; i < 10; ++i)
{
bps.push_back(new domain::business_process(app, db));
}
app.run();
}
In my sample, business_processes don't do any CPU intensive work, so there's no use in scheduling them across CPU's, but if you wanted you could easily achieve this, by replacing the app.run() line with:
thread_group g;
for (unsigned i = 0; i < thread::hardware_concurrency(); ++i)
g.create_thread(boost::bind(&io_service::run, &app));
g.join_all();
See the demo running Live On Coliru

I'm not a MySQL guru, but the following is generic multithreading advice.
Having NumberOfThreads == NumberOfCores is appropriate when none of the threads ever block and you are just splitting the load over all CPUs.
A common pattern is to have multiple threads per CPU, so one is executing while another is waiting on something.
In your case, I'd be inclined to set NumberOfThreads = n * NumberOfCores where 'n' is read from a config file, a registry entry or some other user-settable value. You can test the system with different values of 'n' to fund the optimum. I'd suggest somewhere around 3 for a first guess.

MongoDB C++ driver handling replica set connection failures

So the mongo c++ documentation says
On a failover situation, expect at least one operation to return an
error (throw an exception) before the failover is complete. Operations
are not retried
Kind of annoying, but that leaves it up to me to handle a failed operation. Ideally I would just like the application to sleep for a few seconds (app is single threaded). And retry with the hopes that a new primary mongod is established. In the case of a second failure, well I take it the connection is truly messed up and I just want to thrown an exception.
Within my MongodbManager class this means all operations have this kind of double try/catch block set up. I was wondering if there is a more elegant solution?
Example method:
template <typename T>
std::string
MongoManager::insert(std::string ns, T object)
{
mongo::BSONObj = convertToBson(object);
std::string result;
try {
connection_->insert(ns, oo); //connection_ = shared_ptr<DBClientReplicaSet>
result = connection_->getLastError();
lastOpSucceeded_ = true;
}
catch (mongo::SocketException& ex)
{
lastOpSucceeded_ = false;
boost::this_thread::sleep( boost::posix_time::seconds(5) );
}
// try again?
if (!lastOpSucceeded_) {
try {
connection_->insert(ns, oo);
result = connection_->getLastError();
lastOpSucceeded_ = true;
}
catch (mongo::SocketException& ex)
{
//do some clean up, throw exception
}
}
return result;
}

That's indeed sort of how you need to handle it. Perhaps instead of having two try/catch blocks I would use the following strategy:
keep a count of how many times you have tried
create a while loop with as terminator (count < 5 && lastOpSucceeded)
and then sleep with pow(2,count) to sleep more in every iteration.
And then when all else fails, bail out.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is QuestDB (embedded) TableWriter thread-safe? - questdb

The short answer is TableWriter is not thread safe. You will be responsible to not use it in parallel threads. A bit longer answer is that even in stand alone QuestDB parallel writing is restricted. It is only possible from multiple ILP connections at the moment.

Related

Unit testing suspend coroutine

How to execute multiple sql query parallel in java

Boost::Thread: Removing a thread from a dynamic group?

MySQL Asynchronous?

MongoDB C++ driver handling replica set connection failures

Categories

Resources