What does "max txn-txn-inflight limit reached" in QuestDb, and how to I avoid it? - questdb

I occasionally get "txn-txn-inflight limit reached [txn=251584, min=240384]" on servers when attempting to read data from (embedded) QuestDb.
It self corrects after some time (minutes). What does it mean and what can I do to avoid it?

Try removing _txn_scoreboard file in table's directory. This file has no meaning unless process is running.
The contents of this file is used to indicate if there is active TableReader holding a view on particular data transaction. When Java process exists, TableReader instances returned to pool will clock down their transaction number to prevent false-positive "reader holding transaction X'.
If Java process is crashed or did not return TableReader to pool, the transaction numbers can appear to be in-use next time application starts. The only work around so far is to remove _txn_scoreboard file.

Related

How to prevent procmail from crashing the platform and make it run one process at a time?

I have the problem that I capture emails and they arrive in masses, the issue is that every time they arrive in masses the platform crashes, the question is how to make it go running the process 1 at a time, is it possible? because currently I filled the entire procmail server where there were multiple processes at once, plus we add the executives who were working and the server died and we had to reboot and delete data from the procmail to get it working again.
Because once we capture the data it is working and making subprocesses.
This is the code:
SHELL = /bin/sh
LOGFILE = /var/log/procmail.log
LOGABSTRACT = "all"
VERBOSE = "on"
:0c
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
:0:
/var/log/plaform_catchemail
If by "platform" you mean the PHP script, you can serialize access to it by using a lock file.
:0c:.catchemail.lock
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
This means, if the file .catchemail.lock does not exist in your $MAILDIR, go ahead and create it, and hold it for the duration of this recipe.
If it does exist, sleep and try again.
There is a failure scenario if the lock is held for too long; Procmail's default behavior in this case is to bounce the message (i.e. cause the delivering MTA to regard it as undeliverable, and return an error message to the sender). You probably want to avoid that, ideally by telling the MTA to attempt delivery again at a later time. (The precise mechanism will depend on your MTA; but basically, by setting a suitable exit code.) But what's feasible and scalable ultimately depends on how many messages you receive vs how many you can process under this constraint.

log4cpp stops working properly after sometime

I have a log4cpp implementation in a multiple process environment . Logger is configured once during initialization and then is shared among forked processes which server http requests.
During first minute or so , I see the logs rolls perfectly fine at the query per second load( say it runs at 100qps).
After that, the log slows down dramatically. So, I logged pid as well and notice that only one process gets to write to the log for a time duration ( around 10-15 seconds) and then another process starts writing and so on so forth . Processes don't die. They just don't get a chance to write.
This is different from what happens when the server starts . At that time, every other log line is written by a different process. ( Also, I write one-log-line per process at the end of serving the request. )
At this point, I can't think of what could be going wrong.
This is how my log4cpp conf file looks
log4cpp.rootCategory=DEBUG,rootAppender
log4cpp.appender.rootAppender=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.rootAppender.fileName=/tmp/mylogfile.log
log4cpp.appender.rootAppender.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.rootAppender.layout.ConversionPattern=%d|%p|%m%n
log4cpp.category.http.server.main=INFO,MAIN
log4cpp.additivity.http.server.main=false
log4cpp.appender.MAIN=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.MAIN.maxBackupIndex=10
log4cpp.appender.MAIN.maxFileAge=1
log4cpp.appender.MAIN.append=true
log4cpp.appender.MAIN.fileName=/tmp/mylogfile.log
log4cpp.appender.MAIN.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.MAIN.layout.ConversionPattern=%d|%p|%m%n
Edit: more updates : Thanks #Botje for your time.
I see that whenever a new child process is created , it is only that process that gets to write to the log. That tells me that all the reference other processes were holding become invalid.
I also tried setting additive property to true. With that , server starts properly writing into the /tmp/myfile.log and then switches to writing into /tmp/myfile.log.1 withing a minute . And then stops writing after a minute.
At that point logs gets directed to stderr which is directed to another log file.
Also,
I did notice that the log4cpp FileAppender uses seek to determine the file size before writing log entries. If the file handle is shared between processes that will cause writes to end up at the start of the file instead of the end. Even if you fix that, you still have multiple processes that think they are in charge of log file rotation.
I suggest you have all processes write to a common udp/tcp/Unix socket and designate one process that collects all log entries and actually writes it to a file. You don't have to reinvent the wheel, you can use the syslog protocol and either the system syslog or a copy running in userspace.

Oracle 12. Maximum duration for "select for update" for occi c++

We are using occi in order to access Oracle 12 via a C++ process. One of the operations has to ensure that the client has to pick the latest data in the database and operate according to the latest value. The statement is
std::string sqlStmt = "SELECT REF(a) FROM O_RECORD a WHERE G_ID= :1 AND P_STATUS IN (:2, :3) FOR UPDATE OF PL_STATUS"
(we are using TYPES). For some reason this command did not go though and the database table is LOCKED. All other operations are waiting the first thread to finish, however the thread is killed and we have reached a deadend.
What is the optimal solution to avoid this catastrophic senario? Can I set a timeout in the statement in order to by 100% that a thread can operate on the "select for update", let's say for a maximum of 10 seconds? In other words the thread of execution can lock the database table/row but no more than a predifined time.
Is this possible?
There is a session parameter ddl_lock_timeout but no dml_lock_timeout. So you can not go this way. So Either you have to use
SELECT REF(a)
FROM O_RECORD a
WHERE G_ID= :1 AND P_STATUS IN (:2, :3)
FOR UPDATE OF PL_STATUS SKIP LOCKED
And modify the application logic. Or you can implement your own interruption mechanism. Simply fire a parallel thread and after some time execute OCIBreak. It is documented and supported solution. Calling OCIBreak is thread safe. The blocked SELECT .. FOR UPDATE statement will be released and you will get an error ORA-01013: user requested cancel of current operation
So on OCCI level you will have to handle this error.
Edit: added the Resource Manager, which can impose an even more precise limitation, just focused on those sessions that are blocking others...
by means of the Resource Manager:
The Resource Manager allows the definition of more complex policies than those available to the profiles and in your case is more suitable than the latter.
You have to define a plan and the groups of users associated to the plan, have to specify the policies associated to plan/groups and finally have to attach the users to the groups. To have an idea of how to do this, you can reuse this example #support.oracle.com (it appears a bit too long to be posted here) but replacing the MAX_IDLE_TIME with MAX_IDLE_BLOCKER_TIME.
The core line would be
dbms_resource_manager.create_plan_directive(
plan => 'TEST_PLAN',
group_or_subplan => 'my_limited_throttled_group',
comment => 'Limit blocking idle time to 300 seconds',
MAX_IDLE_BLOCKER_TIME => 300)
;
by means of profiles:
You can limit the inactivity period of those session specifying an IDLE_TIME.
CREATE PROFILE:
If a user exceeds the CONNECT_TIME or IDLE_TIME session resource limit, then the database rolls back the current transaction and ends the session. When the user process next issues a call, the database returns an error
To do so, specify a profile with a maximux idle time, and apply it to just the relevant users (so you wont affect all users or applications)
CREATE PROFILE o_record_consumer
LIMIT IDLE_TIME 2; --2 minutes timeout
alter user the_record_consumer profile o_record_consumer;
The drawback is that this setting is session-wide, so if the same session should be able to stay idle in the course of other operations, this policy will be enforced anyway.
of interest...
Maybe you already know that the other sessions may cohordinate their access to the same record in several ways:
FOR UPDATE WAIT x; If you append the WAIT x clause to your select for update statement, the waiting session will give up the wait after "x" seconds have elapsed. (the integer "x" must be hardcoded there, for instance the value "3"; a variable won't do, at least in Oracle 11gR2).
SKIP LOCKED; If you append the SKIP LOCKED clause to your select for update statement, the select won't return the records that are locked (as ibre5041 already pointed up).
You may signal an additional session (a sort of watchdog) that your session is up to start the query and, upon successful execution, alert it about the completion. The watchdog session may implement its "kill-the-session-after-timeout" logic. You have to pay the added complexity but get the benefit of having the timeout applied to that specific statement, not to the session. To do so see ORACLE-BASE - DBMS_PIPE or 3.2 DBMS_ALERT: Broadcasting Alerts to Users, By Steven Feuerstein, 1998.
Finally, it may be that you are attempting to implement a homemade queue infrastructure. In this case, bear in mind that Oracle already has its own queue mechanics called Advanced Queue and you may get a lot with very little by simply using them; see ORACLE-BASE - Oracle Advanced Queuing.

Safe way to cache PID to Port Mapping Windows

I'm using WinDivert to pipe connections (TCP and UDP) through a transparent proxy on Windows. How this works is by doing a port-to-pid lookup using functions like GETTcpTable2, then checking to see if the PID matches or does not match the PID of the proxy or any of it's child processes. If they don't match, they get forwarded through the proxy, if they do, the packets are untouched.
My question is, is there a safe way, or a safe duration, that I can "cache" the results of that port-to-pid lookup? Whenever I get a lot of packets flowing through, say watching a video on youtube, the code using WinDivert suddenly chomps all of my CPU up, and I'm assuming this is from making a TcpTable2 lookup on every packet received. I can see with UDP there not really being a safe duration that I can assume it's the same process bound to a port, but is this possible with TCP?
As a complement to Luis comment, I think that the application that caches the port to pid lookup could also keep a handle to the processes (just get it through OpenProcess). The problem, if that resources associated to a process are not freed until all handles to it are closed. That is normal, because until you have a valid handle to a process, you can query the system for various informations such as used memory or times. So you should periodically look whether the cached processes are terminated to purge the entry from cache and close the handle.
As an alternative, you could just keep another information such as the starting time of a process, that is accessible through GetProcessTimes. When looking in the cache to find a process id, you open the process and controls its start time. If ok, it is the right process, if not, the process id has been reused and you should purge the entry from cache.
The first way should be more efficient because you do not have to re-open the process for each packet, but you have to be more strict for identifying terminated processes to release resources, maybe with a thread that would use WaitForMultipleObjectsEx on all process handles to be alerted as soon as one is terminated.
The second way should be simpler to implement.
So, all I ended up doing here was using two std::unordered_maps. One map was to store the port number (as a key) and the last system time in milliseconds that the TCPTable was queried to find the process ID that was bound to the port (the key). If the key didn't exist or the last time was greater than the current system time plus 2 seconds, then a fresh query the to TCPTable is needed to re-check the PID bound to the port. After we've done that check, we update the second map which uses the port # as the key and returns an int that represents the PID found using the port in question on the last query. Gives us a 2 second cache on lookups which dropped peak CPU usage from well over 50% down to a max of 3%.

sqlite in c++ - parallel inserts from different applications, what will happen?

I'm opening the sqlite database file with sqlite3_open and inserting data with a sqlite3_exec.
The file is a global log file and many users are writting to it.
Now I wonder, what happens if two different users with two different program instances try to insert data at the same time... Is the opening failing for the second user? Or the inserting?
What will happen in this case?
Is there a way to handle the problem, if this scenario is not working? Without a server side database?
In most cases yes. It uses file locking, but it is broken on some systems, see http://www.sqlite.org/faq.html#q5
In short, the lock is created when you start a transaction, and released immediately after. While locked, other instances can neither read nor write to the db (in "big" db, they can still read). However, you can connect sqlite in exclusive mode.
When you want to write to db, which is locked by another process, the execution halts for a specified timeout, by default 5 seconds. If lock is released, it proceeds with writing, if not it raises error.