Failed to do checkpoint - c++

int iReturn = sqlite3_wal_checkpoint_v2(m_poDB, NULL, SQLITE_CHECKPOINT_FULL, &iSizeOfWalLog, &iNumOfCheckpointedFrames);
returns with iReturn = 5 (SQLITE_BUSY). The writer wakes up now and then, adds or deletes a number of rows to the database, does a checkpoint and goes to sleep again.
Question 1: How is that possible if I use WAL mode and have 4 readers and one writer?
Question 2: In the log messages I have seen that the checkpointing often works but only sometimes reports SQLITE_BUSY. Should I be concerned if it works sometimes but not always? Can this corrupt the database?
Question 3: Should I not use sqlite3_wal_checkpoint_v2 or SQLITE_CHECKPOINT_FULL?

A full checkpoint requires that there are no concurrent readers or writers.
You could try to increase your busy timeout, but if you try to do the checkpoint regularly, you could get away with ignoring single failures.

Related

ESP32/FreeRTOS, how to stop currently running task when the new one was triggered (avoiding overlap)

I'm working on code to control 2 module relay regarding door access. I'm looking for the way to stop the currently running tasks, before running the new one (the same task). All I want is to avoid overlap.
void TaskOpenManRoom(void *parameter){
Serial.println("Opening men room");
digitalWrite(manRelay, LOW);
vTaskDelay(6000 / portTICK_PERIOD_MS);
digitalWrite(manRelay, HIGH);
Serial.println("Closing men room");
vTaskDelete(NULL);
};
xTaskCreate(
TaskOpenManRoom,
"TaskOpenManRoom",
1000,
(void *) &man,
1,
&TaskMen
);
My goal is to extend the time when the door should be opened. So basically when the first task was triggered and then after some while the second one, the door should stay opened another 6000ms.
In mu current code, when the second task is called like in the middle of the first one, the door get closed because of first task calling digitalWrite(manRelay, HIGH);
I would appreciate the hint how I can kill the first task when the second been triggered.
Tasks are meant to be long-running, because they are relatively heavyweight. Don't start and end tasks for each user activity and don't delay them for extended periods of time.
You don't need any task at all for your functionality, you just need a timer to perform the closing activity after 6000 ms. Then you can reset it whenever you need.
TimerHandle_t closeManRoomTimer;
void OpenManRoom() {
xTimerReset(closeManRoomTimer, 100); // <------ (re)arm the close timer
Serial.println("Opening men room");
digitalWrite(manRelay, LOW);
};
void CloseManRoom(TimerHandle_t) {
Serial.println("Closing men room");
digitalWrite(manRelay, HIGH);
};
// during program startup, setup a one-shot close timer
closeManRoomTimer = xTimerCreate("closeManRoomTimer", pdMS_TO_TICKS(6000), pdFALSE, 0, &CloseManRoom);
I would not kill the first task when the second starts.
If you use a task at all, I'd rewrite the task to something along this general line:
cast parameter to pointer to uint32
atomic increment open count, and if it was zero {
open the door
repeat {
sleep six seconds
} atomic decrement count, and exit loop if it was 1
close the door
}
exit the task
...and when you create the task, pass a pointer to a uint32_t for it to use to store the open count.
So the task starts by atomically incrementing the open count, which returns the value that was previously in the open count. If that was zero, it means the door is currently closed. In that case, we open it and got to sleep.
If the task runs again while it's sleeping, the open count will now be one. We immediately increment that, but when we check the previous value, it was 1, so we don't try to open the door again--we just skip all the stuff in the if statement, and exit the task.
When the first instance of the task wakes up, it decrements the count, and it it was 1, it exits the loop, closes the door, and exits the task. But if the task ran again while it was sleeping, the count will still be greater than 1, so it will stay in the loop and sleep some more.
This is open to a little bit of optimization. As it stands right now, it sleeps a fixed period of time (six seconds) even if the current open count is greater than 1. If the task as expensive enough to justify a little extra work, we could do an atomic exchange, to retrieve the current open count and set the open count to 0, multiply the retrieved value by 6000, then sleep for that long. That adds quite a bit of extra complexity though, and in this case, the benefit would be much too small to justify it.
This does depend on our not running the task more than 4 billion times while the door is open. If we did, our atomic increment would overflow, and the code would misbehave. For the case at hand (and most others) this is unlikely to be a problem. In the rare situation where it might be, the obvious fix is a 64-bit variable (and 64-bit atomic increment and decrement). Incrementing the variable until a 64-bit variable overflows is generally not a realistic possibility (e.g., if you incremented at 1 GHz, it would take centuries).
Many ways:
use vTaskDelay which puts the task in the not running state (it is not blocking
Wait for, mutex semaphore, queue or task notification from another task.
I would appreciate the hint how I can kill the first task when the
second been triggered.
It will kill current task:
vTaskDelete(NULL);

Is there a way to set an expiration time for a Django cache lock?

I have a Django 3.1.3 server that uses Redis for its cache via django-redis 4.12.1. I know that cache locks can generally be set via the following:
with cache.lock('my_cache_lock_key'):
# Execute some logic here, such as:
cache.set('some_key', 'Hello world', 3000)
Generally, the cache lock releases when the with block completes execution. However, I have some custom logic in my code that sometimes does not release the cache lock (which is fine for my own reasons).
My question: is there a way to set a timeout value for Django cache locks, much in the same way as you can set timeouts for setting cache values (cache.set('some_key', 'Hello world', 3000))?
I've answered my own question. The following arguments are available for cache.lock():
def lock(
self,
key,
version=None,
timeout=None,
sleep=0.1,
blocking_timeout=None,
client=None,
thread_local=True,
):
Cross referencing that with comments from the Python Redis source, which uses the same arguments:
``timeout`` indicates a maximum life for the lock.
By default, it will remain locked until release() is called.
``timeout`` can be specified as a float or integer, both representing
the number of seconds to wait.
``sleep`` indicates the amount of time to sleep per loop iteration
when the lock is in blocking mode and another client is currently
holding the lock.
``blocking`` indicates whether calling ``acquire`` should block until
the lock has been acquired or to fail immediately, causing ``acquire``
to return False and the lock not being acquired. Defaults to True.
Note this value can be overridden by passing a ``blocking``
argument to ``acquire``.
``blocking_timeout`` indicates the maximum amount of time in seconds to
spend trying to acquire the lock. A value of ``None`` indicates
continue trying forever. ``blocking_timeout`` can be specified as a
float or integer, both representing the number of seconds to wait.
Therefore, to set the maximum time period of 2 seconds for which a cache lock takes effect, do something like this:
with cache.lock(key='my_cache_lock_key', timeout=2):
# Execute some logic here, such as:
cache.set('some_key', 'Hello world', 3000)

Query on boost interprocess::file_lock on NFS

We have an application which user can run to generate some data at user specified path. This unique output data is generated with respect to one unique input data-set - this input data is provided by the user.
When we initially developed the application, we never anticipated that number of unique input data-set will be large (due to nature of application). Our expectation was number of unique input data-set could be of order of 10 where as one user has this as 1000. So, that particular user started 1000 jobs of our application on grid and all writing data to same path. Note - these 1000 jobs are not fired from our application and rather he spawned 1000 processes of our application on different machines.
Now this lead to some collision and data loss.
To guard against it, I am planning to synchronization using boost::interprocess. This is what I am planning:
// usual processing of input data ...
boost::filesystem::path reportLockFilePath(boost::filesystem::system_complete(userDir));
rerportLockFilePath.append("report.lock");
// if lock file does not exist, create one
if (!boost::filesystem::exists(reportLockFilePath) {
boost::interprocess::named_mutex reportLockMutex(boost::interprocess::open_or_create, "report_mutex");
boost::interprocess::scoped_lock< boost::interprocess::named_mutex > lock(reportLockMutex);
std::ofstream lockStrm(reportLockFilePath.string().c_str());
lockStrm << "## report lock file ##" << std::endl;
lockStrm.flush();
}
boost::interprocess::file_lock reportFileLock(reportLockFilePath.string().c_str());
boost::interprocess::scoped_lock< boost::interprocess::file_lock > lock(reportFileLock);
// usual reporting code that we already have ...
Now, questions are -
If this is correct synchronization for the problem at hand
If this synchronization scheme will work, when jobs are on different machines and path is on NFS
If on NFS etc., this is not going to work, what are the C++ alternatives? I prefer to avoid lower level C functions to avoid race condition due to lock being held when one instance of execution crashes etc.
I just removed the named mutex part (as that was causing problem on few machines due to permission issue - probably related to umask issue discussed in this context in some other post) and replaced with
std::ofstream lockStrm(reportLockFilePath.string().c_str(), std::ios_base::app);
And it worked at least in our internal testing.

While loop implementation in Pentaho Kettle

I need guidence on implementing WHILE loop with Kettle/PDI. The scenario is
(1) I have some (may be thousand or thousands of thousand) data in a table, to be validated with a remote server.
(2) Read them and loopup to the remote server; I use Modified Java Script for this as remote server lookup validation is defined in external Java JAR file (I can use "Change number of copies to start... option on Modified java script and set to 5 or 10)
(3) Update the result on database table. There will be 50 to 60% connection failure cases each session.
(4) Repeat Step 1 to step 3 till all gets updated to success
(5) Stop looping on Nth cycle; this is to avoid very long or infinite looping, N value may be 5 or 10.
How to design such a WhILE loop in Pentaho Kettle?
Have you seen this link? It gives a pretty well detailed explanation of how to implement a while loop.
You need a parent job with a sub-transformation for doing a check on the condition which will return a variable to the job on whether to abort or to continue.

Debugging livelock in Django/Postgresql

I run a moderately popular web app on Django with Apache2, mod_python, and PostgreSQL 8.3 with the postgresql_psycopg2 database backend. I'm experiencing occasional livelock, identifiable when an apache2 process continually consumes 99% of CPU for several minutes or more.
I did an strace -ppid on the apache2 process, and found that it was continually repeating these system calls:
sendto(25, "Q\0\0\0SSELECT (1) AS \"a\" FROM \"account_profile\" WHERE \"account_profile\".\"id\" = 66201 \0", 84, 0, NULL, 0) = 84
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
poll([{fd=25, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recvfrom(25, "E\0\0\0\210SERROR\0C25P02\0Mcurrent transaction is aborted, commands ignored until end of transaction block\0Fpostgres.c\0L906\0Rexec_simple_query\0\0Z\0\0\0\5E", 16384, 0, NULL, NULL) = 143
This exact fragment repeats continually in the trace, and was running for over 10 minutes before I finally killed the apache2 process. (Note: I edited this to replace my previous strace fragment with a new one that shows full the full string contents rather than truncated.)
My interpretation of the above is that django is attempting to do an existence check on my table account_profile, but at some earlier point (before I started the trace) something went wrong (SQL parse error? referential integrity or uniqueness constraint violation? who knows?), and now Postgresql is returning the error "current transaction is aborted". For some reason, instead of raising an Exception and giving up, it just keeps retrying.
One possibility is that this is being triggered in a call to Profile.objects.get_or_create. This is the model class that maps to the account_profile table. Perhaps there is something in get_or_create that is designed to catch too broad a set of exceptions and retry? From the web server logs, it appears that this livelock might have occurred as a result of a double-click on the POST button in my site's registration form.
This condition has occurred a couple of times over the past few days on the live site, and results in a significant slowdown until I intervene, so pretty much anything other than infinite deadlock would be an improvement! :)
This turned out to be entirely my fault. I found the spot where the select (1) as 'a' statement seemed to originate (in django/models/base.py) and hacked it to log a traceback, which pointed clearly at my code.
I had some code that makes up a unique email "key" for each Profile. These keys are randomly generated, so because there is some possibility of overlap, I run it in a try/except within a while loop. My assumption was that the database's unique constraint would cause the save to fail if the key was not unique, and I'd be able to try again.
Unfortunately, in Postgresql you cannot simply try again after an integrity error. You have to issue a COMMIT or ROLLBACK command (even if you're in autocommit mode, apparently) before you can try again. So I had an infinite loop of failing save attempts where I was ignoring the error message.
Now I look for a more specific exception (django.db.IntegrityError) and run a limited number of attempts so that the loop is not infinite.
Thanks to everyone for viewing/answering.
Your analysis sounds pretty good. Clearly it's not picking up the fact that the transaction is aborted. I suggest you report this as a bug to the django project...