Pin Allocation in Job Scheduling example in OptaPlanner - scheduling

I'm trying to pin Allocations start date in the Job Scheduling example of OptaPlanner. The issue comes from the fact that Allocation are defined by their delay from the previous Allocation. If I add the #PlanningPin annotation it will lock the delay but not the start date of the Allocation as the start date of previous ones can change. Is there an elegant way of doing this ?
For now I added a boolean pinned in the Allocation class which reduce the hardScore by as much as it is out of sync with the pinned start date (example: -5hard if the allocation start date is 5 units later than the pinned start date). It works well for simple examples but I don't know if it's an proper solution and it doesn't imply that the allocation will not move as it's only driven by score calculation.
I hope I made myself understandable. Thank you.

Related

CPLEX: freeing model (resources) takes immense time

I am solving a MIP and have built a corresponding CPLEX IloModel. My implementation follows the following pseudo-code:
model = IloModel( env );
//Build optimization model
//Configure CPLEX Solver
//Solve model
//Do some solution-statistics
model.end();
Everything works fine, I get correct solutions, et cetera. Now, I would like to automate solving a lot of different instances sequentially.
However, here I ran into a problem: the bigger my instances, the longer freeing resources using model.end() takes time. For my small instances (using up to 500mb of RAM) it already takes dozens of minutes, for medium sized instance (using up to 2 GB of RAM) it takes hours and I never measured how long it takes my large instances (using up to 32 GB of RAM), as I always manually killed the process after it did not finish over a whole nights wait. Therefore, freeing ressources takes significantly longer than building the model or solving it using my specified time-limits. While model.end() runs, the CPU usage always stays at roughly 100%.
Is this expected behaviour? Have I missed something in implementing my model or how to free resources that it takes this excessive amount of time?
I really want to avoid automating solving multiple instances in sequence through killing the CPLEX solve process after a specified time threshold.
Thank you!
EDIT:
I can circumvent the problem by running env.end() (which takes <1s even for large models) instead of model.end(). As I do not reuse the environment for now, that is ok for me. However, I wonder what is happening here, from what I gathered from the docs, freeing the resources allocated for the model is a subprocess of freeing the whole environment.
I'm guessing, but did you terminate the solver before terminating the model? The solver is using the model and so it is notified about its changes. It could be that model.end() is not optimized and as it is freeing constraints one by one, solver is notified about each particular change, it updates its own data structures etc.
In other words, I think that calling cplex.end() before model.end() may solve the issue.
If you can then it is always best to call env.end() after each solve. As you noticed, it is faster: it is easier to free all the resources at once since there's no need to check whether a particular resource is still needed (e.g. a variable could be used by multiple models). It is also safer since new model starts from scratch and the risk of a memory leak is minimized.

SQLite C++ API Transactions slow

I have a problem with SQLite. It seems that every call takes ~300ms to execute. After some testing I noticed that the delay is caused by transactions. 8 normal inserts with implicit transactions take about 2 seconds, however, if I start a transaction before the inserts and commit it after, I can do almost a million inserts in the same time. Calls affected include DROP TABLE, CREATE TABLE, INSERT and I assume others, too (probably all that implicitly begin a transaction).
Some more info:
Downloaded the source amalgamation from the SQLite website (3200100)
Compiled it using Visual Studio into a static library (Not using any compiler flags, although I have been playing around with them without luck)
I am using sqlite3_open16 followed by sqlite3_prepare16_v3 and then sqlite3_step to start execution and/or receive the first result
No multithreading, no access from multiple processes, database file is exclusively opened by this program
If I create the file on my SSD (960 EVO) instead the "transaction delay" goes from 300ms down to 10ms. Still an absurdly high value, though, but I feel like the speed of my disk shouldn't influence whatever is slowing the transactions down?
The function that is blocking is sqlite3_step (It also annoys me that I have to call a function with that name just to execute a DROP TABLE, for example, but not that it matters)
Edit: During the transaction, the CPU usage is 100%.
On a side note, is it possible to "help" SQLite with organizing data if you know that every single row of your table will be exactly, say, 64 Byte?
I hope you can help me with this or possibly recommend an alternative (relational, c++ api, file based, highly performant)
Thank you very much!
SQLite makes lots of effort to ensure it doesn't suffer data corruption, so with an implicit transaction, you are limited by your hard disk speed.
With a transaction, the data is written to other locations, and only committed to disk once, and is much faster
From sqlite speed
With synchronization turned on, SQLite executes an fsync() system call (or the equivalent) at key points to make certain that critical data has actually been written to the disk drive surface.
When creating a transaction, the data is written to other files, and only when all the data is committed, will the fsync cost be paid, and all together. That is a price for that part of the configuration. A positive from this, is I have never suffered from sqlite data loss through corruption.
I feel like the speed of my disk shouldn't influence whatever is slowing the transactions down?
This is an important trade-off. If you want improved data integrity, then the speed of your disk is relevant.
How long does committing a transaction take?
From sqlite faq :19 why are transactions slow
SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second.
You can :-
Use transactions to bind more work. The cost is per transaction, so can be bulked up.
Use temporary tables. Temporary tables do not suffer the performance, and will run at full speed.
NOT RECOMMENDED. Use PRAGMA synchronous=OFF to disable the synchronous write.

Profiling algorithm

I need to implement execution time measuring functionality. I thought about two possibilities.
First - regular time() call, just remember time each execution step starts and time when each execution step completes. Unix time shell command works this way.
Second method is sampling. Every execution step set some sort of flag before execution begins(for example - creates some object in the stack frame), and destroy it when it's completes. Timer periodically scan all flags and generate execution time profile. If some execution step takes more time then the others - it will be scanned more times. Many profilers works like this.
I need to add some profiling functionality in my server application, what method is better and why? I think that second method is less accurate and first method add dependency to profiling library code.
The second method is essentially stack sampling.
You can try to do it yourself, by means of some kind of entry-exit event capture, or it's better if there is a utility to actually read the stack.
The latter has an advantage in that you get line-of-code resolution, rather than just method-level.
There's something that a lot of people don't get about this, which is that precision of timing measurement is far less important than precision of problem identification.
It is important to take samples even during I/O or other blocking, so you are not blind to needless I/O. If you are worried that competition with other processes could inflate the time, don't be, because what really matters is not absolute time measurements, but percentages.
For example, if one line of code is on the stack 50% of the wall-clock time, and thus responsible for that much, getting rid of it would double the speed of the app, regardless of whatever else is going on.
Profiling is about more than just getting samples.
Often people are pretty casual about what they do with them, but that's where the money is.
First, inclusive time is the fraction of time a method or line of code is on the stack. Forget "self" time - it's included in inclusive time.
Forget invocation counting - its relation to inclusive percent is, at best, very indirect.
If you are summarizing, the best way to do it is to have a "butterfly view" whose focus is on a single line of code.
To its left and right are the lines of code appearing immediately above it and below it on the stack samples.
Next to each line of code is a percent - the percent of stack samples containing that line of code.
(And don't worry about recursion. It's simply not an issue.)
Even better than any kind of summary is to just let the user see a small random selection of the stack samples themselves.
That way, the user can get the whole picture of why each snapshot in time was being spent.
Any avoidable activity appearing on more than one sample is a chance for some serious speedup, guaranteed.
People often think "Well, that could just be a fluke, not a real bottleneck".
Not so. Fixing it will pay off, maybe a little, maybe a lot, but on average - significant.
People should not be ruled by risk-aversion.
More on all that.
When boost is an option, you can use the timer library.
Make sure that you really know what you're looking for in the profiler you're writing, whenever you're collecting the total execution time of a certain piece of code, it will include time spent in all its children and it may be hard to really find what is the bottleneck in your system as the most top-level function will always bubble up as the most expensive one - like for instance main().
What I would suggest is to hook into every function's prologue and epilogue (if your application is a CLR application, you can use the ICorProfilerInfo::SetEnterLeaveFunctionHooks to do that, you can also use macros at the beginning of every method, or any other mechanism that would inject your code at the beginning and and of every function) and collect your times in a form of a tree for each thread that your profiling.
The algorithm for this would look somehow similar to this:
For each thread that you're monitoring create a stack-like data structure.
Whenever you're notified about a function that began execution, push something that would identify the function into that stack.
If that function is not the only function on the stack, then you know that the previous function that did not return yet was the one that called your function.
Keep track of those callee-called relationships in your favorite data structure.
Whenever a method returns, it's identifier will always be on top of its thread stack. It's total execution time is equal to (time when the last (it's) identifier was pushed on the stack - current time). Pop that identifier of the stack.
This way you'll have a tree-like breakdown of what eats up your execution time where you can see what child calls account for the total execution time of a function.
Have fun!
In my profiler I used an extended version of the 1-st approach mentioned by you.
I have a class which provides context objects. You can define them in your work code as automatic objects to be freed up as soon as execution flow leaves the context where they have been defined (for example, a function or a loop). The constructor calls GetTickCount (it was a Windows project, you can choose analogous function as appropriate to your target platform) and stores this value, while destructor calls GetTickCount again and calculates difference between this moment and start. Each object has unique context ID (can be autogenerated as a static object inside the same context), so profiler can sum up all timings with the same IDs, which means that the same context has been passed several times. Also number of executions is counted.
Here is a macro for preprocessor, which helps to profile a function:
#define _PROFILEFUNC_ static ProfilerLocator locator(__FUNC__); ProfilerObject obj(locator);
When I want to profile a function I just insert PROFILEFUNC at the beginning of the function. This generates a static object locator which identifies the context and stores a name of it as the function name (you may decide to choose another naming). Then automatic ProfilerObject is created on stack and "traces" own creation and deletion, reporting this to the profiler.

Processing instrument capture data

I have an instrument that produces a stream of data; my code accesses this data though a callback onDataAcquisitionEvent(const InstrumentOutput &data). The data processing algorithm is potentially much slower than the rate of data arrival, so I cannot hope to process every single piece of data (and I don't have to), but would like to process as many as possible. Thank of the instrument as an environmental sensor with the rate of data acquisition that I don't control. InstrumentOutput could for example be a class that contains three simultaneous pressure measurements in different locations.
I also need to keep some short history of data. Assume for example that I can reasonably hope to process a sample of data every 200ms or so. Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data that arrived prior to that latest sample, depending on whether abnormal readings are present in the last sample.
The other requirement is to get out of the onDataAcquisitionEvent() callback as soon as possible, to avoid data loss in the sensor.
Data acquisition library (third party) collects the instrument data on a separate thread.
I thought of the following design; have single producer/single consumer queue and push the data tokens into the synchronized queue in the onDataAcquisitionEvent() callback.
On the receiving end, there is a loop that pops the data from the queue. The loop will almost never sleep because of the high rate of data arrival. On each iteration, the following happens:
Pop all the available data from the queue,
The popped data is copied into a circular buffer (I used boost circular buffer), this way some history is always available,
Process the last element in the buffer (and potentially look at the prior ones),
Repeat the loop.
Questions:
Is this design sound, and what are the pitfalls? and
What could be a better design?
Edit: One problem I thought of is when the size of the circular buffer is not large enough to hold the needed history; currently I simply reallocate the circular buffer, doubling its size. I hope I would only need to do that once or twice.
I have a bit of experience with data acquisition, and I can tell you a lot of developers have problems with premature feature creep. Because it sounds easy to simply capture data from the instrument into a log, folks tend to add unessential components to the system before verifying that logging is actually robust. This is a big mistake.
The other requirement is to get out of the onDataAcquisitionEvent() callback as soon as possible, to avoid data loss in the sensor.
That's the only requirement until that part of the product is working 110% under all field conditions.
Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data that arrived prior to that latest sample, depending on whether abnormal readings are present in the last sample.
"Most of the time" doesn't matter. Code for the worst case, because onDataAcquisitionEvent() can't be spending its time thinking about contingencies.
It sounds like you're falling into the pitfall of designing it to work with the best data that might be available, and leaving open what might happen if it's not available or if providing the best data to the monitor is ultimately too expensive.
Decimate the data at the source. Specify how many samples will be needed for the abnormal case processing, and attempt to provide that many, at a constant sample rate, plus a margin of maybe 20%.
There should certainly be no loops that never sleep. A circular buffer is fine, but just populate it with whatever minimum you need, and analyze it only as frequently as necessary.
The quality of the system is determined by its stability and determinism, not trying to go an extra mile and provide as much as possible.
Your producer/consumer design is exactly the right design. In real-time systems we often also give different run-time priorities to the consuming threads, not sure this applies in your case.
Use a data structure that's basically a doubly-linked-list, so that if it grows you don't need to re-allocate everything, and you also have O(1) access to the samples you need.
If your memory isn't large enough to hold your several seconds worth of data (which it should -- one sample every 200ms? 5 samples per second.) then you need to see whether you can stand reading from auxiliary memory, but that's throughput and in your case has nothing to do with your design and requirement for "Getting out of the callback as soon as possible".
Consider an implementation of the queue that does not need locking (remember: single reader and single writer only!), so that your callback doesn't stall.
If your callback is really quick, consider disabling interrupts/giving it a high priority. May not be necessary if it can never block and has the right priority set.
Questions, (1) is this design sound, and what are the pitfalls, and (2) what could be a better design. Thanks.
Yes, it is sound. But for performance reasons, you should design the code so that it processes an array of input samples at each processing stage, instead of just a single sample each. This results in much more optimal code for current state of the art CPUs.
The length of such a an array (=a chunk of data) is either fixed (simpler code) or variable (flexible, but some processing may become more complicated).
As a second design choice, you probably should ignore the history at this architectural level, and relegate that feature...
Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data [...]
Maybe, tracking a history should be implemented in just that special part of the code, that occasionally requires access to it. Maybe, that should not be part of the "overall architecture". If so, it simplifies processing at all.

Generate log after every 2 secs

HI,
I have developed a library in c++ which used to keep track of new and delete operator and generate logs for the same. Now i have to add one more functionality that will generate logs for new and delete after 2 secs and everey 2 secs it will refresh the log file. so that if the main program gets core dump then also we can have some logs to track memomy allocation. Any help woild be appreciated.
thanx in advance.
Just write to some buffer and store timestamp of last dump to disk, if more then 2 seconds ago dump the buffer again and reset timestamp. But if you want this log for debug even when a crash occurs I assume you can lose vital information in those 2 seconds, maybe you could write every new/delete without the 2 second delay when running in debug mode.
For a computer program, 2 seconds is an awfully long time and a lot of allocations/deallocations can happen in that time that don't get logged if the main program crashes.
A better alternative would be to log information about every allocation and deallocation to some persistent storage (a file for example). This might result in a huge amount of data being logged, so you should only enable/activate the feature when debugging a potentially memory-related problem, but it has the advantage that a core dump does not cause you to lose that much information (at most one buffer worth, if you are using buffered IO) and you can let some off-line analysis tools loose on the logs to locate potential problems for you (or just to filter out the majority of obviously correct allocation/deallocation pairs).
There are tools for verifying memory access correctness, such as valgrind. Have you looked at them yet? If not, you should--if you plan to do this sort of logging for every single allocation, it's going to slow your program down a lot, just like those already-written tools will do. If you need something a lot more precision-targeted, then maybe writing your own is a good idea.