In my code I apply same filtering on multiple input files. In first version of code I created AVFilterGraph for every input, but I think these actions might be excessive.
However, when I try to reuse the same graph, I face with the error during sending frame to abuffer filter. At the previous iteration over input files, I passed EOF to it for flushing, and the av_buffersrc_add_frame function has a check for this:
BufferSourceContext *s = ctx->priv;
...
if (s->eof)
return AVERROR(EINVAL);
which crashes the execution on the second iteration.
Unfortunately, I couldn't find any functions that can restore buffer filter or something like that.
I would like to know if avfilter implies the possibility to reuse once created filter graph, or there are some fundamental misconceptions in my understanding of ffmpeg logic by passing input after EOF.
Thank you!
Related
UPDATE: I had each node write to a separate file, and when the separate files were concatenated together the result was correct. I also updated the code to attempt a channel flush and file sync after each write of a single record, but there are still issues between nodes 0 and 1, now. If I make Node 0 sleep for a few seconds before it starts its iteration of the coforall loop, the records come out correct. If not, the last few hundred bytes of Node 0's records seem to be reliably overwritten with NULL bytes, up to the start of Node 1's records. The issues between Node 1 and Node 2, and Node 2 and Node 3, seem to not show up anymore.
Additionally, if I suppress either Node 0 or Node 1 from writing, I see the fully-formed records from the un-suppressed node written correctly to the file. In the case that Node 1 is suppressed, I see 9,997 100B records (or 999,700) correct bytes followed by NULL bytes in the file where Node 1's suppressed records would go. In the case that Node 0 is suppressed, I see exactly 999,700 NULL bytes in the file, after which Node 1's records begin.
Original Post:
I'm trying to troubleshoot an issue with parallel writes from different nodes to a shared NFS-backed file on disk. At the moment, I suspect that something is wrong with the way writes to the disk happen on the NFS server.
I'm working on adapting MPI+C code that uses pwrite to write to coordinated chunks of a file. If I try to have the equivalent locales in Chapel write to the file inside of a coforall loop, I end up with the bits of the file around the node boundaries messed up - usually the final few hundred bytes of each node's data are garbled. However, if I have just one locale iterate through the data on all locales and write it, the data comes out correctly. That is, I use the same data structures to calculate the offsets, but only Locale 0 seeks to that offset and performs the writes.
I've verified that the offsets into the file that each locale runs do not overlap, and I'm using a single channel per task, defined from within the on loc do block, so that tasks don't share a single channel.
Are there known issues with writing to a file from different locales? A lot of the documentation makes it seem like this is known to be safe, but an unsubstantiated guess seems to indicate that there are issues with caching of file contents; when examining the incorrect data, the bits that are incorrect seem to be the original data from the file in that location at the beginning of the program.
I've included the relevant routine below, in case you easily spot something I missed. To make this serial, I convert the coforall loc in Locales and on loc do block into a for j in 0..numLocales-1 loop, and replace here.id with j. Please let me know what else would help get to the bottom of this. Thanks!
proc write_share_of_data(data_filename: string, ref record_blocks) throws {
coforall loc in Locales {
on loc do {
var data_file: file = open(data_filename, iomode.cwr);
var data_writer = data_file.writer(kind=ionative, locking=false);
var line: [1..100] uint(8);
const indices = record_blocks[here.id].D;
var local_record_offset = + reduce record_blocks[0..here.id-1].D.size;
writeln("Loc ", here.id, ": record offset is ", local_record_offset);
var local_bytes_offset = terarec.terarec_width_disk * local_record_offset;
data_writer.seek(start=local_bytes_offset);
for i in indices {
var write_rec: terarec_t = record_blocks[here.id].records[i];
line[1..10] = write_rec.key;
line[11..98] = write_rec.value;
line[99] = 13; // \r
line[100] = 10; // \n
data_writer.write(line);
lines_written += 1;
}
data_file.fsync();
data_writer.close();
data_file.close();
}
}
return;
}
Adding an answer here that solved my particular problem, though it doesn't explain the behavior seen. I ended up changing the outer loop from coforall loc in Locales to for loc in Locales. This isn't too big of an issue since it's all writing to one file anyway - I doubt that multiple locales can actually make much headway in all attempting to write concurrently to a single file on an NFS server. As a result, the change still allows nodes to write the data they have locally to NFS, rather than forcing Node 0 to collect and then write the data on behalf of all locales. This amounts to only adding idle time to the write operation commensurate with the time it takes Locale 0 to start the remote task on other nodes when the previous node has finished writing, which for the application at hand is not a concern.
Have you tried specifying start/end in file.writer instead of using seek? Does that change anything? What about specifying the end offset for the channel.seek call? Does it matter if the file is created and has the appropriate size before you start?
Other than that, I wonder if this issue would appear for both NFS and Lustre. If it appears for both it might well be a Chapel bug. It sounds from your description that the C program was using this pattern, which points to it being a bug. But, have you run C code doing this on your setup? If it being a Chapel bug seems most likely after further investigation, we would appreciate a bug report issue with a reproducer.
I know that NFS does not always do what one would like, in terms of data consistency. It's my understanding that it has "close to open" semantics but it's unclear to me what that means in the context of opening a file and writing to a particular region within it, in parallel from different locales.
From Why NFS Sucks by Olaf Kirch:
An NFS client is permitted to cache changes locally and send them to
the server whenever it sees fit. This sort of lazy write-back greatly
helps write performance, but the flip side is that everyone else will
be blissfully unaware of these change before they hit the server. To
make things just a little harder, there is also no requirement for a
client to transmit its cached write in any particular fashion, so
dirty pages can (and often will be) written out in random order.
I read two implications from this paragraph that are relevant to your situation here:
The writes you do on different locales can be observed by the NFS server in an arbitrary order. (However as I understand it, the data should be sent to the server by the time your fsync call returns).
These writes are done at an OS page granularity (usually 4k). (Note that this is more a hypothesis I am making than it is a fact. It should be tested or further investigated).
It would be interesting to check if 2. is a plausible explanation for the behavior you are seeing. For example, you could explore having each locale operate on a multiple of 4096 records (or potentially try writing records of 4096 bytes each) and see if that changes the behavior. If 2 is indeed the explanation, it should be possible to create a C program that demonstrates the behavior as well.
I have a Halide pipeline which takes in an image, and applies some filters to it. It works fine for a single pass. I pass in an image, and I get the processed image as the output. What I would like to do is implement multiple passes i.e., I want to keep passing the output image back into the pipeline for a number of steps until some condition on the image is met.
How do I do this Halide? The only other way I could think of doing it was having an external C++ method that can invoke the pipeline in a loop. I was hoping if this is possible natively in Halide.
The code looks something like this. I want to consume output as input in a loop until some condition is met.
int main(int argc, char **argv) {
Buffer<uint8_t> input = Tools::load_and_convert_image("img.png");
int W = input.width();
int H = input.height();
Buffer<uint8_t> output(W, H);
// after some Funcs. res is last Func of the pipeline
res.realize(output);
Tools::convert_and_save_image(output, "out.png");
return 0;
}
You can run the pipeline a static number of times, by unrolling it into a larger pipeline. If what you are computing is expressible as a single stage, you can run it a dynamic but unconditional number of times by using an RDom to iterate over successive applications. However, you can not express any loops with a dynamic termination condition (what you might think of as a while loop). That you have to do in the calling C++ code.
You've hit on quite an insightful question. This is a conscious constraint of the language: it's the difference between loop bounds being decidable, and becoming Turing complete. Future version of the language may extend this in targeted ways, but it's currently quite fundamental to how the compiler can infer all the bounds in your program.
With a printer that doesn't exist, I send to the spooler different files. In my software, I try to get all files existing in the queue of the spooler. For that, I tried the following instruction:
bool t = EnumJobs(hPrinter, 0,1,3, (LPBYTE) &h, sizeof(JOB_INFO_3), &pcbNeeded, &pcReturned)
I get jobId in the field 'JobId' of the structure.
In the structure type 'JOB_INFO_3', the field 'JobId' is well filled but the field 'nextJobId' is not filled. Why?
It's the same problem when I execute the following instruction:
bool t = EnumJobs(hPrinter, 0,3,3, (LPBYTE) &h, sizeof(JOB_INFO_3), &pcbNeeded, &pcReturned)
Moreover, the field 'JobId' is not filled. Why ?
Then, I don't know how to get info(filename, state, number of pages, etc) of a particular job. I tried the following instruction but it didn't work:
GetJobA(hPrinter, h.JobId, 1, (LPBYTE) &job_info_1, sizeof(JOB_INFO_1), & nbBytes)
And my last question is: Is it possible to get all the jobs from the spooler of the printer?
Do you have any solutions?
So, I'm not sure what the rest of your code looks like, but it looks possible that you're not using the API quite correctly. The MSDN documentation suggests that you should call the EnumJobs API twice.
To determine the required buffer size, call EnumJobs with cbBuf set to zero. EnumJobs fails, GetLastError returns ERROR_INSUFFICIENT_BUFFER, and the pcbNeeded parameter returns the size, in bytes, of the buffer required to hold the array of structures and their data.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd162625(v=vs.85).aspx
The flow goes like this:
Call EnumJobs for the first time to see how much memory needs to be allocated for your JOB_INFO_n array.
Allocate the memory required for your JOB_INFO_n array.
Call EnumJobs with your JOB_INFO_n array.
Looking at the call to EnumJobs where you attempt to get the first three jobs, the size of your pJob appears to be sizeof(JOB_INFO_3), where it should be three times this size in order to hold all three jobs. What is the return from EnumJobs for that call?
The reason why nextJobId is not filled in is likely a misunderstanding of the field. This field is for print jobs that have been linked together, not to find out which print job is next in the queue.
NextJobId - The print job identifier for the next print job in the linked set of print jobs.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd145021(v=vs.85).aspx
As for the information about the print job, this is going to be difficult. Unfortunately, there is no way I know of to get the name/path of the file printed. There's no concept of this in the spooler APIs. Consider a print job which isn't backed by a file for example. The best you get is the print job name, which is set by the printing application.
For pages, it looks like there is a TotalPages field in the JOB_INFO_1 structure. That may be of some use to you. It looks like you're already trying to get the JOB_INFO_1 structure but having some troubles. If the API is failing, you can use GetLastError() to identify what the issue is. Does the job ID you're passed in exist?
https://msdn.microsoft.com/en-us/library/windows/desktop/ms679360(v=vs.85).aspx
For the last question about getting all print jobs from the queue. It seems that the MSDN documentation suggests the following:
To determine the number of print jobs in the printer queue, call the GetPrinter function with the Level parameter set to 2.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd162625(v=vs.85).aspx
Hope this helps.
I'm using DirectSound's DirectSoundBuffer to play voice data that I am streaming over a network. The nature of voice/speech is that it doesn't necessarily constantly stream (it starts and stops) and it also can come in varying packet sizes (which makes it difficult to predict where the buffer should stop). I keep a StoppingPoint variable that keeps track of where in my buffer I have written up to (it also takes into account the circular nature of this buffer). If the StoppingPoint is reached, I would like to stop playing my buffer. Also, I StoppingPoint would also signify the point at which I'd also like to start writing from.
My buffer
|==============================|--------------------------|
^ ^ ^
| | |
Voice Data Stopping point Old/Garbage
data
Also, I can't use notifications, because in order to use a notification the buffer must be stopped. But in my case, it is very likely for more data to come in as the buffer is playing, thus pushing back my 'StoppingPoint` value.
What I currently have now is a function that is called every frame. This function, amongst other things, is checking where the Play cursor is. If the Play cursor is passed the StoppingPoint location, it stops the buffer, and moves the Play cursor back to the StoppingPoint location. This seems to work ok so far, but as you'd expect, the Play cursor quite often overshoots the StoppingPoint. This means a little bit of old/garbage data is played every time the end of the streamed data is reached.
I'm just curious as to if there is a way to stop playing a DirectSoundBuffer at a specific offset? What I would like is to write data to the buffer, then play, then have it stop precisely at the location described by my StoppingPoint variable without overshooting it.
Note: I haven't included any code because this is more of a high-level solution that I need. My implementation is incredibly straight forward, typical and for the most part it DOES work. I just need a nudge in the right direction to remove the overshooting of my StoppingPoint. Perhaps there is a function I can use? Or some other algorithm that is commonly used to achieve this?
I've come up with a solution, but I'm still interested in any feedback or alternative solutions. I'm not sure if the way I've done it is ok, but it seems to be producing the desired results. Although, my solution feels a little too convoluted...
1) Whenever I write data, I now write at either the Buffer's Write Cursor or the StoppingPoint - whichever is later. This is to avoid stopping, then later writing in that "untouchable" space between the Play Cursor and Write Cursor, whose data has already been dedicated to playback.
DWORD writeCursorOffset = 0;
buffer->GetCurrentPosition(NULL, &writeCursorOffset);
//if write cursor has passed stopping point, then we need to write from there.
//So update m_StoppingPoint to reflect the new writing position.
m_StoppingPoint = m_StoppingPoint > writeCursorOffset ? m_StoppingPoint : writeCursorOffset;
2) I added some silence after every single write, but I left StoppingPoint to point at the end of the actual voice data. Eg.
|==============================|*********|---------------------|
^ ^ ^ ^
| | | |
Voice data Stopping Silence Old/Garbage
Point Data
3) If the Buffer's Play Cursor passed the StoppingPoint, I would then stop playing the buffer. Even if the Play Cursor overshoots here, all it will play is silence.
//error checking removed for demonstration purposes
buffer->Stop();
4) Immediately after stopping I would update StoppingPoint to be equal to the end of the silence. This would ensure that when more speech data comes in, the buffer will not play any silence first.
//don't forget that modulo at the end - circular buffer!
m_StoppingPoint = (m_StoppingPoint + SILENCE_BUFFER_SIZE) % BufferSize;
|==============================|*********|-------------------|
^
|
Move Stopping
Point here
Again, if I've done anything glaringly evil here, please let me know!
I'm trying to profile (with Callgrind) a specific part of my code by removing noise and computation that I don't care about.
Here is an example of what I want to do:
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
//Method to be profiled with these data
//Post operation on the data
}
My use-case is a regression test, I want to make sure that the method in question is still fast enough (something like less than 10% extra instructions since the last implementation).
This is why I'd like to have the cleaner output form Callgrind.
(I need a for loop in order to have a significant amount of data processed in order to have a good estimation of the behavior of the method I want to profile)
My first try was to change the code to:
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_START_INSTRUMENTATION;
//Method to be profiled with these data
CALLGRIND_STOP_INSTRUMENTATION;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
Adding the Callgrind macros to control the instrumentation. I also added the --instr-atstart=no options to be sure that I profile only the part of the code I want...
Unfortunately with this configuration when I start to launch my executable with callgrind, it never ends... It is not a question of slowness, because a full instrumentation run last less than one minute.
I also tried
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_TOGGLE_COLLECT;
//Method to be profiled with these data
CALLGRIND_TOGGLE_COLLECT;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
(or the --toggle-collect="myMethod" option)
But Callgrind returned me a log without any call (KCachegrind is white as snow :( and says zero instructions...)
Did I use the macros/options correctly? Any idea of what I need to change in order to get the expected result?
I finally managed to solve this issue... This was a config issue:
I kept the code
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_TOGGLE_COLLECT;
//Method to be profiled with these data
CALLGRIND_TOGGLE_COLLECT;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
But ran the callgrind with --collect-atstart=no (and without the --instr-atstart=no!!!) and it worked perfectly, in a reasonable time (~1min).
The issue with START/STOP instrumentation was that callgrind dumps a file (callgrind.out.#number) at each iteration (each STOP) thus it was really really slow... (after 5min I had only 5000 runs for a 300 000 iterations benchmark... unsuitable for a regression test).
The toggle-collect option is very picky in how you specify the method to use as trigger. You actually need to specify its argument list as well, and even the whitespace needs to match! Use the method name exactly as it appears in the callgrind output. For instance, I am using this invokation:
$ valgrind
--tool=callgrind
--collect-atstart=no
"--toggle-collect=ctrl_simulate(float, int)"
./swaag
Please observe:
The double quotes around the option.
The argument list including parentheses.
The whitespace after the comma character.