My Issue is the following:
Why can my program not re-attach to the shared memory map?
I do the following in my program(it might be easier to use the example from the boost page for you while this is only a small fragment from my program):
First time, initialize it:
m_sharedMemory = new managed_shared_memory(create_only, segmentName.c_str() , 1000000);
m_hashMap = m_sharedMemory->construct<MyHashMap>(segmentName.c_str())( 3, boost::hash<std::string>(), std::equal_to<std::string>() , m_sharedMemory->get_allocator<ValueType>());
Second time "Re-Attach"
m_sharedMemory = new managed_shared_memory(open_only, segmentName.c_str());
m_hashMap = m_sharedMemory->find<MyHashMap>(segmentName.c_str()).first;
My Issue here is ,if 2 items get inserted the .second call on the returned object from find will show "1" which is actually wrong, its supposed to show 2, after this if my program tries to find anything in the stored map the program crashes. Has somebody been doing this already.
If i do the same thing in the initial program run it is no problem to lookup values from the hash. This only occurs if the program was initialized and later the program is restarted and does the attach and tries to retrieve values formerly inserted.
Thanks for the help.
Boost Quick Ref Map Example
While i was talking to the "maker" of this library he told me that it will only be possible to use the map within the same process.
Related
UPDATE: I had each node write to a separate file, and when the separate files were concatenated together the result was correct. I also updated the code to attempt a channel flush and file sync after each write of a single record, but there are still issues between nodes 0 and 1, now. If I make Node 0 sleep for a few seconds before it starts its iteration of the coforall loop, the records come out correct. If not, the last few hundred bytes of Node 0's records seem to be reliably overwritten with NULL bytes, up to the start of Node 1's records. The issues between Node 1 and Node 2, and Node 2 and Node 3, seem to not show up anymore.
Additionally, if I suppress either Node 0 or Node 1 from writing, I see the fully-formed records from the un-suppressed node written correctly to the file. In the case that Node 1 is suppressed, I see 9,997 100B records (or 999,700) correct bytes followed by NULL bytes in the file where Node 1's suppressed records would go. In the case that Node 0 is suppressed, I see exactly 999,700 NULL bytes in the file, after which Node 1's records begin.
Original Post:
I'm trying to troubleshoot an issue with parallel writes from different nodes to a shared NFS-backed file on disk. At the moment, I suspect that something is wrong with the way writes to the disk happen on the NFS server.
I'm working on adapting MPI+C code that uses pwrite to write to coordinated chunks of a file. If I try to have the equivalent locales in Chapel write to the file inside of a coforall loop, I end up with the bits of the file around the node boundaries messed up - usually the final few hundred bytes of each node's data are garbled. However, if I have just one locale iterate through the data on all locales and write it, the data comes out correctly. That is, I use the same data structures to calculate the offsets, but only Locale 0 seeks to that offset and performs the writes.
I've verified that the offsets into the file that each locale runs do not overlap, and I'm using a single channel per task, defined from within the on loc do block, so that tasks don't share a single channel.
Are there known issues with writing to a file from different locales? A lot of the documentation makes it seem like this is known to be safe, but an unsubstantiated guess seems to indicate that there are issues with caching of file contents; when examining the incorrect data, the bits that are incorrect seem to be the original data from the file in that location at the beginning of the program.
I've included the relevant routine below, in case you easily spot something I missed. To make this serial, I convert the coforall loc in Locales and on loc do block into a for j in 0..numLocales-1 loop, and replace here.id with j. Please let me know what else would help get to the bottom of this. Thanks!
proc write_share_of_data(data_filename: string, ref record_blocks) throws {
coforall loc in Locales {
on loc do {
var data_file: file = open(data_filename, iomode.cwr);
var data_writer = data_file.writer(kind=ionative, locking=false);
var line: [1..100] uint(8);
const indices = record_blocks[here.id].D;
var local_record_offset = + reduce record_blocks[0..here.id-1].D.size;
writeln("Loc ", here.id, ": record offset is ", local_record_offset);
var local_bytes_offset = terarec.terarec_width_disk * local_record_offset;
data_writer.seek(start=local_bytes_offset);
for i in indices {
var write_rec: terarec_t = record_blocks[here.id].records[i];
line[1..10] = write_rec.key;
line[11..98] = write_rec.value;
line[99] = 13; // \r
line[100] = 10; // \n
data_writer.write(line);
lines_written += 1;
}
data_file.fsync();
data_writer.close();
data_file.close();
}
}
return;
}
Adding an answer here that solved my particular problem, though it doesn't explain the behavior seen. I ended up changing the outer loop from coforall loc in Locales to for loc in Locales. This isn't too big of an issue since it's all writing to one file anyway - I doubt that multiple locales can actually make much headway in all attempting to write concurrently to a single file on an NFS server. As a result, the change still allows nodes to write the data they have locally to NFS, rather than forcing Node 0 to collect and then write the data on behalf of all locales. This amounts to only adding idle time to the write operation commensurate with the time it takes Locale 0 to start the remote task on other nodes when the previous node has finished writing, which for the application at hand is not a concern.
Have you tried specifying start/end in file.writer instead of using seek? Does that change anything? What about specifying the end offset for the channel.seek call? Does it matter if the file is created and has the appropriate size before you start?
Other than that, I wonder if this issue would appear for both NFS and Lustre. If it appears for both it might well be a Chapel bug. It sounds from your description that the C program was using this pattern, which points to it being a bug. But, have you run C code doing this on your setup? If it being a Chapel bug seems most likely after further investigation, we would appreciate a bug report issue with a reproducer.
I know that NFS does not always do what one would like, in terms of data consistency. It's my understanding that it has "close to open" semantics but it's unclear to me what that means in the context of opening a file and writing to a particular region within it, in parallel from different locales.
From Why NFS Sucks by Olaf Kirch:
An NFS client is permitted to cache changes locally and send them to
the server whenever it sees fit. This sort of lazy write-back greatly
helps write performance, but the flip side is that everyone else will
be blissfully unaware of these change before they hit the server. To
make things just a little harder, there is also no requirement for a
client to transmit its cached write in any particular fashion, so
dirty pages can (and often will be) written out in random order.
I read two implications from this paragraph that are relevant to your situation here:
The writes you do on different locales can be observed by the NFS server in an arbitrary order. (However as I understand it, the data should be sent to the server by the time your fsync call returns).
These writes are done at an OS page granularity (usually 4k). (Note that this is more a hypothesis I am making than it is a fact. It should be tested or further investigated).
It would be interesting to check if 2. is a plausible explanation for the behavior you are seeing. For example, you could explore having each locale operate on a multiple of 4096 records (or potentially try writing records of 4096 bytes each) and see if that changes the behavior. If 2 is indeed the explanation, it should be possible to create a C program that demonstrates the behavior as well.
Hello I have been having trouble with my program I've created a list and I have the following program
Its goal is to Read the 5 variables inside the text file 5 variables in 5 different inputs
so I manage to correctly show the list inside the while loop since every time a line is read it's printed untill .eof(End Of File)
My goal is that I am trying to print the list OUTSIDE of the while Loop that has already read and printed those 5 variables 10x the problem is that it repeats the last entered 5 Variables in the list
I and repeats that 10x (size of the list)
I've also tried something like this inside the for loop which I found in here:
int ID=it->ID;
string NAME=it->NAME;
int SEMESTER=it->SEMESTER;
string DIRECTION=it->DIRECTION;
double GRADE=it->GRADE;
As if those are nodes but I am a starter with nodes as well as lists and it seems to have failed
When I look at the code you've provided, I see that in your for loop the a_student is used, which is not updated by/with the iterator. Basically you're not looping over the created student list.
Maybe try using std::for_each (cppreference) it will save you a lot of time:)
Godbolt example: https://godbolt.org/z/fQUhcr
The issue on the left column of code is that you are not de-referencing the iterator. You never set the a_students variable inside the loop, so why do you expect it to change on each iteration?
Write something like:
a_students = *it;
inside the loop. However there is a simpler way. Instead of manually handling the iterators, use:
for (auto a_students: s_stl_list)
{
// do something with each "a_students" which will be AUTOmatically the right type
}
With a printer that doesn't exist, I send to the spooler different files. In my software, I try to get all files existing in the queue of the spooler. For that, I tried the following instruction:
bool t = EnumJobs(hPrinter, 0,1,3, (LPBYTE) &h, sizeof(JOB_INFO_3), &pcbNeeded, &pcReturned)
I get jobId in the field 'JobId' of the structure.
In the structure type 'JOB_INFO_3', the field 'JobId' is well filled but the field 'nextJobId' is not filled. Why?
It's the same problem when I execute the following instruction:
bool t = EnumJobs(hPrinter, 0,3,3, (LPBYTE) &h, sizeof(JOB_INFO_3), &pcbNeeded, &pcReturned)
Moreover, the field 'JobId' is not filled. Why ?
Then, I don't know how to get info(filename, state, number of pages, etc) of a particular job. I tried the following instruction but it didn't work:
GetJobA(hPrinter, h.JobId, 1, (LPBYTE) &job_info_1, sizeof(JOB_INFO_1), & nbBytes)
And my last question is: Is it possible to get all the jobs from the spooler of the printer?
Do you have any solutions?
So, I'm not sure what the rest of your code looks like, but it looks possible that you're not using the API quite correctly. The MSDN documentation suggests that you should call the EnumJobs API twice.
To determine the required buffer size, call EnumJobs with cbBuf set to zero. EnumJobs fails, GetLastError returns ERROR_INSUFFICIENT_BUFFER, and the pcbNeeded parameter returns the size, in bytes, of the buffer required to hold the array of structures and their data.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd162625(v=vs.85).aspx
The flow goes like this:
Call EnumJobs for the first time to see how much memory needs to be allocated for your JOB_INFO_n array.
Allocate the memory required for your JOB_INFO_n array.
Call EnumJobs with your JOB_INFO_n array.
Looking at the call to EnumJobs where you attempt to get the first three jobs, the size of your pJob appears to be sizeof(JOB_INFO_3), where it should be three times this size in order to hold all three jobs. What is the return from EnumJobs for that call?
The reason why nextJobId is not filled in is likely a misunderstanding of the field. This field is for print jobs that have been linked together, not to find out which print job is next in the queue.
NextJobId - The print job identifier for the next print job in the linked set of print jobs.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd145021(v=vs.85).aspx
As for the information about the print job, this is going to be difficult. Unfortunately, there is no way I know of to get the name/path of the file printed. There's no concept of this in the spooler APIs. Consider a print job which isn't backed by a file for example. The best you get is the print job name, which is set by the printing application.
For pages, it looks like there is a TotalPages field in the JOB_INFO_1 structure. That may be of some use to you. It looks like you're already trying to get the JOB_INFO_1 structure but having some troubles. If the API is failing, you can use GetLastError() to identify what the issue is. Does the job ID you're passed in exist?
https://msdn.microsoft.com/en-us/library/windows/desktop/ms679360(v=vs.85).aspx
For the last question about getting all print jobs from the queue. It seems that the MSDN documentation suggests the following:
To determine the number of print jobs in the printer queue, call the GetPrinter function with the Level parameter set to 2.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd162625(v=vs.85).aspx
Hope this helps.
Sorry I'm new to Greenhill's. I'm using MULTI 6.1.6 and my language of choice is C++.
I have a problem when try to use simulator to initiate an object of a class bigger than 1M in size using new.
Class_Big* big_obj;
Class_Big = new Class_Big();
Class_Small* Small_obj;
Small_obj = new Class_Small();
if sizeOf(Class_Big) > 1MB it simply never call the class constructor, return NULL and go to the next instruction (Class_Small* Small_obj;) and creates the next object correctly. If I scope out some variables on the Class_Big to make its size < 1MB the code works fine and the object created.
I added both
MemoryPoolSize="0x200000"
HeapSize="0x200000"
to my xml file.
Another error I get in building phase If I used a lib have a big class:
intex: error: Not enough RAM for request.
intex: fatal: Integrate failed.
Error: build failed
Can you help with it?
Thanks
To specify memory sizes for the Heap and memory pool, in the MULTI GUI go to the .int file (it can be found under the .gpj dropdown when it is expanded) and double click on it to edit it. Then right-click inside the purple box and go to "Edit". Go to the "Attributes" tab and you can modify the memory pool size and heap size to be larger.
Alternatively you can just edit the .int file in a text editor, but if you want to use the gui to set these follow the above steps.
Also from their manual:
"Check the .bsp file in use. The memory declared with the
MinimumAddress/MaximumAddress keywords must match your board's memory.
If it does not, modify these keywords as needed. If the memory
declared in the .bsp file does match the board, you must modify your
application to use less memory."
Additionally, check the default.ld and you can set the values for the RAM limits there. Look at __INTEGRITY_RamLimit and the other values there. Hope this helps!
With INTEGRITY you are in total control of how much memory is used for each partition. It is a static configuration. Everything, code stack heap you name it, comes out of that. So if you have a bunch of code, automatics, etc in the partition then a memory allocation may fail if you ask for too much. Try increasing the size.
For the first part of the problem Basically I should have modified the "VirtualHeapSize" on the .ld component file.
Second part still try to figure it out.
I am working with dirEntries. Great function by the way. I wanted to test for the number of files before I used a foreach on it. I looked here on stack overflow and the suggested method was to use the walkLength function (Count files in directory with Dlang). I took that suggestion. The return from it was a non-zero value - great. However, the foreach will not iterate over the result of dirEntries. It acts like the walkLength has left the dirEntries at the end of the range. Is there a way to reset it so I can start at the beginning of the dirEntries list? Here is some example code:
auto dFiles = dirEntries("",filter, SpanMode.shallow);
if (walkLength(dFiles) > 0)
{
writeln("Passed walkLength function");
foreach (src; dFiles)
{
writeln("Inside foreach");
}
}
The output shows it gets past the walkLength function, but never gets inside the foreach iterator.
Am I using dirEntries wrong? I have looked at Ali Cehreli's great book Programming in D as someone suggest. Nothing stuck out at me. I have looked online and nothing points to my interpretation of the problem. Of course, I could be completely off base on what the problem really is.
dirEntries gives you a range. You consume that range with walkLength and reach its end. Then you attempt to read more entries from it with the foreach loop. However, you have already reached its end, so there is nothing more to read.
You can either repeat the dirEntries call, use the array function from std.array to convert the range to an array, or use the .save function to create a copy of the range that you pass to walkLength.
If you don't care about the length and only care about whether it's empty, you can use the .empty property of the range instead of walkLength.