in the man pages of GNU/Linux the read function is described with following synopsis:
ssize_t read(int fd, void *buf, size_t count);
I would like to use this function to read data from a socket or a serial port. If the count is greater than one, the pointer supplied in the function argument will point to the last byte that was read from the port in the memory so pointer decrement is necessary for bringing the pointer to the first byte of data. This is dangerous because using it in a language like C++ with it's dynamic memory allocation of containers based on their size and space needs could corrupt data at the point of return from read() function. I thought of using a C-style array instead of a pointer. Is this the correct approach? If not, what is the correct way to do this? The programming language I'm using is C++.
EDIT:
The code that caused the described situation is as follows:
QSerialPort class was used to configure and open the port with following parameters:
Baudrate of 115200
8 data bits
No parity
One stop bit
No flow control
and for the reading part as long as the stackoverflow is concerned the read is performed exactly like this:
A std::vector containing a number of structs defined this way:
struct DataMember
{
QString name;
size_t count;
char *buff;
}
then within a while loop until the end of the mentioned std::vector is reached, a read() is performed based on count member variable of the said struct and the data is stored in the same struct's buff:
ssize_t nbytes = read(port->handle(), v.at(i).buff, v.at(i).count);
and then the data is printed on the console. In my test case as long as the data is one byte the value printed is correct but for more than one byte the value displayed is the last value that was read from the port plus some garbage values. I don't know why is this happening. Note that the correct result is obtained when the char *buff is changed to char buff[count].
If the count is greater than one, the pointer supplied in the function argument will point to the last byte that was read from the port in the memory
No. The pointer is passed to the read() method by value, so it is therefore completely and utterly impossible for the value to be any different after the call than it was before, regardless of the count.
so pointer decrement is necessary for bringing the pointer to the first byte of data.
The pointer already points to the first byte of data. No decrement is necessary.
This is dangerous because using it in a language like C++ with it's dynamic memory allocation of containers based on their size and space needs could corrupt data at the point of return from read() function.
This is all nonsense based on an impossibility.
You are mistaken about all this.
In my test case as long as the data is one byte the value printed is correct but for more than one byte the value displayed is the last value that was read from the port plus some garbage values.
From the read(2) manpage:
On success, the number of bytes read is returned (zero indicates end of file),
and the file position is advanced by this number. It is not an error if this number is
smaller than the number of bytes requested; this may happen for example because fewer
bytes are actually available right now (maybe because we were close to end-of-file, or
because we are reading from a pipe, or from a terminal), or because read() was interrupted
by a signal. On error, -1 is returned, and errno is set appropriately. In this case it
is left unspecified whether the file position (if any) changes.
In the case of pipes, sockets and character devices (that includes serial ports) and a blocking file descriptor (default) read will, in practice, not wait for the full count. In your case read() blocks until a byte comes in on the serial port and returns. That is why in the output the first byte is correct and the rest is garbage (uninitialized memory). You have to add a loop around the read() that repeats until count bytes have been read if you need the full count.
I don't know why is this happening.
But I know. char * is just a pointer, but that pointer needs to be initialized to something before you can use it. Without doing so you're invoking undefined behavior and everything might happen.
Instead of the size_t count; and char *buff elements you should just use a std::vector<char>, before making the read call, resize it to the number of bytes you want to read, then take the address of the first element of that vector and pass that to read:
struct fnord {
std::string name;
std::vector data;
};
and use it like this; note that using read requires some additional work to properly deal with signal and error conditions.
size_t readsomething(int fd, size_t count, fnord &f)
{
// reserve memory
f.data.reserve(count);
int rbytes = 0;
int rv;
do {
rv = read(fd, &f.data[rbytes], count - rbytes);
if( !rv ) {
// End of File / Stream
break;
}
if( 0 > rv ) {
if( EINTR == errno ) {
// signal interrupted read... restart
continue;
}
if( EAGAIN == errno
|| EWOULDBLOCK == errno ) {
// file / socket is in nonblocking mode and
// no more data is available.
break;
}
// some critical error happened. Deal with it!
break;
}
rbytes += rv;
} while(rbytes < count);
return rbyteS;
}
Looking at your first paragraph of gibberish:
If the count is greater than one, the pointer supplied in the function argument will point to the last byte that was read from the port in the memory
What makes you think so? This is not how it works. Most likely you passed some invalid pointer that wasn't properly initialized. Anything can happen.
so pointer decrement is necessary for bringing the pointer to the first byte of data.
Nope. That's not how it works.
This is dangerous because using it in a language like C++ with it's dynamic memory allocation of containers based on their size and space needs could corrupt data at the point of return from read() function.
Nope. That's not how it works!
C and C++ are an explicit languages. Everything happens in plain sight and nothing happens without you (the programmer) explicitly requesting it. No memory is allocated without you requesting this to happen. It can either be an explicit new, some RAII, automatic storage or the use of a container. But nothing happens "out of the blue" in C and C++. There's no built-in garbage collection^1 in C nor C++. Objects don't move around in memory or resize without you explicitly coding something into your program that makes this happen.
[1]: There are GC libraries you can use, but those never will stomp onto anything that can be reached by code that's executing. Essentially garbage collector libraries for C and C++ are memory leak detectors, which will free memory that can no longer be reached by normal program flow.
Related
I am trying to create a LabVIEW DLL and call it from a C++ program but I am facing a problem of data passing.
A scientific camera I recently bought comes with a LabVIEW SDK, and nothing else. The example program provided with the SDK is mainly a while loop around two functions, ReadData and DecodeData.
ReadData collects data from USB (using VISA read), the data obtained in one call contains several complete data blocks and an incomplete incoming block.
DecodeData is called multiple times to process all the complete blocks (it removes the processed data from the buffer). When all the complete blocks have been processed, the remaining data (the beginning of the incoming block) is passed to ReadData which will concatenate its new data at the end of the buffer.
Full example code:
Details of ReadData:
Details of DecodeData:
In the example program, written in LabVIEW, everything works fine. The problem is when I export these functions in a DLL. The memory buffers, inputs and outputs of both functions, are char arrays. After ReadData, my C++ program correctly obtains a buffer containg data, including null bytes.
The problem is when I inject this buffer in DecodeData, it seems that LabVIEW only takes into account the bytes before the first null byte... I guess that the char[] input is just processed as a null-terminated string and the rest of the data is just discarded.
I tried to add data converters ("string to byte array" at outputs and "byte array to string" at inputs) but the conversion function also discards the data after the first null character.
I could modify the .vi from the sdk to only handle byte arrays and not strings, but it uses lots of character processing functions and I would prefer leaving it as is.
How can I pass the data buffer from C++ to the LabVIEW DLL without losing part of my data?
Edit: here is the C++ code.
The header exported with the LabVIEW DLL:
int32_t __cdecl CORE_S_Read_data_from_USB(char VISARefIn[],
Enum1 blockToProcessPrevCycle, uint32_t bytesToProcessPrevCycle,
uint8_t inBytesRead[], uint32_t *BytesReceived, LVBoolean *DataReception,
uint8_t outBytesRead[], Enum1 *blockToProcess, uint32_t *bytesToProcess,
int32_t longueur, int32_t longueur2);
void __cdecl CORE_S_Decode_data(uint8_t inBytesRead[],
LVBoolean LUXELL256TypeB, uint32_t bytesToProcess, Enum1 blockToProcess,
Cluster2 *PrevHeader, LVBoolean *FrameCompleto,
uint32_t *bytesToProcessNextCycle, Enum1 *blockToProcessNextCycle,
Cluster2 *HeaderOut, uint8_t outBytesRead[], Int16Array *InfraredImage,
Cluster2 *Header, int32_t longueur, int32_t longueur2, int32_t longueur3);
Usage in my C++ source:
while (...)
{
// Append new data in uiBytesRead
ret = CORE_S_Read_data_from_USB(VISARef, blockToProcess, bytesToProcess, uiBytesRead, &BytesReceived,
&DataReception, uiBytesRead, &blockToProcess, &bytesToProcess, BUFFER_SIZE, BUFFER_SIZE);
if (DataReception == 0)
continue;
bool FrameCompleto = true;
while (FrameCompleto)
{
// Removes one frame of uiBytesRead per call
CORE_S_Decode_data(uiBytesRead, LUXELL256TypeB, 0, blockToProcess, &Header, &FrameCompleto, &bytesToProcess, &blockToProcess, &Header,
uiBytesRead, &InfraredImage, &Header, BUFFER_SIZE, BUFFER_SIZE, BUFFER_SIZE);
}
}
It is a little tricky to answer in this specific case but assuming that the problem is that NULL values in the buffer data are causing issues then it might be worth looking at the option to use String Handle Pointers for the String-Type controls and indicators of the VIs you are exporting.
This option can be selected during the "Define VI Prototype" stage of configuring the DLL Build
LabVIEW manages String Types internally as an integer of the string's length and an unsigned char array so it shouldn't matter what characters are used. For interfacing with external code, LabVIEW's extcode.h header defines an LStrHandle as follows:
typedef struct {
int32 cnt; /* number of bytes that follow */
uChar str[1]; /* cnt bytes */
} LStr, *LStrPtr, **LStrHandle;
So a String Handle Pointer is of type *LStrHandle.
extcode.h provides the macros LHStrBuf(LStrHandle) and LHStrLen(LStrHandle) which can ease dereferencing for the String Handle Pointer when you want to read or update the string content and length. Also, note a NULL handle can be used to represent an empty string so don't assume that the handle will be valid without checking.
When creating or resizing String Handle Pointers to pass to a function, it is worth noting that an LStr has exactly the same in-memory representation as a LabVIEW-array so the function NumericArrayResize with typeCode uB can create/resize a large enough buffer to store the string and the length-integer.
An example of creating a new String Handle Pointer for a string of length required_string_length is achieved by passing NumericArrayResize a handle pointer where the handle is NULL.
LStrHandle* new_string_handle_pointer;
// assign NULL value to handle
*new_string_handle_pointer=0;
err = NumericArrayResize(uB, 1, (UHandle *)new_string_handle_pointer, required_string_length);
// new_string_handle_pointer will now reference the new LStrHandle
When updating the String value in a String Handle remember to write the string's characters to the uChar array and to update the size integer. From a performance view, it might not be worth shrinking a String Handle when updating it to a shorter string but you will need to resize it if you know the string you are writing to it will be longer than what it can hold.
You should clean up any handle that is passed to you from LabVIEW or a LabVIEW-based DLL so once you have finished with it, call DSDisposeHandle on the handle that the handle-pointer references.
For more information on LabVIEW's memory manager function please read this guide.
I'm fairly new to c and I'm reading a book regarding Software Vulnerabilities and I came across this buffer overflow sample, it mentions that this can cause a buffer overflow. I am trying to determine how this is the case.
int handle_query_string(char *query_string)
{
struct keyval *qstring_values, *ent;
char buf[1024];
if(!query_string) {
return 0;
}
qstring_values = split_keyvalue_pairs(query_string);
if((ent = find_entry(qstring_values, "mode")) != NULL) {
sprintf(buf, "MODE=%s", ent->value);
putenv(buf);
}
}
I am paying close attention to this block of code because this appears to be where the buffer overflow is caused.
if((ent = find_entry(qstring_values, "mode")) != NULL)
{
sprintf(buf, "MODE=%s", ent->value);
putenv(buf);
}
I think here is it, because your buf is only 1024 and because ent->value can have more than 1024, then this may overflow.
sprintf(buf, "MODE=%s", ent->value);
But depends of implementations of split_keyvalue_pairs(query_string). If this function already checks the value and threat it (which I doubt).
klutt provided a good fix for the problem in a previous answer, so I'll try to go a bit more specific and in-depth on the exact nature of the overflow in your code.
char buf[1024];
This line allocates 1024 bytes on the stack, addressed by the pointer named buf. The big problem here is that it is on the stack. If you dynamically allocate using malloc (or my favorite: calloc), it will be on the heap. The location doesn't necessarily prevent or fix an overflow. But it can change the effect. Right above (give or take some bytes) this space on the stack would be the return address from the function, and an overflow can change that causing the program to redirect when it returns.
sprintf(buf, "MODE=%s", ent->value);
This line is what actually performs the overflow. sprintf = "string print format." This means that the destination is a string (char *), and you are printing a formatted string. It doesn't care about the length, it will just take the starting memory address of the destination string, and keep writing until it has finished. If there's more than 1024 characters to be written (in this case), then it will go past the end of your buffer and overflow into other parts of memory. The solution is to use the function snprint instead. The "n" tells you that it will limit the amount to be written to the destination, and avoid an overflow.
The ultimate thing to understand is that a "buffer" does not actually exist. It's simply not a thing. It is a concept we use to order the area in memory, but the computer has no idea what a buffer is, where it starts, or where it ends. So in writing, the computer doesn't really care if it is inside or outside of the buffer, and doesn't know where to stop writing. So, we need to tell it very explicitly how far it is allowed to write, or it will just keep writing.
A very big thing here is that you passed a pointer to a local variable to putenv. The buffer will cease to exist when handle_query_string returns. After that it will contain garbage variables. Note that what putenv does require that the string passed to it remains unchanged for the rest of the program. From the documentation for putenv (emphasis mine):
int putenv(char *string);
The putenv() function adds or changes the value of environment variables. The argument string is of the form name=value. If name does not already exist in the environment, then string is added to the environment. If name does exist, then the value of name in the environment is changed to value. The string pointed to by string becomes part of the environment, so altering the string changes the environment.
This can be corrected by using dynamic allocation. char *buf = malloc(1024) instead of char buf[1024]
Another thing is that sprintf(buf, "MODE=%s", ent->value); might overflow. That would happen if the string ent->value is too long. A solution there is to use snprintf instead.
snprintf(buf, sizeof buf, "MODE=%s", ent->value);
This prevents overflow, but it might still cause problems, because if ent->value is too big to fit in buf, then buf will for obvious reasons not contain the full string.
Here is a way to rectify both issues:
int handle_query_string(char *query_string)
{
struct keyval *qstring_values, *ent;
char *buf = NULL;
if(!query_string)
return 0;
qstring_values = split_keyvalue_pairs(query_string);
if((ent = find_entry(qstring_values, "mode")) != NULL)
{
// Make sure that the buffer is big enough instead of using
// a fixed size. The +5 on size is for "MODE=" and +1 is
// for the string terminator
const char[] format_string = "MODE=%s";
const size_t size = strlen(ent->value) + 5 + 1;
buf = malloc(size);
// Always check malloc for failure or chase nasty bugs
if(!buf) exit(EXIT_FAILURE);
sprintf(buf, format_string, ent->value);
putenv(buf);
}
}
Since we're using malloc the allocation will remain after the function exits. And for the same reason, we make sure that the buffer is big enough beforehand, and thus, using snprintf instead of sprintf is not necessary.
Theoretically, this has a memory leak unless you use free on all strings you have allocated, but in reality, not freeing before exiting main is very rarely a problem. Might be good to know though.
It can also be good to know that even though this code now is fairly protected, it's still not thread safe. The content of query_string and thus also ent->value may be altered. Your code does not show it, but it seems highly likely that find_entry returns a pointer that points somewhere in query_string. This can of course also be solved, but it can get complicated.
I'm using IOKit framework to communicate with my driver using IOConnectCallMethod from the user-space client and IOExternalMethodDispatch on the driver side.
So far I was able to send fixed length commands, and now I wish to send a varied size array of chars (i.e. fullpath).
However, it seems that the driver and the client sides command lengths are coupled, which means that checkStructureInputSize from IOExternalMethodDispatch in driver must be equal to inputStructCnt from
IOConnectCallMethod in client side.
Here are the struct contents on both sides :
DRIVER :
struct IOExternalMethodDispatch
{
IOExternalMethodAction function;
uint32_t checkScalarInputCount;
uint32_t checkStructureInputSize;
uint32_t checkScalarOutputCount;
uint32_t checkStructureOutputSize;
};
CLIENT:
kern_return_t IOConnectCallMethod(
mach_port_t connection, // In
uint32_t selector, // In
const uint64_t *input, // In
uint32_t inputCnt, // In
const void *inputStruct, // In
size_t inputStructCnt, // In
uint64_t *output, // Out
uint32_t *outputCnt, // In/Out
void *outputStruct, // Out
size_t *outputStructCnt) // In/Out
Here's my failed attempt to use a varied size command :
std::vector<char> rawData; //vector of chars
// filling the vector with filePath ...
kr = IOConnectCallMethod(_connection, kCommandIndex , 0, 0, rawData.data(), rawData.size(), 0, 0, 0, 0);
And from the driver command handler side, I'm calling IOUserClient::ExternalMethod with IOExternalMethodArguments *arguments and IOExternalMethodDispatch *dispatch but this requires the exact length of data I'm passing from the client which is dynamic.
this doesn't work unless I set the dispatch function with the exact length of data it should expect.
Any idea how to resolve this or perhaps there's a different API I should use in this case ?
As you have already discovered, the answer for accepting variable-length "struct" inputs and outputs is to specify the special kIOUCVariableStructureSize value for input or output struct size in the IOExternalMethodDispatch.
This will allow the method dispatch to succeed and call out to your method implementation. A nasty pitfall however is that structure inputs and outputs are not necessarily provided via the structureInput and structureOutput pointer fields in the IOExternalMethodArguments structure. In the struct definition (IOKit/IOUserClient.h), notice:
struct IOExternalMethodArguments
{
…
const void * structureInput;
uint32_t structureInputSize;
IOMemoryDescriptor * structureInputDescriptor;
…
void * structureOutput;
uint32_t structureOutputSize;
IOMemoryDescriptor * structureOutputDescriptor;
…
};
Depending on the actual size, the memory region might be referenced by structureInput or structureInputDescriptor (and structureOutput or structureOutputDescriptor) - the crossover point has typically been 8192 bytes, or 2 memory pages. Anything smaller will come in as a pointer, anything larger will be referenced by a memory descriptor. Don't count on a specific crossover point though, that's an implementation detail and could in principle change.
How you handle this situation depends on what you need to do with the input or output data. Usually though, you'll want to read it directly in your kext - so if it comes in as a memory descriptor, you need to map it into the kernel task's address space first. Something like this:
static IOReturn my_external_method_impl(OSObject* target, void* reference, IOExternalMethodArguments* arguments)
{
IOMemoryMap* map = nullptr;
const void* input;
size_t input_size;
if (arguments->structureInputDescriptor != nullptr)
{
map = arguments->structureInputDescriptor->createMappingInTask(kernel_task, 0, kIOMapAnywhere | kIOMapReadOnly);
if (map == nullptr)
{
// insert error handling here
return …;
}
input = reinterpret_cast<const void*>(map->getAddress());
input_size = map->getLength();
}
else
{
input = arguments->structureInput;
input_size = arguments->structureInputSize;
}
// …
// do stuff with input here
// …
OSSafeReleaseNULL(map); // make sure we unmap on all function return paths!
return …;
}
The output descriptor can be treated similarly, except without the kIOMapReadOnly option of course!
CAUTION: SUBTLE SECURITY RISK:
Interpreting user data in the kernel is generally a security-sensitive task. Until recently, the structure input mechanism was particularly vulnerable - because the input struct is memory-mapped from user space to kernel space, another userspace thread can still modify that memory while the kernel is reading it. You need to craft your kernel code very carefully to avoid introducing a vulnerability to malicious user clients. For example, bounds-checking a userspace-supplied value in mapped memory and then re-reading it under the assumption that it's still within the valid range is wrong.
The most straightforward way to avoid this is to make a copy of the memory once and then only use the copied version of the data. To take this approach, you don't even need to memory-map the descriptor: you can use the readBytes() member function. For large amounts of data, you might not want to do this for efficiency though.
Recently (during the 10.12.x cycle) Apple changed the structureInputDescriptor so it's created with the kIOMemoryMapCopyOnWrite option. (Which as far as I can tell was created specifically for this purpose.) The upshot of this is that if userspace modifies the memory range, it doesn't modify the kernel mapping but transparently creates copies of the pages it writes to. Relying on this assumes your user's system is fully patched up though. Even on a fully patched system, the structureOutputDescriptor suffers from the same issue, so treat it as write-only from the kernel's point of view. Never read back any data you wrote there. (Copy-on-write mapping makes no sense for the output struct.)
After going through the relevant manual again, I've found the relevant paragraph :
The checkScalarInputCount, checkStructureInputSize, checkScalarOutputCount, and checkStructureOutputSize fields allow for sanity-checking of the argument list before passing it along to the target object. The scalar counts should be set to the number of scalar (64-bit) values the target's method expects to read or write. The structure sizes should be set to the size of any structures the target's method expects to read or write. For either of the struct size fields, if the size of the struct can't be determined at compile time, specify kIOUCVariableStructureSize instead of the actual size.
So all I had to do in order to avoid the size verification, is to set the field checkStructureInputSize to value kIOUCVariableStructureSize in IoExternalMethodDispatch and the command passed to the driver properly.
if (m_Connections[t].socket != INVALID_SOCKET)
{
m_TCPResult = recv(m_Connections[t].socket, m_TCPRecvbuf, m_TCPRecvbuflen, 0);
if (m_TCPResult > 0)
{
printf("[TCPReceive] Bytes (m_TCPResult) received from %d: %d\n", m_Connections[t].socket, m_TCPResult);
// Deserialize the data
Packets::MainPacket receivedData;
memcpy(&receivedData, m_TCPRecvbuf, sizeof(receivedData));
// Check the type and do something with the data
CheckType(m_Connections[t].socket, receivedData);
}
else
{
if (WSAGetLastError() != WSAEWOULDBLOCK)
printf("TCPReceive error: %d\n", WSAGetLastError());
}
}
So I have this piece of code. I need to do a memcpy() to convert the incoming data from winsock to a struct that can be read by the application. However, after the CheckType() method is done the application crashes giving me an Access Violation Reading Location error. I removed the memcpy() method once to check and then it worked fine (no crashes).
I have no idea what the problem might be. I've been searching on Google but haven't found anything useful that seems to be a solution to my problem
EDIT:
Some more info:
// in the header
char m_TCPRecvbuf[DEFAULT_BUFLEN];
// receivedData struct
struct MainPacket
{
char type;
int id;
LoginData loginData;
vector<PlayerData> playerData;
};
You're writing over the vector when you do your memcpy. It's not a POD you can't initialize it via a memcpy, but instead have to use it's member functions to initialize it.
Think of it this way, the vector will have a pointer to the data it manages and a size_t indicating the size, at a minimum. You can't just initialize the pointer by memcpying a value you've received over the network. The pointer may make sense to the sender, but when you receive it all you've got is a pointer that's valid on the server, not in your application. Because of this the moment you try to use the vector you'll get undefined behaviour and will probably crash (if you're lucky).
Also, as a result of this sizeof doesn't work in the way you'd expect when applied to classes. For example, if you've got vector with 1,000 items in it then sizeof won't reflect this. What sizeof tells you is the combined size of all the member variables in the class definition (subject to padding). If our vector implementaton is just a pointer and a size_t then it'll probably be around 8 bytes on a 32bit platform, and 16 on a 64bit platform, regardless of how many items are in the vector.
What you need to do is encode information in the packet so that you can decode it. For example, rather than send a vector your packet should contain a field indicating the number of PlayerData instances, followed by the data for each player.
I am trying to write a wrapper around Windows file functions, one would read num bytes amount of data from the file and retrun it. For some reason I fail to allocate the memory properly, but I just can't find the reason why:
PBYTE Read(int num_bytes, HANDLER hFile){
PBYTE bBuffer;
DWORD new_size = sizeof(BYTE)*num_bytes;
//after the allocation the debugger already displays a 16 char wide placeholder
bBuffer = (PBYTE)malloc(new_size);
OVERLAPPED o = { 0 };
o.Offset = 0;
BOOL bReadDone = ReadFile(hFile, (LPVOID)bBuffer, sizeof(BYTE)*num_bytes, NULL, &o);
return bBuffer;
}
Data gets copied, but the allocated buffer is always too wide and contains extra wierd filler characters. Can sby please explain what am I doing wrong?
"what am I doing wrong?"
sizeof(BYTE) is 1 so you can remove it everywhere and eliminate the redundant new_size variable.
You tagged your question C++ but used malloc to allocate the buffer. Your design makes the caller responsible for freeing the buffer, which is a poor design approach, and even more so by using malloc/free in C++ program. A good C++ solution to this quandry would be to return a
std::vector.
It is vital that you provide the lpNumberOfBytesRead parameter to ReadFile. Without it you don't know how many bytes were read. And if you don't know how many bytes were read you can't tell the difference between "extra wierd filler characters" and unused memory at the end of the buffer. If the data is characters then character-oriented output routines (and debugger tools) don't know the difference either, since there is no null terminator at the end of the data that was actually read. You could use NumberOfBytesRead to put in a nul terminator so you and the debugger don't read beyond the real data.