How to handle char * from packed struct in cgo? - c++

Since Go doesn’t support packed struct I found this great article explains everything with examples how to work with packed struct in go. https://medium.com/#liamkelly17/working-with-packed-c-structs-in-cgo-224a0a3b708b
The problem is when I try char * in place of [10]char it's not working. I'm not sure how this conversion works with [10]char and not with char * . Here is example code taken from above article and modified with char * .
package main
/*
#include "stdio.h"
#pragma pack(1)
typedef struct{
unsigned char a;
char b;
int c;
unsigned int d;
char *e; // changed from char[10] to char *
}packed;
void PrintPacked(packed p){
printf("\nFrom C\na:%d\nb:%d\nc:%d\nd:%d\ne:%s\n", p.a, p.b, p.c, p.d, p.e);
}
*/
import "C"
import (
"bytes"
"encoding/binary"
)
//GoPack is the go version of the c packed structure
type GoPack struct {
a uint8
b int8
c int32
d uint32
e [10]uint8
}
//Pack Produces a packed version of the go struct
func (g *GoPack) Pack(out *C.packed) {
buf := &bytes.Buffer{}
binary.Write(buf, binary.LittleEndian, g)
*out = *(*C.packed)(C.CBytes(buf.Bytes()))
}
func main() {
pack := &GoPack{1, 2, 3, 4, [10]byte{}}
copy(pack.e[:], "TEST123")
cpack := C.packed{} //just to allocate the memory, still under GC control
pack.Pack(&cpack)
C.PrintPacked(cpack)
}
I'm working with cgo first time so correct me if i am wrong at any point.

You are writing ten (zero) bytes of GoPack.e into the packed.e which is of type char *. This won't work, because pointers will be 4 or 8 bytes depending on your system, so even if the bytes represented a valid pointer, you are overflowing the amount of memory allocated.
If you want to create a valid structure with a valid packed.e field, you need to allocate 10 bytes of memory in the C heap, copy the bytes into that, and then point packed.e to this allocated memory. (You will also need to free this memory when you free the corresponding packed structure). You can't do this directly with binary.Write.
You can take this as a starting point:
buf := &bytes.Buffer{}
binary.Write(buf, binary.LittleEndian, g.a)
binary.Write(buf, binary.LittleEndian, g.b)
binary.Write(buf, binary.LittleEndian, g.c)
binary.Write(buf, binary.LittleEndian, g.d)
binary.Write(buf, binary.LittleEndian, uintptr(C.CBytes(g.e))
*out = *(*C.packed)(C.CBytes(buf.Bytes()))
The function C.CBytes(b) allocates len(b) bytes in the C heap, and copies the bytes from b into it, returning an unsafe.Pointer.
Note that I've copied your *out = *(*C.packed)... line from your code. This actually causes a memory leak and an unnecessary copy. Probably it would be better to use a writer that writes bytes directly to the memory pointed to by out.
Perhaps this?
const N = 10000 // should be sizeof(*out) or larger
buf := bytes.NewBuffer((*[N]byte)(unsafe.Pointer(out))[:])
This makes a bytes.Buffer that directly writes to the out struct without going through any intermediate memory. Note that because of unsafe shenanigans, this is vulnerable to a buffer overflow if you write more bytes of data than is pointed to by out.
Words of warning: this is all pretty nasty, and prone to the same sorts of problems you'd find in C, and you'd need to check the cgo pointer rules to make sure that you're not vulnerable to garbage collection interactions. A point of advice: given that you say you "don't have much experience with pointers and memory allocation", you probably should avoid writing or including code like this because the problems it can introduce are nefarious and may not be immediately obvious.

Related

Can a stack allocated Rust buffer be accessed through C++?

In order to avoid head allocations, and since I know the maximum MTU of an ethernet packet, I created a small buffer: [u8, MAX_BYTES_TRANSPORT] in Rust, that C++ should fill for me:
pub fn receive(&mut self, f: &dyn Fn(&[u8])) -> std::result::Result<(), &str> {
let mut buffer: [u8; MAX_BYTES_TRANSPORT] = [0; MAX_BYTES_TRANSPORT];
//number of bytes written in buffer in the C++ side
let written_size: *mut size_t = std::ptr::null_mut::<size_t>();
let r = unsafe{openvpn_client_receive_just(
buffer.as_mut_ptr(), buffer.len(), written_size, self.openvpn_client)};
So, the function openvpn_client_receive_just, which is a C++ function with C interface, should write to this buffer. Is this safe? I couldn't find information about a stack allocated Rust buffer being used in C++
This is the function:
uint8_t openvpn_client_receive_just(
uint8_t *buffer, size_t buffer_size, size_t *written_size, OpenVPNSocket *client)
Can a stack allocated buffer be accessed through C++?
Yes.
From the type-system perspective there is no difference between statically allocated, stack allocated, or heap allocated: the C signature only takes pointer and size, and cares little where that pointer points to.
Is this safe?
Most likely.
As long as the C function is correctly written and respects the bounds of the buffer, this will be safe. If it doesn't, well, that's a bug.
One could argue that it's better to have a heap-allocated buffer, but honestly once one starts writing out of bounds, overwriting arbitrary stack bytes or overwriting arbitrary heap bytes are both bad, and have undefined behavior.
For extra security, you could use a heap allocated nested between 2 guard pages. Using OS specific facilities, you could allocate 3 contiguous OS pages (typically 4KB each on x86), then mark the first and last as read-only and put your buffer in the middle one. Then any (close) write before or after the buffer would be caught by the OS. Larger jumps, though, wouldn't... so it's a lot of effort for a mitigation.
Is your code safe?
You most likely need to know how many bytes were written, so using a null pointer is strange.
I'd expect to see:
let mut written: size_t = 0;
let written_size = &mut written as *mut _;
And yes, that's once again a pointer to a stack variable, just like you would in C.
A note on style. Your Rust code is unusual in that you use fully typed variables and full paths, a more idiomatic style would be:
// Result is implicitly in scope.
pub fn receive(&mut self, f: &dyn Fn(&[u8])) -> Result<(), &str) {
let mut buffer = [0u8; MAX_BYTES_TRANSPORT];
let mut written: size_t = 0;
let written_size = &mut written as *mut _;
// Safety:
// <enumerate preconditions to safely call the function here, and why they are met>
let result = unsafe {
openvpn_client_receive_just(
buffer.as_mut_ptr(), buffer.len(), written_size, self.openvpn_client)
};
translate_openvpn_error(result)?;
let buffer = &buffer[0..written];
f(buffer);
Ok(())
}
I did annotate the type for written, to help along inference, but strictly speaking it should not be necessary.
Also, I like to preface every unsafe call I make with the list of pre-conditions that make it safe, and for each why they are met. It helps me audit my unsafe code later on.

How to fill buffers with mixed types conveniently in standard conformant way?

There are problems, where we need to fill buffers with mixed types. Two examples:
programming OpenGL/DirectX, we need to fill vertex buffers, which can have mixed types (which is basically an array of struct, but the struct maybe described by a run-time data)
creating a memory allocator: putting header/trailer information to the buffer (size, flags, next/prev pointer, sentinels, etc.)
The problem can be described like this:
there is an allocation function, which gives back some memory (new, malloc, OS dependent allocation function, like mmap or VirtualAlloc)
there is a need to put mixed types into an allocated buffer, at various offsets
A solution can be this, for example writing an int to an offset:
void *buffer = <allocate>;
int offset = <some_offset>;
char *ptr = static_cast<char*>(buffer);
*reinterpret_cast<int*>(ptr+offset) = int_value;
However, this is inconvenient, and has UB at least two places:
ptr+offset is UB, as there is no char array at ptr
writing to the result of reinterpret_cast is UB, as there is no int there
To solve the inconvenience problem, this solution is often used:
union Pointer {
void *asVoid;
bool *asBool;
byte *asByte;
char *asChar;
short *asShort;
int *asInt;
Pointer(void *p) : asVoid(p) { }
};
So, with this union, we can do this:
Pointer p = <allocate>;
p.asChar += offset;
*p.asInt++ = int_value; // write an int to offset
*p.asShort++ = short_value; // then a short afterwards
// other writes here
This solution is convenient for filling buffers, but has further UB, as the solution uses non-active union members.
So, my question is: how can one solve this problem in a strictly standard conformant, and most convenient way? I mean, I'd like to have the functionality which the union solution gives me, but in a standard conformant way.
(Note: suppose, that we have no alignment issues here, alignment is taken care of by using proper offsets)
A simple (and conformant) way to handle these things is leveraging std::memcpy to move whatever values you need into the correct offsets in your storage area, e.g.
std::int32_t value;
char *ptr;
int offset;
// ...
std::memcpy(ptr+offset, &value, sizeof(value));
Do not worry about performance, since your compiler will not actually perform std::memcpy calls in many cases (e.g. small values). Of course, check the assembly output (and profile!), but it should be fine in general.

Array type required - Pointer to Record

I am trying to convert some C++ code into Delphi. Hopefully you can help.
This is the block of code that needs to be converted;
group = (inum - 1) / inodes_per_group;
index = ((inum - 1) % inodes_per_group) * inode_size;
inode_index = (index % blocksize);
desc[group].bg_block_bitmap;
blknum = desc[group].bg_inode_table + (index / blocksize); //Specifically this line
Now desc is declared as EXT2_GROUP_DESC *desc; which is defined elsewhere;
typedef struct tagEXT2_GROUP_DESC
{
uint32_t bg_block_bitmap; /* points to the blocks bitmap for the group */
uint32_t bg_inode_bitmap; /* points to the inodes bitmap for the group */
uint32_t bg_inode_table; /* points to the inode table first block */
uint16_t bg_free_blocks_count; /* number of free blocks in the group */
uint16_t bg_free_inodes_count; /* number of free inodes in the */
uint16_t bg_used_dirs_count; /* number of inodes allocated to directories */
uint16_t bg_pad; /* padding */
uint32_t bg_reserved[3]; /* reserved */
}__attribute__ ((__packed__)) EXT2_GROUP_DESC;
desc is initalised using calloc as follows;
desc = (EXT2_GROUP_DESC *) calloc(totalGroups, sizeof(EXT2_GROUP_DESC));
First question; In C++, how is it possible to access a pointer to a record as an array like this? Why is there no array type required in C++?
Second question: Below is my Delphi conversion, why can I not access desc as an array without giving it a type?
My way is obviously wrong. What is the correct way to go about it?
Type
PTExt2_Group_Desc = ^TExt2_Group_Desc;
TExt2_Group_Desc = packed Record
bg_block_bitmap : Cardinal;
bg_inode_bitmap : Cardinal;
bg_inode_table : Cardinal;
bg_free_blocks_count : Word;
bg_free_inodes_count : Word;
bg_used_dirs_count : Word;
bg_pad : Word;
bg_reserved : Array[0..2] of Cardinal;
end;
//Calloc function found from Google
function CAlloc(Items, Size: Cardinal): Pointer;
begin
try
GetMem(Result, Items * Size);
FillChar(PByte(Result)^, Items * Size, 0);
except
on EOutOfMemory do
Result := nil;
end;
end;
self.desc := PTExt2_Group_Desc(calloc(totalGroups, sizeof(TEXT2_GROUP_DESC)));
index := ((inum-1) MOD self.inodes_per_group) * self.inode_size;
inode_index := (index MOD self.block_size);
blknum := self.desc[group].bg_inode_table + (index div self.block_size); //Error - Array type required
To answer your question how is it possible (by which I take it you mean "sensible") to use a pointer as an array...
A pointer is simply a reference to an area of memory. A typed pointer ensures that the layout of that memory conforms to a particular specification. An array is merely a sequential arrangement of uniform memory areas, so an array reference applied to a base pointer is logically equivalent to simply taking an offset into the memory. Instead of a byte offset, you are indicating an offset as a multiple (the array index) of a number of bytes, being the number of bytes occupied by the type referenced by pointer.
e.g. if you have a pointer P referencing memory that is a notional array of 16-bit words (2 bytes) then:
P[4] = P + (4 * 2) = P + 8 bytes
i.e P[4] = the 16-bit word located at P + 8
Now to the Delphi technique(s) involved:
If you are using Delphi 2009 or later, you can enable this syntax by enabling pointer arithmetic:
{$POINTERMATH ON}
If you are using an earlier version of Delphi, or prefer not to enable pointer arithmetic for some reason, you can achieve the same result by introducing an array type declaration:
TExt2_Group_Desc = packed record
..
end;
TExt2_Group_DescArray = array [0..255] of TExt2_Group_Desc;
PTExt2_Group_Desc = ^TExt2_Group_DescArray;
Doing this means that your pointer type must always be used as a reference to an array however, so you may prefer to create this array-form declaration as a separate type:
PTExt2_Group_Desc = ^TExt2_Group_Desc;
TExt2_Group_Desc = packed record
..
end;
TExt2_Group_DescArray = array [0..255] of TExt2_Group_Desc;
PTExt2_Group_DescAsArray = ^TExt2_Group_DescArray;
The bounds on the array are not strictly important (in terms of memory use) since you are not declaring a variable of that type, only using it as a way of coercing the array-form pointer type. However, if you have bounds checking enabled then you should ensure that the bounds on this array declaration are sufficient to accommodate your required indexing range.
The POINTERMATH directive approach is not affected by this since there are no explicit bounds involved in that case.
First question; In C++, how is it possible to access a pointer to a
record as an array like this? Why is there no array type required in
C++?
To answer your first question, C++ has no notion of Variable Length Arrays (VLA). Some C++ compilers offer this as an extension, but the ANSI C++ language specification outlaws them.
So the code you're seeing is one of many ways of compensating for the lack of VLA's. Since arrays store data in contiguous blocks, all that needs to be done is declare a pointer, create the block dynamically (in the case above, using calloc), thus a pointer is returned that denotes the start of the array block.
Once the pointer points to this memory block, the array syntax of using [ ] can be used to access each element.
Note that the code you posted is mostly C in style. There is very little reason (unless you're writing an allocator class) to use calloc in a C++ program. In C++, you would usually use new[] to allocate memory for an array, or use a class such as std::vector to handle the memory management automatically.
Having said this, C99 has VLA's, so the code above need not be done with this version of C. For C++, there are no VLA's that are standard, so the code in your question would be used (albeit, very rarely).
As to your second question concerning Delphi, I guess the answer to the first question can be used as a guide. I am not a Delphi programmer, but what you should investigate is whether Delphi has some sort of dynamic array class or type. That would be the equivalent of the calloc call in C++.

Memory Fragmentation in C++

I want to use malloc()/new to allocate 256KB memory to variable m.
Then, use m to store data such as strings and numbers.
My problem is how to save data to m and retrive them.
For example, how to store int 123456 in offsets 0 to 3 and read it to variable x?
Or store "David" string from offset 4 to 8(or 9 with \0) and then, retrive it to variable s?
You can store an integer by casting pointers.
unsigned char *p = new unsigned char[256 * 1000];
*(int *) p = 123456;
int x = *(int *) p;
This is a terrible idea. Don't worked with untyped memory, and don't try to play fast and loose like you do in PHP because C++ is less tolerant of sloppy programming.
I suggest reading an introductory C++ textbook, which will explain things like types and classes which you can use to avoid dealing with untyped memory.
Edit: From the comments above, it looks like you want to learn about pointer arithmetic.
Don't use pointer arithmetic*.
* unless you promise that you know what you are doing.
Please read my comment, I think you need to know more about C and low level native programmng.
Is there a specific application for that format?
to assign a structure to memory you can do somethinglike
struct my_format{
int first;
char second[5];
};
int main()
{
struct my_format *mfp=
malloc(sizeof(struct my_format));
mfp->first=123456;
free(mfp);
}
or whatever this doesn't deal with memory specifics (IE exact positions of vars) vur doing so is just plain bad in almost all ways.

Length of a BYTE array in C++

I have a program in C++ that has a BYTE array that stores some values. I need to find the length of that array i.e. number of bytes in that array. Please help me in this regard.
This is the code:
BYTE *res;
res = (BYTE *)realloc(res, (byte_len(res)+2));
byte_len is a fictitious function that returns the length of the BYTE array and I would like to know how to implement it.
Given your code:
BYTE *res;
res = (BYTE *)realloc(res, (byte_len(res)+2));
res is a pointer to type BYTE. The fact that it points to a contiguous sequence of n BYTES is due to the fact that you did so. The information about the length is not a part of the pointer. In other words, res points to only one BYTE, and if you point it to the right location, where you have access to, you can use it to get BYTE values before or after it.
BYTE data[10];
BYTE *res = data[2];
/* Now you can access res[-2] to res[7] */
So, to answer your question: you definitely know how many BYTEs you allocated when you called malloc() or realloc(), so you should keep track of the number.
Finally, your use of realloc() is wrong, because if realloc() fails, you leak memory. The standard way to use realloc() is to use a temporary:
BYTE *tmp;
tmp = (BYTE *)realloc(res, n*2);
if (tmp == NULL) {
/* realloc failed, res is still valid */
} else {
/* You can't use res now, but tmp is valid. Reassign */
res = tmp;
}
If the array is a fixed size array, such as:
BYTE Data[200];
You can find the length (in elements) with the commonly used macro:
#define ARRAY_LENGTH(array) (sizeof(array)/sizeof((array)[0]))
However, in C++ I prefer to use the following where possible:
template<typename T, size_t N>
inline size_t array_length(T data[N])
{
return N;
};
Because it prevents this from occurring:
// array is now dynamically allocated
BYTE* data = new BYTE[200];
// oops! this is now 4 (or 8 on 64bit)!
size_t length = ARRAY_LENGTH(data);
// this on the other hand becomes a compile error
length = array_length(data);
If the array is not a fixed size array:
In C++, raw pointers (like byte*) are not bounded. If you need the length, which you always do when working with arrays, you have to keep track of the length separately. Classes like std::vector help with this because they store the length of the array along with the data.
In the C way of doing things (which is also relevant to C++) you generally need to keep a record of how long your array is:
BYTE *res;
int len = 100
res = (BYTE *)realloc(res, (byte_len(len)));
len += 2;
res = (BYTE *)realloc(res, (byte_len(len)));
An alternative in the C++ way of doing things s to use the std::vector container class; a vector has the ability to manage the length of the array by itself, and also deals with the issues of memory management..
EDIT: as others have pointed out the use of realloc here is incorrect as it will lead to memory leaks, this just deals with keeping track of the length. You should probably accept one of the other replies as the best answer
Given the information you seem to have available, there is no way to do what you want. When you are working with arrays allocated on the heap, you need to save the size somewhere if you need to work with it again. Neither new nor malloc will do this for you.
Now, if you have the number of items in the array saved somewhere, you can do this to get the total size in characters, which is the unit that realloc works with. The code would look like this:
size_t array_memsize = elems_in_array * sizeof(BYTE);
If you are really working with C++ and not C I would strongly suggest that you use the vector template for this instead of going to malloc and realloc. The vector template is fast and not anywhere near as error prone as rolling your own memory management. In addition, it tracks the size for you.
When you allocate the pointer initially you also need to keep track of the length:
size_t bufSize = 100;
BYTE* buf = malloc(sizeof(BYTE ) * bufSize);
When you re-allocate you should be carefull with the re-alloc:
BYTE* temp = realloc(buf,sizeof(BYTE ) * (bufSize+2));
if (temp != NULL)
{
bufSize += 2;
buf = temp;
}
If it is a local variable allocated on the stack you can calculate it like this:
BYTE array[] = { 10, 20, 30, ... };
size_t lenght = sizeof(array) / sizeof(BYTE);
If you receive a pointer to the beginning of the array or you allocate it dynamically(on the heap), you need to keep the length as well as the pointer.
EDIT: I also advise you use STL vector for such needs because it already implements dynamic array semantics.