Moving data from multiple unique_ptr into contiguous C-style array - c++

Over time I am receiving parts(RdKafka::Message) of a larger message and storing them in
std::vector<std::unique_ptr<RdKafka::Message>> messageParts;
The RdKafka::Message has a raw pointer payload that I want to move into a single shared_ptr. I can do this because I have the size and index into the original larger message contained in the RdKafka::Message.
Once I have received all parts of the larger message I want to reassemble all of the messages into a single shared_ptr.
My first strategy has been to allocate the total amount of memory space in a C style pointer and then std::move each individual unique_ptr part of the data into the correct section of the C style memory. Something like this:
uint8_t* payload = new uint8_t[totalBytes];
for(auto& m : messageParts){
int32_t partition = m->partition();
size_t len = m->len();
std::memcpy(&payload[partition], (uint8_t*)(m->payload()), len);
}
std::shared_ptr<uint8_t[]>(payload);
This obviously doesn't compile but I am looking for something like this so that I can return a single pointer to all the original memory.

Related

dynamic memory allocation using new operator , issue in type cast

uint8_t hello = 50;
uint8_t *data;
data = new uint8_t(hello); //issue when using "new"
data = (uint8_t*)malloc(hello); //worked fine
I want to allocate memory like above mentioned code. If I delete this data ptr at the end of the scope, there is some sort of memory leak. Moreover, have I allocated the memory correctly? Is there any casting needed like I did for malloc?
If you're looking to make a buffer of some particular size:
uint8_t *data = new uint8_t[hello];
Where hello is your size argument. Note the use of [...] instead of (...). The second form is for constructor arguments when building one of something.
The original form creates an allocation for one uint8_t and populates that particular one with the value 50. That's presumably not what you want, but it does work:
data = new uint8_t(hello);
std::cout << *data << std::endl;
This outputs 2 which is correct as that's the ASCII character 50.
All that being said, for character buffers steer towards std::string and far, far away from C-style buffers if you can.

Passing vector to enclave in Intel SGX

I have a vector<vector <string>> a; How could I pass it to the enclave? How I declare edl function.
A sample function declaration for the app, edl and enclave is much appreciated.
I am aware of this: C++ Arguments to SGX Enclave Edge Functions.
A sample to pass even a vector<string> is ok for me.
update1:
I came up with this:
App.cpp
const char *convert(const std::string & s)
{
return s.c_str();
}
vector<string> members_data;
member_data.push_back("apple");
member_data.push_back("orange"); //just for sample
std::vector<const char*> vc;
std::transform(members_data.begin(), members_data.end(), std::back_inserter(vc), convert);
edl:
trusted {
public void ecall_receive_vector([in, size=len] const char **arr, size_t len);
};
enclave
void ecall_receive_vector(const char *arr[], size_t len)
{
vector<string> v(arr, arr+len);
printf("%s\n", v[2].c_str());
}
But enclave does not receive any data, the program compiles perfectly with no error. Could anyone help? The printf is the sample ocall.
In the EDL use count instead of size.
trusted {
public void ecall_receive_vector([in, count=len] const char **arr, size_t len);
};
You are passing a double pointer, it is, a pointer to pointer to char (char **).
While marshaling/unmarshaling pointers, the EDL Processor processes (copies and validates input and output) only the first level of indirection, it's up to the developer to handle the additional levels of indirection. Hence, for an array of pointers it will only copy the first array of pointers, not the pointed values, copying them is the developer's responsibility.
If not specified count and size default to 1 and sizeof(<pointed-type>) respectively. In your case size = sizeof(<pointer>) which in most platforms is 4.
In your case, you provided only size. As you don't provide the caller code I assume you're passing the length of the string, and as count was not specified it defaults to 1. Then the total number of bytes, based on Total number of bytes = count * size will be 1 * len which is wrong.
Using only count will let size default to sizeof(<pointed-type>), then Total number of bytes = count * size will be count * sizeof(<pointed-type>), which is right because you're passing an array of pointers.
To close, once inside the Enclave you need to copy the pointers' data because those pointers reside out of the enclave, that may be done automatically by assigning them to a std::string.
From Intel SGX SDK Documentation:
Pointer Handling (the last paragraph)
You may use the direction attribute to trade protection for performance. Otherwise, you must use the user_check attribute described below and validate the data obtained from untrusted memory via pointers before using it, since the memory a pointer points to could change unexpectedly because it is stored in untrusted memory. However, the direction attribute does not help with structures that contain pointers. In this scenario, developers have to validate and copy the buffer contents, recursively if needed, themselves.
And,
Buffer Size Calculation
The generalized formula for calculating the buffer size using these attributes:
Total number of bytes = count * size
The above formula holds when both count and size/sizefunc are specified.
size can be specified by either size or sizefunc attribute.
If count is not specified for the pointer parameter, then it is assumed to be equal to 1, i.e., count=1. Then total number of bytes equals to size/sizefunc.
If size is not specified, then the buffer size is calculated using the above formula where size is sizeof (element pointed by the pointer).

Sending dynamic arrays in MPI, C++

My task is to measure time of communication betweeen two processes.
I want to send 4,8,...,1000,...., 10000 bytes of data and measure time it takes to send and receive back the message.
So i figured out that i will send an array of shorts.
When i send array initialised like that:
mpi::communicator world;
short message[100000];
....
world.send(1,0, message);
time seems to be ok, and I can see a time difference between message[100000] and [1000]
But I want to allocate array dynamically like that:
short *message = new short[100000];
...
world.send(1,0, *message);
It seems like the second send is always sending the same amount of data no matter what size the array will be.
So my question is, how to send a dynamically allocated array?
In the second case message is of type short * and *message dereferences to a scalar short, i.e. to the first element of the array only. Use
world.send(1, 0, message, n);
instead and vary the value of n. It should also (probably) work if you cast the pointer to a pointer to an array and then dereference it:
world.send(1, 0, *reinterpret_cast<int(*)[100]>(message));
The int(*)[100] type is a pointer to an integer array with 100 elements.

Copy unsigned char * to unsigned char*

I need to save packet state for a while.
So I read the packet data which is represented as unsigned char* and than I create a record with this data and save the record in the list for a while.
Which will be a better way to represent the packet in the record as char* or as char[].
How do i copy the read data ( unsigned char ) to both options :
To unsigned char[] and to unsigned char*
I need to copy the data because each time I read packet it will be readed to the same char*,so when I save it for a while I need to copy data first
If the packet data is binary I'd prefer using std::vector to store the data, as opposed to one of the C strXXX functions, to avoid issues with a potential NULL character existing in the data stream. Most strXXX functions look for NULL characters and truncate their operation. Since the data is not a string, I'd also avoid std::string for this task.
std::vector<unsigned char> v( buf, buf + datalen );
The vector constructor will copy all the data from buf[0] to buf[datalen - 1] and will deallocate the memory when the vector goes out of scope. You can get a pointer to the underlying buffer using v.data() or &v[0].
So, it sounds like you need to save the data from multiple packets in a list until some point in the future.
If it was me, I'd use std::string or std::vector normally because that removes allocation issues and is generally plenty fast.
If you do intend to use char* or char[], then you'd want to use char*. Declaring a variable like "char buf[1024];" allocates it on the stack, which means that when that function returns it goes away. To save it in a list, you'd need to dynamically allocate it, so you would do something like "char *buf = new char[packet.size];" and then copy the data and store the pointer and the length of the data in your list (or, as I said before, use std::string which avoids keeping the length separately).
How do you copy the data?
Probably memcpy. The strcpy function would have problems with data which can have nul characters in it, which is common in networking situations. So, something like:
char *buf = new char[packet_length];
memcpy(buf, packet_data, packet_length);
// Put buf and packet_length into a structure in your list.

C++ Pointer question

I'm new to pointers in C++. I'm not sure why I need pointers like char * something[20] as oppose to just char something[20][100]. I realize that the second method would mean that 100 block of memory will be allocated for each element in the array, but wouldn't the first method introduce memory leak issues.
If someone could explain to me how char * something[20] locates memory, that would be great.
Edit:
My C++ Primer Plus book is doing:
const char * cities[5] = {
"City 1",
"City 2",
"City 3",
"City 4",
"City 5"
}
Isn't this the opposite of what people just said?
You allocate 20 pointers in the memory, then you will need to go through each and every one of them to allocate memory dynamically:
something[0] = new char[100];
something[1] = new char[20]; // they can differ in size
And delete them all separately:
delete [] something[0];
delete [] something[1];
EDIT:
const char* text[] = {"These", "are", "string", "literals"};
Strings specified directly in the source code ("string literals", which are always const char *) are quite different to char *, mainly because you don't have to worry about alloc/dealloc of them. They are also generally handled very different in memory, but this depends on the implementation of your compiler.
You're right.
You'd need to go through each element of that array and allocate a character buffer for each one.
Then, later, you'd need to go through each element of that array and free the memory again.
Why you would want to faff about with this in C++ is anyone's guess.
What's wrong with std::vector<std::string> myStrings(20)?
It will allocate space for twenty char-pointers.
They will not be initialized, so typical usage looks like
char * something[20];
for (int i=0; i<20; i++)
something[i] = strdup("something of a content");
and later
for (int i=0; i<20; i++)
if (something[i])
free(something[i]);
You're right - the first method may introduce memory leak issues and the overhead of doing dynamic allocations, plus more reads. I think the second method is usually preferable, unless it wastes too much RAM or you may need the strings to grow longer than 99 chars.
How the first method works:
char* something[20]; // Stores 20 pointers.
something[0] = malloc(100); // Make something[0] point to a new buffer of 100 bytes.
sprintf(something[0], "hai"); // Make the new buffer contain "hai", going through the pointer in something[0]
free(something[0]); // Release the buffer.
char* smth[20] does not allocate any memeory on heap. It allocates just enough space on the stack to store 20 pointers. The value of those pointers is undefined, so before using them, you have to initialize them, like this:
char* smth[20];
smth[0] = new char[100]; // allocate memory for 100 chars, store the address of the first one in smth[0]
//..some code..
delete[] smth[0];
First of all, this almost inapplicable in C++. The normal equivalent in C++ would be something like: std::vector<std::string> something;
In C, the primary difference is that you can allocate each string separately from the others. With char something[M][N], you always allocate exactly the same number of strings, and the same space for each string. This will frequently waste space (when the strings are shorter than you've made space for), and won't allow you to deal with any more strings or longer of strings than you've made space for initially.
char *something[20] let's you deal with longer/shorter strings more efficiently, but still only makes space for 20 strings.
The next step (if you're feeling adventurous) is to use something like:
char **something;
and allocate the strings individually, and allocate space for the pointers dynamically as well, so if you get more than 20 strings you can deal with that as well.
I'll repeat, however, that for most practical purposes, this is restricted to C. In C++, the standard library already has data structures for situations like these.
C++ has pointers because C has pointers.
Why do we use pointers?
To track dynamically-allocated memory. The memory allocation functions in C (malloc, calloc, realloc) and the new operator in C++ all return pointer values.
To mimic pass-by-reference semantics (C only). In C, all function arguments are passed by value; the formal parameter and the actual parameter are distinct objects, and modifying a formal parameter doesn't affect the actual parameter. We get around this by passing pointers to the function. C++ introduced reference types, which serve the same purpose, but are a bit cleaner and safer than using pointers.
To build dynamic, self-referential data structures. A struct cannot contain an instance of itself, but it can contain a pointer to an instance. For example, the following code
struct node
{
data_t data;
struct node *next;
};
creates a data type for a simple linked-list node; the next member explicitly points to the next element in the list. Note that in C++, the STL containers for stacks and queues and vectors all use pointers under the hood, isolating you from the bookkeeping.
There are literally dozens of other places where pointers come up, but those are the main reasons you use them.
Your array of pointers could be used to store strings of varying length by allocating just enough memory for each, rather than relying on some maximum size (which will eventually be exceeded, leading to a buffer overflow error, and in any case will lead to internal memory fragmentation). Naturally, in C++ you'd use the string data type (which hides all the pointer and memory management behind the class API) instead of pointers to char, but someone has decided to confuse you by starting with low-level details instead of the big picture.
I'm not
sure why I need pointers like char *
something[20] as oppose to just char
something[20][100]. I realize that the
second method would mean that 100
block of memory will be allocated for
each element in the array, but
wouldn't the first method introduce
memory leak issues.
The second method will suffice if you're only referencing your buffer(s) locally.
The problem comes when you pass the array name to another function. When you pass char something[10] to another function, you're actually passing char* something because the array length doesn't go along for the ride.
For multidimensional arrays, you can declare a function that takes in an array of determinate length in all but one direction, e.g. foo(char* something[10]).
So why use the first form rather than the second? I can think of a few reasons:
You don't want to have the restriction that the entire buffer must reside in continuous memory.
You don't know at compile-time that you'll need each buffer, or that the length of each buffer will need to be the same size, and you want the flexibility to determine that at run-time.
This is a function declaration.
char * something[20]
Assuming this is 32Bit, this allocates 80 bytes of data on the stack.
4 Bytes for each pointer address, 20 pointers total = 4 x 20 = 80 bytes.
The pointers are all uninitialized, so you need to write additional code to allocate/free
the buffers for doing this.
It roughly looks like:
[0] [4 Bytes of Uninitialized data to hold a pointer/memory address...]
[1] [4 Bytes of ... ]
...
[19]
char something[20][100]
Allocates 2000 bytes on the stack.
100 Bytes for each something, 20 somethings total = 100 x 20 = 2000 bytes.
[0] [100 bytes to hold characters]
[1] [100 bytes to hold characters]
...
[19]
The char *, has a smaller memory overhead, but you have to manage the memory.
The char[][] approach, has bigger memory overhead, but you don't have additional memory management.
With either approach, you have to be careful when writing to the buffer allocated not to exceed/overwrite the memory alloc'd for it.