How to track memory allocation via DPDK API rte_malloc? - dpdk

Based on DPDK API documentation rte_malloc_dump_stats is said to Dump for the specified type to a file. But neither custom application or app/test/test_malloc.c shares the desired result.
Expectation: for DPDK API rte_malloc, rte_calloc, and rte_zmalloc is created with const char *type. the stats for specific const char *type can be queried with rte_malloc_dump_stats
Current result: DPDK API rte_malloc_dump_stats gives overall heap usage, without any information about which "type" uses how much.
Question: Is there any other API available in DPDK to track the huge page malloc usage? Any DPDK patch would be useful?
Sample code flow:
/* with huge page */
rte_eal_init
/* create memory location for various object type */
rte_malloc ("objecttype-1")
rte_malloc ("objecttype-2")
rte_malloc ("objecttype-3")
rte_malloc ("objecttype-4")
/* dump stats for object type - 1 */
rte_malloc_dump_stats ("objecttype-1")
My application calls rte_calloc, which calls few other internal subroutines and finally calls heap_alloc (in which "type" seems to be unsed).
heap_alloc(struct malloc_heap *heap, const char *type __rte_unused, size_t size, unsigned int flags, size_t align, size_t bound, bool contig). Heap alloc currently seems to ignore the "type" argument. How to track the heap memory usage?

The observation is correct, currently DPDK API rte_malloc, rte_zalloc, rte_calloc discards the name field which is represented by first argument as const char *type. Hence dump or query with DPDK API rte_malloc_dump_stats does not return for specific const char *type.
There are a couple of workaround to this problem (without modifying the DPDK library), such as
rte_memzone_reserve creates a named area (which can be the desired type) as a repalcement.
rte_mempool_create_empty to create a named empty mempool.
both approaches have its disadvantages
memzone: is normally targeted for larger areas and called once, hence frequent calls to memzone_reserve for the small objects will have a performance impact. One has to also use rte_memzone_walk to iterate through the desired const char *type to get the details too.
empty_mempool: expects all objects put into are of the same size, hence it is useful in places like creating a pool of reusable counters, or lists. But using rte_mempool_lookup has an easier way to access the pool for details like the number of elements, size and cached elements per core etc.
There is a much easier alternative by simply creating Create a container to hold malloc|calloc|zalloc name, pointer and size using fb_arry. on every successful allocation fetch an element holder from fb_array and update the details. major advantage of this approach is fb_arry can be extended on huge page via rte_memzone_reserve, which makes monitoring of const char *type with almost no modification.
Code Snippet:
struct rte_memzone *mptr = NULL;
struct elements {
char name[256];
void *ptr;
};
/* right after rte_eal_init */
mptr = rte_memzone_reserve_aligned("MALLOC_REGIONS",
sizeof(struct rte_fbarray), SOCKET_ID_ANY,
RTE_MEMZONE_2MB, 8);
/* create a wrapper or MACRO to insert details of each alloc with type field */
do
{
ptr = rte_malloc(NULL, size_dataoibject, 0);
if (ptr == NULL)
return NULL;
int index = rte_fbarray_find_next_free((struct rte_fbarray *)mptr->addr, 0);
if (index < 0) {
printf("fail in rte_fbarray_find_next_free %d", index);
rte_free(ptr);
return NULL;
}
return ptr;
} while(0);
/* in the data processing thread, invoke the MACRO/function*/
references:
DPDK debug guide
malloc monitor

Related

How to allocate memory in a DriverKit system extension and map it to another process?

I have allocated memory in my application and passed its pointer and size to IOConnectCallStructMethod. Using IOMemoryDescriptor::CreateMapping I have then mapped this memory to the DriverKit system extension process, and it is possible to write to this mapped memory location and read the data from my application.
I would now like to do something similar for memory that is allocated in the system extension, and then map it to the application that is using the system extension. I would like to create a set of memory buffers in the system extension, and then write to it from the application and then signal to the system extension with IOConnectCallScalarMethod that a given buffer should be sent to the USB device, using IOUSBHostPipe::AsyncIO. When the CompleteAsyncIO callback then comes as a result of the sending completing, I would notify back to the application that it is now possible to copy data to the first buffer that was sent. The mechanism for this could probably be done using IOConnectCallAsyncStructMethod, and the OSAction object that is created in the system extension. What I don't understand is how to map memory allocated in the system extension to the application.
This is what IOUserClient::CopyClientMemoryForType in DriverKit is for, which gets invoked when your user process calls IOConnectMapMemory64 from IOKit.framework. The kext equivalent, incidentally, is IOUserClient::clientMemoryForType and essentially works exactly the same.
To make it work, you need to override the CopyClientMemoryForType virtual function in your user client subclass.
In the class definition in .iig:
virtual kern_return_t CopyClientMemoryForType(
uint64_t type, uint64_t *options, IOMemoryDescriptor **memory) override;
In the implementation .cpp, something along these lines:
kern_return_t IMPL(MyUserClient, CopyClientMemoryForType) //(uint64_t type, uint64_t *options, IOMemoryDescriptor **memory)
{
kern_return_t res;
if (type == 0)
{
IOBufferMemoryDescriptor* buffer = nullptr;
res = IOBufferMemoryDescriptor::Create(kIOMemoryDirectionInOut, 128 /* capacity */, 8 /* alignment */, &buffer);
if (res != kIOReturnSuccess)
{
os_log(OS_LOG_DEFAULT, "MyUserClient::CopyClientMemoryForType(): IOBufferMemoryDescriptor::Create failed: 0x%x", res);
}
else
{
*memory = buffer; // returned with refcount 1
}
}
else
{
res = this->CopyClientMemoryForType(type, options, memory, SUPERDISPATCH);
}
return res;
}
In user space, you would call:
mach_vm_address_t address = 0;
mach_vm_size_t size = 0;
IOReturn res = IOConnectMapMemory64(connection, 0 /*memoryType*/, mach_task_self(), &address, &size, kIOMapAnywhere);
Some notes on this:
The value in the type parameter comes from the memoryType parameter to the IOConnectMapMemory64 call that caused this function to be called. Your driver therefore can have some kind of numbering convention; in the simplest case you can treat it similarly to the selector in external methods.
memory is effectively an output parameter and this is where you're expected to return the memory descriptor you want to map into user space when your function returns kIOReturnSuccess. The function has copy semantics, i.e. the caller expects to take ownership of the memory descriptor, i.e. it will eventually drop the reference count by 1 when it is no longer needed. The returned memory descriptor need not be an IOBufferMemoryDescriptor as I've used in the example, it can also be a PCI BAR or whatever.
The kIOMapAnywhere option in the IOConnectMapMemory64 call is important and normally what you want: if you don't specify this, the atAddress parameter becomes an in-out parameter, and the caller is expected to select a location in the address space where the driver memory should be mapped. Normally you don't care where this is, and indeed specifying an explicit location can be dangerous if there's already something mapped there.
If user space must not write to the mapped memory, set the options parameter to CopyClientMemoryForType accordingly: *options = kIOUserClientMemoryReadOnly;
To destroy the mapping, the user space process must call IOConnectUnmapMemory64().

BluetoothGATTSetCharacteristicValue returns E_INVALIDARG or ERROR_INVALID_FUNCTION

I have build a set C++ containing classes on top of the BluetoothAPIs apis.
I can enumerate open handles to services, characteristics and descriptors. I can read characteristic values. The issue that I have is that I cannot write to a characteristic value.
Below is the code use to write the characteristic value
void BleGattCharacteristic::setValue(UCHAR * data, ULONG size){
if (pGattCharacteristic->IsSignedWritable || pGattCharacteristic->IsWritable || pGattCharacteristic->IsWritableWithoutResponse)
{
size_t required_size = sizeof(BTH_LE_GATT_CHARACTERISTIC_VALUE) + size;
PBTH_LE_GATT_CHARACTERISTIC_VALUE gatt_value = (PBTH_LE_GATT_CHARACTERISTIC_VALUE)malloc(required_size);
ZeroMemory(gatt_value, required_size);
gatt_value->DataSize = (ULONG)size;
memcpy(gatt_value->Data, data, size);
HRESULT hr = BluetoothGATTSetCharacteristicValue(bleDeviceContext.getBleServiceHandle(), pGattCharacteristic, gatt_value, NULL, BLUETOOTH_GATT_FLAG_NONE);
free(gatt_value);
if (HRESULT_FROM_WIN32(S_OK) != hr)
{
stringstream msg;
msg << "Unable to write the characeristic value. Reason: ["
<< Util.getLastError(hr) << "]";
throw BleException(msg.str());
}
}
else
{
throw BleException("characteristic is not writable");
}}
The call to bleDeviceContext.getBleServiceHandle() returns the open handle to the device info service.
pGattCharacteristics is the pointer to the characteristic to write too. It was opened with a call to BluetoothGATTGetCharacteristics.
I have tried different combinations of the flags with no difference in the return code.
I have also tried using the handle to the device not to the service. In that case I get an ERROR_INVALID_FUNCTION return error code.
I would appreciate any pointers as to what I am doing wrong or what other possible options I could try.
1- You have to use the Service Handle, right.
2- I don't know how you designed your class, and then how you allocate some memory for the Characteristic's Value itself.
What I do (to be sure to have enough and proper memory for Value's data):
a) at init of the Value object, call ::BluetoothGATTGetCharacteristicValue twice, to get the needed size and then actually allocate some internal memory for it.
b) when using it, set the inner memory to what it may , then call ::BluetoothGATTSetCharacteristicValue
hr=::BluetoothGATTSetCharacteristicValue(
handle,
(PBTH_LE_GATT_CHARACTERISTIC)Characteristic,
value,//actually a (PBTH_LE_GATT_CHARACTERISTIC_VALUE) to allocated memory
0,//BTH_LE_GATT_RELIABLE_WRITE_CONTEXT ReliableWriteContext,
BLUETOOTH_GATT_FLAG_NONE)
So a few things:
typedef struct _BTH_LE_GATT_CHARACTERISTIC_VALUE {
ULONG DataSize;
UCHAR Data[];
} BTH_LE_GATT_CHARACTERISTIC_VALUE, *PBTH_LE_GATT_CHARACTERISTIC_VALUE;
is how the data structure used in the parameter CharacteristicValue is defined. Please note that Data is NOT an allocated array, but rather a pointer. So accessing Data[0] is undefined behavior and could be accessing anywhere in memory. Rather you need to do gatt_value.Data = &data; setting the pointer to the address of the input parameter.
Secondly the documentation is quite clear as to why you might get ERROR_INVALID_FUNCTION; if another reliable write is already pending then this write will fail. You should consider retry logic in that case.
As for E_INVALIDARG I'd assume it's related to the undefined behavior but I'd check after fixing the other issues previously mentioned.

How does one properly deserialize a byte array back into an object in C++?

My team has been having this issue for a few weeks now, and we're a bit stumped. Kindness and knowledge would be gracefully received!
Working with an embedded system, we are attempting to serialize an object, send it through a Linux socket, receive it in another process, and deserialize it back into the original object. We have the following deserialization function:
/*! Takes a byte array and populates the object's data members */
std::shared_ptr<Foo> Foo::unmarshal(uint8_t *serialized, uint32_t size)
{
auto msg = reinterpret_cast<Foo *>(serialized);
return std::shared_ptr<ChildOfFoo>(
reinterpret_cast<ChildOfFoo *>(serialized));
}
The object is successfully deserialzed and can be read from. However, when the destructor for the returned std::shared_ptr<Foo> is called, the program segfaults. Valgrind gives the following output:
==1664== Process terminating with default action of signal 11 (SIGSEGV)
==1664== Bad permissions for mapped region at address 0xFFFF603800003C88
==1664== at 0xFFFF603800003C88: ???
==1664== by 0x42C7C3: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:149)
==1664== by 0x42BC00: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:666)
==1664== by 0x435999: std::__shared_ptr<ChildOfFoo, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:914)
==1664== by 0x4359B3: std::shared_ptr<ChildOfFoo>::~shared_ptr() (shared_ptr.h:93)
We're open to any suggestions at all! Thank you for your time :)
In general, this won't work:
auto msg = reinterpret_cast<Foo *>(serialized);
You can't just take an arbitrary array of bytes and pretend it's a valid C++ object (even if reinterpret_cast<> allows you to compile code that attempts to do so). For one thing, any C++ object that contains at least one virtual method will contain a vtable pointer, which points to the virtual-methods table for that object's class, and is used whenever a virtual method is called. But if you serialize that pointer on computer A, then send it across the network and deserialize and then try to use the reconstituted object on computer B, you'll invoke undefined behavior because there is no guarantee that that class's vtable will exist at the same memory location on computer B that it did on computer A. Also, any class that does any kind of dynamic memory allocation (e.g. any string class or container class) will contain pointers to other objects that it allocated, and that will lead you to the same invalid-pointer problem.
But let's say you've limited your serializations to only POD (plain old Data) objects that contain no pointers. Will it work then? The answer is: possibly, in very specific cases, but it will be very fragile. The reason for that is that the compiler is free to lay out the class's member variables in memory in different ways, and it will insert padding differently on different hardware (or even with different optimization settings, sometimes), leading to a situation where the bytes that represent a particular Foo object on computer A are different from the bytes that would represent that same object on computer B. On top of that you may have to to worry about different word-lengths on different computers (e.g. long is 32-bit on some architectures and 64-bit on others), and different endian-ness (e.g. Intel CPUs represent values in little-endian form while PowerPC CPUs typically represent them in big-endian). Any one of these differences will cause your receiving computer to misinterpret the bytes it received and thereby corrupt your data badly.
So the remaining part of the question is, what is the proper way to serialize/deserialize a C++ object? And the answer is: you have to do it the hard way, by writing a routine for each class that does the serialization member-variable by member-variable, taking the class's particular semantics into account. For example, here are some methods that you might have your serializable classes define:
// Serialize this object's state out into (buffer)
// (buffer) must point to at least FlattenedSize() bytes of writeable space
void Flatten(uint8_t *buffer) const;
// Return the number of bytes this object will require to serialize
size_t FlattenedSize() const;
// Set this object's state from the bytes in (buffer)
// Returns true on success, or false on failure
bool Unflatten(const uint8_t *buffer, size_t size);
... and here's an example of a simple x/y point class that implements the methods:
class Point
{
public:
Point() : m_x(0), m_y(0) {/* empty */}
Point(int32_t x, int32_t y) : m_x(x), m_y(y) {/* empty */}
void Flatten(uint8_t *buffer) const
{
const int32_t beX = htonl(m_x);
memcpy(buffer, &beX, sizeof(beX));
buffer += sizeof(beX);
const int32_t beY = htonl(m_y);
memcpy(buffer, &beY, sizeof(beY));
}
size_t FlattenedSize() const {return sizeof(m_x) + sizeof(m_y);}
bool Unflatten(const uint8_t *buffer, size_t size)
{
if (size < FlattenedSize()) return false;
int32_t beX;
memcpy(&beX, buffer, sizeof(beX);
m_x = ntohl(beX);
buffer += sizeof(beX);
int32_t beY;
memcpy(&beY, buffer, sizeof(beY));
m_y = ntohl(beY);
return true;
}
int32_t m_x;
int32_t m_y;
};
... then your unmarshal function could look like this (note I've made it templated so that it will work for any class that implements the above methods):
/*! Takes a byte array and populates the object's data members */
template<class T> std::shared_ptr<T> unmarshal(const uint8_t *serialized, size_t size)
{
auto sp = std::make_shared<T>();
if (sp->Unflatten(serialized, size) == true) return sp;
// Oops, Unflatten() failed! handle the error somehow here
[...]
}
If this seems like a lot of work compared to just grabbing the raw memory bytes of your class object and sending them verbatim across the wire, you're right -- it is. But this is what you have to do if you want the serialization to work reliably and not break every time you upgrade your compiler, or change your optimization flags, or want to communicate between computers with different CPU architectures. If you'd rather not do this sort of thing by hand, there are pre-packaged libraries to assist by with (partially) automating the process, such as Google's Protocol Buffers library, or even good old XML.
The segfault during destruction occurs because you are creating a shared_ptr object by reinterpret casting a pointer to a uint8_t. During the destruction of the returned shared_ptr object the uint8_t will be released as if it is a pointer to a Foo* and hence the segfault occurs.
Update your unmarshal as given below and try it.
std::shared_ptr<Foo> Foo::unmarshal(uint8_t *&serialized, uint32_t size)
{
ChildOfFoo* ptrChildOfFoo = new ChildOfFoo();
memcpy(ptrChildOfFoo, serialized, size);
return std::shared_ptr<ChildOfFoo>(ptrChildOfFoo);
}
Here the ownership of the the ChildOfFoo object created by the statement ChildOfFoo* ptrChildOfFoo = new ChildOfFoo(); is transferred to the shared_ptr object returned by the unmarshal function. So when the returned shared_ptr object's destructor is called, it will be properly de-allocated and no segfault occurs.
Hope this help!

Proper memory control in gSoap

I'm currently developing application using gSoap library and has some misunderstanding of proper usage library. I has generated proxy object (-j flag) which wrapped my own classes, as you can see below. Application must work 24/7 and connect simultaneously to many cameras (~50 cameras), so after every request i need to clear all temporary data. Is it normal usage to call soap_destroy() and soap_end() after every request? Because it seem's overkill to do it after each request. May be exists another option of proper usage?
DeviceBindingProxy::destroy()
{
soap_destroy(this->soap);
soap_end(this->soap);
}
class OnvifDeviceService : public Domain::IDeviceService
{
public:
OnvifDeviceService()
: m_deviceProxy(new DeviceBindingProxy)
{
soap_register_plugin(m_deviceProxy->soap, soap_wsse);
}
int OnvifDeviceService::getDeviceInformation(const Access::Domain::Endpoint &endpoint, Domain::DeviceInformation *information)
{
_tds__GetDeviceInformation tds__GetDeviceInformation;
_tds__GetDeviceInformationResponse tds__GetDeviceInformationResponse;
setupUserPasswordToProxy(endpoint);
m_deviceProxy->soap_endpoint = endpoint.endpoint().c_str();
int result = m_deviceProxy->GetDeviceInformation(&tds__GetDeviceInformation, tds__GetDeviceInformationResponse);
m_deviceProxy->soap_endpoint = NULL;
if (result != SOAP_OK) {
Common::Infrastructure::printSoapError("Fail to get device information.", m_deviceProxy->soap);
m_deviceProxy->destroy();
return -1;
}
*information = Domain::DeviceInformation(tds__GetDeviceInformationResponse.Manufacturer,
tds__GetDeviceInformationResponse.Model,
tds__GetDeviceInformationResponse.FirmwareVersion);
m_deviceProxy->destroy();
return 0;
}
}
To ensure proper allocation and deallocation of managed data:
soap_destroy(soap);
soap_end(soap);
You want to do this often to avoid memory to fill up with old data. These calls remove all deserialized data and data you allocated with the soap_new_X() and soap_malloc() functions.
All managed allocations are deleted with soap_destroy() followed by soap_end(). After that, you can start allocating again and delete again, etc.
To allocate managed data:
SomeClass *obj = soap_new_SomeClass(soap);
You can use soap_malloc for raw managed allocation, or to allocate an array of pointers, or a C string:
const char *s = soap_malloc(soap, 100);
Remember that malloc is not safe in C++. Better is to allocate std::string with:
std::string *s = soap_new_std__string(soap);
Arrays can be allocated with the second parameter, e.g. an array of 10 strings:
std::string *s = soap_new_std__string(soap, 10);
If you want to preserve data that otherwise gets deleted with these calls, use:
soap_unlink(soap, obj);
Now obj can be removed later with delete obj. But be aware that all pointer members in obj that point to managed data have become invalid after soap_destroy() and soap_end(). So you may have to invoke soap_unlink() on these members or risk dangling pointers.
A new cool feature of gSOAP is to generate deep copy and delete function for any data structures automatically, which saves a HUGE amount of coding time:
SomeClass *otherobj = soap_dup_SomeClass(NULL, obj);
This duplicates obj to unmanaged heap space. This is a deep copy that checks for cycles in the object graph and removes such cycles to avoid deletion issues. You can also duplicate the whole (cyclic) managed object to another context by using soap instead of NULL for the first argument of soap_dup_SomeClass.
To deep delete:
soap_del_SomeClass(obj);
This deletes obj but also the data pointed to by its members, and so on.
To use the soap_dup_X and soap_del_X functions use soapcpp2 with options -Ec and -Ed, respectively.
In principle, static and stack-allocated data can be serialized just as well. But consider using the managed heap instead.
See https://www.genivia.com/doc/databinding/html/index.html#memory2 for more details and examples.
Hope this helps.
The way memory has to be handled is described in Section 9.3 of the GSoap documentation.

Lua RPC and userdata

I'm currently using luarpc in my program to make interprocess communication. The problem now is that due to my tolua++ binding which stores class instances as userdata im unable to use any of those functions cause luarpc cant handle userdata. My question now is if would be possible (and how) to transmit userdata if you know that its always only a pointer (4 Bytes) and has a metatable attached for call and indexing operations.
You can't.
It doesn't matter if the userdata is a pointer or an object. The reason you can't arbitrarily RPC through them is because the data is not stored in Lua. And therefore LuaRPC cannot transmit it properly.
A pointer into your address space is absolutely worthless for some other process; even moreso if it's running on another machine. You have to actually transmit the data itself to make the RPC work. LuaRPC can do this transmission, but only for data that it can understand. And the only data it understands is data stored in Lua.
Ok i got it working now. What i did is for userdata args/returns i send the actual ptr + metatable name(typename) to the client. the client then attaches a metatable with an __index method that creates a new helper with the typename and appends a helper with the field you want to access. when you then call or read a field from that userdata the client sends the data for calling a field of the typetable and the userdata to the server.
ReadVariable:
lua_pushlightuserdata(L,msg.read<void*>());
#ifndef RPC_SERVER
luaL_getmetatable(L,"rpc.userdata");
int len = msg.read<int>();
char* s = new char[len];
msg.read((uint8*)s,len);
s[len] = '\0';
lua_pushlstring(L,s,len);
lua_setfield(L,-2,"__name");
lua_pushlightuserdata(L,TlsGetValue(transporttls));
lua_setfield(L,-2,"__transport");
lua_setmetatable(L,-2);
#endif
Write Variable:
else
{
msg.append<RPCType>(RPC_USERDATA);
msg.append<void*>(lua_touserdata(L,idx));
#ifdef RPC_SERVER
lua_getmetatable(L,idx);
lua_rawget(L,LUA_REGISTRYINDEX);
const char* s = lua_tostring(L,-1);
int len = lua_strlen(L,-1);
msg.append<int>(len);
msg.append(s,len);
#endif
lua_settop(L,stack_at_start);
}
userdata indexing:
checkNumArgs(L,2);
ASSERT(lua_isuserdata(L,1) && isMetatableType(L,1,"rpc.userdata"));
if(lua_type(L,2) != LUA_TSTRING)
return luaL_error( L, "can't index a handle with a non-string" );
const char* s = lua_tostring(L,2);
if(strlen(s) > MAX_PATH - 1)
return luaL_error(L,"string to long");
int stack = lua_gettop(L);
lua_getmetatable(L,1);
lua_getfield(L,-1,"__name");
const char* name = lua_tostring(L,-1);
if(strlen(name) > MAX_PATH - 1)
return luaL_error(L,"string to long");
lua_pop(L,1); // remove name
lua_getfield(L,-1,"__transport");
Transport* t = reinterpret_cast<Transport*>(lua_touserdata(L,-1));
lua_pop(L,1);
Helper* h = Helper::create(L,t,name);
Helper::append(L,h,s);
return 1;
well i more or less rewrote the complete rpc library to work with named pipes and windows but i think the code should give anyone enough information to implement it.
this allows code like:
local remote = rpc.remoteobj:getinstance()
remote:dosmthn()
on the clientside. it currently doesnt allow to add new fields but well this is all i need for now :D