How to cast string to uint32 luajit ffi - casting

Consider that str is a binary string which contains an unsigned int 32 at position 13.
I tried this:
local value = ffi.cast("uint32_t", ffi.new("char[4]", str:sub(13,16)))
However, the data returned is a "cdata" of type unsigned int and I don't now how to get the actual value (the Int)

Indexing converts cdata array into Lua number
local value = ffi.cast("uint32_t*", ffi.new("const char*", str:sub(13,16)))[0]

In general I agree with Egor Skriptunoffs answer. For a more generalized aproach (and maybe soemwhat overkill for this particular case) one could use a union type
local ffi = require 'ffi'
local union_type = ffi.typeof [[
union {
char bytes[4];
uint32_t integer;
}
]]
local union = union_type { bytes = 'abcd' }
print(string.format('0x%x', union.integer))
note that you need to worry about endianness here; you can confirm your systems endianness with ffi.abi('le') or ffi.abi('be'). If you're getting your string from somewhere else (like over the network), its endianness is most likely documented somewhere.
Suppose you want to interpret the string fromt he above example (abcd) as big endian; then you could do this
local union do
if ffi.abi('le') then
union = union_type { bytes = ('abcd'):reverse() }
else
union = union_type { bytes = 'abcd' }
end
end
If the system is little endian, reverse the string. Otherwise leave it as is.

Related

memcpy with initialized variable and negative numbers with cast

I have
QByteArray bytes // Fullfilled earlier
char id_c = bytes[7];
int _id;
_id = 0; // If I comment this result would be different
memcpy(&_id, &id_c, 1);
int result = _id;
I have _id variable and if I comment "_id=0" result variable result would be different with negative number. Why? Why initializing _id with 0 would be different?!
How can I do this alternatively with same result as using "_id=0" but without memcpy and unwanted castings?
This is not my code. I am interested how to get same result correctly without stupid castings.
Correct.
Because this statement:
memcpy(&_id, &id_c, 1);
Is only copying a single byte from &id_c into an address representing a 4-byte integer, &_id. Only the first byte of memory occupied by _id gets anything copied into it. Without the zero init of _id first, the remaining three bytes of that value are left undefined (presumably random garbage values off the stack).
What's wrong with an "unwanted casting"? This is just as fine and the compiler generates the most efficient code.
QByteArray bytes // Fullfilled earlier
int _id = (int)(bytes[7]);
int result = _id;
If you want sign extended result of the unsigned byte copied into _id, then this:
int _id = (signed char)(bytes[7]);
_id = 0 is called assigning 0 value to the variable _id, if you comment that then we cannot be sure what is stored in that _id , and you are updating only one byte out of that, as it is of type int it is more than one byte in size.
You might try these net/host byte order conversions:
on linux
on windows
the only difference is the header file to use; You can use preprocessor tricks to determine the platform and choose the proper header if cross-platform programming is intended. A better approach is to use the C++20 feature std::endian. But you need to handle the conversion yourself:
#include <bit>
#include <climits>
int int_cvt(int x){
if constexpr (endian::native==endian::big)
return x;
y=0;
while(x){
unsigned char c=x;
x>>=std::CHAR_BIT;
y<<=std::CHAR_BIT;
y+=c;
};
return y;
};
cheers,
FM.

c++ convert memory into data structure

When debugging an application I have found in memory a structure that I am 100% certain only consists of 4 strings. Though I am not quite sure how I would convert it to a data structure so I can use the structures pointer address to access values. For example here is what the data struct looks like in memory (as an example lets say it is CONSISTENTLY located at the memory address 0x123456)
The data structureconsists of 4 separate strings
string 1 = ad
string 2 = dgdhkkkkkkhkk
string 3 = ggghhjk
string 4 = dgcfoh
And I have tried creating a data struct like
struct reversedConnectionDat_t
{
char * data1;
char * data2;
char * data3;
char * data4;
}
and this is how I tried accessing the data
reversedConnectionDat_t * storeDat = (reversedConnectionDat_t*)0x123456;
print(storeDat->data3);
But it does not seem to work. Am I not reading the strings from memory properly?
(Oh and the strings will sometimes change from what I posted in the example code posted above, i.e sometimes string 1 will be 7 in length and string 3 will only be 2 in length etc...)
You have a pointer to a structure of pointers so even if you point the structure to the correct memory address, you still have uninitialized pointers inside the structure. You need to provide them with actual memory. I would try setting up your structure like this ...
struct reversedConnectionDat_t
{
char data1 [3];
char data2 [50];
char data3 [50];
char data4 [50];
}
BTW, I didn't count the spaces. I just kind of guessed at it but you get the idea.
I think you've mis-identified that data structure. I suspect that what you have is three independent buffers, each of which can hold one or more null-terminated strings.
The first structure is 68 bytes long and contains "ad\0dgdhkkkkkkhkk\0" (followed by enough \0 to fill the buffer.
It's possible that this buffer is really only 64 bytes long, and that the four bytes after it are used for some other data element.
The second buffer looks to be 64 bytes long, containing a single string and padded with \0 characters to fill out the 64 bytes.
It's impossible to say how long the third buffer is. All we know is that it's long enough to hold the string "dgcfoh\0". I'd guess that the buffer is 64 bytes long, but be willing to revise that opinion if I get more data.
I think the structure you want is:
struct s
{
char data1[68]; // buffer holds one or more null-terminated strings
char data2[64];
char data3[64].
}
Based on the scant information you've given us, that's what I'd start with. Then you need a way to parse a buffer of null-terminated strings. That is, get the two individual strings from the first buffer. That's a pretty easy bit of C code.
I couldn't understand what's wrong with your code except for your magicnumber:0x123456, casting which might not suit your structure. Are you sure your magic-number results in data compatible to the struct defined by you? Like, if you'll try to access storeDat->data3, it'll definitely be leading to seg-fault except you do something as follows or you are very lucky.
struct R{
char *a;
char *b;
};
int main(void)
{
struct R *r1 = (struct R*) malloc(sizeof(struct R));
r1->a = "12333"; //Pointing to a string literal
r1->b = "12331"; //Pointing to a string literal
int address = (int)&r1;
struct R *r2 = (struct R*) address;
std::cout<<r2->b;
return 0;
}
P.S. - I'm not a good programmer. But was just curious to answer, as I thought it might be of some help. Sorry, if I couldn't understand your problem properly.
You are using pointers(char*) and the structure size of your structure is the size of the 4 pointers. If you want to get the strings you should use arrays(char[]) with fixed size.
This will only work if your string size are equal to the buffer size.
IMO the best way is to get the in a char array then find the null terminators
/0 and then configure your pointers to point to the start of each string(at the start and right after the first 3 null terminators).
char* pointerToMem = something; //your strings data
yourStruct.str1 = pointerToMem;
while(*pointerToMem != '\0')
{
pointerToMem++;
}
yourStruct.str2 = pointerToMem + 1;
This is how you can make the struct of pointers work. This code is not optimal and you should not use it as it but it shows how can you get the strings from the memory. To have a C string you only need the address of the first character and some null terminator at the end.

wchar_t* to short int conversion

One of the function in a 3rd party class return awchar_t* that holding a resource id (I don't know why it uses wchar_t* type ) I need to convert this pointer to short int
This method, using AND operator works for me. but it seems like not the correct way. is there any proper way to do this?
wchar_t* s;
short int b = (unsigned long)(s) & 0xFFFF;
wchar_t* s; // I assume this is what you meant
short int b = static_cast<short int>(reinterpret_cast<intptr_t>(s))
You could also replace short int b with auto b, and it will be deduced as short int from the type of the right-hand expression.
It returns the resource ID as a wchar_t* because that is the data type that Windows uses to carry resource identifiers. Resources can be identified by either numeric ID or by name. If numeric, the pointer itself contains the actual ID number encoded in its lower 16 bits. Otherwise it is a normal pointer to a null-terminated string elsewhere in memory. There is an IS_INTRESOURCE() macro to differentiate which is the actual case, eg:
wchar_t *s = ...;
if (IS_INTRESOURCE(s))
{
// s is a numeric ID...
WORD b = (WORD) s;
...
}
else
{
// s is a null-terminated name string
...
}
Did you mean in your code wchar_t *s;?
I'd do the conversion more explicit using
short int b = reinterpret_cast<short int>(s);
If it fits your application needs, I suggest using a data type with a fixed nr of bits, e.g. uint16_t. Using short int means you only know for sure your variable has at least 16 bits. An additional question: Why do you not use unsigned short int, instead of (signed) short int?
In general, knowing the exact nr of bits make things a little more predictable, and makes it easier to know exactly what happens when you cast or use bitmasks.

Python's struct.pack/unpack equivalence in C++

I used struct.pack in Python to transform a data into serialized byte stream.
>>> import struct
>>> struct.pack('i', 1234)
'\xd2\x04\x00\x00'
What is the equivalence in C++?
You'll probably be better off in the long run using a third party library (e.g. Google Protocol Buffers), but if you insist on rolling your own, the C++ version of your example might be something like this:
#include <stdint.h>
#include <string.h>
int32_t myValueToPack = 1234; // or whatever
uint8_t myByteArray[sizeof(myValueToPack)];
int32_t bigEndianValue = htonl(myValueToPack); // convert the value to big-endian for cross-platform compatibility
memcpy(&myByteArray[0], &bigEndianValue, sizeof(bigEndianValue));
// At this point, myByteArray contains the "packed" data in network-endian (aka big-endian) format
The corresponding 'unpack' code would look like this:
// Assume at this point we have the packed array myByteArray, from before
int32_t bigEndianValue;
memcpy(&bigEndianValue, &myByteArray[0], sizeof(bigEndianValue));
int32_t theUnpackedValue = ntohl(bigEndianValue);
In real life you'd probably be packing more than one value, which is easy enough to do (by making the array size larger and calling htonl() and memcpy() in a loop -- don't forget to increase memcpy()'s first argument as you go, so that your second value doesn't overwrite the first value's location in the array, and so on).
You'd also probably want to pack (aka serialize) different data types as well. uint8_t's (aka chars) and booleans are simple enough as no endian-handling is necesary for them -- you can just copy each of them into the array verbatim as a single byte. uint16_t's you can convert to big-endian via htons(), and convert back to native-endian via ntohs(). Floating point values are a bit tricky, since there is no built-in htonf(), but you can roll your own that will work on IEEE754-compliant machines:
uint32_t htonf(float f)
{
uint32_t x;
memcpy(&x, &f, sizeof(float));
return htonl(x);
}
.... and the corresponding ntohf() to unpack them:
float ntohf(uint32_t nf)
{
float x;
nf = ntohl(nf);
memcpy(&x, &nf, sizeof(float));
return x;
}
Lastly for strings you can just add the bytes of the string to the buffer (including the NUL terminator) via memcpy:
const char * s = "hello";
int slen = strlen(s);
memcpy(myByteArray, s, slen+1); // +1 for the NUL byte
There isn't one. C++ doesn't have built-in serialization.
You would have to write individual objects to a byte array/vector, and being careful about endianness (if you want your code to be portable).
https://github.com/karkason/cppystruct
#include "cppystruct.h"
// icmp_header can be any type that supports std::size and std::data and holds bytes
auto [type, code, checksum, p_id, sequence] = pystruct::unpack(PY_STRING("bbHHh"), icmp_header);
int leet = 1337;
auto runtimePacked = pystruct::pack(PY_STRING(">2i10s"), leet, 20, "String!");
// runtimePacked is an std::array filled with "\x00\x00\x059\x00\x00\x00\x10String!\x00\x00\x00"
// The format is "compiled" and has zero overhead in runtime
constexpr auto packed = pystruct::pack(PY_STRING("<2i10s"), 10, 20, "String!");
// packed is an std::array filled with "\x00\x01\x00\x00\x10\x00\x00\x00String!\x00\x00\x00"
You could check out Boost.Serialization, but I doubt you can get it to use the same format as Python's pack.
I was also looking for the same thing. Luckily I found https://github.com/mpapierski/struct
with a few additions you can add missing types into struct.hpp, I think it's the best so far.
To use it, just define you params like this
DEFINE_STRUCT(test,
((2, TYPE_UNSIGNED_INT))
((20, TYPE_CHAR))
((20, TYPE_CHAR))
)
The just call this function which will be generated at compilation
pack(unsigned int p1, unsigned int p2, const char * p3, const char * p4)
The number and type of parameters will depend on what you defined above.
The return type is a char* which contains your packed data.
There is also another unpack() function which you can use to read the buffer
You can use union to get different view into the same memory.
For example:
union Pack{
int i;
char c[sizeof(int)];
};
Pack p = {};
p.i = 1234;
std::string packed(p.c, sizeof(int)); // "\xd2\x04\x00\0"
As mentioned in the other answers, you have to notice the endianness.

How to read in specific sizes and store data of an unknown type in c++?

I'm trying to read data in from a binary file and then store in a data structure for later use. The issue is I don't want to have to identify exactly what type it is when I'm just reading it in and storing it. I just want to store the information regarding what type of data it is and how much data of this certain type there is (information easily obtained in the first couple bytes of this data)
But how can I read in just a certain amount of data, disregarding what type it is and still easily be able to cast (or something similar) that data into a readable form later?
My first idea would be to use characters, since all the data I will be looking at will be in byte units.
But if I did something like this:
ifstream fileStream;
fileStream.open("fileName.tiff", ios::binary);
//if I had to read in 4 bytes of data
char memory[4];
fileStream.read((char *)&memory, 4);
But how could I cast these 4 bytes if I later I wanted to read this and knew it was a double?
What's the best way to read in data of an unknown type but know size for later use?
fireStream.
I think a reinterpret_cast will give you what you need. If you have a char * to the bytes you can do the following:
double * x = reinterpret_cast<double *>(dataPtr);
Check out Type Casting on cplusplus.com for a more detailed description of reinterpret_cast.
You could copy it to the known data structure which makes life easier later on:
double x;
memcpy (&x,memory,sizeof(double));
or you could just refer to it as a cast value:
if (*((double*)(memory)) == 4.0) {
// blah blah blah
}
I believe a char* is the best way to read it in, since the size of a char is guaranteed to be 1 unit (not necessarily a byte, but all other data types are defined in terms of that unit, so that, if sizeof(double) == 27, you know that it will fit into a char[27]). So, if you have a known size, that's the easiest way to do it.
You could store the data in a class that provides functions to cast it to the possible result types, like this:
enum data_type {
TYPE_DOUBLE,
TYPE_INT
};
class data {
public:
data_type type;
size_t len;
char *buffer;
data(data_type a_type, char *a_buffer, size_t a_len)
: type(a_type), buffer(NULL), len(a_len) {
buffer = new char[a_len];
memcpy(buffer, a_buffer, a_len);
}
~data() {
delete[] buffer;
}
double as_double() {
assert(TYPE_DOUBLE == type);
assert(len >= sizeof(double));
return *reinterpret_cast<double*>(buffer);
}
int as_int() {...}
};
Later you would do something like this:
data d = ...;
switch (d.type) {
case TYPE_DOUBLE:
something(d.as_double());
break;
case TYPE_INT:
something_else(d.as_int());
break;
...
}
That's at least how I'm doing these kind of things :)
You can use structures and anonymous unions:
struct Variant
{
size_t size;
enum
{
TYPE_DOUBLE,
TYPE_INT,
} type;
union
{
char raw[0]; // Copy to here. *
double asDouble;
int asInt;
};
};
Optional: Create a table of type => size, so you can find the size given the type at runtime. This is only needed when reading.
static unsigned char typeSizes[2] =
{
sizeof(double),
sizeof(int),
};
Usage:
Variant v;
v.type = Variant::TYPE_DOUBLE;
v.size = Variant::typeSizes[v.type];
fileStream.read(v.raw, v.size);
printf("%f\n", v.asDouble);
You will probably receive warnings about type punning. Read: Doing this is not portable and against the standard! Then again, so is reinterpret_cast, C-style casting, etc.
Note: First edit, I did not read your original question. I only had the union, not the size or type part.
*This is a neat trick I learned a long time ago. Basically, raw doesn't take up any bytes (thus doesn't increase the size of the union), but provides a pointer to a position in the union (in this case, the beginning). It's very useful when describing file structures:
struct Bitmap
{
// Header stuff.
uint32_t dataSize;
RGBPixel data[0];
};
Then you can just fread the data into a Bitmap. =]
Be careful. In most environments I'm aware of, doubles are 8 bytes, not 4; reinterpret_casting memory to a double will result in junk, based on what the four bytes following memory contain. If you want a 32-bit floating point value, you probably want a float (though I should note that the C++ standard does not require that float and double be represented in any way and in particular need not be IEEE-754 compliant).
Also, your code will not be portable unless you take endianness into account in your code. I see that the TIFF format has an endianness marker in its first two bytes that should tell you whether you're reading in big-endian or little-endian values.
So I would write a function with the following prototype:
template<typename VALUE_TYPE> VALUE_TYPE convert(char* input);
If you want full portability, specialize the template and have it actually interpret the bits in input. Otherwise, you can probably get away with e.g.
template<VALUE_TYPE> VALUE_TYPE convert(char* input) {
return reinterpret_cast<double>(input);
}