I have a bitfield struct called DescriptorByte aligned to 1 byte and a struct for holding lots of DescriptorByte like this:
struct DescriptorByte
{
unsigned char IsImmedCalc : 1;
unsigned char IsPrefix : 1;
unsigned char NoMemOp : 1;
unsigned char Size : 5;
};
struct OpcodeList
{
DescriptorByte ADD_EB_GB;
DescriptorByte ADD_EV_GV;
DescriptorByte ADD_GB_EB;
DescriptorByte ADD_GV_EV;
DescriptorByte ADD_8_OI = { TRUE, FALSE, TRUE, OPBASE + IMMED_8 };
DescriptorByte ADD_32_OI = { TRUE, FALSE, TRUE, OPBASE + IMMED_32 };
DescriptorByte PUSH_ES = { TRUE, FALSE, TRUE, OPBASE };
DescriptorByte POP_ES = { TRUE, FALSE, TRUE, OPBASE };
DescriptorByte OR_EB_GB;
//ETC
};
What i want to do is based on a number (byte) iterating through the struct like this:
OpcodeList opcodelist;
BYTE count = 5;
DescriptorByte = opcodelist + count;
Since the bitfield struct is aligned to 1 byte i should get the 5th element of the OpcodeList table, but i dont know how to make this on C++ i only know how to make it in ASM :/
LEA EAX, OPCODELIST
MOV ECX, COUNT
MOV EAX, DWORD PTR [EAX+ECX];
AND EAX, 0FF;
Thanks.
You are trying to access this "list" multiple ways: By name (like ADD_EB_GB), and by index. You can't do this with a struct.
I would suggest using a std::vector<DescriptorByte> (or std::array). This way you can access by index. And if you still need to access by name, make your names constant indices into this vector.
A bit like this:
typedef std::array<DescriptorByte, 9> OpcodeList;
enum OpcodeIndex
{
ADD_EB_GB = 0,
ADD_EV_GV,
ADD_GB_EB,
ADD_GV_EV,
ADD_8_OI,
ADD_32_OI,
PUSH_ES,
POP_ES,
OR_EB_GB
}
...
// instantiate & initialize an opcode list.
OpcodeList opcodeList =
{
{ ... },// ADD_EB_GB,
{ ... },// ADD_EV_GV,
{ ... },// ADD_GB_EB,
{ ... },// ADD_GV_EV,
{ TRUE, FALSE, TRUE, OPBASE + IMMED_8 },// ADD_8_OI,
{ TRUE, FALSE, TRUE, OPBASE + IMMED_32 },// ADD_32_OI,
{ TRUE, FALSE, TRUE, OPBASE },// PUSH_ES,
{ TRUE, FALSE, TRUE, OPBASE },// POP_ES,
{ ... }// OR_EB_GB
};
// to access by symbol:
DescriptorByte someOpcode = opcodeList[ADD_GV_EV];
// to access by symbol + offset:
DescriptorByte anotherOpcode = opcodeList[ADD_GV_EV + 5];
Note about performance. At the machine code level, a struct and a statically sized array will perform exactly the same. They are ptr+offset fields. This is exactly what std::array will compile into. std::vector will have more overhead because it supports varying sizes, so only use it if you will have OpcodeList objects that have different sizes.
Regarding initialization, it's true this is more verbose / uglier than what you have, but it's a worthy effort to keep things manageable.
You can also do this using your struct:
OpcodeList opcodelist;
BYTE count = 5;
void* p1 = &opcodelist;
DescriptorByte* p2 = (DescriptorByte*)p1;
DescriptorByte = *(p2 + count);
Related
I've looked through the Apache Arrow docs but I can't find a clean way of converting equal length std::vectors into an arrow::Array and then an arrow::Table. Here's the code in question.
#include <vector>
#include <arrow/array.h>
#include <arrow/table.h>
const std::vector<double> a = {1,2,3,4,5};
const std::vector<bool> b = {true, false, false, true, true};
auto schema = arrow::schema({arrow::field("a", arrow::float64()), arrow::field("b", arrow::boolean())});
std::shared_ptr<arrow::Array> array_a(N, arrow::float64());
std::shared_ptr<arrow::Array> array_b(N, arrow::boolean());
// how to store the contents of the vectors a and b into array_a and array_b, resp.
// ...?
std::shared_ptr<arrow::Table> table = arrow::Table::Make(schema, {array_a, array_b});
As mentioned in the documentation,you can use a builder for this:
const std::vector<double> a = {1,2,3,4,5};
const std::vector<bool> b = {true, false, false, true, true};
auto schema = arrow::schema({arrow::field("a", arrow::float64()), arrow::field("b", arrow::boolean())});
arrow::DoubleBuilder aBuilder;
PARQUET_THROW_NOT_OK(aBuilder.AppendValues(a));
arrow::BooleanBuilder bBuilder;
PARQUET_THROW_NOT_OK(bBuilder.AppendValues(b));
std::shared_ptr<arrow::Array> array_a, array_b;
ARROW_ASSIGN_OR_RAISE(array_a, aBuilder.Finish());
ARROW_ASSIGN_OR_RAISE(array_b, bBuilder.Finish());
std::shared_ptr<arrow::Table> table = arrow::Table::Make(schema, {array_a, array_b});
Templates generally are inline - you have to supply the definition with the declaration.
Global (static) data requires that there be exactly one definition of the data (but it can be declared multiple times).
So, for a class with static data, one normally declares the static in the class definition (header), and the storage as static in the implementation file (.cpp).
But what does one do for a template that needs to refer to static / global data?
Here's a bit of code to give you something somewhat concrete to consider:
// we represent in a formal manner anything that can be encoded in a MSVS format specification
// A format specification, which consists of optional and required fields, has the following form:
// %[flags][width][.precision][{h | l | ll | w | I | I32 | I64}] type
// based on https://msdn.microsoft.com/en-us/library/56e442dc.aspx
struct FormatSpec
{
enum Size {
normal,
h,
l,
ll,
w,
I,
I32,
I64
};
enum Type {
invalid,
character,
signed_integer,
unsigned_integer,
unsigned_octal,
unsigned_hex,
floating_point,
expontential_floating_point,
engineering_floating_point,
hex_double_floating_point,
pointer,
string,
z_string
};
unsigned fLeftAlign : 1;
unsigned fAlwaysSigned : 1;
unsigned fLeadingZeros : 1;
unsigned fBlankPadding : 1;
unsigned fBasePrefix : 1;
unsigned width;
unsigned precision;
Size size_;
Type type_;
};
struct FormatSpecTypeDatum
{
FormatSpec::Type id; // id
const TCHAR * symbol; // text symbol
};
FormatSpecTypeDatum kTypeSpecs[] =
{
{ FormatSpec::character, _T("c") },
{ FormatSpec::character, _T("C") },
{ FormatSpec::signed_integer, _T("d") },
{ FormatSpec::signed_integer, _T("i") },
{ FormatSpec::unsigned_octal, _T("o") },
{ FormatSpec::unsigned_integer, _T("u") },
{ FormatSpec::unsigned_hex, _T("x") },
{ FormatSpec::unsigned_hex, _T("X") },
{ FormatSpec::expontential_floating_point, _T("e") },
{ FormatSpec::expontential_floating_point, _T("E") },
{ FormatSpec::floating_point, _T("f") },
{ FormatSpec::floating_point, _T("F") },
{ FormatSpec::engineering_floating_point, _T("g") },
{ FormatSpec::engineering_floating_point, _T("G") },
{ FormatSpec::hex_double_floating_point, _T("a") },
{ FormatSpec::hex_double_floating_point, _T("A") },
{ FormatSpec::pointer, _T("p") },
{ FormatSpec::string, _T("s") },
{ FormatSpec::string, _T("S") },
{ FormatSpec::z_string, _T("Z") },
};
template <typename ctype>
bool DecodeFormatSpecType(const ctype * & format, FormatSpec & spec)
{
for (unsigned i = 0; i < countof(kTypeSpecs); ++i)
if (format[0] == kTypeSpecs[i].symbol[0])
{
spec.type_ = kTypeSpecs[i].id;
++format;
return true;
}
return false;
}
It's relatively simple - a symbolic ID to character representation lookup table.
I want to be able to use DecodeFormatSpecType<>() for char, unsigned char, wchar_t, etc.
I could remove the template from DecodeFormatSpecType() and just supply overloaded interfaces for various character types.
The main thing is that the data isn't really changing - an unsigned char 'c' and a wchar_t 'c' and a legacy char 'c' have the exact same value, regardless of the character's storage size (for core ASCII characters this is true, although there are undoubtedly some other encodings such as EDBIC where this isn't true, that's not the problem I'm attempting to solve here).
I just want to understand "how do I construct my C++ libraries so that I can access global data defined in exactly one location - which is stored as an array - and I want the accessing templated code to know the length of the global data, just like I can with normal non-templated code have a global symbol table like what I've shown in my example code by having the table and the implementation that needs its size both exist in the appropriate .cpp file"
Does that make sense?
global data + functions that need to know the exact definition but also can be presented (with an interface) this generic (to a valid domain).
A function template can use global functions and global data without any problem.
If you want to encapsulate the definition of kTypeSpecs and not have it defined in a header file, you can use couple of functions to provide access to the data.
size_t getNumberOfTypeSpecs();
// Provide read only access to the data.
FormatSpecTypeDatum const* getTypeSpecs();
and then implement DecodeFormatSpecType as
template <typename ctype>
bool DecodeFormatSpecType(const ctype * & format, FormatSpec & spec)
{
size_t num = getNumberOfTypeSpecs();
FormatSpecTypeDatum const* typeSpecs = getTypeSpecs();
for (unsigned i = 0; i < num; ++i)
if (format[0] == typeSpecs[i].symbol[0])
{
spec.type_ = typeSpecs[i].id;
++format;
return true;
}
return false;
}
The functions getNumberOfTypeSpecs and getTypeSpecs can be implemented in a .cpp file as:
// Make the data file scoped global variable.
static FormatSpecTypeDatum kTypeSpecs[] =
{
{ FormatSpec::character, _T("c") },
{ FormatSpec::character, _T("C") },
{ FormatSpec::signed_integer, _T("d") },
{ FormatSpec::signed_integer, _T("i") },
{ FormatSpec::unsigned_octal, _T("o") },
{ FormatSpec::unsigned_integer, _T("u") },
{ FormatSpec::unsigned_hex, _T("x") },
{ FormatSpec::unsigned_hex, _T("X") },
{ FormatSpec::expontential_floating_point, _T("e") },
{ FormatSpec::expontential_floating_point, _T("E") },
{ FormatSpec::floating_point, _T("f") },
{ FormatSpec::floating_point, _T("F") },
{ FormatSpec::engineering_floating_point, _T("g") },
{ FormatSpec::engineering_floating_point, _T("G") },
{ FormatSpec::hex_double_floating_point, _T("a") },
{ FormatSpec::hex_double_floating_point, _T("A") },
{ FormatSpec::pointer, _T("p") },
{ FormatSpec::string, _T("s") },
{ FormatSpec::string, _T("S") },
{ FormatSpec::z_string, _T("Z") },
};
size_t getNumberOfTypeSpecs()
{
return sizeof(kTypeSpecs)/sizeof(kTypeSpecs[0]);
}
FormatSpecTypeDatum const* getTypeSpecs()
{
return kTypeSpecs;
}
Update, in response to comment by OP
Yes, you can. The following are perfectly valid:
size_t getNumberOfTypeSpecs()
{
static constexpr size_t num = sizeof(kTypeSpecs)/sizeof(kTypeSpecs[0]);
return num;
}
constexpr size_t getNumberOfTypeSpecs()
{
return sizeof(kTypeSpecs)/sizeof(kTypeSpecs[0]);
}
Why is there no direct assigment during declaration of a struct possible in C++?
I have the following C code that wont work with C++ compiler:
static const struct {
struct structtype1 header;
struct {
struct structtype2 intf;
struct structtype3 src;
} foo1, foo2;
} bar = {
.header = {
.byte1 = 0x23,
.length = 12,
.count1 = 4,
.count2 = 4,
},
.foo1= {
.intf = {
.data = 0x23,
.len = 2,
.ep = 3,
},
.src= {
.data = 0x21,
.len = 4,
.ep = 2,
},
},
.foo2= {
.intf = {
.data = 0x17,
.len = 11,
.ep = 2,
},
.src= {
.data = 0x20,
.len = 2,
.ep = 1,
},
},
};
you have two choices:
1.use constructors for that
this will define values for members but all structs will be the same
struct _pnt { int x,y,z; _pnt() { x=0; y=0; z=0; } };
_pnt p0,p1,p2; // all points are initialized to (0,0,0)
2.use definition as usual it still works in any C++ compiler I used
I think this is what you want
Do not forget that the order of members is the same as in declaration !!!
struct _pnt { int x,y,z; };
_pnt p0={0.0,0.0,0.0},
p1={1.0,0.0,0.0},
p2={0.0,1.0,0.0};
problem is you have to init all members not just some !!!
I have the following bitfield struct:
struct DescriptorByte
{
unsigned short IsImmedCalc : 1;
unsigned short IsPrefix : 1;
unsigned short NoMemOp : 1;
unsigned short Size : 5;
};
I want to create an table for holding many DescriptorByte struct, so i created this:
struct OpcodeList
{
DescriptorByte ADD_8_MO;
DescriptorByte ADD_32_MO;
DescriptorByte ADD_8_OM;
DescriptorByte ADD_32_OM;
DescriptorByte ADD_8_OI = { TRUE, FALSE, TRUE, 1 + 1 };
DescriptorByte ADD_32_OI = { TRUE, FALSE, TRUE, 1 + 4 };
DescriptorByte PUSH_ES = { TRUE, FALSE, TRUE, 1 };
};
So is this the same as having an struct with each member beign 1 byte long?. Also i want to be able to reference the initializator member like this:
DescriptorByte ADD_8_OI = { IsImmedCalc = true, Size = 1 };
but visual studio is not letting me. The idea behind all of this is having a table of DescriptorByte, is this the best approach? also what is the best initialization method? thanks.
"is this the same as having a struct with each member being 1 byte long?"
Your compiler might add padding if you do not use #pragma pack or something similar.
But there isn't any padding required in this specific case, so essentially the answer is yes.
Just change the unsigned short to unsigned char and each member will be 1 byte long.
Add '.' on the left side of each field:
DescriptorByte ADD_8_OI = { .IsImmedCalc = true, .Size = 1 };
Alternatively, just write the actual values in the correct order (missing ones will be set to 0):
DescriptorByte ADD_8_OI = { true, 1 };
EDIT: Posted this thinking it was a C# question, sorry! Leaving it here for others.
C# does not support bit-fields. However, you can still 'emulate' that behavior using a single member variable of the appropriate size along with various getter properties.
In your example, you want to use an unsigned 8-bit integer value (byte) and encapsulate those bitfields. Have no fear, you can still use a struct to do all this to make marshaling and interop easier.
So let's take your DescriptorByte and recreate what you are looking to do:
struct DescriptorByte
{
static readonly byte IsImmedCalcFlag = 0x80; // 1000 0000
static readonly byte IsPrefixFlag = 0x40; // 0100 0000
static readonly byte NoMemOpFlag = 0x20; // 0010 0000
static readonly byte FlagsBitMask = 0xE0; // 1110 0000
static readonly byte SizeBitMask = 0x1F; // 0001 1111
byte field;
public bool IsImmedCalc
{
get { return (field & IsImmedCalcFlag) > 0; }
set
{
if (value)
field = (byte)(field | IsImmedCalcFlag); // Set the bit
else
field = (byte)(field & ~IsImmedCalcFlag); // Clear the bit
}
}
public bool IsPrefix
{
get { return (field & IsPrefixFlag) > 0; }
set
{
if (value)
field = (byte)(field | IsPrefixFlag); // Set the bit
else
field = (byte)(field & ~IsPrefixFlag); // Clear the bit
}
}
public bool NoMemOp
{
get { return (field & NoMemOpFlag) > 0; }
set
{
if (value)
field = (byte)(field | NoMemOpFlag); // Set the bit
else
field = (byte)(field & ~NoMemOpFlag); // Clear the bit
}
}
public byte Size
{
get { return (byte)(field & SizeBitMask); }
set { field = (byte)((field & FlagsBitMask) | (value & SizeBitMask)); }
}
}
In C, you sometimes see something like:
struct foobar
{
int size;
int data[1];
};
where the data member doesn't really have just one element; rather it's meant to be variable length.
If you do something like that in D, is it going to let you, for example, read myfoobar.data[4]?
I know D has variable length arrays, e.g. int[] myvarlenintarray;, but what if you're trying to interface with some code that already puts out a data structure in memory like the one above, and possibly much more complex than that? Let's say it's in the first portion of int[3000] buffer;. Is there an easy way to cast it to a usable struct without moving it in memory? If not, is there an easy way to get the data into a similar struct without having to manually parse out each member of the struct?
edit:
I think I need to give a practical example so you see where I'm coming from.
import std.c.windows.windows;
import std.utf;
import std.stdio;
public struct REPARSE_DATA_BUFFER
{
ULONG ReparseTag;
USHORT ReparseDataLength;
USHORT Reserved;
union
{
struct SymbolicLinkReparseBuffer
{
USHORT SubstituteNameOffset;
USHORT SubstituteNameLength;
USHORT PrintNameOffset;
USHORT PrintNameLength;
ULONG Flags;
WCHAR[1] PathBuffer;
}
SymbolicLinkReparseBuffer mySymbolicLinkReparseBuffer;
struct MountPointReparseBuffer
{
USHORT SubstituteNameOffset;
USHORT SubstituteNameLength;
USHORT PrintNameOffset;
USHORT PrintNameLength;
WCHAR[1] PathBuffer;
}
MountPointReparseBuffer myMountPointReparseBuffer;
struct GenericReparseBuffer
{
UCHAR[1] DataBuffer;
}
GenericReparseBuffer myGenericReparseBuffer;
}
}
alias REPARSE_DATA_BUFFER* PREPARSE_DATA_BUFFER;
enum MAXIMUM_REPARSE_DATA_BUFFER_SIZE = 16*1024;
// Values for 'ReparseTag' member of REPARSE_DATA_BUFFER:
enum : DWORD {
IO_REPARSE_TAG_SYMLINK = 0xA000000C,
IO_REPARSE_TAG_MOUNT_POINT = 0xA0000003 // which also defines a Junction Point
}
enum DWORD FSCTL_GET_REPARSE_POINT = 0x000900a8;
enum FILE_FLAG_OPEN_REPARSE_POINT = 0x00200000;
public extern(Windows) BOOL function(HANDLE, DWORD, LPVOID, DWORD, LPVOID, DWORD, LPVOID, OVERLAPPED*) DeviceIoControl;
void main()
{
DeviceIoControl = cast(BOOL function(HANDLE, DWORD, LPVOID, DWORD, LPVOID, DWORD, LPVOID, OVERLAPPED*))GetProcAddress(LoadLibraryA("kernel32.dll"), "DeviceIoControl");
auto RPHandle = CreateFileW((r"J:\Documents and Settings").toUTF16z(), 0, FILE_SHARE_READ, null, OPEN_EXISTING, FILE_FLAG_OPEN_REPARSE_POINT + FILE_FLAG_BACKUP_SEMANTICS, null);
if (RPHandle == INVALID_HANDLE_VALUE)
{
printf("CreateFileW failed with error code %d.", GetLastError());
return;
}
BYTE[MAXIMUM_REPARSE_DATA_BUFFER_SIZE] reparsebuffer;
uint reparsedatasize;
auto getreparsepointresult = DeviceIoControl(RPHandle, FSCTL_GET_REPARSE_POINT, null, 0, cast(void*) reparsebuffer.ptr, MAXIMUM_REPARSE_DATA_BUFFER_SIZE, &reparsedatasize, null);
if (getreparsepointresult == 0)
{
printf("DeviceIoControl with FSCTL_GET_REPARSE_POINT failed with error code %d.", GetLastError());
return;
}
// Now what?
// If I do this:
auto ReparseDataPtr = cast(REPARSE_DATA_BUFFER*) reparsebuffer.ptr;
printf("%d == %d\n", reparsebuffer.ptr, ReparseDataPtr); // Alright, data hasn't been copied.
// But what good is a pointer? Can I use a pointer to a struct to access one of its members apart from dereferencing?
printf("%d == %d\n", &reparsebuffer[0], &(*ReparseDataPtr)); // Here, I dereference ReparseDataPtr, but nothing moves.
printf("%d == %d\n", &reparsebuffer[0], &((*ReparseDataPtr).ReparseTag)); // Same here, so I can access members in a roundabout way.
printf("%d == %d\n", &reparsebuffer[0], &(ReparseDataPtr.ReparseTag)); // And thanks to Jim's comment, here's a less roundabout way.
auto ReparseData = *ReparseDataPtr; // But if I assign a name to the dereferenced ReparseDataPtr,
printf("%d != %d\n", &reparsebuffer[0], &(ReparseData.ReparseTag)); // the data is copied to a new location, leaving most of PathBuffer behind.
REPARSE_DATA_BUFFER ReparseDataFn() {return *ReparseDataPtr;} // Similarly, this way
printf("%d != %d\n", &reparsebuffer[0], &(ReparseDataFn().ReparseTag)); // copies stuff to a new location.
}
Firstly, I don't understand why it's different for the case in which I don't give *ReparseDataPtr a name.
Secondly, is there no way to have a symbol whose type is REPARSE_DATA_BUFFER and whose data is located at reparsebuffer.ptr?
Have you tried doing the exact same thing in D as in C?
struct foobar { int size; int data[1]; };
It works... just use data.ptr instead of data to access the elements, because otherwise it will perform bounds checking with a length of 1.
You could access it via a helper method:
struct foobar
{
public:
int[] Data() { return data.ptr[0..size]; }
private:
int size;
int data[1];
}
You might also want to put int a static foreach over the members of foobar that uses static assert to make sure that the offset of each is less than the offset of data.