Convert wchar_t to char with boost locale

Convert wchar_t to char with boost locale - c++

My goal is to convert wchar_t to char, my approach was to use boost::locale (using boost 1.60). For example,
wchar_t * myWcharString = "0000000002" (Memory 0x 30 00 30 00 ... 32 00)
to
char * myCharString = "0000000002" (Memory 0x 30 30 ... 32)
I wrote a function:
inline char* newCharFromWchar(wchar_t * utf16String) {
char * cResult = NULL;
try {
std::string szResult = boost::locale::conv::from_utf(utf16String, "UTF-8");
cResult = new char[szResult.size() + 1];
memset(reinterpret_cast<void*>(cResult), 0, szResult.size() + 1);
memcpy(reinterpret_cast<void*>(cResult),
reinterpret_cast<const void*>(szResult.c_str()),
szResult.size());
}
catch (...) {
// boost::locale::conv might throw
}
return cResult;
}
Now the problem is that with VS2013 it behaves differently to gcc and clang, i.e.
// VS 2013 behaves as expected
wchar_t * utf16String = "0000000002" (Memory 0x 30 00 30 00 ... 32 00)
char * cResult = "0000000002" (Memory 0x 30 30 ... 32)
// both gcc and clang NOT as expected:
wchar_t * utf16String = "0000000002" (Memory 0x 30 00 30 00 ... 32 00)
char * cResult = "2" (Memory 0x 32)
Both boost implementation of gcc and clang seem to use only the last 2 bytes of my input wchar_t, though it is parsed correctly regarding the start and end address of the input.
What am I missing?

VS2013 takes wchar_t as 16-Bit character, while both gcc and clang take it a 32-Bit character (on my machine).
So if I store 0x 30 00 30 00 ... 32 00 as wchar_t it only works with VS2013 as expected. boost::locale will assume that 0x 30 00 30 00 is a single character instead of two, as I expected. The resulting output therefore is completely different between these platforms.

Related

converting a string read from binary file to integer

I have a binary file. i am reading 16 bytes at a time it using fstream.
I want to convert it to an integer. I tried atoi. but it didnt work.
In python we can do that by converting to byte stream using stringobtained.encode('utf-8') and then converting it to int using int(bytestring.hex(),16). Should we follow such an elloborate steps as done in python or is there a way to convert it directly?
ifstream file(binfile, ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg(0, ios::beg);
while (!file.eof())
{
file.read(memblock, 16);
int a = atoi(memblock); // doesnt work 0 always
cout << a << "\n";
memset(memblock, 0, sizeof(memblock));
}
file.close();
Edit:
This is the sample contents of the file.
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
04 00 01 01 00 40 20 20 00 00 05 A3 00 00 00 47
00 00 00 2E 00 00 00 3B 00 00 00 04 00 00 00 01
I need to read it as 16 byte i.e. 32 hex digits at a time.(i.e. one row in the sample file content) and convert it to integer.
so when reading 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00, i should get, 110748049513798795666017677735771517696
But i couldnt do it. I always get 0 even after trying strtoull. Am i reading the file wrong, or what am i missing.

You have a number of problems here. First is that C++ doesn't have a standard 128-bit integer type. You may be able to find a compiler extension, see for example Is there a 128 bit integer in gcc? or Is there a 128 bit integer in C++?.
Second is that you're trying to decode raw bytes instead of a character string. atoi will stop at the first non-digit character it runs into, which 246 times out of 256 will be the very first byte, thus it returns zero. If you're very unlucky you will read 16 valid digits and atoi will start reading uninitialized memory, leading to undefined behavior.
You don't need atoi anyway, your problem is much simpler than that. You just need to assemble 16 bytes into an integer, which can be done with shifting and or operators. The only complication is that read wants a char type which will probably be signed, and you need unsigned bytes.
ifstream file(binfile, ios::in | ios::binary);
char memblock[16];
while (file.read(memblock, 16))
{
uint128_t a = 0;
for (int i = 0; i < 16; ++i)
{
a = (a << 8) | (static_cast<unsigned int>(memblock[i]) & 0xff);
}
cout << a << "\n";
}
file.close();

It the number is binary what you want is:
short value ;
file.read(&value, sizeof (value));
Depending upon how the file was written and your processor, you may have to reverse the bytes in value using bit operations.

How to convert ECDSA DER encoded signature data to microsoft CNG supported format?

I am preparing a minidriver to perform sign in smartcard using NCryptSignHash function of Microsoft CNG.
When I perform sign with an SECP521R1 EC key in smartcard it generates a sign data with length of 139 as ECC signed data format:
ECDSASignature ::= SEQUENCE {
r INTEGER,
s INTEGER
}
Sample signed data is
308188024201A2001E9C0151C55BCA188F201020A84180B339E61EDE61F6EAD0B277321CAB81C87DAFC2AC65D542D0D0B01C3C5E25E9209C47CFDDFD5BBCAFA0D2AF2E7FD86701024200C103E534BD1378D8B6F5652FB058F7D5045615DCD940462ED0F923073076EF581210D0DD95BF2891358F5F743DB2EC009A0608CEFAA9A40AF41718881D0A26A7F4
But when I perform Sign using MS_KEY_STORAGE_PROVIDER it generates a sign with length of 132 byte.
What is the procedure to reduce the sign data size from 139 to 132?

Your input is an X9.62 signature format which is a SEQUENCE containing two ASN.1 / DER encoded signatures. These integers are variable sized, signed, big endian numbers. They are encoded in the minimum number of bytes. This means that the size of the encoding can vary.
The 139 bytes is common because it assumes the maximum size of the encoding for r and s. These values are computed using modular arithmetic and they can therefore contain any number of bits, up to the number of bits of order n, which is the same as the key size, 521 bits.
The 132 bytes are specified by ISO/IEC 7816-8 / IEEE P1363 which is a standard that deals with signatures for smart cards. The signature consists of the concatenation of r and s, where r and s are encoded as the minimum number of bytes to display a value of the same size as the order, in bytes. The r and s are statically sized, unsigned, big endian numbers.
The calculation of the number of bytes of r or s is ceil((double) n / 8) or (n + 8 - 1) / 8 where 8 is the number of bits in a byte. So if the elliptic curve is 521 bits then the resulting size is 66 bytes, and together they therefore consume 132 bytes.
Now on to the decoding. There are multiple ways of handling this: perform a full ASN.1 parse, obtain the integers and then encode them back again in the ISO 7816-8 form is the most logical one.
However, you can also see that you could simply copy bytes as r and s will always be non-negative (and thus unsigned) and big endian. So you just need to compensate for the size. Otherwise the only hard part is to be able to decode the length of the components within the X9.62 structure.
Warning: code in C# instead of C++ as I expected the main .NET language; language not indicated in question when I wrote the main part of the answer.
class ConvertECDSASignature
{
private static int BYTE_SIZE_BITS = 8;
private static byte ASN1_SEQUENCE = 0x30;
private static byte ASN1_INTEGER = 0x02;
public static byte[] lightweightConvertSignatureFromX9_62ToISO7816_8(int orderInBits, byte[] x9_62)
{
int offset = 0;
if (x9_62[offset++] != ASN1_SEQUENCE)
{
throw new IllegalSignatureFormatException("Input is not a SEQUENCE");
}
int sequenceSize = parseLength(x9_62, offset, out offset);
int sequenceValueOffset = offset;
int nBytes = (orderInBits + BYTE_SIZE_BITS - 1) / BYTE_SIZE_BITS;
byte[] iso7816_8 = new byte[2 * nBytes];
// retrieve and copy r
if (x9_62[offset++] != ASN1_INTEGER)
{
throw new IllegalSignatureFormatException("Input is not an INTEGER");
}
int rSize = parseLength(x9_62, offset, out offset);
copyToStatic(x9_62, offset, rSize, iso7816_8, 0, nBytes);
offset += rSize;
// --- retrieve and copy s
if (x9_62[offset++] != ASN1_INTEGER)
{
throw new IllegalSignatureFormatException("Input is not an INTEGER");
}
int sSize = parseLength(x9_62, offset, out offset);
copyToStatic(x9_62, offset, sSize, iso7816_8, nBytes, nBytes);
offset += sSize;
if (offset != sequenceValueOffset + sequenceSize)
{
throw new IllegalSignatureFormatException("SEQUENCE is either too small or too large for the encoding of r and s");
}
return iso7816_8;
}
/**
* Copies an variable sized, signed, big endian number to an array as static sized, unsigned, big endian number.
* Assumes that the iso7816_8 buffer is zeroized from the iso7816_8Offset for nBytes.
*/
private static void copyToStatic(byte[] sint, int sintOffset, int sintSize, byte[] iso7816_8, int iso7816_8Offset, int nBytes)
{
// if the integer starts with zero, then skip it
if (sint[sintOffset] == 0x00)
{
sintOffset++;
sintSize--;
}
// after skipping the zero byte then the integer must fit
if (sintSize > nBytes)
{
throw new IllegalSignatureFormatException("Number format of r or s too large");
}
// copy it into the right place
Array.Copy(sint, sintOffset, iso7816_8, iso7816_8Offset + nBytes - sintSize, sintSize);
}
/*
* Standalone BER decoding of length value, up to 2^31 -1.
*/
private static int parseLength(byte[] input, int startOffset, out int offset)
{
offset = startOffset;
byte l1 = input[offset++];
// --- return value of single byte length encoding
if (l1 < 0x80)
{
return l1;
}
// otherwise the first byte of the length specifies the number of encoding bytes that follows
int end = offset + l1 & 0x7F;
uint result = 0;
// --- skip leftmost zero bytes (for BER)
while (offset < end)
{
if (input[offset] != 0x00)
{
break;
}
offset++;
}
// --- test against maximum value
if (end - offset > sizeof(uint))
{
throw new IllegalSignatureFormatException("Length of TLV is too large");
}
// --- parse multi byte length encoding
while (offset < end)
{
result = (result << BYTE_SIZE_BITS) ^ input[offset++];
}
// --- make sure that the uint isn't larger than an int can handle
if (result > Int32.MaxValue)
{
throw new IllegalSignatureFormatException("Length of TLV is too large");
}
// --- return multi byte length encoding
return (int) result;
}
}
Note that the code is somewhat permissive in the fact that it doesn't require the minimum length encoding for the SEQUENCE and INTEGER length encoding (which it should).
It also allows wrongly encoded INTEGER values that are unnecessarily left-padded with zero bytes.
Neither of these issues should break the security of the algorithm but other libraries may and should be less permissive.

What is the procedure to reduce the sign data size from 139 to 132?
You have an ASN.1 encoded signature (shown below). It is used by Java, OpenSSL and some other libraries. You need the signature in P1363 format, which is a concatenation of r || s, without the ASN.1 encoding. P1363 is used by Crypto++ and a few other libraries. (There's another common signature format, and that is OpenPGP).
For the concatenation of r || s, both r and s must be 66 bytes because of secp-521r1 field element size on an octet boundary. That means the procedure is, you have to strip the outer SEQUENCE, and then strip the two INTEGER, and then concatenate the values of the two integers.
Your formatted r || s signature using your sample data will be:
01 A2 00 1E ... 7F D8 67 01 || 00 C1 03 E5 ... 0A 26 A7 F4
Microsoft .Net 2.0 has ASN.1 classes that allow you to manipulate ASN.1 encoded data. See AsnEncodedData class.
$ echo 08188024201A2001E9C0151C55BCA188F201020A84180B339E61EDE61F6EAD0B277321CAB
81C87DAFC2AC65D542D0D0B01C3C5E25E9209C47CFDDFD5BBCAFA0D2AF2E7FD86701024200C103E5
34BD1378D8B6F5652FB058F7D5045615DCD940462ED0F923073076EF581210D0DD95BF2891358F5F
743DB2EC009A0608CEFAA9A40AF41718881D0A26A7F4 | xxd -r -p > signature.bin
$ dumpasn1 signature.bin
0 136: SEQUENCE {
3 66: INTEGER
: 01 A2 00 1E 9C 01 51 C5 5B CA 18 8F 20 10 20 A8
: 41 80 B3 39 E6 1E DE 61 F6 EA D0 B2 77 32 1C AB
: 81 C8 7D AF C2 AC 65 D5 42 D0 D0 B0 1C 3C 5E 25
: E9 20 9C 47 CF DD FD 5B BC AF A0 D2 AF 2E 7F D8
: 67 01
71 66: INTEGER
: 00 C1 03 E5 34 BD 13 78 D8 B6 F5 65 2F B0 58 F7
: D5 04 56 15 DC D9 40 46 2E D0 F9 23 07 30 76 EF
: 58 12 10 D0 DD 95 BF 28 91 35 8F 5F 74 3D B2 EC
: 00 9A 06 08 CE FA A9 A4 0A F4 17 18 88 1D 0A 26
: A7 F4
: }
0 warnings, 0 errors.
Another noteworthy item is, .Net uses the XML format detailed in RFC 3275, XML-Signature Syntax and Processing. It is a different format than ASN.1, P1363, OpenPGP, CNG and other libraries.
The ASN.1 to P1363 conversion is rather trivial. You can see an example using the Crypto++ library at ECDSA sign with BouncyCastle and verify with Crypto++.
You might find Cryptographic Interoperability: Digital Signatures on Code Project helpful.

Disable alignment on a 64-bit structure

I'm trying to align my structure and make it as small as possible using bit fields. I have to send this data back to a client, which will examine the fields to set a few data members.
The size of the structure is indeed the same, but when I set members it does not work at all.
Here's some example code:
#pragma pack(push, 1)
struct PW_INFO
{
char hash[16]; //Does not matter
uint32_t number; //Does not matter
uint32_t salt_id : 30; //Position: 0 bits
uint32_t enc_level : 7; //Position: 30 bits
uint32_t delta : 27; //Position: 37 bits
}; //Total size: 28 bytes
#pragma pack(pop)
void int64shrl(uint64_t& base, uint32_t to_shift, uint32_t position)
{
uint64_t res = static_cast<uint64_t>(to_shift);
res = Int64ShllMod32(res, position);
base |= res;
}
int32_t main()
{
std::cout << "Size of PW_INFO: " << sizeof(PW_INFO) << "\n"; //Returns 28 as expected (16 + sizeof(uint32_t) + 8)
PW_INFO pw = { "abc123", 0, 0, 0, 0 };
pw.enc_level = 105;
uint64_t base{ 0 };
&base; //debug purposes
int64shrl(base, 103, 30);
return 0;
}
Here's where it gets weird: setting the "salt_id" field (which is 30 bits into the bitfield) will yield the following result in memory:
0x003FFB8C 61 62 63 31 32 33 00 00 abc123..
0x003FFB94 00 00 00 00 00 00 00 00 ........
0x003FFB9C 00 00 00 00 00 00 00 00 ........
0x003FFBA4 69 00 00 00 i...
(Only the last 8 bytes are of concern since they represent the bit field.)
But, Int64ShllMod32 returns a correct result (the remote client undersands it perfectly):
0x003FFB7C 00 00 00 c0 19 00 00 00 ...À....
I'm guessing it has to do with alignment, if so how would I completely get rid of it? It seems even if the size is correct, it will try to align it (1 byte boundary as the #pragma directive suggests).
More information:
I use Visual Studio 2015 and its compiler.
I am not trying to write those in a different format, the reason I'm asking this is that I do NOT want to use my own format. They are reading from 64 bit bitfields everywhere, I don't have access to the source code but I see a lot of calls to Int64ShrlMod32 (from what I read, this is what the compiler produces when dealing with 8 byte structures).
The actual bitfield starts at "salt_id". 30 + 7 + 27 = 64 bits, I hope it is clearer now.

Portable executable DOS header length

I've been studying this image for building a portable executable: https://i.imgur.com/LIImg.jpg
The image/walkthrough says the PE header starts at 0x40 (64 in decimal). However, the hexadecimal dump says the DOS header is 32 bytes long. Is it being packed at 4 bytes for each field?
Looking at the IMAGE_DOS_HEADER in WinNT.h, it doesn't seem to fit either. It has 16 2-byte fields, one 4-length 2-byte array, one 10-length 2-byte array, and the 4-byte pointer to the PE location. Any way you look at that it doesn't add up to 64...

However, the hexadecimal dump says the DOS header is 32 bytes long.
Offset:0x30
00 00 00 00-00 00 00 00-00 00 00 00-40 00 00 00
0x30 + 16 = 0x40 (64).
typedef struct _IMAGE_DOS_HEADER
{
// Cumulative size:
WORD e_magic; // 2
WORD e_cblp; // 4
WORD e_cp; // 6
WORD e_crlc; // 8
WORD e_cparhdr; // 10
WORD e_minalloc; // 12
WORD e_maxalloc; // 14
WORD e_ss; // 16
WORD e_sp; // 18
WORD e_csum; // 20
WORD e_ip; // 22
WORD e_cs; // 24
WORD e_lfarlc; // 26
WORD e_ovno; // 28
WORD e_res[4]; // 36
WORD e_oemid; // 38
WORD e_oeminfo; // 40
WORD e_res2[10]; // 60
LONG e_lfanew; // 64
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
It has 16 2-byte fields, one 4-length 2-byte array, one 10-length 2-byte array, and the 4-byte pointer to the PE location. Any way you look at that it doesn't add up to 64...
(16 * 2) = 32
(4 * 2) = 8
(10 * 2) = 20
+ 4
------------------
64

What does the symbol \0 mean in a string-literal?

Consider following code:
char str[] = "Hello\0";
What is the length of str array, and with how much 0s it is ending?

sizeof str is 7 - five bytes for the "Hello" text, plus the explicit NUL terminator, plus the implicit NUL terminator.
strlen(str) is 5 - the five "Hello" bytes only.
The key here is that the implicit nul terminator is always added - even if the string literal just happens to end with \0. Of course, strlen just stops at the first \0 - it can't tell the difference.
There is one exception to the implicit NUL terminator rule - if you explicitly specify the array size, the string will be truncated to fit:
char str[6] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 6 (with one NUL)
char str[7] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 7 (with two NULs)
char str[8] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 8 (with three NULs per C99 6.7.8.21)
This is, however, rarely useful, and prone to miscalculating the string length and ending up with an unterminated string. It is also forbidden in C++.

The length of the array is 7, the NUL character \0 still counts as a character and the string is still terminated with an implicit \0
See this link to see a working example
Note that had you declared str as char str[6]= "Hello\0"; the length would be 6 because the implicit NUL is only added if it can fit (which it can't in this example.)
§ 6.7.8/p14 An array of
character type may be initialized by a
character string literal, optionally
enclosed in braces. Sucessive
characters of the character string
literal (including the terminating
null character if there is room or if
the array is of unknown size)
initialize the elements of the array.
Examples
char str[] = "Hello\0"; /* sizeof == 7, Explicit + Implicit NUL */
char str[5]= "Hello\0"; /* sizeof == 5, str is "Hello" with no NUL (no longer a C-string, just an array of char). This may trigger compiler warning */
char str[6]= "Hello\0"; /* sizeof == 6, Explicit NUL only */
char str[7]= "Hello\0"; /* sizeof == 7, Explicit + Implicit NUL */
char str[8]= "Hello\0"; /* sizeof == 8, Explicit + two Implicit NUL */

Specifically, I want to mention one situation, by which you may confuse.
What is the difference between "\0" and ""?
The answer is that "\0" represents in array is {0 0} and "" is {0}.
Because "\0" is still a string literal and it will also add "\0" at the end of it. And "" is empty but also add "\0".
Understanding of this will help you understand "\0" deeply.

Banging my usual drum solo of JUST TRY IT, here's how you can answer questions like that in the future:
$ cat junk.c
#include <stdio.h>
char* string = "Hello\0";
int main(int argv, char** argc)
{
printf("-->%s<--\n", string);
}
$ gcc -S junk.c
$ cat junk.s
... eliding the unnecessary parts ...
.LC0:
.string "Hello"
.string ""
...
.LC1:
.string "-->%s<--\n"
...
Note here how the string I used for printf is just "-->%s<---\n" while the global string is in two parts: "Hello" and "". The GNU assembler also terminates strings with an implicit NUL character, so the fact that the first string (.LC0) is in those two parts indicates that there are two NULs. The string is thus 7 bytes long. Generally if you really want to know what your compiler is doing with a certain hunk of code, isolate it in a dummy example like this and see what it's doing using -S (for GNU -- MSVC has a flag too for assembler output but I don't know it off-hand). You'll learn a lot about how your code works (or fails to work as the case may be) and you'll get an answer quickly that is 100% guaranteed to match the tools and environment you're working in.

What is the length of str array, and with how much 0s it is ending?
Let's find out:
int main() {
char str[] = "Hello\0";
int length = sizeof str / sizeof str[0];
// "sizeof array" is the bytes for the whole array (must use a real array, not
// a pointer), divide by "sizeof array[0]" (sometimes sizeof *array is used)
// to get the number of items in the array
printf("array length: %d\n", length);
printf("last 3 bytes: %02x %02x %02x\n",
str[length - 3], str[length - 2], str[length - 1]);
return 0;
}

char str[]= "Hello\0";
That would be 7 bytes.
In memory it'd be:
48 65 6C 6C 6F 00 00
H e l l o \0 \0
Edit:
What does the \0 symbol mean in a C string?
It's the "end" of a string. A null character. In memory, it's actually a Zero. Usually functions that handle char arrays look for this character, as this is the end of the message. I'll put an example at the end.
What is the length of str array? (Answered before the edit part)
7
and with how much 0s it is ending?
You array has two "spaces" with zero; str[5]=str[6]='\0'=0
Extra example:
Let's assume you have a function that prints the content of that text array.
You could define it as:
char str[40];
Now, you could change the content of that array (I won't get into details on how to), so that it contains the message: "This is just a printing test"
In memory, you should have something like:
54 68 69 73 20 69 73 20 6a 75 73 74 20 61 20 70 72 69 6e 74
69 6e 67 20 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00
So you print that char array. And then you want a new message. Let's say just "Hello"
48 65 6c 6c 6f 00 73 20 6a 75 73 74 20 61 20 70 72 69 6e 74
69 6e 67 20 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00
Notice the 00 on str[5]. That's how the print function will know how much it actually needs to send, despite the actual longitude of the vector and the whole content.

'\0' is referred to as NULL character or NULL terminator
It is the character equivalent of integer 0(zero) as it refers to nothing
In C language it is generally used to mark an end of a string.
example string a="Arsenic";
every character stored in an array
a[0]=A
a[1]=r
a[2]=s
a[3]=e
a[4]=n
a[5]=i
a[6]=c
end of the array contains ''\0' to stop the array memory allocation for the string 'a'.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert wchar_t to char with boost locale - c++

Related

converting a string read from binary file to integer

How to convert ECDSA DER encoded signature data to microsoft CNG supported format?

Disable alignment on a 64-bit structure

Portable executable DOS header length

What does the symbol \0 mean in a string-literal?

Categories

Resources