UTF-16 to UTF8 with WideCharToMultiByte problems

UTF-16 to UTF8 with WideCharToMultiByte problems - c++

int main(){
//"Chào" in Vietnamese
wchar_t utf16[] =L"\x00ff\x00fe\x0043\x0000\x0068\x0000\x00EO\x0000\x006F";
//Dump utf16: FF FE 43 0 68 0 E 4F 0 6F (right)
int size = WideCharToMultiByte(CP_UTF8,0,utf16,-1,NULL,0,NULL,NULL);
char *utf8 = new char[size];
int k = WideCharToMultiByte(CP_UTF8,0,utf16,-1,utf8 ,size,NULL,NULL);
//Dump utf8: ffffffc3 fffffbf ffffc3 ffffbe 43 0
}
Here is my code, when i convert it string into UTF-8, it show a wrong result, so what is wrong with my code?

wchar_t utf16[] = L"\uFEFFChào";
int size = 5;
for (int i = 0; i < size; ++i) {
std::printf("%X ", utf16[i]);
}
This program prints out: FEFF 43 68 E0 6F
If printing out each wchar_t you've read from a file prints out FF FE 43 0 68 0 E 4F 0 6F then the UTF-16 data is not being read from the file correctly.. Those values represent the UTF-16 string: `L"ÿþC\0h\0à\0o".
You don't show your code for reading from the file, but here's one way to do it correctly:
https://stackoverflow.com/a/10504278/365496

You're reading the file incorrectly. Your dump of the input is showing single bytes in wide characters. Your dump of the output is the byte sequence that results from encoding L"\xff\xfe\x43" to UTF-8. The string is being truncated at the first \x0000 in the input.

Related

fgets() doesn't correctly parse the data as I want

I'm currently parsing '.dat' file by using fgets() as below.
#include <iostream>
int main(int argc, char **argv)
{
FILE *fp = fopen( argv[1], "r+");
unsigned char pu8bufsend[256];
printf("%s\n", argv[1]);
fseek(fp, 58, SEEK_SET);
while(fgets((char*)pu8bufsend, 128*2, fp) != NULL)
{
for (size_t j = 0; j < 256; j++)
{
printf("%02x , %d\n", pu8bufsend[j], j);
}
}
}
When I parse a certain '.dat' file, the first 20 outputs look as below.
c0 , 0
a8 , 1
0a , 2
00 , 3
00 , 4
00 , 5
00 , 6
00 , 7
86 , 8
aa , 9
2b , 10
1d , 11
ff , 12
7f , 13
00 , 14
00 , 15
e8 , 16
7a , 17
40 , 18
00 , 19
00 , 20
However, I'm supposed to get this highlighted part from the '.dat' file.
It's supposed to be really simple but don't get why I'm struggling so much for this.
Can anyone figure out my problem?

Two things I noticed:
The file contains a 0d 0a sequence, that's a \r\n pair, a Windows-style line ending. Your output just contains 0a, which means that line ending translation is in effect. Add "b" to the file open mode that you pass to fopen.
After the line ending, your buffer contains random garbage. This is because fgets stops reading at a line break, and leaves the rest of the buffer as-is. You need to use fread. Now, in the comments you said you tried this and it "didn't work". You'll need to elaborate on this.

converting a string read from binary file to integer

I have a binary file. i am reading 16 bytes at a time it using fstream.
I want to convert it to an integer. I tried atoi. but it didnt work.
In python we can do that by converting to byte stream using stringobtained.encode('utf-8') and then converting it to int using int(bytestring.hex(),16). Should we follow such an elloborate steps as done in python or is there a way to convert it directly?
ifstream file(binfile, ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg(0, ios::beg);
while (!file.eof())
{
file.read(memblock, 16);
int a = atoi(memblock); // doesnt work 0 always
cout << a << "\n";
memset(memblock, 0, sizeof(memblock));
}
file.close();
Edit:
This is the sample contents of the file.
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
04 00 01 01 00 40 20 20 00 00 05 A3 00 00 00 47
00 00 00 2E 00 00 00 3B 00 00 00 04 00 00 00 01
I need to read it as 16 byte i.e. 32 hex digits at a time.(i.e. one row in the sample file content) and convert it to integer.
so when reading 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00, i should get, 110748049513798795666017677735771517696
But i couldnt do it. I always get 0 even after trying strtoull. Am i reading the file wrong, or what am i missing.

You have a number of problems here. First is that C++ doesn't have a standard 128-bit integer type. You may be able to find a compiler extension, see for example Is there a 128 bit integer in gcc? or Is there a 128 bit integer in C++?.
Second is that you're trying to decode raw bytes instead of a character string. atoi will stop at the first non-digit character it runs into, which 246 times out of 256 will be the very first byte, thus it returns zero. If you're very unlucky you will read 16 valid digits and atoi will start reading uninitialized memory, leading to undefined behavior.
You don't need atoi anyway, your problem is much simpler than that. You just need to assemble 16 bytes into an integer, which can be done with shifting and or operators. The only complication is that read wants a char type which will probably be signed, and you need unsigned bytes.
ifstream file(binfile, ios::in | ios::binary);
char memblock[16];
while (file.read(memblock, 16))
{
uint128_t a = 0;
for (int i = 0; i < 16; ++i)
{
a = (a << 8) | (static_cast<unsigned int>(memblock[i]) & 0xff);
}
cout << a << "\n";
}
file.close();

It the number is binary what you want is:
short value ;
file.read(&value, sizeof (value));
Depending upon how the file was written and your processor, you may have to reverse the bytes in value using bit operations.

How to convert ECDSA DER encoded signature data to microsoft CNG supported format?

I am preparing a minidriver to perform sign in smartcard using NCryptSignHash function of Microsoft CNG.
When I perform sign with an SECP521R1 EC key in smartcard it generates a sign data with length of 139 as ECC signed data format:
ECDSASignature ::= SEQUENCE {
r INTEGER,
s INTEGER
}
Sample signed data is
308188024201A2001E9C0151C55BCA188F201020A84180B339E61EDE61F6EAD0B277321CAB81C87DAFC2AC65D542D0D0B01C3C5E25E9209C47CFDDFD5BBCAFA0D2AF2E7FD86701024200C103E534BD1378D8B6F5652FB058F7D5045615DCD940462ED0F923073076EF581210D0DD95BF2891358F5F743DB2EC009A0608CEFAA9A40AF41718881D0A26A7F4
But when I perform Sign using MS_KEY_STORAGE_PROVIDER it generates a sign with length of 132 byte.
What is the procedure to reduce the sign data size from 139 to 132?

Your input is an X9.62 signature format which is a SEQUENCE containing two ASN.1 / DER encoded signatures. These integers are variable sized, signed, big endian numbers. They are encoded in the minimum number of bytes. This means that the size of the encoding can vary.
The 139 bytes is common because it assumes the maximum size of the encoding for r and s. These values are computed using modular arithmetic and they can therefore contain any number of bits, up to the number of bits of order n, which is the same as the key size, 521 bits.
The 132 bytes are specified by ISO/IEC 7816-8 / IEEE P1363 which is a standard that deals with signatures for smart cards. The signature consists of the concatenation of r and s, where r and s are encoded as the minimum number of bytes to display a value of the same size as the order, in bytes. The r and s are statically sized, unsigned, big endian numbers.
The calculation of the number of bytes of r or s is ceil((double) n / 8) or (n + 8 - 1) / 8 where 8 is the number of bits in a byte. So if the elliptic curve is 521 bits then the resulting size is 66 bytes, and together they therefore consume 132 bytes.
Now on to the decoding. There are multiple ways of handling this: perform a full ASN.1 parse, obtain the integers and then encode them back again in the ISO 7816-8 form is the most logical one.
However, you can also see that you could simply copy bytes as r and s will always be non-negative (and thus unsigned) and big endian. So you just need to compensate for the size. Otherwise the only hard part is to be able to decode the length of the components within the X9.62 structure.
Warning: code in C# instead of C++ as I expected the main .NET language; language not indicated in question when I wrote the main part of the answer.
class ConvertECDSASignature
{
private static int BYTE_SIZE_BITS = 8;
private static byte ASN1_SEQUENCE = 0x30;
private static byte ASN1_INTEGER = 0x02;
public static byte[] lightweightConvertSignatureFromX9_62ToISO7816_8(int orderInBits, byte[] x9_62)
{
int offset = 0;
if (x9_62[offset++] != ASN1_SEQUENCE)
{
throw new IllegalSignatureFormatException("Input is not a SEQUENCE");
}
int sequenceSize = parseLength(x9_62, offset, out offset);
int sequenceValueOffset = offset;
int nBytes = (orderInBits + BYTE_SIZE_BITS - 1) / BYTE_SIZE_BITS;
byte[] iso7816_8 = new byte[2 * nBytes];
// retrieve and copy r
if (x9_62[offset++] != ASN1_INTEGER)
{
throw new IllegalSignatureFormatException("Input is not an INTEGER");
}
int rSize = parseLength(x9_62, offset, out offset);
copyToStatic(x9_62, offset, rSize, iso7816_8, 0, nBytes);
offset += rSize;
// --- retrieve and copy s
if (x9_62[offset++] != ASN1_INTEGER)
{
throw new IllegalSignatureFormatException("Input is not an INTEGER");
}
int sSize = parseLength(x9_62, offset, out offset);
copyToStatic(x9_62, offset, sSize, iso7816_8, nBytes, nBytes);
offset += sSize;
if (offset != sequenceValueOffset + sequenceSize)
{
throw new IllegalSignatureFormatException("SEQUENCE is either too small or too large for the encoding of r and s");
}
return iso7816_8;
}
/**
* Copies an variable sized, signed, big endian number to an array as static sized, unsigned, big endian number.
* Assumes that the iso7816_8 buffer is zeroized from the iso7816_8Offset for nBytes.
*/
private static void copyToStatic(byte[] sint, int sintOffset, int sintSize, byte[] iso7816_8, int iso7816_8Offset, int nBytes)
{
// if the integer starts with zero, then skip it
if (sint[sintOffset] == 0x00)
{
sintOffset++;
sintSize--;
}
// after skipping the zero byte then the integer must fit
if (sintSize > nBytes)
{
throw new IllegalSignatureFormatException("Number format of r or s too large");
}
// copy it into the right place
Array.Copy(sint, sintOffset, iso7816_8, iso7816_8Offset + nBytes - sintSize, sintSize);
}
/*
* Standalone BER decoding of length value, up to 2^31 -1.
*/
private static int parseLength(byte[] input, int startOffset, out int offset)
{
offset = startOffset;
byte l1 = input[offset++];
// --- return value of single byte length encoding
if (l1 < 0x80)
{
return l1;
}
// otherwise the first byte of the length specifies the number of encoding bytes that follows
int end = offset + l1 & 0x7F;
uint result = 0;
// --- skip leftmost zero bytes (for BER)
while (offset < end)
{
if (input[offset] != 0x00)
{
break;
}
offset++;
}
// --- test against maximum value
if (end - offset > sizeof(uint))
{
throw new IllegalSignatureFormatException("Length of TLV is too large");
}
// --- parse multi byte length encoding
while (offset < end)
{
result = (result << BYTE_SIZE_BITS) ^ input[offset++];
}
// --- make sure that the uint isn't larger than an int can handle
if (result > Int32.MaxValue)
{
throw new IllegalSignatureFormatException("Length of TLV is too large");
}
// --- return multi byte length encoding
return (int) result;
}
}
Note that the code is somewhat permissive in the fact that it doesn't require the minimum length encoding for the SEQUENCE and INTEGER length encoding (which it should).
It also allows wrongly encoded INTEGER values that are unnecessarily left-padded with zero bytes.
Neither of these issues should break the security of the algorithm but other libraries may and should be less permissive.

What is the procedure to reduce the sign data size from 139 to 132?
You have an ASN.1 encoded signature (shown below). It is used by Java, OpenSSL and some other libraries. You need the signature in P1363 format, which is a concatenation of r || s, without the ASN.1 encoding. P1363 is used by Crypto++ and a few other libraries. (There's another common signature format, and that is OpenPGP).
For the concatenation of r || s, both r and s must be 66 bytes because of secp-521r1 field element size on an octet boundary. That means the procedure is, you have to strip the outer SEQUENCE, and then strip the two INTEGER, and then concatenate the values of the two integers.
Your formatted r || s signature using your sample data will be:
01 A2 00 1E ... 7F D8 67 01 || 00 C1 03 E5 ... 0A 26 A7 F4
Microsoft .Net 2.0 has ASN.1 classes that allow you to manipulate ASN.1 encoded data. See AsnEncodedData class.
$ echo 08188024201A2001E9C0151C55BCA188F201020A84180B339E61EDE61F6EAD0B277321CAB
81C87DAFC2AC65D542D0D0B01C3C5E25E9209C47CFDDFD5BBCAFA0D2AF2E7FD86701024200C103E5
34BD1378D8B6F5652FB058F7D5045615DCD940462ED0F923073076EF581210D0DD95BF2891358F5F
743DB2EC009A0608CEFAA9A40AF41718881D0A26A7F4 | xxd -r -p > signature.bin
$ dumpasn1 signature.bin
0 136: SEQUENCE {
3 66: INTEGER
: 01 A2 00 1E 9C 01 51 C5 5B CA 18 8F 20 10 20 A8
: 41 80 B3 39 E6 1E DE 61 F6 EA D0 B2 77 32 1C AB
: 81 C8 7D AF C2 AC 65 D5 42 D0 D0 B0 1C 3C 5E 25
: E9 20 9C 47 CF DD FD 5B BC AF A0 D2 AF 2E 7F D8
: 67 01
71 66: INTEGER
: 00 C1 03 E5 34 BD 13 78 D8 B6 F5 65 2F B0 58 F7
: D5 04 56 15 DC D9 40 46 2E D0 F9 23 07 30 76 EF
: 58 12 10 D0 DD 95 BF 28 91 35 8F 5F 74 3D B2 EC
: 00 9A 06 08 CE FA A9 A4 0A F4 17 18 88 1D 0A 26
: A7 F4
: }
0 warnings, 0 errors.
Another noteworthy item is, .Net uses the XML format detailed in RFC 3275, XML-Signature Syntax and Processing. It is a different format than ASN.1, P1363, OpenPGP, CNG and other libraries.
The ASN.1 to P1363 conversion is rather trivial. You can see an example using the Crypto++ library at ECDSA sign with BouncyCastle and verify with Crypto++.
You might find Cryptographic Interoperability: Digital Signatures on Code Project helpful.

C++ Read Binary File with "ieee-be" machinefmt

I basically try to convert Matlab Code to C++, reading a binary file I do not really know how it looks like.
The Matlab Code is simplified as follows:
x=zeros(48,32);
fid=fopen('pres_00.bin','r','ieee-be');
fseek(fid,ipos,'bof');
x(1:4:48,:)=fread(fid,[12,32],'single');
in the end we basically get double numbers in the x array (row 1, 5,..)
How can I read the *.bin file in C++? I tried:
file1.seekg(0, ios::end);
int length = file1.tellg();
file1.seekg(ipos, ios_base::beg);
lenght = lenght - ipos;
char * buffer = new char[length];
file1.read(buffer, length);
double* double_values = (double*)buffer;
double test = double_values[0];
file1.close();
Sadly "test" is not similar to the number matlab is encoding out of the binary file. How can I implement the information with the ieee-be encoding into c++?
Unfortunately I'm not that familiar with binary files...
Cheers and thanks for your help!
//edit:
Maybe it helps:
In my case
ipos = 0
the first hex row (offset0) (32) :
44 7C CD 35 44 7C AD 89 44 7C E9 F2 44 7D F7 10 44 7D 9C F9 44 7B F9 E4 44 7B 3E 1D 44 7B 6C CE
ANSI: D|Í5D|.‰D|éòD}÷.D}œùD{ùäD{>.D{lÎ
First value in Matlab: 1.011206359863281e+03
What my Code reads in buffer: D|Í5D|-‰.D|éòD}÷.\x10D}œùD{ùäD{>\x1dD{lÎ......
double test = -4.6818882332480884e-262

There are two parts to this problem. First, the representation is IEEE 32 bit floating point; since most processors use IEEE floating point, all you need is a simple cast to do the conversion. This won't be portable to all processors though. The second part is the be in the ieee-be specification, it means that the bytes are stored big-endian. Since many processors (i.e. Intel/AMD) are little-endian, you need to do a byte swap before the conversion.
void byteswap4(char *p)
{
std::swap(p[0], p[3]);
std::swap(p[1], p[2]);
}
float to_float(char *p)
{
return *((float*)p);
}
See it in action: https://ideone.com/IrDEJF

Odd result when reading first byte of a PNG header

I'm trying to read the header from a PNG file.
The result should be
Dec: 137 80 78 71 13 10 26 10
Hex: 89 50 4E 47 0D 0A 1A 0A
However, I get
Dec: 4294967 80 78 71 13 10 26 10
What am I doing wrong?
Code:
char T;
pngFile = fopen(Filename, "rb");
if(pngFile)
{
fread(&T, 1, 1, pngFile);
fclose(pngFile);
printf("T: %u\n", T);
}

137 is too big for signed char - use unsigned char instead...
see this link for limits of data types.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

UTF-16 to UTF8 with WideCharToMultiByte problems - c++

You're reading the file incorrectly. Your dump of the input is showing single bytes in wide characters. Your dump of the output is the byte sequence that results from encoding L"\xff\xfe\x43" to UTF-8. The string is being truncated at the first \x0000 in the input.

Related

fgets() doesn't correctly parse the data as I want

converting a string read from binary file to integer

How to convert ECDSA DER encoded signature data to microsoft CNG supported format?

C++ Read Binary File with "ieee-be" machinefmt

Odd result when reading first byte of a PNG header

Categories

Resources