Protobuf serialization length differs when using different values for field

Protobuf serialization length differs when using different values for field - c++

My Protobuf message consists of 3 doubles
syntax = "proto3";
message TestMessage{
double input = 1;
double output = 2;
double info = 3;
}
When I set these values to
test.set_input(2.3456);
test.set_output(5.4321);
test.set_info(5.0);
the serialized message looks like
00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........#..h"lx.|
00000010 15 40 19 |.#.|
00000013
when using test.serializeToArray and could not be deserialized successfully by a go program using the same protobuf message. When trying to read it from a c++ program I got a 0 as info, so the message seems to be corrupted.
When using test.serializeToOstream I got this message, which could be deserialized successfully by both go and c++ programs.
00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........#..h"lx.|
00000010 15 40 19 00 00 00 00 00 00 14 40 |.#........#|
0000001b
When setting the values to
test.set_input(2.3456);
test.set_output(5.4321);
test.set_info(5.5678);
the serialized messages, both produced by test.serializeToArray and test.serializeToOstream look like
00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........#..h"lx.|
00000010 15 40 19 da ac fa 5c 6d 45 16 40 |.#....\mE.#|
0000001b
and could be successfully read by my go and cpp program.
What am I missing here? Why is serializeToArray not working in the first case?
EDIT:
As it turns out, serializeToString works fine, too.
Here the code I used for the comparison:
file_a.open(FILEPATH_A);
file_b.open(FILEPATH_B);
test.set_input(2.3456);
test.set_output(5.4321);
test.set_info(5.0);
//serializeToArray
int size = test.ByteSize();
char *buffer = (char*) malloc(size);
test.SerializeToArray(buffer, size);
file_a << buffer;
//serializeToString
std::string buf;
test.SerializeToString(&buf);
file_b << buf;
file_a.close();
file_b.close();
Why does serializeToArray not work as expected?
EDIT2:
When using file_b << buf.data() instead of file_b << buf.data(), the data gets corrupted as well, but why?

I think the error you're making is treating binary as character data and using character data APIs. Many of those APIs stop at the first nil byte (0), but that is a totally valid value in protobuf binary.
You need to make sure you don't use any such APIs basically - stick purely to binary safe APIs.
Since you indicate that size is 27, this all fits.
Basically, the binary representation of 5.0 includes 0 bytes, but you could easily have seen the same problem for other values in time.

Related

Arduino RP2040 Pico Unique ID

I am using Raspberry pi pico on Arduino IDE. I am using this library githublink for it. There is 3 examples in this link, ArduinoUniqueID and ArduinoUniqueID8
doesn't print anything. Ide says
WARNING: library ArduinoUniqueID claims to run on avr, esp8266, esp32, sam, samd, stm32 architecture(s) and may be incompatible with your current board which runs on mbed_rp2040 architecture(s).
(but GitHub says we add RP2040)
When I try to use last example ArduinoUniqueIDSerialUSB , It prints something but they are not correct values. It prints these :
UniqueID: 30 00 33 00 39 00 31 00 36 00 30 00 45 00 36 00 32 00 41 00 38 00 32 00 34 00 38 00 43 00 33 00
UniqueID: 34 00 38 00 43 00 33 00
The correct unique ID values here : (I printed these with micropython)
hex value of s = e660a4931754432c
type s = <class 'bytes'>
s = b'\xe6`\xa4\x93\x17TC,'
I don't even know what type 34 00 38 00 43 00 33 00 are, I try to convert hex but it prints same thing.
How can I find pico's Unique ID with Arduino Code ?

A unique ID for the Pico (and most RP2040 boards) is determined by the serial number of the flash. The Pico SDK has functions to get that ID. Either you can retrieve it directly from the flash by using flash_get_unique_id(uint8_t* id_out) which is what the library linked above did. The documentation for that is here.
Alternatively, you can get the unique ID from the MCU. The two functions for retrieving the ID are pico_get_unique_board_id(pico_unique_board_id_t* id_out) which returns the ID as a hex array or pico_get_unique_board_id_string(char* id_out, uint len) which returns it as a string. The documentation for that is here.
Those values are hex and are coming from their Unique_ID buffer which it looks like is being improperly filled with the Unique id. The code below should instead do what you need.
uint8_t UniqueID[8];
void UniqueIDdump(stream)
{
flash_get_unique_id(UniqueID);
stream.print("UniqueID: ");
for (size_t i = 0; i < 8; i++)
{
if (UniqueID[i] < 0x10)
stream.print("0");
stream.print(UniqueID[i], HEX);
stream.print(" "); }
stream.println();
}

"UniqueID: 30 00 33 " etc is a unicode string "039160E62A8248C348C3" not hex.
also for Earls pico core just add extern "C" void flash_get_unique_id(uint8_t *p); to your sketch and you can access the function required by the above UniqueIDdump example

gRPC memory leaks

My Visual Studio project, that use gRPC library have memory leaks. After some R&D I made a little project to reproduce the problem and found that don't even need to call any gRPC object in my code.
My steps:
1) Get helloworld.proto from examples
2) Generate C++ files
3) Create C++ project with next code:
#include "helloworld.grpc.pb.h"
void f(){
helloworld::HelloRequest request;
}
int main(){
_CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF);
return 0;
}
Part of Output(full dump have 240 lines):
Detected memory leaks!
Dumping objects ->
{1450} normal block at 0x00FD77A0, 16 bytes long.
Data: <`{ t C | > 60 7B FD 00 20 74 FD 00 84 43 CA 00 88 7C CA 00
{1449} normal block at 0x00FECA30, 48 bytes long.
Data: <google.protobuf.> 67 6F 6F 67 6C 65 2E 70 72 6F 74 6F 62 75 66 2E
{1448} normal block at 0x00FEA048, 8 bytes long.
Data: < > 20 C6 FE 00 00 00 00 00
{1447} normal block at 0x00FEC610, 52 bytes long.
Data: < v p" v > B8 76 FC 00 70 22 FE 00 B8 76 FC 00 00 00 CD CD
{1441} normal block at 0x00FEA610, 32 bytes long.
Data: <google.protobuf.> 67 6F 6F 67 6C 65 2E 70 72 6F 74 6F 62 75 66 2E
{1440} normal block at 0x00FE9B78, 8 bytes long.
If I add google::protobuf::ShutdownProtobufLibrary(); line before return 0;, I will have much less output. Only that:
Detected memory leaks!
Dumping objects ->
{160} normal block at 0x00FCD858, 4 bytes long.
Data: < > 18 D6 B9 00
{159} normal block at 0x00FCD618, 4 bytes long.
Data: < > > C8 3E B9 00
{158} normal block at 0x00FCD678, 4 bytes long.
Data: < ? > D0 3F B9 00
Object dump complete.
But if I include some addition generated sources with many and big services and messages, memory dump will be much bigger.
So since I really don't use any gRPC objects directly, only one think I can imagine is that some static objects still alive when VS Memory Dumper start to work.
Is there a way to fix it or a suggestion what I can do with that?
UPD:
I made some addition work around this problem and open new issue on grpc repository bug tracker: https://github.com/grpc/grpc/issues/22506
Problem Description on that issue contain screenshots with leaked allocations callstack and gRPC debug traces.
UPD2:
I found all of them(1.23.0 version). I leaved the detailed comment there: https://github.com/grpc/grpc/issues/22506#issuecomment-618406755

Memory leak while trying to replace character in string

I'm generating a big char for future passing to a thread with strcpyand strcat. It was all going ok until I needed to substitute all the occurrences of the space for a comma in one of the strings. I searched for the solution to this here
Problem is, now I have a memory leak and the program exits with this message:
_Dumping objects ->
{473} normal block at 0x0091E0C0, 32 bytes long.
Data: <AMLUH UL619 BKD > 41 4D 4C 55 48 20 55 4C 36 31 39 20 42 4B 44 20
{472} normal block at 0x049CCD20, 8 bytes long.
Data: < > BC ED 18 00 F0 EC 18 00
{416} normal block at 0x082B5158, 1000 bytes long.
Data: <Number of Aircra> 4E 75 6D 62 65 72 20 6F 66 20 41 69 72 63 72 61
{415} normal block at 0x04A0E200, 20 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{185} normal block at 0x049DA998, 64 bytes long.
Data: < O X8 8 > DC 4F BB 58 38 C5 9A 06 38 D3 88 00 00 00 00 00
PythonPlugin.cpp(76) : {172} normal block at 0x0088D338, 72 bytes long.
Data: < a X F <) > DC 61 BB 58 18 BB 46 06 3C 29 8A 06 CD CD CD CD
Object dump complete._
Here's the code so you can tell me what I'm doing wrong:
Code of the problem:
char* loop_planes(ac){
char *char1=new char[1000];
for(...){
strcpy(char1,"Number of Aircrafts\nHour of simulation\n\n");
string tmp2=fp.GetRoute();
tmp2.replace(tmp2.begin(),tmp2.end()," ",","); #PROBLEM IS IN THIS LINE
const char *tmp3=tmp2.c_str();
strcat(char1,tmp3);
}
return char1;
}
The fp.GetRoute()is a string like this: AMLUH UL619 BKD UM748 RUTOL
Also, now that I'm talking about memory allocation, I don't want any future problems with memory leaks, so when should I delete char1, knowing that the thread is going to call this function?

When you call std::string::replace, the best match is a fumction template whose third and fourth parameters are input iterators. So the string literals you are passing are interpreted as the start and end of a range, when they are not. This leads to undefined behaviour.
You can fix this easily by using the algorithm std::replace instead:
std::replace(tmp2.begin(),tmp2.end(),' ',',');
Note that here the third and fourth parameters are single chars.

The answer from #juanchopanza correctly identifies and fixes the original question, but since you've asked about memory leaks in general, I'd like to additionally suggest that you replace your function with something that doesn't use new or delete or strcpy or strcat.
std::string loop_planes() {
std::string res("Number of Aircrafts\nHour of simulation\n\n");
for (...) {
std::string route = fp.GetRoute();
std::replace(route.begin(), route.end(), ' ',',');
res += route;
}
return res;
}
This doesn't require any explicit memory allocation or deletion and does not leak memory. I also took the liberty of changing the return type from char * to std::string to eliminate messy conversions.

ElGamal encryption example?

I apologise in advance for the n00bishness of asking this question, but I've been stuck for ages and I'm struggling to figure out what to do next. Essentially, I am trying to perform ElGamal encryption on some data. I have been given the public part of an ephemeral key pair and a second static key, as well as some data. If my understanding is correct, this is all I need to perform the encryption, but I'm struggling to figure out how using Crypto++.
I've looked endlessly for examples, but I can find literally zero on Google. Ohloh is less than helpful as I just get back endless pages of the cryptopp ElGamal source files, which I can't seem to be able to figure out (I'm relatively new to using Crypto++ and until about 3 days ago hadn't even heard of ElGamal).
The closest I've been able to find as an example comes from the CryptoPP package itself, which is as follows:
bool ValidateElGamal()
{
cout << "\nElGamal validation suite running...\n\n";
bool pass = true;
{
FileSource fc("TestData/elgc1024.dat", true, new HexDecoder);
ElGamalDecryptor privC(fc);
ElGamalEncryptor pubC(privC);
privC.AccessKey().Precompute();
ByteQueue queue;
privC.AccessKey().SavePrecomputation(queue);
privC.AccessKey().LoadPrecomputation(queue);
pass = CryptoSystemValidate(privC, pubC) && pass;
}
return pass;
}
However, this doesn't really seem to help me much as I'm unaware of how to plug in my already computed values. I am not sure if I'm struggling with my understanding of how Elgamal works (entirely possible) or if I'm just being an idiot when it comes to using what I've got with CryptoPP. Can anyone help point me in the right direction?

I have been given the public part of an ephemeral key pair and a second static key, as well as some data.
We can't really help you here because we know nothing about what is supposed to be done.
The ephemeral key pair is probably for simulating key exchange, and the static key is long term for signing the ephemeral exchange. Other than that, its anybody's guess as to what's going on.
Would you happen to know what the keys are? is the ephemeral key a Diffie-Hellman key and the static key an ElGamal signing key?
If my understanding is correct, this is all I need to perform the encryption, but I'm struggling to figure out how using Crypto++.
For the encryption example, I'm going to cheat a bit and use an RSA encryption example and port it to ElGamal. This is about as difficult as copy and paste because both RSA encryption and ElGamal encryption adhere to the the PK_Encryptor and PK_Decryptor interfaces. See the PK_Encryptor and PK_Decryptor classes for details. (And keep in mind, you might need an ElGamal or Nyberg-Rueppel (NR) signing example).
Crypto++ has a cryptosystem built on ElGamal. The cryptosystem will encrypt a large block of plain text under a symmetric key, and then encrypt the symmetric key under the ElGamal key. I'm not sure what standard it follows, though (likely IEEE's P1363). See SymmetricEncrypt and SymmetricDecrypt in elgamal.h.
The key size is artificially small so the program runs quickly. ElGamal is a discrete log problem, so its key size should be 2048-bits or higher in practice. 2048-bits is blessed by ECRYPT (Asia), ISO/IEC (Worldwide), NESSIE (Europe), and NIST (US).
If you need to save/persist/load the keys you generate, then see Keys and Formats on the Crypto++ wiki. The short answer is to call decryptor.Save() and decryptor.Load(); and stay away from the {BER|DER} encodings.
If you want, you can use a standard string rather than a SecByteBlock. The string will be easier if you are interested in printing stuff to the terminal via cout and friends.
Finally, there's now a page on the Crypto++ Wiki covering the topic with the source code for the program below. See Crypto++'s ElGamal Encryption.
#include <iostream>
using std::cout;
using std::cerr;
using std::endl;
#include <cryptopp/osrng.h>
using CryptoPP::AutoSeededRandomPool;
#include <cryptopp/secblock.h>
using CryptoPP::SecByteBlock;
#include <cryptopp/elgamal.h>
using CryptoPP::ElGamal;
using CryptoPP::ElGamalKeys;
#include <cryptopp/cryptlib.h>
using CryptoPP::DecodingResult;
int main(int argc, char* argv[])
{
////////////////////////////////////////////////
// Generate keys
AutoSeededRandomPool rng;
cout << "Generating private key. This may take some time..." << endl;
ElGamal::Decryptor decryptor;
decryptor.AccessKey().GenerateRandomWithKeySize(rng, 512);
const ElGamalKeys::PrivateKey& privateKey = decryptor.AccessKey();
ElGamal::Encryptor encryptor(decryptor);
const PublicKey& publicKey = encryptor.AccessKey();
////////////////////////////////////////////////
// Secret to protect
static const int SECRET_SIZE = 16;
SecByteBlock plaintext( SECRET_SIZE );
memset( plaintext, 'A', SECRET_SIZE );
////////////////////////////////////////////////
// Encrypt
// Now that there is a concrete object, we can validate
assert( 0 != encryptor.FixedMaxPlaintextLength() );
assert( plaintext.size() <= encryptor.FixedMaxPlaintextLength() );
// Create cipher text space
size_t ecl = encryptor.CiphertextLength( plaintext.size() );
assert( 0 != ecl );
SecByteBlock ciphertext( ecl );
encryptor.Encrypt( rng, plaintext, plaintext.size(), ciphertext );
////////////////////////////////////////////////
// Decrypt
// Now that there is a concrete object, we can check sizes
assert( 0 != decryptor.FixedCiphertextLength() );
assert( ciphertext.size() <= decryptor.FixedCiphertextLength() );
// Create recovered text space
size_t dpl = decryptor.MaxPlaintextLength( ciphertext.size() );
assert( 0 != dpl );
SecByteBlock recovered( dpl );
DecodingResult result = decryptor.Decrypt( rng, ciphertext, ciphertext.size(), recovered );
// More sanity checks
assert( result.isValidCoding );
assert( result.messageLength <= decryptor.MaxPlaintextLength( ciphertext.size() ) );
// At this point, we can set the size of the recovered
// data. Until decryption occurs (successfully), we
// only know its maximum size
recovered.resize( result.messageLength );
// SecByteBlock is overloaded for proper results below
assert( plaintext == recovered );
// If the assert fires, we won't get this far.
if(plaintext == recovered)
cout << "Recovered plain text" << endl;
else
cout << "Failed to recover plain text" << endl;
return !(plaintext == recovered);
}
You can also create the Decryptor from a PrivateKey like so:
ElGamalKeys::PrivateKey k;
k.GenerateRandomWithKeySize(rng, 512);
ElGamal::Decryptor d(k);
...
And an Encryptor from a PublicKey:
ElGamalKeys::PublicKey pk;
privateKey.MakePublicKey(pk);
ElGamal::Encryptor e(pk);
You can save and load keys to and from disk as follows:
ElGamalKeys::PrivateKey privateKey1;
privateKey1.GenerateRandomWithKeySize(prng, 2048);
privateKey1.Save(FileSink("elgamal.der", true /*binary*/).Ref());
ElGamalKeys::PrivateKey privateKey2;
privateKey2.Load(FileSource("elgamal.der", true /*pump*/).Ref());
privateKey2.Validate(prng, 3);
ElGamal::Decryptor decryptor(privateKey2);
// ...
The keys are ASN.1 encoded, so you can dump them with something like Peter Gutmann's dumpasn1:
$ ./cryptopp-elgamal-keys.exe
Generating private key. This may take some time...
$ dumpasn1 elgamal.der
0 556: SEQUENCE {
4 257: INTEGER
: 00 C0 8F 5A 29 88 82 8C 88 7D 00 AE 08 F0 37 AC
: FA F3 6B FC 4D B2 EF 5D 65 92 FD 39 98 04 C7 6D
: 6D 74 F5 FA 84 8F 56 0C DD B4 96 B2 51 81 E3 A1
: 75 F6 BE 82 46 67 92 F2 B3 EC 41 00 70 5C 45 BF
: 40 A0 2C EC 15 49 AD 92 F1 3E 4D 06 E2 89 C6 5F
: 0A 5A 88 32 3D BD 66 59 12 A1 CB 15 B1 72 FE F3
: 2D 19 DD 07 DF A8 D6 4C B8 D0 AB 22 7C F2 79 4B
: 6D 23 CE 40 EC FB DF B8 68 A4 8E 52 A9 9B 22 F1
: [ Another 129 bytes skipped ]
265 1: INTEGER 3
268 257: INTEGER
: 00 BA 4D ED 20 E8 36 AC 01 F6 5C 9C DA 62 11 BB
: E9 71 D0 AB B7 E2 D3 61 37 E2 7B 5C B3 77 2C C9
: FC DE 43 70 AE AA 5A 3C 80 0A 2E B0 FA C9 18 E5
: 1C 72 86 46 96 E9 9A 44 08 FF 43 62 95 BE D7 37
: F8 99 16 59 7D FA 3A 73 DD 0D C8 CA 19 B8 6D CA
: 8D 8E 89 52 50 4E 3A 84 B3 17 BD 71 1A 1D 38 9E
: 4A C4 04 F3 A2 1A F7 1F 34 F0 5A B9 CD B4 E2 7F
: 8C 40 18 22 58 85 14 40 E0 BF 01 2D 52 B7 69 7B
: [ Another 129 bytes skipped ]
529 29: INTEGER
: 01 61 40 24 1F 48 00 4C 35 86 0B 9D 02 8C B8 90
: B1 56 CF BD A4 75 FE E2 8E 0B B3 66 08
: }
0 warnings, 0 errors.

Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

I've working with a legacy application and I'm trying to work out the difference between applications compiled with Multi byte character set and Not Set under the Character Set option.
I understand that compiling with Multi byte character set defines _MBCS which allows multi byte character set code pages to be used, and using Not set doesn't define _MBCS, in which case only single byte character set code pages are allowed.
In the case that Not Set is used, I'm assuming then that we can only use the single byte character set code pages found on this page: http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx
Therefore, am I correct in thinking that is Not Set is used, the application won't be able to encode and write or read far eastern languages since they are defined in double byte character set code pages (and of course Unicode)?
Following on from this, if Multi byte character set is defined, are both single and multi byte character set code pages available, or only multi byte character set code pages? I'm guessing it must be both for European languages to be supported.
Thanks,
Andy
Further Reading
The answers on these pages didn't answer my question, but helped in my understanding:
About the "Character set" option in visual studio 2010
Research
So, just as working research... With my locale set as Japanese
Effect on hard coded strings
char *foo = "Jap text: テスト";
wchar_t *bar = L"Jap text: テスト";
Compiling with Unicode
*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2
Compiling with Multi byte character set
*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2
Compiling with Not Set
*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2
Conclusion:
The character encoding doesn't have any effect on hard coded strings. Although defining chars as above seems to use the Locale defined codepage and wchar_t seems to use either UCS-2 or UTF-16.
Using encoded strings in W/A versions of Win32 APIs
So, using the following code:
char *foo = "C:\\Temp\\テスト\\テa.txt";
wchar_t *bar = L"C:\\Temp\\テスト\\テw.txt";
CreateFileA(bar, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
CreateFileW(foo, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
Compiling with Unicode
Result: Both files are created
Compiling with Multi byte character set
Result: Both files are created
Compiling with Not set
Result: Both files are created
Conclusion:
Both the A and W version of the API expect the same encoding regardless of the character set chosen. From this, perhaps we can assume that all the Character Set option does is switch between the version of the API. So the A version always expects strings in the encoding of the current code page and the W version always expects UTF-16 or UCS-2.
Opening files using W and A Win32 APIs
So using the following code:
char filea[MAX_PATH] = {0};
OPENFILENAMEA ofna = {0};
ofna.lStructSize = sizeof ( ofna );
ofna.hwndOwner = NULL ;
ofna.lpstrFile = filea ;
ofna.nMaxFile = MAX_PATH;
ofna.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
ofna.nFilterIndex =1;
ofna.lpstrFileTitle = NULL ;
ofna.nMaxFileTitle = 0 ;
ofna.lpstrInitialDir=NULL ;
ofna.Flags = OFN_PATHMUSTEXIST|OFN_FILEMUSTEXIST ;
wchar_t filew[MAX_PATH] = {0};
OPENFILENAMEW ofnw = {0};
ofnw.lStructSize = sizeof ( ofnw );
ofnw.hwndOwner = NULL ;
ofnw.lpstrFile = filew ;
ofnw.nMaxFile = MAX_PATH;
ofnw.lpstrFilter = L"All\0*.*\0Text\0*.TXT\0";
ofnw.nFilterIndex =1;
ofnw.lpstrFileTitle = NULL;
ofnw.nMaxFileTitle = 0 ;
ofnw.lpstrInitialDir=NULL ;
ofnw.Flags = OFN_PATHMUSTEXIST|OFN_FILEMUSTEXIST ;
GetOpenFileNameA(&ofna);
GetOpenFileNameW(&ofnw);
and selecting either:
C:\Temp\テスト\テopenw.txt
C:\Temp\テスト\テopenw.txt
Yields:
When compiled with Unicode
*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74
00 == UTF-16 or UCS-2
When compiled with Multi byte character set
*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74
00 == UTF-16 or UCS-2
When compiled with Not Set
*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74
00 == UTF-16 or UCS-2
Conclusion:
Again, the Character Set setting doesn't have a bearing on the behaviour of the Win32 API. The A version always seems to return a string with the encoding of the active code page and the W one always returns UTF-16 or UCS-2. I can actually see this explained a bit in this great answer: https://stackoverflow.com/a/3299860/187100.
Ultimate Conculsion
Hans appears to be correct when he says that the define doesn't really have any magic to it, beyond changing the Win32 APIs to use either W or A. Therefore, I can't really see any difference between Not Set and Multi byte character set.

No, that's not really the way it works. The only thing that happens is that the macro gets defined, it doesn't otherwise have a magic effect on the compiler. It is very rare to actually write code that uses #ifdef _MBCS to test this macro.
You almost always leave it up to a helper function to make the conversion. Like WideCharToMultiByte(), OLE2A() or wctombs(). Which are conversion functions that always consider multi-byte encodings, as guided by the code page. _MBCS is an historical accident, relevant only 25+ years ago when multi-byte encodings were not common yet. Much like using a non-Unicode encoding is a historical artifact these days as well.

In the reference it is stated that:
By definition, the ASCII character set is a subset of all
multibyte-character sets. In many multibyte character sets, each
character in the range 0x00 – 0x7F is identical to the character that
has the same value in the ASCII character set. For example, in both
ASCII and MBCS character strings, the 1-byte NULL character ('\0') has
value 0x00 and indicates the terminating null character.
As you guessed, by enabling _MBCS Visual Studio also supports ASCII single character set.
In a second reference, single character set seems to be supported even if we enable _MBCS:
MBCS/Unicode portability: Using the Tchar.h header file, you can build
single-byte, MBCS, and Unicode applications from the same sources.
Tchar.h defines macros prefixed with _tcs , which map to str, _mbs, or
wcs functions, as appropriate. To build MBCS, define the symbol _MBCS.
To build Unicode, define the symbol _UNICODE. By default, _MBCS is
defined for MFC applications. For more information, see Generic-Text
Mappings in Tchar.h.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Protobuf serialization length differs when using different values for field - c++

Related

Arduino RP2040 Pico Unique ID

gRPC memory leaks

Memory leak while trying to replace character in string

ElGamal encryption example?

Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

Categories

Resources