What does the symbol \0 mean in a string-literal? - c++

Consider following code:
char str[] = "Hello\0";
What is the length of str array, and with how much 0s it is ending?

sizeof str is 7 - five bytes for the "Hello" text, plus the explicit NUL terminator, plus the implicit NUL terminator.
strlen(str) is 5 - the five "Hello" bytes only.
The key here is that the implicit nul terminator is always added - even if the string literal just happens to end with \0. Of course, strlen just stops at the first \0 - it can't tell the difference.
There is one exception to the implicit NUL terminator rule - if you explicitly specify the array size, the string will be truncated to fit:
char str[6] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 6 (with one NUL)
char str[7] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 7 (with two NULs)
char str[8] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 8 (with three NULs per C99 6.7.8.21)
This is, however, rarely useful, and prone to miscalculating the string length and ending up with an unterminated string. It is also forbidden in C++.

The length of the array is 7, the NUL character \0 still counts as a character and the string is still terminated with an implicit \0
See this link to see a working example
Note that had you declared str as char str[6]= "Hello\0"; the length would be 6 because the implicit NUL is only added if it can fit (which it can't in this example.)
§ 6.7.8/p14 An array of
character type may be initialized by a
character string literal, optionally
enclosed in braces. Sucessive
characters of the character string
literal (including the terminating
null character if there is room or if
the array is of unknown size)
initialize the elements of the array.
Examples
char str[] = "Hello\0"; /* sizeof == 7, Explicit + Implicit NUL */
char str[5]= "Hello\0"; /* sizeof == 5, str is "Hello" with no NUL (no longer a C-string, just an array of char). This may trigger compiler warning */
char str[6]= "Hello\0"; /* sizeof == 6, Explicit NUL only */
char str[7]= "Hello\0"; /* sizeof == 7, Explicit + Implicit NUL */
char str[8]= "Hello\0"; /* sizeof == 8, Explicit + two Implicit NUL */

Specifically, I want to mention one situation, by which you may confuse.
What is the difference between "\0" and ""?
The answer is that "\0" represents in array is {0 0} and "" is {0}.
Because "\0" is still a string literal and it will also add "\0" at the end of it. And "" is empty but also add "\0".
Understanding of this will help you understand "\0" deeply.

Banging my usual drum solo of JUST TRY IT, here's how you can answer questions like that in the future:
$ cat junk.c
#include <stdio.h>
char* string = "Hello\0";
int main(int argv, char** argc)
{
printf("-->%s<--\n", string);
}
$ gcc -S junk.c
$ cat junk.s
... eliding the unnecessary parts ...
.LC0:
.string "Hello"
.string ""
...
.LC1:
.string "-->%s<--\n"
...
Note here how the string I used for printf is just "-->%s<---\n" while the global string is in two parts: "Hello" and "". The GNU assembler also terminates strings with an implicit NUL character, so the fact that the first string (.LC0) is in those two parts indicates that there are two NULs. The string is thus 7 bytes long. Generally if you really want to know what your compiler is doing with a certain hunk of code, isolate it in a dummy example like this and see what it's doing using -S (for GNU -- MSVC has a flag too for assembler output but I don't know it off-hand). You'll learn a lot about how your code works (or fails to work as the case may be) and you'll get an answer quickly that is 100% guaranteed to match the tools and environment you're working in.

What is the length of str array, and with how much 0s it is ending?
Let's find out:
int main() {
char str[] = "Hello\0";
int length = sizeof str / sizeof str[0];
// "sizeof array" is the bytes for the whole array (must use a real array, not
// a pointer), divide by "sizeof array[0]" (sometimes sizeof *array is used)
// to get the number of items in the array
printf("array length: %d\n", length);
printf("last 3 bytes: %02x %02x %02x\n",
str[length - 3], str[length - 2], str[length - 1]);
return 0;
}

char str[]= "Hello\0";
That would be 7 bytes.
In memory it'd be:
48 65 6C 6C 6F 00 00
H e l l o \0 \0
Edit:
What does the \0 symbol mean in a C string?
It's the "end" of a string. A null character. In memory, it's actually a Zero. Usually functions that handle char arrays look for this character, as this is the end of the message. I'll put an example at the end.
What is the length of str array? (Answered before the edit part)
7
and with how much 0s it is ending?
You array has two "spaces" with zero; str[5]=str[6]='\0'=0
Extra example:
Let's assume you have a function that prints the content of that text array.
You could define it as:
char str[40];
Now, you could change the content of that array (I won't get into details on how to), so that it contains the message: "This is just a printing test"
In memory, you should have something like:
54 68 69 73 20 69 73 20 6a 75 73 74 20 61 20 70 72 69 6e 74
69 6e 67 20 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00
So you print that char array. And then you want a new message. Let's say just "Hello"
48 65 6c 6c 6f 00 73 20 6a 75 73 74 20 61 20 70 72 69 6e 74
69 6e 67 20 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00
Notice the 00 on str[5]. That's how the print function will know how much it actually needs to send, despite the actual longitude of the vector and the whole content.

'\0' is referred to as NULL character or NULL terminator
It is the character equivalent of integer 0(zero) as it refers to nothing
In C language it is generally used to mark an end of a string.
example string a="Arsenic";
every character stored in an array
a[0]=A
a[1]=r
a[2]=s
a[3]=e
a[4]=n
a[5]=i
a[6]=c
end of the array contains ''\0' to stop the array memory allocation for the string 'a'.

Related

Put a string of hexadecimal values directly into memory

I am working on a project in which I take the hexadecimal memory values of a variable/struct and print them into a file.
My goal is to get the hexadecimal memory values from that file and place it back in a pointer pointing to an "empty" variable. The part in which I get the hexadecimal memory elements works like this:
template <typename T>
void encode (T n, std::ofstream& file) {
char *ptr = reinterpret_cast<char*>(&n);
for (int i = 0; i < sizeof(T); i++) {
unsigned int byte = static_cast<unsigned int>(ptr[i]);
file << std::setw(2) << std::setfill('0') << std::hex << (byte & 0xff) << " ";
}
}
This piece of code results in creating the following hexadecimal string:
7b 00 00 00 33 33 47 41 d9 22 00 00 01 ff 02 00 03 14 00 00 c6 1f 00 00
Now I want to place these hexadecimal values directly back into memory, but at a different location. (See it as a client recieving this string and having to decode it)
My problem now is that I don't know how to put it directly into memory. I've tried the following and unfortunately failed:
template<typename T>
void decode(T* ptr, std::ifstream& file){
//Line of hex values
std::string line;
std::getline(file,line);
//Size of string
int n = line.length();
//Converting the string into char *
char * array = new char[n];
strcpy(array, line.c_str());
//copying the char * into the pointer given to the function
memcpy(ptr,array,n);
}
This is the item which will be encoded. Its memory pattern is the same as in the outputted file:
This is the result I'm getting which as you can see stores the char * into memory but not the way I want it:
The expected result is that the decoded variable should have the same memory pattern as the encoded variable, how can I do this?
std::getline(file,line);
This reads exactly what's in the file, character by character.
You indicate that your file contains this hexadecimal string:
7b 00 00 00 33 33 47 41 d9 22 00 00 01 ff 02 00 03 14 00 00 c6 1f 00 00
That is: the first character in the file is '7'. The next one is 'b', then a space character. And so on.
That's what you will get in your file, after std::getline() returns. That is, the first character of file will be 7, the next one will be b, the next one will be a space, and so on.
My problem now is that I don't know how to put it directly into memory.
No, your problem seems to be that you need to convert the read line of text back into actual, binary, raw bytes. You will need to write some code do it, first. You will need to write additional code that does the exact opposite of what you did here:
file << std::setw(2) << std::setfill('0') << std::hex << (byte & 0xff) << " ";
The additional code, that needs to be written, does exactly the opposite of this. That is, once done, the first byte in your read buffer will be 0x7B, instead of three characters "7b", and so on.
There are many different ways to do it, ranging between using istringstream to writing a very simple hex-to-decimal conversion function. If you flip through the pages in your C++ textbook you are likely to find some sample code to do that, this is a fairly common algorithm that's offered as an example of a basic, logical test in most introductory textbooks.
And once you do that, you can copy it into your pointer. You cannot use strcpy(), for that, of course, because it copies whatever it sees up until the first 00 byte. You'll need to use std::copy, or maybe even your own, manual, copy loop.

converting a string read from binary file to integer

I have a binary file. i am reading 16 bytes at a time it using fstream.
I want to convert it to an integer. I tried atoi. but it didnt work.
In python we can do that by converting to byte stream using stringobtained.encode('utf-8') and then converting it to int using int(bytestring.hex(),16). Should we follow such an elloborate steps as done in python or is there a way to convert it directly?
ifstream file(binfile, ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg(0, ios::beg);
while (!file.eof())
{
file.read(memblock, 16);
int a = atoi(memblock); // doesnt work 0 always
cout << a << "\n";
memset(memblock, 0, sizeof(memblock));
}
file.close();
Edit:
This is the sample contents of the file.
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
04 00 01 01 00 40 20 20 00 00 05 A3 00 00 00 47
00 00 00 2E 00 00 00 3B 00 00 00 04 00 00 00 01
I need to read it as 16 byte i.e. 32 hex digits at a time.(i.e. one row in the sample file content) and convert it to integer.
so when reading 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00, i should get, 110748049513798795666017677735771517696
But i couldnt do it. I always get 0 even after trying strtoull. Am i reading the file wrong, or what am i missing.
You have a number of problems here. First is that C++ doesn't have a standard 128-bit integer type. You may be able to find a compiler extension, see for example Is there a 128 bit integer in gcc? or Is there a 128 bit integer in C++?.
Second is that you're trying to decode raw bytes instead of a character string. atoi will stop at the first non-digit character it runs into, which 246 times out of 256 will be the very first byte, thus it returns zero. If you're very unlucky you will read 16 valid digits and atoi will start reading uninitialized memory, leading to undefined behavior.
You don't need atoi anyway, your problem is much simpler than that. You just need to assemble 16 bytes into an integer, which can be done with shifting and or operators. The only complication is that read wants a char type which will probably be signed, and you need unsigned bytes.
ifstream file(binfile, ios::in | ios::binary);
char memblock[16];
while (file.read(memblock, 16))
{
uint128_t a = 0;
for (int i = 0; i < 16; ++i)
{
a = (a << 8) | (static_cast<unsigned int>(memblock[i]) & 0xff);
}
cout << a << "\n";
}
file.close();
It the number is binary what you want is:
short value ;
file.read(&value, sizeof (value));
Depending upon how the file was written and your processor, you may have to reverse the bytes in value using bit operations.

Convert wchar_t to char with boost locale

My goal is to convert wchar_t to char, my approach was to use boost::locale (using boost 1.60). For example,
wchar_t * myWcharString = "0000000002" (Memory 0x 30 00 30 00 ... 32 00)
to
char * myCharString = "0000000002" (Memory 0x 30 30 ... 32)
I wrote a function:
inline char* newCharFromWchar(wchar_t * utf16String) {
char * cResult = NULL;
try {
std::string szResult = boost::locale::conv::from_utf(utf16String, "UTF-8");
cResult = new char[szResult.size() + 1];
memset(reinterpret_cast<void*>(cResult), 0, szResult.size() + 1);
memcpy(reinterpret_cast<void*>(cResult),
reinterpret_cast<const void*>(szResult.c_str()),
szResult.size());
}
catch (...) {
// boost::locale::conv might throw
}
return cResult;
}
Now the problem is that with VS2013 it behaves differently to gcc and clang, i.e.
// VS 2013 behaves as expected
wchar_t * utf16String = "0000000002" (Memory 0x 30 00 30 00 ... 32 00)
char * cResult = "0000000002" (Memory 0x 30 30 ... 32)
// both gcc and clang NOT as expected:
wchar_t * utf16String = "0000000002" (Memory 0x 30 00 30 00 ... 32 00)
char * cResult = "2" (Memory 0x 32)
Both boost implementation of gcc and clang seem to use only the last 2 bytes of my input wchar_t, though it is parsed correctly regarding the start and end address of the input.
What am I missing?
VS2013 takes wchar_t as 16-Bit character, while both gcc and clang take it a 32-Bit character (on my machine).
So if I store 0x 30 00 30 00 ... 32 00 as wchar_t it only works with VS2013 as expected. boost::locale will assume that 0x 30 00 30 00 is a single character instead of two, as I expected. The resulting output therefore is completely different between these platforms.

UTF-16 to UTF8 with WideCharToMultiByte problems

int main(){
//"Chào" in Vietnamese
wchar_t utf16[] =L"\x00ff\x00fe\x0043\x0000\x0068\x0000\x00EO\x0000\x006F";
//Dump utf16: FF FE 43 0 68 0 E 4F 0 6F (right)
int size = WideCharToMultiByte(CP_UTF8,0,utf16,-1,NULL,0,NULL,NULL);
char *utf8 = new char[size];
int k = WideCharToMultiByte(CP_UTF8,0,utf16,-1,utf8 ,size,NULL,NULL);
//Dump utf8: ffffffc3 fffffbf ffffc3 ffffbe 43 0
}
Here is my code, when i convert it string into UTF-8, it show a wrong result, so what is wrong with my code?
wchar_t utf16[] = L"\uFEFFChào";
int size = 5;
for (int i = 0; i < size; ++i) {
std::printf("%X ", utf16[i]);
}
This program prints out: FEFF 43 68 E0 6F
If printing out each wchar_t you've read from a file prints out FF FE 43 0 68 0 E 4F 0 6F then the UTF-16 data is not being read from the file correctly.. Those values represent the UTF-16 string: `L"ÿþC\0h\0à\0o".
You don't show your code for reading from the file, but here's one way to do it correctly:
https://stackoverflow.com/a/10504278/365496
You're reading the file incorrectly. Your dump of the input is showing single bytes in wide characters. Your dump of the output is the byte sequence that results from encoding L"\xff\xfe\x43" to UTF-8. The string is being truncated at the first \x0000 in the input.

Why does RegSetValueEx work even when I break the rule about accounting for NUL termination in the length?

I've got a simple program that adds calc.exe to startup:
#include <windows.h>
#include <tchar.h>
int main(){
_tprintf(TEXT("Adding calc.exe to SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run...\n"));
HKEY hRegRunKey;
LPCTSTR lpKeyName = TEXT("SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run");
LPCTSTR lpKeyValue = TEXT("Calculator");
LPCTSTR lpProgram = TEXT("C:\\WINDOWS\\system32\\calc.exe");
DWORD cchProgram = _tcslen(lpProgram);
_tprintf(TEXT("Path: %s. \n"), lpProgram);
_tprintf(TEXT("Length: %d. \n"), cchProgram);
if(RegOpenKeyEx( HKEY_LOCAL_MACHINE, lpKeyName, 0, KEY_SET_VALUE, &hRegRunKey) == ERROR_SUCCESS){
if(RegSetValueEx(hRegRunKey, lpKeyValue, 0, REG_SZ, (const BYTE *)lpProgram, cchProgram * sizeof(TCHAR)) != ERROR_SUCCESS){
_tprintf(TEXT("ERROR: Can't set key value.\n"));
exit(1);
}else{
_tprintf(TEXT("Key has been added sucessfully.\n"));
}
}
Sleep(5000);
RegCloseKey(hRegRunKey);
}
For me the world of c/c++/WIN32API is still full of misteries... so I have few questions.
1. When I define string is it automatically null terminated?
LPCTSTR lpProgram = TEXT("C:\\WINDOWS\\system32\\calc.exe");
or should it be done:
LPCTSTR lpProgram = TEXT("C:\\WINDOWS\\system32\\calc.exe\0");
2. In my code is final argument to RegSetValueEx set to correct value?
From MSDN - RegSetValueEx function page:
cbData [in] The size of the information pointed to by the lpData
parameter, in bytes. If the data is of type REG_SZ, REG_EXPAND_SZ, or
REG_MULTI_SZ, cbData must include the size of the terminating null
character or characters.
cchProgram is set to 28 characters without null termination. On my system(because of UNICODE I think?) cchProgram * sizeof(TCHAR) = 56.
Shouldn't I set it to 58 to add null termination?
When I run this program, as it is above, without any modifications and I'll check Calculator value in registry via Modify binary date I get:
43 00 3A 00 5C 00 57 00 C.:.\.W.
49 00 4E 00 44 00 4F 00 I.N.D.O.
57 00 53 00 5C 00 73 00 W.S.\.s.
79 00 73 00 74 00 65 00 y.s.t.e.
6D 00 33 00 32 00 5C 00 m.3.2.\.
63 00 61 00 6C 00 63 00 c.a.l.c.
2E 00 65 00 78 00 65 00 ..e.x.e.
00 00 ..
Its 58 bytes including null termination. I'am confuse:/
UPDATE
Accounting for a NULL character by adding 1 to string length when calculating cbData yields exactly same result as without adding it.
cchProgram * sizeof(TCHAR) produces same data entry as (cchProgram + 1) * sizeof(TCHAR)
Providing value smaller then a string length doesn't add NULL byte and copies given number of bytes.
27 * sizeof(TCHAR) as cbData produces:
43 00 3A 00 5C 00 57 00 C.:.\.W.
49 00 4E 00 44 00 4F 00 I.N.D.O.
57 00 53 00 5C 00 73 00 W.S.\.s.
79 00 73 00 74 00 65 00 y.s.t.e.
6D 00 33 00 32 00 5C 00 m.3.2.\.
63 00 61 00 6C 00 63 00 c.a.l.c.
2E 00 65 00 78 00 ..e.x.
I am on some old XP, service pack god knows what, I don't know how other version of windows would handle it.
1: Yes, it will be null terminated without the need for \0.
Double quoted strings (") are literal constants whose type is in fact a null-terminated array of characters. So string literals enclosed between double quotes always have a null character ('\0') automatically appended at the end.
2: _tcslen() doesn't include the null terminator. You can add sizeof(TCHAR) to add it.
The reason the value still works is probably because Windows tries to be robust even when given incorrect input. It is probably automatically appending the null terminator for you. However, because the documentation says you must include the null terminator it may not always append it.
When I define string is it automatically null terminated?
String literals are null-terminated, yes. "Hello" is actually {'H', 'e', 'l', 'l', 'o', '\0'}.
In my code is final argument to RegSetValueEx set to correct value?
You're right that you need the null terminator. An easier way would be sizeof(TEXT("C:\\WINDOWS\\system32\\calc.exe")) if the string literal is short, since sizeof("Hello") is 6; it includes the null-terminator, but in most cases, you'll need your variable and will have to add one to the length you get from string character-counting functions, since they don't include the null-terminator.
Ben Voigt made an excellent point below that a const TCHAR[] program = TEXT("text"); can be used the same way as a literal in the call (sizeof(program)), but it a lot more maintainable when you want to change one less place in the code, which is a must for any actual project instead of a really small test, and even that can grow.
Finally, there are two things you should get out of your head early:
Hungarian notation: Don't do it. It's outdated and rather pointless.
TCHAR: Just use wide strings with any Windows API functions you can.
What you're doing absolutely right is checking function calls for errors. You wouldn't believe how many problems asked about can be solved by checking for failure and using GetLastError when the documentation says to.
Since you asked how you're supposed to use C++ facilities, here's one way, with a couple changes that make more sense for using C++:
#include <windows.h>
int main(){
//R means raw string literal. Note one backslash
std::cout << R"(Adding calc.exe to SOFTWARE\Microsoft\Windows\CurrentVersion\Run...)" << '\n';
const WCHAR[] keyName = LR"(SOFTWARE\Microsoft\Windows\CurrentVersion\Run)");
std::cout << "Enter program name: ";
std::wstring keyValue;
if (!std::getline(std::wcin, keyValue)) {/*error*/}
std::cout << "Enter full program path: ";
std::wstring program;
if (!std::getline(std::wcin, program)) {/*error*/}
std::wcout << "Path: " << program << ".\n";
std::cout << "Length: " << program.size() << ".\n";
HKEY runKey;
if(RegOpenKeyExW(HKEY_LOCAL_MACHINE, keyName, 0, KEY_SET_VALUE, &runKey)) {/*error*/}
if(RegSetValueExW(runKey, keyValue.c_str(), 0, REG_SZ, reinterpret_cast<const BYTE *>(program.c_str()), (program.size() + 1) * 2)) {
std::cout << "ERROR: Can't set key value.\n";
return 1;
}
if (RegCloseKey(runKey)) {/*error*/}
std::cout << "Key has been added successfully.\n";
std::cout << "Press enter to continue..."
std::cin.get();
}
A better way to do this using C++ idioms would be to at least have a RegKey RAII class that calls RegCloseKey in its destructor and saves you the work. At the very least, it could be used like this:
RegKey key(HKEY_LOCAL_MACHINE, keyName, KEY_SET_VALUE);
RegSetValueExW(key, ...); //could have implicit or explicit conversion, fill in the ...
//RegCloseKey called when key goes out of scope