Retrieve CString file path from XML file - c++

I have an XML file with many values and a working C++ function that can retrieve these values
Two of these values are:
A file path such as: "C:\foo1\foo2" and
A file name: "foo3.txt"
Combining these together, they would become "C:\foo1\foo2\foo3.txt"
However, while trying to set a CString to save a file path, it will give an error because using the character, \, in a string is not allowed due to string notation and its interaction with the \ character.
I am using MFC, and I know WIN32 allows you to create a file path with / instead of \, so: "C:/foo1/foo2/foo3.txt" would work. I tested this in Windows Explorer and it worked.
I would like to collect the file path from XML file, but when it comes in, it will have \ instead of / in its file path, meaning it will not be possible to replace the character (the string coming in will have an error already due to XML not having a problem with the \ character.
How do I safely retrieve the path as a CString, ideally while converting any \ character to a / character.

Now I'm not familiar with the "CString" class you are refering to. Googling the API documentation just has the standard c style char array format commands, so I'm going to assume rightly or wrongly cstring is a char array.
The fact we are going to need to use an object that is not resizable means we either
Need to use the heap, which will be slow, and can leak memory if the memory isn't deleted later
Allow a maximum string length and accept it will be truncated if below this
Heap example (NOTE: I'm not using smart pointers as I assume they don't have access to them, else you'd just std::string and not do this.)
char* escapeString(const char* data, unsigned int length){
//multiplying by 1.5 means this could still truncate,
//but I'm making an educated guess it's not all bad characters.
const int newLen = (length + 1) * 1.5;
char* escaped = new char[newLen + 1];
unsigned int index = 0;
for(unsigned int i = 0; i < length && i < newLen; i++){
if(data[i] == '\\' || data[i] == '\"'){
escaped[index++] = '\\';
}
else if(data[i] == '%'){
escaped[index++] = '%';
}
//else anything else you want to escape
escaped[index++] = data[i];
}
//Make sure a null string is null terminatedescaped
escaped[index] = '\0';
return escaped;
}
int main() {
const char* stringWithBadChars = "I\"m not a %%good \\string";
char* escapedString = escapeString(stringWithBadChars, strlen(stringWithBadChars));
std::cout << escapedString;
delete [] escapedString;
return 0;
}
If we do this on the stack instead it would be a lot faster, but we are limited by the size of the buffer we give, and the size of the buffer in the function. We will return a bool if either fails.
bool escapeString(char* data, unsigned int length){
const int newLen = 1000;
char escaped[1001];
unsigned int index = 0;
for(unsigned int i = 0; i < length && i < newLen; i++){
if(data[i] == '\\' || data[i] == '\"'){
escaped[index++] = '\\';
}
else if(data[i] == '%'){
escaped[index++] = '%';
}
escaped[index++] = data[i];
}
//Make sure a null string is null terminatedescaped
memcpy(data, escaped, index);
escaped[index] = '\0';
return index < length && index < 1000;
}
You could probably get even more efficiency using memmov rather than copy it character by character. Doing it this way you also wouldn't need the second char array.

CString reserves some special characters. Have a look at the Format command as an example. The linked documentation refers you to: Format specification syntax: printf and wprintf functions.
The \ is used as mentioned in the comments to indicate a special character. For example:
\t will insert a tab character.
\" will insert a double quote character.
So when it hits the \ it expects the next character to be one of the special ones. Therefore, when you actually need a backslash, you use \\.
The linked article does explain about % but not the slash. However, tt is exactly the same with % because it too has special meaning. So you would use %% when you want the percent sign.

Related

How to pad char array with empty spaces on left and right hand side of the text

I am fairly new with C++ so for some people the answer to the quesiton I have might seem quite obvious.
What I want to achieve is to create a method which would return the given char array fill with empty spaces before and after it in order to meet certain length. So the effect at the end would be as if the given char array would be in the middle of the other, bigger char array.
Lets say we have a char array with HelloWorld!
I want the method to return me a new char array with the length specified beforehand and the given char array "positioned" in the middle of returning char array.
char ch[] = "HelloWorld";
char ret[20]; // lets say we want to have the resulting char array the length of 20 chars
char ret[20] = " HelloWorld "; // this is the result to be expected as return of the method
In case of odd number of given char array would like for it to be in offset of one space on the left of the middle.
I would also like to avoid any memory consuming strings or any other methods that are not in standard library - keep it as plain as possible.
What would be the best way to tackle this issue? Thanks!
There are mainly two ways of doing this: either using char literals (aka char arrays), like you would do in C language or using built-in std::string type (or similar types), which is the usual choice if you're programming in C++, despite there are exceptions.
I'm providing you one example for each.
First, using arrays, you will need to include cstring header to use built-in string literals manipulation functions. Keep in mind that, as part of the length of it, a char array always terminates with the null terminator character '\0' (ASCII code is 0), therefore for a DIM-dimensioned string you will be able to store your characters in DIM - 1 positions. Here is the code with comments.
constexpr int DIM = 20;
char ch[] = "HelloWorld";
char ret[DIM] = "";
auto len_ch = std::strlen(ch); // length of ch without '\0' char
auto n_blanks = DIM - len_ch - 1; // number of blank chars needed
auto half_n_blanks = n_blanks / 2; // half of that
// fill in from begin and end of ret with blanks
for (auto i = 0u; i < half_n_blanks; i++)
ret[i] = ret[DIM - i - 2] = ' ';
// copy ch content into ret starting from half_n_blanks position
memcpy_s(
ret + half_n_blanks, // start inserting from here
DIM - half_n_blanks, // length from our position to the end of ret
ch, // string we need to copy
len_ch); // length of ch
// if odd, after ch copied chars
// there will be a space left to insert a blank in
if (n_blanks % 2 == 1)
*(ret + half_n_blanks + len_ch) = ' ';
I chose first to insert blank spaces both to the begin and to the end of the string and then to copy the content of ch.
The second approach is far easier (to code and to understand). The max characters size a std::string (defined in header string) can contain is std::npos, which is the max number you can have for the type std::size_t (usually a typedef for unsigned int). Basically, you don't have to worry about a std::string max length.
std::string ch = "HelloWorld", ret;
auto ret_max_length = 20;
auto n_blanks = ret_max_length - ch.size();
// insert blanks at the beginning
ret.append(n_blanks / 2, ' ');
// append ch
ret += ch;
// insert blanks after ch
// if odd, simply add 1 to the number of blanks
ret.append(n_blanks / 2 + n_blanks % 2, ' ');
The approach I took here is different, as you can see.
Notice that, because of '\0', the result of these two methods are NOT the same. If you want to obtain the same behaviour, you may either add 1 to DIM or subtract 1 from ret_max_length.
Assuming that we know the size, s, of the array, ret and knowing that the last character of any char array is '\0', we find the length, l, of the input char array, ch.
int l = 0;
int i;
for(i=0; ch[i]!='\0'; i++){
l++;
}
Then we compute how many spaces we need on either side. If total_space is even, then there are equal spaces on either side. Otherwise, we can choose which side will have the extra space, in this case, the left side.
int total_spaces = size-l-1; // subtract by 1 to adjust for '\0' character
int spaces_right = 0, spaces_left = 0;
if((total_spaces%2) == 0){
spaces_left = total_spaces/2;
spaces_right = total_spaces/2;
}
else{
spaces_left = total_spaces/2;
spaces_right = (total_spaces/2)+1;
}
Then first add the left_spaces, then the input array, ch, and then the right_spaces to ret.
i=0;
while(spaces_left > 0){
ret[i] = ' ';
spaces_left--;
i++;
} // add spaces
ret[i] = '\0';
strcat(ret, ch); // concatenate ch to ret
while(spaces_right){
ret[i] = ' ';
spaces_right--;
i++;
}
ret[i] = '\0';
Make sure to include <cstring> to use strcat().

preprending and appending to a cstring

I have the following string tok_str which is like "default.png" I would like to preprend char ' and append char ' too.
That's what I have done, but the chars are appended and prepended in the wrong places
char *tok_str = const_cast<char*>(mReader->getAttributeValue(pAttrIdx));
char * mod_tok = new char[tok_str_len+2];
mod_tok[0] = '\'';
size_t len = strlen(tok_str);
size_t i;
memmove(mod_tok + len, mod_tok, strlen(mod_tok) + 1);
for (i = 0; i < len; ++i)
{
mod_tok[i] = tok_str[i];
}
char *dup;
char *cstr="'";
sprintf(mod_tok,"%s%s",cstr,(dup=strdup(mod_tok)));
free(dup);
If you want to continue using null-terminated byte strings there are a few things you need to think of and do.
The first is of course the null-terminated part. A string of X characters needs space for X+1 to include the terminator.
The second is that all you need is really a single sprintf (or better yet snprintf) call (once you allocated memory):
char* mod_tok = new char[strlen(tok_str) + 3]; // +2 for the extra characters, +1 for terminator
snprintf(mod_tok, strlen(tok_str) + 3, "'%s'", tok_str);
That is it, now you have added the single quotes in front and at the end of the original string.
There are a couple of things to improve:
usage of const when possible
len vs tok_str_len, use only one.
the memmove done in the middle seems to have no effect on the final result
pay attention to the indexes in the for loop
be aware that strlen doesn't count the NULL terminator
if your code starts to mix new/delete with free try to refactor it
That's my proposal:
//keep it const and protect your data
const char *tok_str = mReader->getAttributeValue(pAttrIdx);
//retrive the len once for all (const, no one is supposed to change it)
const size_t len = strlen(tok_str);
char * mod_tok = new char[len+3]; // 2 "'" + '\0'
mod_tok[0] = '\'';
for (size_t i = 0; i < len; ++i)
{
mod_tok[i+1] =tok_str[i];
}
mod_tok[len+1] = '\'';
mod_tok[len+2] = '\0';
//done.
//later...
delete[] mod_tok;
Enjoy your coding!
Stefano
PS: I agree, though, that a use of std::string is reccomended.

Why am i getting two different strings?

I wrote a very simple encryption program to practice c++ and i came across this weird behavior. When i convert my char* array to a string by setting the string equal to the array, then i get a wrong string, however when i create an empty string and add append the chars in the array individually, it creates the correct string. Could someone please explain why this is happening, i just started programming in c++ last week and i cannot figure out why this is not working.
Btw i checked online and these are apparently both valid ways of converting a char array to a string.
void expandPassword(string* pass)
{
int pHash = hashCode(pass);
int pLen = pass->size();
char* expPass = new char[264];
for (int i = 0; i < 264; i++)
{
expPass[i] = (*pass)[i % pLen] * (char) rand();
}
string str;
for (int i = 0; i < 264; i++)
{
str += expPass[i];// This creates the string version correctly
}
string str2 = expPass;// This creates much shorter string
cout <<str<<"\n--------------\n"<<str2<<"\n---------------\n";
delete[] expPass;
}
EDIT: I removed all of the zeros from the array and it did not change anything
When copying from char* to std::string, the assignment operator stops when it reaches the first NULL character. This points to a problem with your "encryption" which is causing embedded NULL characters.
This is one of the main reasons why encoding is used with encrypted data. After encryption, the resulting data should be encoded using Hex/base16 or base64 algorithms.
a c-string as what you are constructing is a series of characters ending with a \0 (zero) ascii value.
in the case of
expPass[i] = (*pass)[i % pLen] * (char) rand();
you may be inserting \0 into the array if the expression evaluates to 0, as well as you do not append a \0 at the end of the string either to assure it being a valid c-string.
when you do
string str2 = expPass;
it can very well be that the string gets shorter since it gets truncated when it finds a \0 somewhere in the string.
This is because str2 = expPass interprets expPass as a C-style string, meaning that a zero-valued ("null") byte '\0' indicates the end of the string. So, for example, this:
char p[2];
p[0] = 'a';
p[1] = '\0';
std::string s = p;
will cause s to have length 1, since p has only one nonzero byte before its terminating '\0'. But this:
char p[2];
p[0] = 'a';
p[1] = '\0';
std::string s;
s += p[0];
s += p[1];
will cause s to have length 2, because it explicitly adds both bytes to s. (A std::string, unlike a C-style string, can contain actual null bytes — though it's not always a good idea to take advantage of that.)
I guess the following line cuts your string:
expPass[i] = (*pass)[i % pLen] * (char) rand();
If rand() returns 0 you get a string terminator at position i.

Convert wchar_t to char

I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
Why not just use a library routine wcstombs.
assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.
You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.
An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}
Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65

C++, Get text from a website, part 3

So, thanks for all the help guys, I am just have one last problem, I am putting the website source in a char var, and then reading the product title (I have gotten that), however it only works if I take part of the source, or only the html from one of the featured products on neweggs page. I think the program is crashing, because it doesnt know which title to pick when I need to get all three titles and put them into an array. Any ideas? Thanks. Here is the parser code:
http://paste2.org/p/809045
Any solution is greatly appreciated.
/**
* num_to_next -
* takes in a pointer to a string and then counts how many
* characters are until the next occurance of the specified character
* #ptr: the pointer to a string in which to search
* #c: char delimiter to search until
**/
int num_to_next(char *ptr, char c)
{
unsigned int i = 0;
for (i = 0; i < strlen(ptr); i++) {
if (ptr[i] == c) {
return i;
}
}
return -1;
}
/**
* space_to_underscore -
* this should help to alleviate some problems when dealing with
* filepaths that have spaces in them (basically just changes all
* spaces in a string to underscores)
* #string: the string to convert, yo
**/
int space_to_underscore(char *string)
{
for (unsigned int i = 0; i < strlen(string); i++) {
if (string[i] == ' ') {
string[i] = '_';
}
}
return 0;
}
char *file_name = (char *)malloc(sizeof(char *)); // allocate memory for where the app name will be stored
memset(file_name, 0, sizeof(file_name)); // zero the memory
char td_one[] = "<ul class="featureCells"><li id="ItemCell" class="cell">";
char *pstr = strstr(buffer, td_one) + strlen(td_one) + 6; // buffer is the source
char *poop = pstr + num_to_next(pstr, '>') + 1;
int blah = num_to_next(poop, '<');
strncpy(file_name, poop, blah);
// null terminate the string //
file_name[blah] = '\0';
space_to_underscore(file_name);
MessageBox(NULL, file_name, "Product Name", MB_OK);
free(file_name);
I'm not sure if these are your only problems, but...
First, you can't do char* filename = (char*)malloc(sizeof(char*)) (well, you can, but that's not what you actually want from your app).
What you want to have is char* filename = (char*)malloc(SIZE_OF_YOUR_STRING * sizeof(char));, so you can't allocate just an abstract buffer for your string and you have to know the expected size of it. Actually, here you don't have to write sizeof(char) because it always equals 1, but this sometimes this way of writing the code can help you(or somebody else) to understand that this block would store a string as array of chars).
Another example on the same problem: char* filename = (char*)malloc(65); - is ok and will allocate a block of memory to store 65 char symbols.
If we go further (where you're doing the memset), char* is a plain pointer and sizeof(filename) in your case would return the size of your pointer, but not your string. What you should write here is strlen(filename).