C++ new line not translating - c++

First off, I'm a complete beginner at C++.
I'm coding something using an API, and would like to pass text containing new lines to it, and have it print out the new lines at the other end.
If I hardcode whatever I want it to print out, like so
printInApp("Hello\nWorld");
it does come out as separate lines in the other end, but if I retrieve the text from the app using a method that returns a const char then pass it straight to printInApp (which takes const char as argument), it comes out as a single line.
Why's this and how would I go about to fix it?

It is the compiler that process escape codes in string literals, not the runtime methods. This is why you can for example have "char c = '\n';" since the compiler just compiles it as "char c = 10".
If you want to process escape codes in strings such as '\' and 'n' as separate characters (eg read as such from a file), you will need to write (or use an existing one) a string function which finds the escape codes and converts them to other values, eg converting a '\' followed by a 'n' into a newline (ascii value 10).

Related

How to convert accented character to Hexadecimal Unicode in VBScript? [duplicate]

I'd like to create a .properties file to be used in a Java program from a VBScript. I'm going to use some strings in languages that use characters outside the ASCII map. So, I need to replace these characters for its UTF code. This would be \u0061 for a, \u0062 fro b and so on.
Is there a way to get the UTF code for a char in VBScript?
VBScript has the AscW function that returns the Unicode (wide) code of the first character in the specified string.
Note that AscW returns the character code as a decimal number, so if you need it in a specific format, you'll have to write some additional code for that (and the problem is, VBScript doesn't have decent string formatting functions). For example, if you need the code formatted as \unnnn, you could use a function like this:
WScript.Echo ToUnicodeChar("✈") ''# \u2708
Function ToUnicodeChar(Char)
str = Hex(AscW(Char))
ToUnicodeChar = "\u" & String(4 - Len(str), "0") & str
End Function

Decoding %E6%B0%94%E6%97%8B%E5%93%88%E5%88%A9.txt to a valid string

I am trying to decode a filename*= field of content disposition header. I get a string something like:
%E6%B0%94%E6%97%8B%E5%93%88%E5%88%A9.txt
What I have figured out that replacing % to \x works fine and I get the correct file name:
气旋哈利.txt
Is there a standard way of doing this in C++? Is there any library available to decode this?
I tried
boost::replace_all(name, "%x","\\x");
std::locale::generator gen;
std::locale locl = gen.generate("en_US.utf-8");
decoded_data = boost::locale::conv::from_utf( encoded_data, locl);
But it prints the replaced string instead of chinese characters.
\xE6\xB0\x94\xE6\x97\x8B\xE5\x93\x88\xE5\x88\xA9.txt
Any Idea where am I going wrong?
Replacing escape code like "\xE6" only work in string and character literals, not generally in strings. That's because it's handled by the compiler when it compiles the program.
However, it's not very hard to do yourself, using a simple loop that check for the '%' character, gets the next two characters and convert them to a number and use that number as a "character".

Convert path to \\

Okay, after two days of searching the web and MSDN, I didn't found any real solution to this problem, so I'm gonna ask here in hope I've overlooked something.
I have open dialog window, and after I get location from selected file, it gives the string in following way C:\file.exe. For next part of mine program I need C:\\file.exe. Is there any Microsoft function that can solve this problem, or some workaround?
ofn.lpstrFile = fileName;
char fileNameStr[sizeof(fileName)+1] = "";
if (GetOpenFileName(&ofn))
strcpy(fileNameStr, fileName);
DeleteFile(fileName); // doesn't works, invalid path
I've posted only this part of code, because everything else works fine and isn't relevant to this problem. Any assistence is greatly appreciated, as I'm going mad in last two days.
You are confusing the requirement in C and C++ to escape backslash characters in string literals with what Windows requires.
Windows allows double backslashes in paths in only two circumstances:
Paths that begin with "\\?\"
Paths that refer to share names such as "\\myserver\foo"
Therefore, "C:\\file.exe" is never a valid path.
The problem here is that Microsoft made the (disastrous) decision decades ago to use backslashes as path separators rather than forward slashes like UNIX uses. That decision has been haunting Windows programmers since the early 1980s because C and C++ use the backslash as an escape character in string literals (and only in literals).
So in C or C++ if you type something like DeleteFile("c:\file.exe") what DeleteFile will see is "c:ile.exe" with an unprintable 0xf inserted between the colon and "ile.exe". That's because the compiler sees the backslash and interprets it to mean the next character isn't what it appears to be. In this case, the next character is an f, which is a valid hex digit. Therefore, the compiler converts "\f" into the character 0xf, which isn't valid in a file name.
So how do you create the path "c:\file.exe" in a C/C++ program? You have two choices:
"c:/file.exe"
"c:\\file.exe"
The first choice works because in the Win32 API (and only the API, not the command line), forward slashes in paths are accepted as path separators. The second choice works because the first backslash tells the compiler to treat the next character specially. If the next character is a hex digit, that's what you will get. If the next character is another backslash, it will be interpreted as exactly that and your string will be correct.
The library Boost.Filesystem "provides portable facilities to query and manipulate paths, files, and directories".
In short, you should not use strings as file or path names. Use boost::filesystem::path instead. You can still init it from a string or char* and you can convert it back to std::string, but all manipulations and decorations will be done correctly by the class.
Im guessing you mean convert "C:\file.exe" to "C:\\file.exe"
std::string output_string;
for (auto character : input_string)
{
if (character == '\\')
{
output_string.push_back(character);
}
output_string.push_back(character);
}
Please note it is actually looking for a single backslash to replace, the double backslash used in the code is to escape the first one.

find if string starts with \U in Python 3.3

I have a string and I want to find out if it starts with \U.
Here is an example
myStr = '\U0001f64c\U0001f60d\U0001f4a6\U0001f445\U0001f4af'
I was trying this:
myStr.startswith('\\U')
but I get False.
How can I detect \U in a string?
The larger picture:
I have a list of strings, most of them are normal English word strings, but there are a few that are similar to what I have shown in myStr, how can I distinguish them?
The original string does not have the character \U. It has the unicode escape sequence \U0001f64c, which is a single Unicode character.
Therefore, it does not make sense to try to detect \U in the string you have given.
Trying to detect the \U in that string is similar to trying to detect \x in the C string "\x90".
It makes no sense because the interpreter has read the sequence and converted it. Of course, if you want to detect the first Unicode character in that string, that works fine.
myStr.startswith('\U0001f64c')
Note that if you define the string with a real \U, like this, you can detect it just fine. Based on some experimentation, I believe Python 2.7.6 defaults to this behavior.
myStr = r'\U0001f64c\U0001f60d\U0001f4a6\U0001f445\U0001f4af'
myStr.startswith('\\U') # Returns True.
Update: The OP requested a way to convert from the Unicode string into the raw string above.
I will show the solution in two steps.
First observe that we can view the raw hex for each character like this.
>>> [hex(ord(x)) for x in myStr]
['0x1f64c', '0x1f60d', '0x1f4a6', '0x1f445', '0x1f4af']
Next, we format it by using a format string.
formatString = "".join(r'\U%08x' for x in myStr)
output = formatString % tuple(myChars)
output.startswith("\\U") # Returns True.
Note of course that since we are converting a Unicode string and we are formatting it this way deliberately, it guaranteed to start with \U. However, I assume your actual application is not just to detect whether it starts with \U.
Update2: If the OP is trying to differentiate between "normal English" strings and "Unicode Strings", the above approach will not work, because all characters have a corresponding Unicode representation.
However, one heuristic you might use to check whether a string looks like ASCII is to just check whether the values of each character are outside the normal ASCII range. Assuming that you consider the normal ASCII range to be between 32 and 127 (You can take a look here and decide what you want to include.), you can do something like the following.
def isNormal(myStr):
myChars = [ord(x) for x in myStr]
return all(x < 128 and x > 31 for x in myChars)
This can be done in one line, but I separated it to make it more readable.
Your string:
myStr = '\U0001f64c\U0001f60d\U0001f4a6\U0001f445\U0001f4af'
is not a foraign language text. It is 5 Unicode characters, which are (in order):
PERSON RAISING BOTH HANDS IN CELEBRATION
SMILING FACE WITH HEART-SHAPED EYES
SPLASHING SWEAT SYMBOL
TONGUE
HUNDRED POINTS SYMBOL
If you want to get strings that only contain 'normal' characters, you can use something like this:
if re.search(r'[^A-Za-z0-9\s]', myStr):
# String contained 'weird' characters.
Note that this will also trip on characters like é, which will sometimes be used in English on words with a French origin.

Unrecognizable character in C++

I'm programming an application that converts .txt files to bags of words for text mining. However, I keep getting non-alphabetic characters ( like ¾ and =) even though my application filters non-alphabetic characters:
My vector passes through a loop which erases strings that begins with a char with an ASCII value other than [65,90] (from A to Z). These characters also pass the isalpha test. It seems like these characters can't be distinguished from alphabetic characters.
I don't see how I can remove these weird strings dynamically from my vector of strings. I need help.
My code because it is quite long for a forum post.
This part of my code fails to get rid of the strings beginning with non-aphabetic characters:
for (unsigned int i=0; i<token24.size();i++){
string temp = token24[i];
char c = temp[0];
if(c>90||c<65){
token24.erase(token24.begin()+i);
i--;
}
}
I also tried with the condition
(c>'Z'||c<'A')
You could always do a string replace the characters with whitespace, but that just handles the specific cases of specific characters, not the larger problem.
I don't think we can do anything for you until we see the code.
The most important part in programs like yours is handling the content of .txt file. Such file can be a Unicode text, which in turn can be encoded, for eample, with UTF-8. Then, single byte can be only a part of a character, not character itself. Are you sure you load (and possibly, decode) the file in a proper way?
Also, don't you think that lower letters are also valid alpha characters?