Create string with ESC characters - c++

I can initialise string with escape characer like std:string s="\065" and this creates "A" character. But what if I need ASCI character = 200. std:string s="\200" not working. Why?

"\065" does not create "A" but "5". "\065" is interpreted as an octal number, which is decimal 53, which is character '5'.
std::string s = "\xc8" ; (hex) gives me the character 200.

Because \065 is actually in octal form; to specify character 200, try \310 or \xc8. And BTW, \065 is not character A but is 5.

Related

How do I mimic a Unicode JS regular expression in Lucee

I am trying to write a regular express in Lucee to mimic the JS on the front end. Since Lucee's regex doesn't seem to suppoert unicode how do I do it.
This is the JS
function charTest(k){
var regexp = /^[\u00C0-\u00ff\s -\~]+$/;
return regexp.test(k)
}
if(!charTest(thisKey)){
alert("Please Use Latin Characters Only");
return false;
}
This is what I have tried in Lucee
regexp = '[\u00C0-\u00ff\s -\~]+/';
writeDump(reFind(regexp,"测));
writeDump(reFind(regexp,"test));
I have also tried
regexp = "[\\p{L}]";
but the dump is always 0
EDIT: Give me one second. I think I interpreted your initial JS regex incorrectly. Fixing it.
EDIT 2: It was more than a second. Your original JS regex was:
"/^[\u00C0-\u00ff\s -\~]+$/". This is:
Basic parts of regex:
"/..../" == signifies the start and stop of the Regex.
"^[...]" == signifies anything that is NOT in this group
"+" == signifies at least one of the previous
"$" == signifies the end of the string
Identifiers in the regex:
"\u00c0-\u00ff" == Unicode character range of Character 192 (À)
to Character 255 (ÿ). This is the Latin 1
Extension of the Unicode character set.
"\s" == signifies a Space Character
" -\~" == signifies another identifier for a space character to the
(escaped) tilde character (~). This is ASCII 32-126, which
includes the printable characters of ASCII (except the DEL
character (127). This includes alpha-numerics amd most punctuation.
I missed the second half of your printable Latin basic character set. I've updated my regex and tests to include it. There are ways to shorthand some of these identifiers, but I wanted it to be explicit.
You can try this:
<cfscript>
//http://www.asciitable.com/
//https://en.wikipedia.org/wiki/List_of_Unicode_characters
//https://en.wikipedia.org/wiki/Latin_script_in_Unicode
function charTest(k) {
return
REfind("[^"
& chr(32) & "-" & chr(126)
& chr(192) & "-" & chr(255)
& "]",arguments.k)
? "Please Use Latin Characters Only"
: ""
;
}
// TESTS
writeDump(charTest("测")); // Not Latin
writeDump(charTest("test")); // All characters between 31 & 126
writeDump(charTest("À")); // Character 192 (in range)
writeDump(charTest("À ")); // Character 192 and Space
writeDump(charTest(" ")); // Space Characters
writeDump(charTest("12345")); // Digits ( character 48-57 )
writeDump(charTest("ð")); // Character 240 (in range)
writeDump(charTest("ℿ")); // Character 8511 (outside range)
writeDump(charTest(chr(199))); // CF Character (in range)
writeDump(charTest(chr(10))); // CF Line Feed Character (outside range)
writeDump(charTest(chr(1000))); // CF Character (outside range)
writeDump(charTest("
")); // CRLF (outside range)
writeDump(charTest(URLDecode("%00", "utf-8"))); // CF Null character (outside range)
//writeDump(asc("测"));
//writeDump(asc("test"));
//writeDump(asc("À"));
//writeDump(asc("ð"));
//writeDump(asc("ℿ"));
</cfscript>
https://trycf.com/gist/05d27baaed2b8fc269f90c7c80a1aa82/lucee5?theme=monokai
All the regex does is look at your input string and if it doesn't find a value between chr(192) and chr(255), it will return your chosen string, else it will return nothing.
I think you can access the UNICODE characters below 255 directly. I'll have to test it.
Do you need to alert this function, like the Javascript? If you need to, you can just output a 1 or 0 to determine if this function actually found the character you're looking for.

Translate \n new line from Char to String in SML/NJ

I am trying to convert #"\n", a Char, to "\n", a String. I used
Char.toString(#"\n");
and it gives
val it = "\\n" : string
Why does not it return "\n"?
Char.toString from the documentation.
returns a printable string representation of the character, using, if
necessary, SML escape sequences.
It also specifies that some control characters are converted to two-character escape sequences and \n is one of it.
To return a string of size one, use String.str.
- String.str(#"\n");
val it = "\n" : string

Replace all non-ASCII characters in a string by their ASCII equivalent

Using Qt/C++, I need to generate a string with only a subset of ASCII characters : letters, digits, hyphen, underscore, period, or colon.
As input, I can have anything.
So I try to apply some rules :
every QChar::isSpace will be replaced with an underscore
every non-ASCII letters will be replaced with an ASCII equivalent (example : "é" will be replaced with "e")
every other non-ASCII character will be removed
Is there any simple way with Qt/C++ to apply the 2nd and the 3rd rule ?
Thanks
Yes, there is a way.
At first you should do unicode normalization to your string with
QString::normalized. Normalization is needed to separate diacritical signs from letters and to replace some fancy symbols with ascii equivalents. Here you can read about normalization forms.
Then you may take chars which can be encoded in Latin-1. Can be tested with
toLatin1 method of QChar.
char QChar::toLatin1() const
Returns the Latin-1 character equivalent to the QChar, or 0. This is mainly useful for non-internationalized software.
...
QString testString = QString::fromUtf8("Ceñía-üÏÖ马克ñ");
QString normalized = testString.normalized(QString::NormalizationForm_KD);
QString result;
copy_if(normalized.begin(), normalized.end(), back_inserter(result), [](QChar& c) {
return c.toLatin1() != 0;
});
qDebug() << result; // Cenia-uIOn

How to find the character "\" in a string?

I am trying to manipulate a string by finding the \ character in the string Find\inHere. However, I can't put that as an input in test.find('\', 0). It won't work and gives me the error "missing terminating character." Is there a way to fix test.find('\', 0)?
string test = "Find\inHere";
int x = test.find('\', 0); // error on this line
cout << x; // x should equal 4
\ is a character used to introduce special characters, for example \n newline, \xDB shows the ASCII character with hexadecimal number DB etc.
So, in order to search this special character, you have to escape it by adding another \, use:
test.find("\\",0);
EDIT : Also, in your first string, it is not written in it "Find\inHere" but "Find" and an error because \inHere isn't a special instruction. So, same way to avoid it, write "Find\\inHere".

Checking if a string is a hexadecimal value

I have char whose value is 183 while doing rtf parsing. This is a special character .,
When i created a string out of it, i will get a hexadecimal string \xb7, which is a hexadecimal string. This is a one length string.
How to determine that the string prep rend with \x or it is a hexadecimal string.
string substr(1,char);
cout<<substr<<substr.length();
Regards