I have a server written in C++, and when receiving a chat string, I'd like to remove weird special characters like the one created by Ctrl + Backspace (though not other symbols like :)]>_ etc.)
I'm using Boost, too.
edit: Why'd this get -1'd? It's a legit question.
Sounds like isprint might help. It returns true for any printable character, ie. not for control characters and whitespaces. For a list of what is considered printable and what not, take a look at this table.
I haven't used it, and this probably isn't the best way to do it, but have you considered trying the boost regex library (i.e., regex_replace)?
Related
I find it hard to explain but I will try my best. Some times in Linux- in the Terminal- things get printed but you can still write over them. eg when using wget you get a progress bar like this:
[===================> ]
Now if you type something while it is doing this it will 'overwrite' it. My question is how to recreate this in c++.
Will you use something like
cout <<
or something else?
I hope you understand what I am getting at...
btw I am using the most recent version of Arch with xfce4
Printing a carriage return character \r is typically interpreted in Linux as returning you to the beginning of the line. Try this, for example:
std::cout << "Hello\rJ";
The output will be:
Jello
This does depend on your terminal, however, so you should look up the meaning of particular control characters for your terminal.
For a more cross-platform solution and the ability to do more complex text-based user interfaces, take a look at ncurses.
You can print the special character \b to go back one space. Then you can print a space to blank it out, or another character to overwrite what was there. You can also use \r to return to the beginning of the current output line and write again from there.
Controlling the terminal involved sending various escape sequences to it, in order to move the cursor around and such.
http://www.ibiblio.org/pub/historic-linux/ftp-archives/tsx-11.mit.edu/Oct-07-1996/info/vt102.codes
You could also use ncurses to do this.
I want to detect if the user has inputted a non-ASCII (otherwise incorrectly known as Unicode) character (for example, り) in a file save dialog box. As I am using Qt, any non-ASCII characters are properly saved in a QString, but I can't figure out how to determine if any of the characters in that string are non-ASCII before converting the string to ASCII. That character above ends up getting written to the filesystem as ã‚Š.
There is no such a built-in feature in my understanding.
About 1-2 years ago, I was proposing an isAscii() method for QString/QChar to wrap the low-level Unix isacii() and the corresponding Windows function, but it was rejected. You could have written then something like this:
bool isUnicode = !myString.at(3).isAcii();
I still think this would be a handy feature if you can convince the maintainer. :-)
Other than that, you would need to check against the ascii boundary yourself, I am afraid. You can do this yourself as follows:
bool isUnicode = myChar.unicode() > 127;
See the documentation for details:
ushort QChar::unicode () const
This is an overloaded function.
The simplest way is to check every charachter's code (QChar::unicode()) to be below 128 if you need pure 7-bit ASCII.
To write it in compact way without loop, you can use regular expression:
bool containsNonASCII = myString.contains(QRegularExpression(QStringLiteral("[^\\x{0000}-\\x{007F}]")));
this works for me :
isLetterOrNumber()
ot_id += QChar((short) b.to_ulong()).isLetterOrNumber() ? QChar((short) b.to_ulong()) : QString("");
I am using tinyxml to save input from a text ctrl. The user can copy whatever they like into the text box and it gets written to an xml file. I'm finding that the new lines don't get saved and neither do & characters. The weird part is that tinyxml just discards them completely without any warning. If I put a & into the textbox and save, the tag will look like:
<textboxtext></textboxtext>
newlines completely disappear as well. No characters whatsoever are stored. What's going on? Even if I need to escape them with & or something, why does it just discard everything? Also, I can't find anything on google regarding this topic. Any help?
EDIT:
I found this topic which suggest the discarding of these characters may be a bug.
TinyXML and preserving HTML Entities
It is, apparently, a bug in TinyXml.
The simple workaround is to escape anything that it might not like:
&, ", ', < and > got their regular xml entities encoding
strange characters (read non-alphanumerical / regular punctuation) are best translated to their unicode codepoint: &#....;
Remember that TinyXml is before all a lightweight xml library, not a full-fledged beast.
I am trying to create a generic stylesheet that can convert all Latin characters in Unicode to uppercase ASCII characters. Using <xsl:character-map> works well except for one thing: namespaces. The character map converts all of my namespaces to upper case, which I do not want.
Is there a way to utilize a character map to do what I want to all the other nodes while leaving the namespaces untouched? I see the disable-output-escaping attribute might be an option, but I haven't been able to make it work.
Looks like this is an Oracle-specific issue. I'll probably post this on the Oracle Forums then.
Thanks for all the feedback!
I'm writing some autosuggest functionality which suggests page names that relate to the terms entered in the search box on our website.
For example typing in "rubbish" would suggest "Rubbish & Recycling", "Rubbish Collection Centres" etc.
I am running into a problem that some of our page names include macrons - specifically the macron used to correctly spell "Māori" (the indigenous people of New Zealand).
Users are going to type "maori" into the search box and I want to be able to return pages such as "Māori History".
The autosuggestion is sourced from a cached array built from all the pages and keywords. To try and locate Māori I've been trying various regex expressions like:
preg_match('/\m(.{1})ori/i',$page_title)
Which also returns page titles containing "Moorings" but not "Māori". How does preg_match/ preg_replace see characters like "ā" and how should I construct the regex to pick them up?
Cheers
Tama
Use the /u modifier for utf-8 mode in regexes,
You're better of on a whole with doing an iconv('utf-8','ascii//TRANSLIT',$string) on both name & search and comparing those.
One thing you need to remember is that UTF-8 gives you multi-byte characters for anything outside of ASCII. I don't know if the string $page_title is being treated as a Unicode object or a dumb byte string. If it's the byte string option, you're going to have to do double dots there to catch it instead, or {1,4}. And even then you're going to have to verify the up to four bytes you grab between the M and the o form a singular valid UTF-8 character. This is all moot if PHP does unicode right, I haven't used it in years so I can't vouch for it.
The other issue to consider is that ā can be constructed in two ways; one as a single character (U+0101) and one as TWO unicode characters ('a' plus a combining diacritic in the U+0300 range). You're likely just only going to ever get the former, but be aware that the latter is also possible.
The only language I know of that does this stuff reliably well is Perl 6, which has all kinds on insane modifiers for internationalized text in regexps.