I want to print the first letter of a string.
#include <iostream>
#include <string>
using namespace std;
int main() {
string str = "다람쥐 헌 쳇바퀴 돌고파.";
cout << str.at(0) << endl;
}
I want '다' to be printed like java, but '?' is printed.
How can I fix it?
That text you have in str -- how is it encoded?
Unfortunately, you need to know that to get the first "character". The std::string class only deals with bytes. How bytes turn into characters is a rather large topic.
The magic word you are probably looking for is UTF-8. See here for more infomation: How do I properly use std::string on UTF-8 in C++?
If you want to go down this road yourself, look here: Extract (first) UTF-8 character from a std::string
And if you're really interested, here's an hour-long video that is actually a great explanation of text encoding: https://www.youtube.com/watch?v=_mZBa3sqTrI
Related
Rule I must abide by
Do not use loops or character arrays to process strings for any of the questions below. Use member functions of the string class. You can use a loop to read the file and to count the number of processors.
Some Tips
Here are some functions that you might find useful:
File class: getline
String class: find, rfind, substr, length, c_str, constant npos
Misc. functions: atoi, atof
(may require the C standard library for C++, i.e., )
isstringstream
(Both of the above are ways to convert a string to a number.)
Here is an example string I would need to extract:
"46 bits physical, 48 bits virtual"
I can go through the same string twice. I'd want to grab 46 and store it and then do the same for 48.
I'm not sure the best way to go about this. Is it possible to do something like this:
string.find_first_of(integer);
string.find_last_not_of(integer);
Or possibly regex? I think I can use that as long as I don't need to use a 3rd party library or anything like that.
The following ended up working for me.
#include <sstream>
string myString = "hello 47";
int val;
istringstream iss (myString);
iss >> val;
cout << val << endl;
// The output of val will be 47.
Since you indicated in the comments that STL is allowed, you can use a generic programming approach relying on STL algorithms. For example,
#include <iostream>
#include <algorithm>
#include <iterator>
#include <string>
int main()
{
using namespace std;
string haystack = "46 bits physical, 48 bits virtual";
string result;
remove_copy_if(begin(haystack), end(haystack),
back_inserter(result),
[](char c) { return !isspace(c) && !isdigit(c); } );
cout << result;
}
You basically treat the characters in the string as a stream of inputs, from that just filter out all non-digit characters and keeping whatever delimiter char you want to use. My example keeps whitespace as delimiter.
The above gives the output
46 48
I am trying to print Unicode characters in C++. My Unicode characters are Old Turkic, I have the font. When I use a letter's code it gives me another characters. For example:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str = "\u10C00" // My character's unicode code.
cout << str << endl;
return 0;
}
This snipped gives an output of another letter with a 0 just after its end.
For example, it gives me this (lets assume that I want to print 'Ö' letter):
A0
But when I copied and pasted my actual letter to my source snippet, from character-map application in ubuntu, it gives me what I want. What is the problem here? I mean, I want use the character code way "\u10C00", but it doesn't work properly. I think this string is too long, so it uses the first 6 characters and pops out the 0 at the end. How can I fix this?
After escape /u must be exactly 4 hexadecimal characters. If you need more, you should use /U. The second variant takes 8 characters.
Example:
"\u00D6" // 'Ö' letter
"\u10C00" // incorrect escape code!
"\U00010C00" // your character
std::string does not really support unicode, use std::wstring instead.
but even std::wstring could have problems since it does not support all sizes.
an alternative would be to use some external string class such as Glib::ustring if you use gtkmm or QString in case of Qt.
Almost each GUI toolkit and other libraries provide it's own string class to handle unicode.
I am reading an XML file into a stringstream buffer in order to parse it using RapidXML. RapidXML is only parsing the names of the XML nodes, but none of their attribute names or values. After some experimentation, I discovered that the problem is not likely to be with RapidXML, but with conversion of the stringstream buffer to a string using std::string content(buffer.str());. The '=' characters that are so important to XML parsing are converted to ' ' (space characters), prior to any RapidXML processing.
The character replacement is evident in the console window when the cout << calls are made in the code below, which is before RapidXML gets its hands on the string.
My code is as follows:
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <conio.h>
#include <string>
#include <stdlib.h>
#include <rapidxml.hpp>
#include <vector>
#include <sstream>
using namespace std;
using namespace rapidxml;
//... main() and so forth, all works fine...
ifstream file(names.at(i)); // names.at(i) works fine...
//...
file.read(fileData, fileSize); // works fine...
//...
// Create XML document object using RapidXML:
xml_document<> doc;
//...
std::stringstream buffer;
buffer << file.rdbuf();
// This is where everything looks okay (i.e., '=' shows up properly):
cout << "\n" << buffer.str() << "\n\nPress a key to continue...";
getchar();
file.close();
std::string content(buffer.str());
// This is where the '=' are replaced by ' ' (space characters):
cout << "\n" << content << "\n\nPress a key to continue...";
getchar();
// Parse XML:
doc.parse<0>(&content[0]);
// Presumably the lack of '=' is preventing RapidXML from parsing attribute
// names and values, which always follow '='...
Thanks in advance for your help.
p.s. I followed advice on using this technique for reading an entire XML file into a stringstream, converting it to a string, and then feeding the string to RapidXML from the following links (thanks to contributors of these pieces of advice, sorry I can't make them work yet...):
Automation Software's RapidXML mini-tutorial
...this method was seen many other places, I won't list them here. Seems sensible enough. My errors seem to be unique. Could this be an ASCII vs. UNICODE issue?
I also tried code from here:
Thomas Whitton's example converting a string buffer to a dynamic cstring
code snippet from the above:
// string to dynamic cstring
std::vector<char> stringCopy(xml.length(), '\0');
std::copy(xml.begin(), xml.end(), stringCopy.begin());
char *cstr = &stringCopy[0];
rapidxml::xml_document<> parsedFromFile;
parsedFromFile.parse<0>(cstr);
...with similar RapidXML failure to parse node attribute names and values. Note that I didn't dump the character vector stringCopy to the console to inspect it, but I am getting the same problem, which for review is:
I am seeing correctly parsed names of XML tags after RapidXML parsing of the string fed to it for analysis.
There are no correctly parsed tag attribute names or values. These are dependent upon the '=' character showing up in the string to be parsed.
If you look closely the = characters probably aren't being replaced by spaces, but zero bytes. If you look at the rapidxml documentation here:
http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1differences
It specifically states that it modifies the source text. This way it can avoid allocating any new strings, instead it uses pointers to the original source.
This part seems to work correctly, maybe the problem is with the rest of your code that's trying to read the attributes?
I have to following string which i get from share point :
\u002fsites\u002fblabla\u002fShared Documents\u002fkittens.xml
and i'm trying to convert it to :
/sites/blabla/Shared Documents/kittens.xml
I googled it and found that it is Unicode encoded, but i couldn't find anything that converts it, technically i can write a small function that converts all the "\u002f" to "/" but i don't think it is the right thing to do.
If any one can shed some light on this matter it would be very helpful.
Thanks
I found a library that solved my problem, and i want to share the solution in case someone else will stumble upon this question.
The library called ICU
And the use is very simple :
icu::UnicodeString converter = icu::UnicodeString(in.c_str(),in.length());
icu::UnicodeString newUstring = converter.unescape();
newUstring.toUTF8String(out);
when in and out are string.
I don't whether there's a ready function in C++ to do that or not , but you can do that by this algorithm :
1-find "\" character in your first string using find function in string class,change it to "/"
2- omit from this point to 4 nexts(incluse u002) and so on...
#include <iostream>
#include <string>
using namespace std;
string process(string str){
string result="";
for (int i=0;i<str.size();i++){
if(str[i]=='\\' ) {
result.push_back('//');
i+=5;
}
else
result.push_back(str[i]);
}
return result;
}
int main(){
string path;
getline(cin,path);
cout<<process(path)<<endl;
}
If the string is a string literal, the compiler should take
care of this translation; a sequence \uxxxx
is a "universal character name". If it's data you're reading
from an outside source, I'm not aware of any standard function
which would convert it; you'll have to write your own. (And
getting it right isn't that trivial, once you add all of the
necessary error checking.)
So...
when I go:
cout<<stringName<<endl;
I get:
NT
But when I go:
cout<<stringName.c_str()<<endl;
I get:
NTNT
Why?
A quick test with the following code:
#include <string>
#include <iostream>
using namespace std;
int main(void) {
string str = "NT";
cout << str.c_str() << endl;
return 0;
}
produces one instance of NT so it looks like you probably have another output call somewhere.
A traditional C string (accessed through a char const*) has a sequence of characters terminated by a character 0. (Not the numeral 0, but an actual zero value, which we write as '\0'.) There's no explicit length — so various string operations just read one character at a time until it hits the '\0'.
A C++ std::string has an explicit length in its structure.
Is it possible that the memory layout of your string's characters looks like this:
'NTNT\0'
but the string's length is set to 2?
That would result in exactly this behavior — manipulating the std::string directly will act like it's just two characters long, but if you do traditional C operations using s.c_str(), it will look like "NTNT".
I'm not sure what shenanigans would get you into this state, but it would certainly match the symptoms.
One way you could get into this state would be to actually write to the string's characters, something like: strcat((char *)s.c_str(), "NT")
Show more code. It seems like you did cout << ealier and forgot that you did it. What does it print if you do cout<< "mofo" << stringName.c_str()<< "|||" << endl; Does it say NTmofoNT||| ? if so that may well be what happened ;)
This is not a problem with c_str(), but probably related to some other anomaly in the rest of the program.
Make a "hello world" application that does these same operations and you'll see it works fine there.