I have to following string which i get from share point :
\u002fsites\u002fblabla\u002fShared Documents\u002fkittens.xml
and i'm trying to convert it to :
/sites/blabla/Shared Documents/kittens.xml
I googled it and found that it is Unicode encoded, but i couldn't find anything that converts it, technically i can write a small function that converts all the "\u002f" to "/" but i don't think it is the right thing to do.
If any one can shed some light on this matter it would be very helpful.
Thanks
I found a library that solved my problem, and i want to share the solution in case someone else will stumble upon this question.
The library called ICU
And the use is very simple :
icu::UnicodeString converter = icu::UnicodeString(in.c_str(),in.length());
icu::UnicodeString newUstring = converter.unescape();
newUstring.toUTF8String(out);
when in and out are string.
I don't whether there's a ready function in C++ to do that or not , but you can do that by this algorithm :
1-find "\" character in your first string using find function in string class,change it to "/"
2- omit from this point to 4 nexts(incluse u002) and so on...
#include <iostream>
#include <string>
using namespace std;
string process(string str){
string result="";
for (int i=0;i<str.size();i++){
if(str[i]=='\\' ) {
result.push_back('//');
i+=5;
}
else
result.push_back(str[i]);
}
return result;
}
int main(){
string path;
getline(cin,path);
cout<<process(path)<<endl;
}
If the string is a string literal, the compiler should take
care of this translation; a sequence \uxxxx
is a "universal character name". If it's data you're reading
from an outside source, I'm not aware of any standard function
which would convert it; you'll have to write your own. (And
getting it right isn't that trivial, once you add all of the
necessary error checking.)
Related
I want to print the first letter of a string.
#include <iostream>
#include <string>
using namespace std;
int main() {
string str = "다람쥐 헌 쳇바퀴 돌고파.";
cout << str.at(0) << endl;
}
I want '다' to be printed like java, but '?' is printed.
How can I fix it?
That text you have in str -- how is it encoded?
Unfortunately, you need to know that to get the first "character". The std::string class only deals with bytes. How bytes turn into characters is a rather large topic.
The magic word you are probably looking for is UTF-8. See here for more infomation: How do I properly use std::string on UTF-8 in C++?
If you want to go down this road yourself, look here: Extract (first) UTF-8 character from a std::string
And if you're really interested, here's an hour-long video that is actually a great explanation of text encoding: https://www.youtube.com/watch?v=_mZBa3sqTrI
I'm having issues figuring out how to convert a System::String into its ascii code values. That is, take each character in the string, and create a new string containing each characters integer ascii value.
I know I can marshal a System::String into a std::string, and then do it that way (though I don't know how to do that either, exactly. I'm not up to date on my C++ string operations. My teacher makes us use character arrays...). But there has to be a purely .NET way of achieving this.
I'm very, very new to .NET. This is a homework assignment.
Any help you can give on how to accomplish this would be much appreciated.
Sorry if this is a stupid question =/
Okay I've actually had to do this for homework as well. What I did was using c++ in a .net environment, have a string. This string's characters can be accessed like those of an array, stringname[i...n]. Use a loop of your choice to iterate through all of the positions. To convert each position, all you need to do is:
int asciicode = stringname[i];
stringname[i] = asciicode;
Of course when you then cout the stringname, you will have a bunch of numbers all bunched together so you may not like that, but that's cosmetic things.
This worked for me.
#include <sstream>
std::stringstream convert;
std::string stdString = "whatever the string is";
std::string buffer;
in charBuff;
for(int i = 0; i < stdString.length(); i++)
{
charBuff = (int)stdString.at(i);
convert << charBuff;
}
buffer = convert.str();
I'm new to c++ (I'm a c# developer).
I have an SQLite wrapper class that requires you to pass in a database name as a const char* , however I only have it as a Platform::String (after doing a file search).
I cant seem to find a way to convert the Platform::String to const char*.
Ive seen another question on StackOverflow that explain why it isnt straight-forward, but no sample code or end-to-end solution.
Can anyone help me ?
Thanks
Disclaimer: I know little about C++/CX, and I'm basing the answer on the documentation here.
The String class contains 16-bit Unicode characters, so you can't directly get a pointer to 8-bit char-typed characters; you'll need to convert the contents.
If the string is known to only contain ASCII characters, then you can convert it directly:
String s = whatever();
std::string narrow(s.Begin(), s.End());
function_requiring_cstring(narrow.c_str());
Otherwise, the string will need translating, which gets rather hairy. The following might do the right thing, converting the wide characters to multi-byte sequences of narrow characters:
String s = whatever();
std::wstring wide(s.Begin(), s.End());
std::vector<char> buffer(s.Length()+1); // We'll need at least that much
for (;;) {
size_t length = std::wcstombs(buffer.data(), wide.c_str(), buffer.size());
if (length == buffer.size()) {
buffer.resize(buffer.size()*2);
} else {
buffer.resize(length+1);
break;
}
}
function_requiring_cstring(buffer.data());
Alternatively, you may find it easier to ignore Microsoft's ideas about how strings should be handled, and use std::string instead.
is there anyway to find out how many times a word repeated in a text .
the text is in character arrays (char[])
text = this is a book,and this book
is about book.
word = book
result = 3
Because this is clearly homework and not tagged as such, I'll give you a solution you clearly can't submit as your assignment because your teacher will know you got it on the internet.
There were no requirements such as ignoring punctuation, so I've allowed myself to write a version that only works for clearly separated words and thus inserted spaces in your sample text string.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
// Count clearly separated occurrences of `word` in `text`.
std::size_t count ( const std::string& text, const std::string& word )
{
std::istringstream input(text);
return (std::count(std::istream_iterator<std::string>(input),
std::istream_iterator<std::string>(), word));
}
int main ( int, char ** )
{
const char text[] = "this is a book , and this book is about book .";
const char word[] = "book";
std::cout << count(text, word) << std::endl;
}
Output:
3
You might want to implement this using std::string and here is a sample for you to start from.
The simplest way would be to loop through the string, counting the number of times that you find the word that you're looking for. I'm sure that you could use a function in <algorithm> to do it fairly easily, but if you have to ask whether it's possible to do this in C++, I wouldn't think that you're advanced enough to try using the algorithm library, and doing it yourself would be more instructional anyway.
I would suggest using std::string though if you're allowed to (since this question does sound like homework, which could carry additional restrictions). Using std::string is easier and less error-prone than char arrays. It can be done with both though.
It is possible.
You have an array of characters. Try to do the search on a piece of paper, character by character:
First character is a T. This is not a b, so it can't be the first character of "book"
Second character is a h, so again, it is not b...
[...]
The next character is a b... Oah, this could be it. Is the next character a o? YES!!! And then next another o???... etc. etc..
When you can do it this way, you will be able to use C++ to do it.
Remember that you can access the n-th character in an array by using the [] operator:
char c = array[5] ; // c is the 6th character in the array
Now, going toward the C++ way would be, at first, to use a std::string instead of an array of chars, and use the strings methods. Google for std::string methods, and I guess you should find somes that you could use...
So you should manage to write some code that will iterate each character until the end
I guess this should be more than enough.
The point of your homework (because everyone here knows this is a homework question) is more about searching for the solution than finding it: This is not rote learning.
I doubt anyone on Stack Overflow remembers the solution to this classical problem. But I guess most will know how to find one solution. You need to learn the "how to find" mindset, so get your compiler and try again...
P.S.: Of course, if you know little or nothing of C++, then you're screwed, and you could start by Googling some C++ Tutorials.
I have a problem with a string in C++ which has several words in Spanish. This means that I have a lot of words with accents and tildes. I want to replace them for their not accented counterparts. Example: I want to replace this word: "había" for habia. I tried replace it directly but with replace method of string class but I could not get that to work.
I'm using this code:
for (it= dictionary.begin(); it != dictionary.end(); it++)
{
strMine=(it->first);
found=toReplace.find_first_of(strMine);
while (found!=std::string::npos)
{
strAux=(it->second);
toReplace.erase(found,strMine.length());
toReplace.insert(found,strAux);
found=toReplace.find_first_of(strMine,found+1);
}
}
Where dictionary is a map like this (with more entries):
dictionary.insert ( std::pair<std::string,std::string>("á","a") );
dictionary.insert ( std::pair<std::string,std::string>("é","e") );
dictionary.insert ( std::pair<std::string,std::string>("í","i") );
dictionary.insert ( std::pair<std::string,std::string>("ó","o") );
dictionary.insert ( std::pair<std::string,std::string>("ú","u") );
dictionary.insert ( std::pair<std::string,std::string>("ñ","n") );
and toReplace strings is:
std::string toReplace="á-é-í-ó-ú-ñ-á-é-í-ó-ú-ñ";
I obviously must be missing something. I can't figure it out.
Is there any library I can use?.
Thanks,
I disagree with the currently "approved" answer. The question makes perfect sense when you are indexing text. Like case-insensitive search, accent-insensitive search is a good idea. "naïve" matches "Naïve" matches "naive" matches "NAİVE" (you do know that an uppercase i is İ in Turkish? That's why you ignore accents)
Now, the best algorithm is hinted at the approved answer: Use NKD (decomposition) to decompose accented letters into the base letter and a seperate accent, and then remove all accents.
There is little point in the re-composition afterwards, though. You removed most sequences which would change, and the others are for all intents and purposes identical anyway. WHat's the difference between æ in NKC and æ in NKD?
First, this is a really bad idea: you’re mangling somebody’s language by removing letters. Although the extra dots in words like “naïve” seem superfluous to people who only speak English, there are literally thousands of writing systems in the world in which such distinctions are very important. Writing software to mutilate someone’s speech puts you squarely on the wrong side of the tension between using computers as means to broaden the realm of human expression vs. tools of oppression.
What is the reason you’re trying to do this? Is something further down the line choking on the accents? Many people would love to help you solve that.
That said, libicu can do this for you. Open the transform demo; copy and paste your Spanish text into the “Input” box; enter
NFD; [:M:] remove; NFC
as “Compound 1” and click transform.
(With help from slide 9 of Unicode Transforms in ICU. Slides 29-30 show how to use the API.)
I definitely think you should look into the root of the problem. That is, look for a solution that will allow you to support characters encoded in Unicode or for the user's locale.
That being said, your problem is that you're dealing with multi-character strings. There is std::wstring but I'm not sure I'd use that. For one thing, wide characters aren't meant to handle variable width encodings. This hole goes deep, so I'll leave it at that.
Now, as for the rest of your code, it is error prone because you mix the looping logic with translation logic. Thus, at least two kinds of bugs can occur: translation bugs and looping bugs. Do use the STL, it can help you a lot with the looping part.
The following is a rough solution for replacing characters in a string.
main.cpp:
#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>
#include "translate_characters.h"
using namespace std;
int main()
{
string text;
cin.unsetf(ios::skipws);
transform(istream_iterator<char>(cin), istream_iterator<char>(),
inserter(text, text.end()), translate_characters());
cout << text << endl;
return 0;
}
translate_characters.h:
#ifndef TRANSLATE_CHARACTERS_H
#define TRANSLATE_CHARACTERS_H
#include <functional>
#include <map>
class translate_characters : public std::unary_function<const char,char> {
public:
translate_characters();
char operator()(const char c);
private:
std::map<char, char> characters_map;
};
#endif // TRANSLATE_CHARACTERS_H
translate_characters.cpp:
#include "translate_characters.h"
using namespace std;
translate_characters::translate_characters()
{
characters_map.insert(make_pair('e', 'a'));
}
char translate_characters::operator()(const char c)
{
map<char, char>::const_iterator translation_pos(characters_map.find(c));
if( translation_pos == characters_map.end() )
return c;
return translation_pos->second;
}
You might want to check out the boost (http://www.boost.org/) library.
It has a regexp library, which you could use.
In addition it has a specific library that has some functions for string manipulation (link) including replace.
Try using std::wstring instead of std::string. UTF-16 should work (as opposed to ASCII).
I could not link the ICU libraries but I still think it's the best solution. As I need this program to be functional as soon as possible I made a little program (that I have to improve) and I'm going to use that. Thank you all for for suggestions and answers.
Here's the code I'm gonna use:
for (it= dictionary.begin(); it != dictionary.end(); it++)
{
strMine=(it->first);
found=toReplace.find(strMine);
while (found != std::string::npos)
{
strAux=(it->second);
toReplace.erase(found,2);
toReplace.insert(found,strAux);
found=toReplace.find(strMine,found+1);
}
}
I will change it next time I have to turn my program in for correction (in about 6 weeks).
If you can (if you're running Unix), I suggest using the tr facility for this: it's custom-built for this purpose. Remember, no code == no buggy code. :-)
Edit: Sorry, you're right, tr doesn't seem to work. How about sed? It's a pretty stupid script I've written, but it works for me.
#!/bin/sed -f
s/á/a/g;
s/é/e/g;
s/í/i/g;
s/ó/o/g;
s/ú/u/g;
s/ñ/n/g;
/// <summary>
///
/// Replace any accent and foreign character by their ASCII equivalent.
/// In other words, convert a string to an ASCII-complient string.
///
/// This also get rid of special hidden character, like EOF, NUL, TAB and other '\0', except \n\r
///
/// Tests with accents and foreign characters:
/// Before: "äæǽaeöœoeüueÄAeÜUeÖOeÀÁÂÃÄÅǺĀĂĄǍΑΆẢẠẦẪẨẬẰẮẴẲẶАAàáâãåǻāăąǎªαάảạầấẫẩậằắẵẳặаaБBбbÇĆĈĊČCçćĉċčcДDдdÐĎĐΔDjðďđδdjÈÉÊËĒĔĖĘĚΕΈẼẺẸỀẾỄỂỆЕЭEèéêëēĕėęěέεẽẻẹềếễểệеэeФFфfĜĞĠĢΓГҐGĝğġģγгґgĤĦHĥħhÌÍÎÏĨĪĬǏĮİΗΉΊΙΪỈỊИЫIìíîïĩīĭǐįıηήίιϊỉịиыїiĴJĵjĶΚКKķκкkĹĻĽĿŁΛЛLĺļľŀłλлlМMмmÑŃŅŇΝНNñńņňʼnνнnÒÓÔÕŌŎǑŐƠØǾΟΌΩΏỎỌỒỐỖỔỘỜỚỠỞỢОOòóôõōŏǒőơøǿºοόωώỏọồốỗổộờớỡởợоoПPпpŔŖŘΡРRŕŗřρрrŚŜŞȘŠΣСSśŝşșšſσςсsȚŢŤŦτТTțţťŧтtÙÚÛŨŪŬŮŰŲƯǓǕǗǙǛŨỦỤỪỨỮỬỰУUùúûũūŭůűųưǔǖǘǚǜυύϋủụừứữửựуuÝŸŶΥΎΫỲỸỶỴЙYýÿŷỳỹỷỵйyВVвvŴWŵwŹŻŽΖЗZźżžζзzÆǼAEßssIJIJijijŒOEƒf'ξksπpβvμmψpsЁYoёyoЄYeєyeЇYiЖZhжzhХKhхkhЦTsцtsЧChчchШShшshЩShchщshchЪъЬьЮYuюyuЯYaяya"
/// After: "aaeooeuueAAeUUeOOeAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaaaaaaaaaaaaaaaaBbCCCCCCccccccDdDDjddjEEEEEEEEEEEEEEEEEEeeeeeeeeeeeeeeeeeeFfGGGGGgggggHHhhIIIIIIIIIIIIIiiiiiiiiiiiiJJjjKKkkLLLLllllMmNNNNNnnnnnOOOOOOOOOOOOOOOOOOOOOOooooooooooooooooooooooPpRRRRrrrrSSSSSSssssssTTTTttttUUUUUUUUUUUUUUUUUUUUUUUUuuuuuuuuuuuuuuuuuuuuuuuYYYYYYYYyyyyyyyyVvWWwwZZZZzzzzAEssIJijOEf'kspvmpsYoyoYeyeYiZhzhKhkhTstsChchShshShchshchYuyuYaya"
///
/// Tests with invalid 'special hidden characters':
/// Before: "\0\0\000\0000Bj��rk�\'\"\\\0\a\b\f\n\r\t\v\u0020���oacu\'\\\'te�"
/// After: "00000Bjrk'\"\\\n\r oacu'\\'te"
///
/// </summary>
private string Normalize(string StringToClean)
{
string normalizedString = StringToClean.Normalize(NormalizationForm.FormD);
StringBuilder Buffer = new StringBuilder(StringToClean.Length);
for (int i = 0; i < normalizedString.Length; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]) != UnicodeCategory.NonSpacingMark)
{
Buffer.Append(normalizedString[i]);
}
}
string PreAsciiCompliant = Buffer.ToString().Normalize(NormalizationForm.FormC);
StringBuilder AsciiComplient = new StringBuilder(PreAsciiCompliant.Length);
foreach (char character in PreAsciiCompliant)
{
//Reject all special characters except \n\r (Carriage-Return and Line-Feed).
//Get rid of special hidden character, like EOF, NUL, TAB and other '\0'
if (((int)character >= 32 && (int)character < 127) || ((int)character == 10 || (int)character == 13))
{
AsciiComplient.Append(character);
}
}
return AsciiComplient.ToString().Trim(); // Remove spaces at start and end of string if any
}