How to parse UTF-8 Chinese string - c++

I am trying to parse a std::string that might contain Chinese characters. For example for a string contains
哈囉hi你好hello
I want to separate them into 6 strings:哈, 囉, hi, 你, 好, hello. Right now the string is obtained by using getline() from a text file. Referencing this post How to use boost::spirit to parse UTF-8?, here's my current code:
#include <boost/regex/pending/unicode_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/range.hpp>
#include <iterator>
#include <iostream>
#include <ostream>
#include <cstdint>
#include <string>
using namespace boost;
using namespace std;
using namespace std::string_literals;
int main()
{
string str = u8"哈囉hi你好hello"; //actually got from getline()
auto &&utf8_text = str;
u8_to_u32_iterator<const char*>
tbegin(begin(utf8_text)), tend(end(utf8_text));
vector<uint32_t> result;
spirit::qi::parse(tbegin, tend, *spirit::standard_wide::char_, result);
for(auto &&code_point : result) {
cout << code_point << ";";
}
}
But got the error: call to 'begin' and 'end' is ambiguous.
It works when I directly declare auto &&utf8_text = u8"哈囉hi你好hello", but I cannot write in this way because the content of string is determined by getline().
I also tried this:
auto str = u8"你好,世界!";
auto &&utf8_text = str;
but still got error: no matching function for call to 'begin' and 'end'.

auto with string literals results in a char pointer. If you want std::string, you have to write it out.

Related

I can't sort my list of names from a text file

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
using std::cout;
using std::endl;
using std::ifstream;
using std::string;
int main()
{
ifstream NameList("LineUp.txt");
string List = "LineUp.txt";
while (getline(NameList, List))
{
std::vector<string> names = {List};
std::sort(names.begin(), names.end());
}
NameList.close();
return 0;
}
I know that I am supposed to put "[] (string a, string b)" at the end of the sort command but I am unable to. My IDE keeps telling me to remove the "string" identifier, or any identifier I have, and then it throws a fit because it can't identify a or b. I just want to sort this shit by alphabet.
std::vector<string> names = {List};
This vector only lives in the scope of the while loop. That means, you are creating a new vector for each single line that is read.
You then sort this vector, which is quite useless, since
a) it contains only one line and
b) you do nothing else with it and it gets destroyed at the closing }
Solution:
move the vector to before the while loop
move the sort() call to after the while loop
inside the loop, call names.push_back() in order to add the current line to the list
Things will go much smoother if your variables have the correct names as well. List should not be named like that, because it's used in getline(), so it's just one line of the list. NameList should be named file, because that's what you access. The list with the names is the vector.
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
using std::cout;
using std::endl;
using std::ifstream;
using std::string;
int main()
{
ifstream file("LineUp.txt");
std::vector<string> names;
string line;
while (getline(file, line))
{
names.push_back(line);
}
std::sort(names.begin(), names.end());
file.close();
for (auto& name : names)
{
std::cout << name << '\n';
}
return 0;
}

vector return type for function in C++ throwing error

I built a simple split function in C++ which will split a string using delimiters. I put the function in GP.cpp and declared the function in GP.h.
GP.cpp:
#include "GP.h"
#include <vector>
#include <string>
using namespace std;
vector<string> GP::split(string text, string delimiter) {
vector<string> result;
size_t pos;
string token;
while( (pos = text.find(delimiter)) != string::npos ) {
token = text.substr(0, pos);
result.push_back(token);
text.erase(0, pos + delimiter.length());
}
return result;
}
GP.h:
#ifndef GP
#define GP
#include <vector>
#include <string>
using namespace std;
class GP {
public:
static vector<string> split(string text, string delimiter);
};
#endif
My editor will comment at vector<string> of the cpp file the following:
explicit type is missing ('int' assumed)
And when I try to build, I get this error:
'split': is not a member of 'std::vector<std::string,std::allocator<_Ty>>'
#define GP means that the token GP in the program will be replaced by blank. So this transforms the code:
class GP {
into
class {
and other such cases leading to your errors.
To fix this, make your include guards use tokens that are less likely to collide with other tokens in your program.
Also it is considered bad practice to put using namespace std; in the header, since anybody else using your header cannot undo it. It would be better to use std:: qualification in the header instead.
Exhibit one for not using #ifdef guards in modern C++ code. That's what #pragma once is for.
You're defining GP to be an empty string, so your header actually looks like this:
#include <vector>
#include <string>
using namespace std;
class {
public:
static vector<string> split(string text, string delimiter);
};
I hope it's immediately obvious what the problem is now.

How to store a file name ext into a map using the experimental c++ 2017 lib

So i've got a simple little program that I want to store the file extensions into a map. The error comes when I try and store the d->path().extension(); into my map data.insert(d->path().extension(),15);. 15 is just a place holder for now I want to store the file ext as the key. I think the error is string is unknown to the std::experimental::filesystem.
This is the current error it throws:
"std::map<_Kty, _Ty, _Pr, _Alloc>::insert [with _Kty=std::string, _Ty=int, _Pr=std::less, _Alloc=std::allocator>]" matches the argument list
#include <map>
#include <regex>
#include <locale>
#include <iostream>
#include <fstream>
using namespace std;
#include <filesystem>
using namespace std::experimental::filesystem;
map<string,int> rscan2(path const& f,map<string, int> const& data, unsigned i = 0)
{
//string indet(i, ' ');
for (recursive_directory_iterator d(f), e; d != e; ++d)
{
data.insert(d->path().extension(),15);
if (is_directory(d->status()))
rscan2(d->path(), data, i + 1);
}
return data;
}
int main(int argc, char *argv[])
{
map<string, int> holdTheInfo;
rscan2(".", holdTheInfo);
}
path has many helper methods to convert to strings. Listing a few:
.string()
.generic_string()
.c_str()
operator string_type()
But the actual problem is that std::map::insert doesn't receive a key and value as arguments, only values. So you should instead use insert_or_assign (since C++17):
data.insert_or_assign(d->path().extension(), 15);
And the operator string_type() will be called for the returned path, converting it to an actual string.

c++ iterate through a vector of strings

So I recently discovered the use of map and vectors, however, I'm having trouble of trying to figure a way to loop through a vector containing strings.
Here's what I've tried:
#include <string>
#include <vector>
#include <stdio>
using namespace std;
void main() {
vector<string> data={"Hello World!","Goodbye World!"};
for (vector<string>::iterator t=data.begin(); t!=data.end(); ++t) {
cout<<*t<<endl;
}
}
and when I try to compile it, I get this error:
cd C:\Users\Jason\Desktop\EXB\Win32
wmake -f C:\Users\Jason\Desktop\EXB\Win32\exbint.mk -h -e
wpp386 ..\Source\exbint.cpp -i="C:\WATCOM/h;C:\WATCOM/h/nt" -w4 -e25 -zq -od -d2 -6r -bt=nt -fo=.obj -mf -xs -xr
..\Source\exbint.cpp(59): Error! E157: col(21) left expression must be integral
..\Source\exbint.cpp(59): Note! N717: col(21) left operand type is 'std::ostream watcall (lvalue)'
..\Source\exbint.cpp(59): Note! N718: col(21) right operand type is 'std::basic_string<char,std::char_traits<char>,std::allocator<char>> (lvalue)'
Error(E42): Last command making (C:\Users\Jason\Desktop\EXB\Win32\exbint.obj) returned a bad status
Error(E02): Make execution terminated
Execution complete
I tried the same method using map and it worked. The only difference was I changed the cout line to:
cout<<t->first<<" => "<<t->last<<endl;
Add iostream header file and change stdio to cstdio.
#include <iostream>
#include <string>
#include <vector>
#include <cstdio>
using namespace std;
int main()
{
vector<string> data={"Hello World!","Goodbye World!"};
for (vector<string>::iterator t=data.begin(); t!=data.end(); ++t)
{
cout<<*t<<endl;
}
return 0;
}
#include <iostream>
#include <vector>
#include <string>
int main()
{
std::vector<std::string> data = {"Hello World!", "Goodbye World!"};
for (std::vector<std::string>::iterator t = data.begin(); t != data.end(); t++) {
std::cout << *t << std::endl;
}
return 0;
}
Or with C++11 (or higher):
#include <iostream>
#include <vector>
#include <string>
typedef std::vector<std::string> STRVEC;
int main()
{
STRVEC data = {"Hello World!", "Goodbye World!"};
for (auto &s: data) {
std::cout << s << std::endl;
}
return 0;
}
From the Open Watcom V2 Fork-Wiki on the C++ Library Status page:
<string>
Mostly complete. Although there are no I/O operators, all other member functions and string operations are available.
A workaround (besides implementing the << operator) would be asking the string instances for the C string:
for (vector<string>::iterator t = data.begin(); t != data.end(); ++t) {
cout << t->c_str() << endl;
}
This of course only works as long as the strings don't contain zero byte values.
When I compile your code, I get:
40234801.cpp:3:17: fatal error: stdio: No such file or directory
#include <stdio>
^
You clearly have a header called "stdio" in your include path that you haven't shown us.
If you change that line to the standard #include <iostream>, then the only reported error is that you wrote void main() instead of int main(). Fix that, and it will build and run.
In passing, note also that using namespace should be avoided.
I found a solution to my own issue. Instead of using a c_str, I used std::string and switched to using the G++ compiler instead of Open Watcom
Instead of having:
char *someString="Blah blah blah";
I instead replaced it with:
string someString="Blah blah blah";
This way is much more efficient and easier.

Error trying to convert from string to int using boost regex match in c++

I am trying to convert the matched string into a int using regex/boost.
I used this C++ to convert Boost Regex match result to other format as a reference. However when I tried I got expected primary-expression before ‘int’ and Symbol 'lexical_cast' could not be resolved error.
this is my code:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main(){
string a = "123";
boost::regex e("123");
boost::smatch match;
if (boost::regex_search(a, match, e))
{
int number = boost::lexical_cast<int>(match[0]);
cout << number << endl;
}
return 0;
}
why am I getting these errors?
you forgot this line:
#include <boost/lexical_cast.hpp>