How to split a string from a vector [duplicate] - c++

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 6 years ago.
My homework is as follows:
Step Two - Create a file called connections.txt with a format like:
Kelp-SeaUrchins
Kelp-SmallFishes
Read these names in from a file and split each string into two (org1,org2). Test your work just through printing for now. For example:
cout << “pair = “ << org1 << “ , “ << org2 << endl;
I am not sure how to split the string, that is stored in a vector, using the hyphen as the token to split it. I was instructed to either create my own function, something like int ind(vector(string)orgs, string animal) { return index of animal in orgs.} or use the find function.

Here is one approach...
Open the file:
ifstream file{ "connections.txt", ios_base::in };
if (!file) throw std::exception("failed to open file");
Read all the lines:
vector<string> lines;
for (string line; file >> line;)
lines.push_back(line);
You can use the regex library from C++11:
regex pat{ R"(([A-Za-z0-9]+)-([A-Za-z0-9]+))" };
for (auto& line : lines) {
smatch matches;
if (regex_match(line, matches, pat))
cout << "pair = " << matches[1] << ", " << matches[2] << endl;
}
You will have to come up with the pattern according to your needs.
Here it will try to match for "at least one alphanumeric" then a - then "at least one alphanumeric".
matches[0] will contain the whole matched string.
matches[1] will contain first alphanumeric part, i.e. your org1
matches[2] will contain second alphanumeric part, i.e. your org2
(You can take them in variables org1 and org2 if you want.)
If org1 and org2 don't contain any white spaces, then you can use another trick.
In every line, you can replace - with a blank.(std::replace).
Then simply use stringstreams to get your tokens.
Side note: This is just to help you. You should do your homework on your own.

Related

How can I replace all words in a string except one

So, I would like to change all words in a string except one, that stays in the middle.
#include <boost/algorithm/string/replace.hpp>
int main()
{
string test = "You want to join player group";
string find = "You want to join group";
string replace = "This is a test about group";
boost::replace_all(test, find, replace);
cout << test << endl;
}
The output was expected to be:
This is a test about player group
But it doesn't work, the output is:
You want to join player group
The problem is on finding out the words, since they are a unique string.
There's a function that reads all words, no matter their position and just change what I want?
EDIT2:
This is the best example of what I want to happen:
char* a = "This is MYYYYYYYYY line in the void Translate"; // This is the main line
char* b = "This is line in the void Translate"; // This is what needs to be find in the main line
char* c = "Testing - is line twatawtn thdwae voiwd Transwlate"; // This needs to replace ALL the words in the char* b, perserving the MYYYYYYYYY
// The output is expected to be:
Testing - is MYYYYYYYY is line twatawtn thdwae voiwd Transwlate
You need to invert your thinking here. Instead of matching "All words but one", you need to try to match that one word so you can extract it and insert it elsewhere.
We can do this with Regular Expressions, which became standardized in C++11:
std::string test = "You want to join player group";
static const std::regex find{R"(You want to join (\S+) group)"};
std::smatch search_result;
if (!std::regex_search(test, search_result, find))
{
std::cerr << "Could not match the string\n";
exit(1);
}
else
{
std::string found_group_name = search_result[1];
auto replace = boost::format("This is a test about %1% group") % found_group_name;
std::cout << replace;
}
Live Demo
To match the word "player" I used a pretty simply regular expression (\S+) which means "match one or more non-whitespace characters (greedily) and put that into a group"
"Groups" in regular expressions are enclosed by parentheses. The 0th group is always the entire match, and since we only have one set of parentheses, your word is therefore in group 1, hence the resulting access of the match result at search_result[1].
To create the regular expression, you'll notice I used the perhaps-unfamiliar string literal syntaxR"(...)". This is called a raw string literal and was also standardized in C++11. It was basically made for describing regular expressions without needing to escape backslashes. If you've used Python, it's the same as r'...'. If you've used C#, it's the same as #"..."
I threw in some boost::format to print the result because you were using Boost in the question and I thought you'd like to have some fun with it :-)
In your example, find is not a substring of test, so boost::replace_all(test, find, replace); has no effect.
Removing group from find and replace solves it:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main()
{
std::string test = "You want to join player group";
std::string find = "You want to join";
std::string replace = "This is a test about";
boost::replace_all(test, find, replace);
std::cout << test << std::endl;
}
Output: This is a test about player group.
In this case, there is just one replace of the beginning of the string because the end of the string is already the right one. You could have another call of replace_all to change the end if needed.
Some other options:
one is in the other answer.
split the strings into a vector (or array) of words, then insert the desired word (player) at the right spot of the replace vector, then build your output string from it.

How to assign string a char array that starts from the middle of the array?

For example in the following code:
char name[20] = "James Johnson";
And I want to assign all the character starting after the white space to the end of the char array, so basically the string is like the following: (not initialize it but just show the idea)
string s = "Johnson";
Therefore, essentially, the string will only accept the last name. How can I do this?
i think you want like this..
string s="";
for(int i=strlen(name)-1;i>=0;i--)
{
if(name[i]==' ')break;
else s+=name[i];
}
reverse(s.begin(),s.end());
Need to
include<algorithm>
There's always more than one way to do it - it depends on exactly what you're asking.
You could either:
search for the position of the first space, and then point a char* at one-past-that position (look up strchr in <cstring>)
split the string into a list of sub-strings, where your split character is a space (look up strtok or boost split)
std::string has a whole arsenal of functions for string manipulation, and I recommend you use those.
You can find the first whitespace character using std::string::find_first_of, and split the string from there:
char name[20] = "James Johnson";
// Convert whole name to string
std::string wholeName(name);
// Create a new string from the whole name starting from one character past the first whitespace
std::string lastName(wholeName, wholeName.find_first_of(' ') + 1);
std::cout << lastName << std::endl;
If you're worried about multiple names, you can also use std::string::find_last_of
If you're worried about the names not being separated by a space, you could use std::string::find_first_not_of and search for letters of the alphabet. The example given in the link is:
std::string str ("look for non-alphabetic characters...");
std::size_t found = str.find_first_not_of("abcdefghijklmnopqrstuvwxyz ");
if (found!=std::string::npos)
{
std::cout << "The first non-alphabetic character is " << str[found];
std::cout << " at position " << found << '\n';
}

regex to find anything except "this String"

I need a regex to find anything except this String
example data is:
this is one line with this string only this should not match
this is another line but has no string in
this String is another
and then this line should match also
I want to find and highlight the entire line, so the line
this is one line with this string is the only one that will be selected.
I tried ^(?!(this String)$) but this find zero length match, so not much help, I tried adding .* in various places but don't understand how to do this.
Chances are that you don't need regex for this task at all. Look at these two pieces of code:
std::regex pattern("^((?!this string).)*$");
while (std::getline(infile, chkme)) {
if (std::regex_match(chkme,pattern)) {
std::cout << "Not found! " << chkme;
}}
and
std::string pattern = "this string";
while (std::getline(infile, chkme)) {
if (chkme.find(pattern) == string::npos) {
std::cout << "Not found! " << chkme;
}}

Why can't regex find the "(" in a Japanese string in C++?

I have a huge file of Japanese example sentences. It's set up so that one line is the sentence, and then the next line is comprised of the words used in the sentence separated by {}, () and []. Basically, I want to read a line from the file, find only the words in the (), store them in a separate file, and then remove them from the string.
I'm trying to do this with regexp. Here is the text I'm working with:
は 二十歳(はたち){20歳} になる[01]{になりました}
And here's the code I'm using to find the stuff between ():
std::smatch m;
std::regex e ("\(([^)]+)\)"); // matches things between ( and )
if (std::regex_search (components,m,e)) {
printToTest(m[0].str(), "what we got"); //Prints to a test file "what we got: " << m[0].str()
components = m.prefix().str().append(m.suffix().str());
//commponents is a string
printToTest(components, "[COMP_AFTER_REMOVAL]");
//Prints to test file "[COMP_AFTER_REMOVAL]: " << components
}
Here's what should get printed:
what we got:はたち
[COMP_AFTER_REMOVAL]:は 二十歳(){20歳} になる[01]{になりました}
Here's what gets printed:
what we got:は 二十歳(はたち
[COMP_AFTER_REMOVAL]:){20歳} になる[01]{になりました}
It seems like somehow the は is being confused for a (, which makes the regexp go from は to ). I believe it's a problem with the way the line is being read in from the file. Maybe it's not being read in as utf8 somehow. Here's what I do:
xml_document finalDoc;
string sentence;
string components;
ifstream infile;
infile.open("examples.utf");
unsigned int line = 0;
string linePos;
bool eof = infile.eof();
while (!eof && line < 1){
getline(infile, sentence);
getline(infile, components);
MakeSentences(sentence, components, finalDoc);
line++;
}
Is something wrong? Any tips? Need more code? Please help. Thanks.
You forgot to escape your backslashes. The compiler sees "\(([^)]+)\)" and interprets it as (([^)]+)) which is not the regex you wanted.
You need to type "\\(([^)]+)\\)"

C++ - Remove or skip quote char in reading a file line by tokenizer

I have a csv file that has records like:
837478739*"EP"1"3FB2B464BD5003B55CA6065E8E040A2A"*"F"*21*15*"NH"*"N"0*-1*"-1"*0*0**-1*223944*-1*"23"1"-1""-1""78909""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""74425""26""-1"*"-1"*1*1*69*23.58*0*0*0*0*"MC"
The file has lots of records, so I need a fast method to breakdown the line and push_back each of those parts into a vector. The main reason I choose tokenizer is that I heard a lot about its performance. I have a function:
void break(){
//using namespace boost;
string s = "This is a , test '' file";
boost::tokenizer<> tok(s);
vector<string> line;
for(boost::tokenizer<>::iterator beg=tok.begin();beg!=tok.end();++beg){
line.push_back(*beg);
}
cout << line[3] << " and " << line[5] << endl;
}
By that I can get each part of the sentence and ignore everything that is not a letter. Does the tokenizer have the ability to read the record that I have and parse them by "*" delimiter and remove the quotes from the string? There won't be any kind of special character between quotes, I just need to remove the quote marks. I tried to read the tokenizer document, but nothing came out.
You need to assign another TokenizerFunc to your Tokenizer to parse the string differently, the default parses on space and punctuation
http://www.boost.org/doc/libs/1_37_0/libs/tokenizer/tokenizerfunction.htm
You can use regex_replace.
"break" is keyword. You shouldn't use it for function name.