Text Parser c++ code - c++

I need a C++ code for the following problem:
i have a text file that i want to start reading from a specific line, then i need to print the output located between the characters --- <\s>
example: hello<\s>
i want the output to be hello
I think i should use text parser but not sure how!
#include <iostream>
#include <cstdlib>
#include <cctype>
#include <cstring>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char* argv[])
{
std::string line_;
ifstream file_("tty.txt");
if (file_.is_open())
{
while (getline(file_, line_))
{
std::cout << line_ << '\n';
}
file_.close();
}
else
std::cout << "error" << '\n';
std::cin.get();
system("PAUSE");
return 0;
}

You can load all text in one variable, and then with regex search all occurences of your desired pattern (in your case <sth>(any_aplha_numeric_character)*</sth> where * means one or more occurence, you can read about it at any std::regex tutorial)
Example:
std::smatch m;
std::string text = "<a>adsd</a> <a>esd</a>";
std::string::const_iterator searchStart(text.cbegin());
std::regex rgx("<a>[A-Za-z0-9\\s]*</a>");
while (std::regex_search(searchStart, text.cend(), m, rgx))
{
cout << m[0] << endl;
searchStart += m.position() + m.length();
}
gives: <a>adsd</a> and <a>esd</a> as a result, from which is very easy to extract that inner string

Related

How do I search a string from a file and return the line location using functions in C++?

I am trying to make a program that lets me search for groups of words in a file and then it would return the line locations where they are found. I made it work for a little bit but for some reason, the value of int FileLine (line location) keeps on stacking up whenever a new word search is introduced.
include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
using namespace std;
string S1, S2, S, Line;
int FileLine = 0;
int CountInFile(string S) {
ifstream in("DataFile.txt");
while (getline(in, Line)) {
FileLine++;
if (Line.find(S, 0) != string::npos) {
cout << "Found " << S << " at line " << FileLine << "\n";
}
}
return 0;
in.close();
}
int main()
{
// Words to search
CountInFile("Computer Science");
CountInFile("Programming");
CountInFile("C++");
CountInFile("COSC");
CountInFile("computer");
This is the output:
Is there a way that I can stop the FileLine value from stacking?

a "?" before the string

I want to use strings to input the path of files:
char** argv;
char* mytarget[2]={ (char*)"‪D:\\testlas\\BigOne.pcd",(char*)"‪‪D:\\testlas\\SmallOne.pcd" };
argv = mytarget;
for(int i=0;i<2;i++)
{
std::cout << "m.name: " << argv[i] <<std::endl;
}
However, cout outputs:
m.name: ?‪D:\\testlas\\BigOne.pcd
m.name: ?‪D:\\testlas\\SmallOne.pcd
Why is there a ? before the strings?
I use VS2017 C++11.
I created a new program and used the code:
#include <iostream>
#include <string>
#include <cstring>
using namespace std;
int main()
{
std::string test = "‪abc789";
cout << test << endl;
return 0;
}
It also outputs "?abc789". Why?
std::string test = "‪abc789";
There is a hidden LEFT-TO-RIGHT EMBEDDING character between the opening quote " and the first letter a (Unicode character U+202A, or UTF-8 E2 80 AA). Remove it, for example by deleting and retyping the line, then the ? will go away.

How to split string using CRLF delimiter in cpp?

I have some string :
testing testing
test2test2
these lines are devided by CRLF. I saw that there are : 0d0a0d0a deviding them.
How can I split it using this information?
I wanted to use str.find(CRLF-DELIMITER) but can't semm to figure how
editing :
I already used str.find("textDelimiter"), but now I need it to look for hexa and not search for a string "0d0a0d0a"
Use boost::split to do that. Please also take a look at Boost.Tokenizer.
Here is another way of doing it using regex:
using std::endl;
using std::cout;
using std::string;
using std::vector;
using boost::algorithm::split_regex;
int main()
{
vector<string> res;
string input = "test1\r\ntest2\r\ntest3";
split_regex(res, input, boost::regex("(\r\n)+"));
for (auto& tok : res)
{
std::cout << "Token: " << tok << std::endl;
}
return 0;
}
Here is the way of doing it without Boost:
#include <string>
#include <sstream>
#include <istream>
#include <vector>
#include <iostream>
int main()
{
std::string strlist("line1\r\nLine2\r\nLine3\r\n");
std::istringstream MyStream(strlist);
std::vector<std::string> v;
std::string s;
while (std::getline(MyStream, s))
{
v.push_back(s);
std::cout << s << std::endl;
}
return 0;
}

getting a sub string of a std::wstring

How can I get a substring of a std::wstring which includes some non-ASCII characters?
The following code does not output anything:
(The text is an Arabic word contains 4 characters where each character has two bytes, plus the word "Hello")
#include <iostream>
#include <string>
using namespace std;
int main()
{
wstring s = L"سلام hello";
wcout << s.substr(0,3) << endl;
wcout << s.substr(4,5) << endl;
return 0;
}
This should work: live on Coliru
#include <iostream>
#include <string>
#include <boost/regex/pending/unicode_iterator.hpp>
using namespace std;
template <typename C>
std::string to_utf8(C const& in)
{
std::string result;
auto out = std::back_inserter(result);
auto utf8out = boost::utf8_output_iterator<decltype(out)>(out);
std::copy(begin(in), end(in), utf8out);
return result;
}
int main()
{
wstring s = L"سلام hello";
auto first = s.substr(0,3);
auto second = s.substr(4,5);
cout << to_utf8(first) << endl;
cout << to_utf8(second) << endl;
}
Prints
سلا
hell
Frankly though, I think your substring calls are making weird assumptions. Let me suggest a fix for that in a minute:

Case Sensitive Partial Match with Boost's Regex

From the following code, I expect to get this output from the corresponding input:
Input: FOO Output: Match
Input: FOOBAR Output: Match
Input: BAR Output: No Match
Input: fOOBar Output: No Match
But why it gives "No Match" for input FOOBAR?
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
string InputString = arg_vec[1];
string toMatch = "FOO";
const regex e(toMatch);
if (regex_match(InputString, e,match_partial)) {
cout << "Match" << endl;
} else {
cout << "No Match" << endl;
}
return 0;
}
Update:
Finally it works with the following approach:
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
bool testSearchBool(const boost::regex &ex, const string st) {
cout << "Searching " << st << endl;
string::const_iterator start, end;
start = st.begin();
end = st.end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
return boost::regex_search(start, end, what, ex, flags);
}
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
string InputString = arg_vec[1];
string toMatch = "FOO*";
static const regex e(toMatch);
if ( testSearchBool(e,InputString) ) {
cout << "MATCH" << endl;
}
else {
cout << "NOMATCH" << endl;
}
return 0;
}
Use regex_search instead of regex_match.
Your regular expression has to account for characters at the beginning and end of the sub-string "FOO".
I'm not sure but "FOO*" might do the trick
match_partial would only return true if the partial string was found at the end of the text input, not the beginning.
A partial match is one that matched
one or more characters at the end of
the text input, but did not match all
of the regular expression (although it
may have done so had more input been
available)
So FOOBAR matched with "FOO" would return false.
As the other answer suggests, using regex.search would allow you to search for sub-strings more effectively.