Splitting text into a list of words with ICU

Splitting text into a list of words with ICU - c++

I'm working on a text tokenizer. ICU is one of very few C++ libraries that have this feature, and probably the best maintained one, so I'd like to use it.
I've found the docs about BreakIterator, but there's one problem with it: how do I leave the punctuation out?
#include "unicode/brkiter.h"
#include <QFile>
#include <vector>
std::vector<QString> listWordBoundaries(const UnicodeString& s)
{
UErrorCode status = U_ZERO_ERROR;
BreakIterator* bi = BreakIterator::createWordInstance(Locale::getUS(), status);
std::vector<QString> words;
bi->setText(s);
for (int32_t p = bi->first(), prevBoundary = 0; p != BreakIterator::DONE; prevBoundary = p, p = bi->next())
{
const auto word = s.tempSubStringBetween(prevBoundary, p);
char buffer [16384];
word.toUTF8(CheckedArrayByteSink(buffer, 16384));
words.emplace_back(QString::fromUtf8(buffer));
}
delete bi;
return words;
}
int main(int /*argc*/, char * /*argv*/ [])
{
QFile f("E:\\words.TXT");
f.open(QFile::ReadOnly);
QFile result("E:\\words.TXT");
result.open(QFile::WriteOnly);
const QByteArray strData = f.readAll();
for (const QString& word: listWordBoundaries(UnicodeString::fromUTF8(StringPiece(strData.data(), strData.size()))))
{
result.write(word.toUtf8());
result.write("\n");
}
return 0;
}
Naturally, the resulting file looks like this:
“
Come
outside
.
Best
if
we
do
not
wake
him
.
”
What I need is just the words. How can this be done?

QT library include several useful methods for check the char's properties:
QChar.
Indeed, you could create the QString variable from the buffer
and check all properties you need before to insert into the output vector.
For example:
auto token = QString::fromUtf8(buffer);
if (token.length() > 0 && token.data()[0].isPunct() == false) {
words.push_back(std::move(token));
}
With that code I can access the first character of the string and check
whether it is a punctuation mark or not.
Something more robust, I express that as function:
bool isInBlackList(const QString& str) {
const auto len = str.lenght();
if (len == 0) return true;
for(int i = 0; i < len; ++i) {
const auto&& c = str.data()[i];
if (c.isPunct() == true || c.isSpace() == true) {
return true;
}
}
return false;
}
If that function returns true, the token hasn't to be inserted into the vector.

Related

Check if two given strings are isomorphic to each other c++, not sure why it's wrong

class Solution {
public:
bool isIsomorphic(string s, string t) {
vector <int> sfreq (26,0);
vector <int> tfreq (26,0);
for (int i=0; i<s.size(); i++) {
sfreq[s[i]-'a']++;
tfreq[t[i]-'a']++;
}
if (sfreq != tfreq) {
return false;
}
return true;
}
};
Hi, this is my code in c++, I saw something similar from https://www.geeksforgeeks.org/check-if-two-given-strings-are-isomorphic-to-each-other/ but my answer shows it's wrong. Can anyone please tell me why it's wrong?

You completely misunderstood the description.
Your question suggests that any permutation of characters in input do not change answer. Also you assumed that histograms are equal.
Position of character is important. Each position in both strings creates a unique pair.
Here my code which passed:
class Solution {
public:
static bool canMapInOneDirection(std::string_view s, std::string_view t)
{
const auto n = s.size();
std::array<char, 128> mapping{};
for(size_t i = 0; i < n; ++i) {
if (mapping[s[i]] == 0) mapping[s[i]] = t[i];
else if (mapping[s[i]] != t[i]) return false;
}
return true;
}
bool isIsomorphic(string s, string t)
{
return s.size() == t.size() && canMapInOneDirection(s, t) && canMapInOneDirection(t, s);
}
};
And test cases you can use to test your code:
s
t
answear
"a"
"b"
true
"aa"
"bb"
true
"ab"
"aa"
false
"aabbcc"
"aabcbc"
false
https://godbolt.org/z/61EcTK5fq

This not a question about anagrams or directly about character frequencies. It is about pattern. It's about having a character-by-character mapping that makes one string into the other. AABC is isomorphic to XXYZ but not isomorphic to BCAA.
When we talk about Isomorphism (same form) it's often a good idea to look for a signature representation.
So instead of determining if two strings are isomorphic I've decided to define a unique signature representation and determine isomorphism if two strings map to the same signature.
I've used std::vector<char> for the signature representation such that the first character (if any) is assigned 0 the second (previously unseen) character 1 and so on.
So a string like MOON has signature {0,1,1,2} because the middle characters are the only repeats. MOON is isomorphic to BOOK but not NOON.
The advantage of such a strategy is that if many strings are to be compared to find groups of mutually isomorphic strings each string need only be converted to its signature once.
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
std::vector<char> get_signature(const std::string& str){
std::vector<char> result;
std::unordered_map<char,char> map;
char curr{1};
for(auto cchar : str){
char& c{map[cchar]};
if(c==0){
c=curr++;
}
result.emplace_back(c-1);
}
return result;
}
int check_signature(const std::string& str, const std::vector<char>& expect ){
const auto result{get_signature(str)};
return result==expect?0:1;
}
int main() {
int errors{0};
{
const std::string str{"ABCDE"};
const std::vector<char> signat{0,1,2,3,4};
errors+=check_signature(str,signat);
}
{
const std::string str{"BABY"};
const std::vector<char> signat{0,1,0,2};
errors+=check_signature(str,signat);
}
{
const std::string str{"XXYZX"};
const std::vector<char> signat{0,0,1,2,0};
errors+=check_signature(str,signat);
}
{
const std::string str{"AABCA"};
const std::vector<char> signat{0,0,1,2,0};
errors+=check_signature(str,signat);
}
{
const std::string str{""};
const std::vector<char> signat{};
errors+=check_signature(str,signat);
}
{
const std::string str{"Z"};
const std::vector<char> signat{0};
errors+=check_signature(str,signat);
}
if(get_signature("XXYZX")!=get_signature("AABCA")){
++errors;
}
if(get_signature("MOON")==get_signature("AABCA")){
++errors;
}
if(get_signature("MOON")!=get_signature("BOOK")){
++errors;
}
if(get_signature("MOON")==get_signature("NOON")){
++errors;
}
if(errors!=0){
std::cout << "ERRORS\n";
}else{
std::cout << "SUCCESS\n";
}
return 0;
}
Expected Output: SUCCESS

Because you are missing a loop.
Note that, it still requires more corner case checking to make it fully work. The second approach properly handles all cases.
class Solution {
public:
bool isIsomorphic(string s, string t) {
vector <int> sfreq (26,0);
vector <int> tfreq (26,0);
for (int i=0; i < s.size(); i++) {
sfreq[s[i]-'a']++;
tfreq[t[i]-'a']++;
}
// character at the same index (can be different character) should have the same count.
for(int i= 0; i < s.size(); i++)
if (sfreq[s[i]-'a'] != tfreq[t[i]-'a']) return false;
return true;
}
};
But the above solution only works if there is direct index mappping between characters. Like, AAABBCA and XXXYYZX. But fails for bbbaaaba and aaabbbba. Also, no uppercase, lowercase handled. The link you shared contains the wrong implementation which is mentioned in the comment.
The solution below works as I tested.
class Solution {
public:
bool isIsomorphic(string s, string t) {
vector<int> scount(128, -1), tcount(128, -1);
for (int i = 0; i < s.size(); ++i) {
auto schar = s[i], tchar = t[i];
if (scount[schar] == -1) {
scount[schar] = tchar;
if (tcount[tchar] != -1) return false;
else tcount[tchar] = schar;
} else if (scount[schar] != tchar) return false;
}
return true;
}
};

How can I speed up parsing of large strings?

So I've made a program that reads in various config files. Some of these config files can be small, some can be semi-large (largest one is 3,844 KB).
The read in file is stored in a string (in the program below it's called sample).
I then have the program extract information from the string based on various formatting rules. This works well, the only issue is that when reading larger files it is very slow....
I was wondering if there was anything I could do to speed up the parsing or if there was an existing library that does what I need (extract string up until a delimiter & extract string string in between 2 delimiters on the same level). Any assistance would be great.
Here's my code & a sample of how it should work...
#include "stdafx.h"
#include <string>
#include <vector>
std::string ExtractStringUntilDelimiter(
std::string& original_string,
const std::string& delimiter,
const int delimiters_to_skip = 1)
{
std::string needle = "";
if (original_string.find(delimiter) != std::string::npos)
{
int total_found = 0;
auto occurance_index = static_cast<size_t>(-1);
while (total_found != delimiters_to_skip)
{
occurance_index = original_string.find(delimiter);
if (occurance_index != std::string::npos)
{
needle = original_string.substr(0, occurance_index);
total_found++;
}
else
{
break;
}
}
// Remove the found string from the original string...
original_string.erase(0, occurance_index + 1);
}
else
{
needle = original_string;
original_string.clear();
}
if (!needle.empty() && needle[0] == '\"')
{
needle = needle.substr(1);
}
if (!needle.empty() && needle[needle.length() - 1] == '\"')
{
needle.pop_back();
}
return needle;
}
void ExtractInitialDelimiter(
std::string& original_string,
const char delimiter)
{
// Remove extra new line characters
while (!original_string.empty() && original_string[0] == delimiter)
{
original_string.erase(0, 1);
}
}
void ExtractInitialAndFinalDelimiters(
std::string& original_string,
const char delimiter)
{
ExtractInitialDelimiter(original_string, delimiter);
while (!original_string.empty() && original_string[original_string.size() - 1] == delimiter)
{
original_string.erase(original_string.size() - 1, 1);
}
}
std::string ExtractStringBetweenDelimiters(
std::string& original_string,
const std::string& opening_delimiter,
const std::string& closing_delimiter)
{
const size_t first_delimiter = original_string.find(opening_delimiter);
if (first_delimiter != std::string::npos)
{
int total_open = 1;
const size_t opening_index = first_delimiter + opening_delimiter.size();
for (size_t i = opening_index; i < original_string.size(); i++)
{
// Check if we have room for opening_delimiter...
if (i + opening_delimiter.size() <= original_string.size())
{
for (size_t j = 0; j < opening_delimiter.size(); j++)
{
if (original_string[i + j] != opening_delimiter[j])
{
break;
}
else if (j == opening_delimiter.size() - 1)
{
total_open++;
}
}
}
// Check if we have room for closing_delimiter...
if (i + closing_delimiter.size() <= original_string.size())
{
for (size_t j = 0; j < closing_delimiter.size(); j++)
{
if (original_string[i + j] != closing_delimiter[j])
{
break;
}
else if (j == closing_delimiter.size() - 1)
{
total_open--;
}
}
}
if (total_open == 0)
{
// Extract result, and return it...
std::string needle = original_string.substr(opening_index, i - opening_index);
original_string.erase(first_delimiter, i + closing_delimiter.size());
// Remove new line symbols
ExtractInitialAndFinalDelimiters(needle, '\n');
ExtractInitialAndFinalDelimiters(original_string, '\n');
return needle;
}
}
}
return "";
}
int main()
{
std::string sample = "{\n"
"Line1\n"
"Line2\n"
"{\n"
"SubLine1\n"
"SubLine2\n"
"}\n"
"}";
std::string result = ExtractStringBetweenDelimiters(sample, "{", "}");
std::string LineOne = ExtractStringUntilDelimiter(result, "\n");
std::string LineTwo = ExtractStringUntilDelimiter(result, "\n");
std::string SerializedVector = ExtractStringBetweenDelimiters(result, "{", "}");
std::string SubLineOne = ExtractStringUntilDelimiter(SerializedVector, "\n");
std::string SubLineTwo = ExtractStringUntilDelimiter(SerializedVector, "\n");
// Just for testing...
printf("LineOne: %s\n", LineOne.c_str());
printf("LineTwo: %s\n", LineTwo.c_str());
printf("\tSubLineOne: %s\n", SubLineOne.c_str());
printf("\tSubLineTwo: %s\n", SubLineTwo.c_str());
system("pause");
}

Use string_view or a hand rolled one.
Don't modify the string loaded.
original_string.erase(0, occurance_index + 1);
is code smell and going to be expensive with a large original string.
If you are going to modify something, do it in one pass. Don't repeatedly delete from the front of it -- that is O(n^2). Instead, procceed along it and shove "finished" stuff into an output accumulator.
This will involve changing how your code works.

You're reading your data into a string. "Length of string" should not be a problem. So far, so good...
You're using "string.find().". That's not necessarily a bad choice.
You're using "string.erase()". That's probably the main source of your problem.
SUGGESTIONS:
Treat the original string as "read-only". Don't call erase(), don't modify it.
Personally, I'd consider reading your text into a C string (a text buffer), then parsing the text buffer, using strstr().

Here is a more efficient version of ExtractStringBetweenDelimiters. Note that this version does not mutate the original buffer. You would perform subsequent queries on the returned string.
std::string trim(std::string buffer, char what)
{
auto not_what = [&what](char ch)
{
return ch != what;
};
auto first = std::find_if(buffer.begin(), buffer.end(), not_what);
auto last = std::find_if(buffer.rbegin(), std::make_reverse_iterator(first), not_what).base();
return std::string(first, last);
}
std::string ExtractStringBetweenDelimiters(
std::string const& buffer,
const char opening_delimiter,
const char closing_delimiter)
{
std::string result;
auto first = std::find(buffer.begin(), buffer.end(), opening_delimiter);
if (first != buffer.end())
{
auto last = std::find(buffer.rbegin(), std::make_reverse_iterator(first),
closing_delimiter).base();
if(last > first)
{
result.assign(first + 1, last);
result = trim(std::move(result), '\n');
}
}
return result;
}
If you have access to string_view (c++17 for std::string_view or boost::string_view) you could return one of these from both functions for extra efficiency.
It's worth mentioning that this method of parsing a structured file is going to cause you problems down the line if any of the serialised strings contains a delimiter, such as a '{'.
In the end you'll want to write or use someone else's parser.
The boost::spirit library is a little complicated to learn, but creates very efficient parsers for this kind of thing.

C++ efficient parse

I am programming some automated test equipment (ATE) and I'm trying to extract the following values out of an example response from the ATE:
DCRE? 1,
DCRE P, 10.3, (pin1)
DCRE F, 200.1, (pin2)
DCRE P, 20.4, (pin3)
From each line, I only care about the pin and the measured result value. So for the case above, I want to store the following pieces of information in a map<std::string, double> results;
results["pin1"] = 50.3;
results["pin2"] = 30.8;
results["pin3"] = 70.3;
I made the following code to parse the response:
void parseResultData(map<Pin*, double> &pinnametoresult, string &datatoparse) {
char *p = strtok((char*) datatoparse.c_str(), " \n");
string lastread;
string current;
while (p) {
current = p;
if(current.find('(') != string::npos) {
string substring = lastread.substr(1);
const char* last = substring.c_str();
double value = strtod(last, NULL);
unsigned short number = atoi(current.substr(4, current.size()-2).c_str());
pinnametoresult[&pinlookupmap[number]] = value;
}
lastread = p;
p = strtok(NULL, " \n");
}
}
It works, but it's not very efficient. Is there a way to make the function more efficient for this specific case? I don't care about the DCRE or P/F value on each line. I thought about using Boost regex library, but not sure if that would be more efficient.

In order to make this a bit more efficient, try to avoid copying. In particular, calls to substring, assignments etc can cause havoc on the performance. If you look at your code, you will see that the content of datatoparse are repeatedly assigned to lastread and current, each time with one line less at the beginning. So, on average you copy half of the original string times the number of lines, making just that part an O(n^2) algorithm. This isn't relevant if you have three or four line (not even on 100 lines!) but if you have a few more, performance degrades rapidly.
Try this approach instead:
string::size_type p0 = 0;
string::size_type p1 = input.find('\n', p0);
while (p1 != string::npos) {
// extract the line
string line = input.substr(p0, p1 - p0);
// move to the next line
p0 = p1 + 1;
p1 = input.find('\n', p0);
}
Notes:
Note that the algorithm still copies all input once, but each line only once, making it O(n).
Since you have a copy of the line, you can insert '\0' as artificial separator in order to give a substring to e.g. atoi() or strtod().
I'm not 100% sure of the order of parameters for string::find() and too lazy to look it up, but the idea is to start searching at a certain position. Look at the various overloads of find-like functions.
When handling a line, search the indices of the parts you need and then extract and parse them.
If you have line fragments (i.e. a partial line without a newline) at the end, you will have to modify the loop slightly. Create tests!

This is what I did:
#include <cstdlib>
#include <string>
#include <vector>
#include <unordered_map>
#include <sstream>
#include <iostream>
using namespace std;
struct Pin {
string something;
Pin() {}
};
vector<Pin*> pins = { new Pin(), new Pin(), new Pin() };
typedef unordered_map<Pin*, double> CONT_T;
inline bool OfInterest(const string& line) {
return line.find("(") != string::npos;
}
void parseResultData(CONT_T& pinnametoresult, const string& datatoparse)
{
istringstream is(datatoparse);
string line;
while (getline(is, line)) {
if (OfInterest(line)) {
double d = 0.0;
unsigned int pinid;
size_t firstComma = line.find(",")+2; // skip space
size_t secondComma = line.find(",", firstComma);
istringstream is2(line.substr(firstComma, secondComma-firstComma));
is2 >> d;
size_t paren = line.find("(")+4; // skip pin
istringstream is3(line.substr(paren, (line.length()-paren)-1));
is3 >> pinid;
--pinid;
Pin* pin = pins[pinid];
pinnametoresult[pin] = d;
}
}
}
/*
*
*/
int main(int argc, char** argv) {
string datatoparse = "DCRE? 1, \n"
"DCRE P, 10.3, (pin1)\n"
"DCRE F, 200.1, (pin2)\n"
"DCRE P, 20.4, (pin3)\n";
CONT_T results;
parseResultData(results, datatoparse);
return 0;
}

Here's my final result. Does not involve any copying, but it will destroy the string.
void parseResultData3(map<std::string, double> &pinnametoresult, std::string &datatoparse) {
char* str = (char*) datatoparse.c_str();
int length = datatoparse.size();
double lastdouble = 0.0;
char* startmarker = NULL; //beginning of next pin to parse
for(int pos = 0; pos < length; pos++, str++) {
if(str[0] == '(') {
startmarker = str + 1;
//get previous value
bool triggered = false;
for(char* lookback = str - 1; ; lookback--) {
if(!triggered && (isdigit(lookback[0]) || lookback[0] == '.')) {
triggered = true;
*(lookback + 1) = '\0';
}
else if(triggered && (!isdigit(lookback[0]) && lookback[0] != '.')) {
lastdouble = strtod(lookback, NULL);
break;
}
}
}
else if(startmarker != NULL) {
if(str[0] == ')') {
str[0] = '\0';
pinnametoresult[startmarker] = lastdouble;
startmarker = NULL;
}
if(str[0] == ',') {
str[0] = '\0';
pinnametoresult[startmarker] = lastdouble;
startmarker = str + 1;
}
}
}
}

how to get filenames from a folder and keep filenames order unchanged?

I have got filenames from a folder and sent names into vector<string>, but when I printed the vector<string>, I found that the order was not the same order as files in the folder.
My code is shown as follows:
#include <windows.h>
#include <iostream>
#include <vector>
using namespace std;
void searchFileInDirectroy( const string& dir, vector<string>& outList );
void searchFileInDirectroy( const string& dir, vector<string>& outList )
{
WIN32_FIND_DATA findData;
HANDLE hHandle;
string filePathName;
string fullPathName;
filePathName = dir;
filePathName += "\\*.*";
hHandle = FindFirstFile( filePathName.c_str(), &findData );
if( INVALID_HANDLE_VALUE == hHandle )
{
cout<<"Error"<<endl;
return ;
}
do
{
if( strcmp(".", findData.cFileName) == 0 || strcmp("..", findData.cFileName) == 0 )
{
continue;
}
fullPathName = dir;
fullPathName += "\\";
fullPathName += findData.cFileName;
if( findData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY )
{
searchFileInDirectroy( fullPathName, outList );
}
else
{
outList.push_back(fullPathName);
}
} while( FindNextFile( hHandle, &findData ) );
FindClose( hHandle );
}
int main()
{
///get filenames from folder;
vector<string> pathList;
searchFileInDirectroy("D:/OpenCV/calculate laef area--cui.ver2.0/source", pathList);
for(unsigned int i=0;i<pathList.size();i++)
{
cout<<pathList[i]<<endl;
}
return 0;
}
The result is like that:
What I really want is that the order is from 1 to 12.

I found that the order was not the same order as files in the folder.
You probably mean that they are not ordered naturally, i.e. the numbers in the filenames are seemingly not respected.
This is because lexigraphical comparison does not respect maths. "12" is less than "2", because "12" already "wins" for the first character, as the strings are compared character by character and '1' is less than '2'.
So you first need an algorithm for natural ordering. C++ does not provide one, but it provides a way to sort ranges with any given ordering relationship, using std::sort:
#include <algorithm>
// ...
struct NaturalOrdering
{
bool operator()(std::string const &lhs, std::string const &rhs) const
{
// ...
}
};
// ...
vector<string> pathList;
// ...
std::sort(pathList.begin(), pathList.end(), NaturalOrdering());
The goal thus becomes to find an algorithm which defines a natural less-than relationship between the two. This not a trivial task if you want to cover each and every corner case. If you search on Google for "string natural order", you will find countless algorithms to use.
Here's a quick self-made one. Its idea is to divide strings into tokens, each containing only digits (like "123") or no digits at all (like "file"). The tokens are then compared individually. If both are numbers, they are converted to ints and compared mathematically, otherwise they are compared lexicographically.
Feel free to take this thing and improve it if it's actually too slow or has other problems. Its intention is more educational than usage in production code:
#include <iostream>
#include <string>
#include <vector>
#include <ctype.h>
#include <algorithm>
#include <sstream>
struct Token
{
bool is_number;
std::string string;
};
std::vector<Token> Tokenize(std::string const &input)
{
std::string const digits = "0123456789";
std::vector<Token> result;
if (!input.empty())
{
bool inside_number_token = isdigit(static_cast<unsigned char>(input[0])) != 0;
std::string::size_type start_current_token = 0;
std::string::size_type start_next_token = 0;
do
{
if (inside_number_token)
{
start_next_token = input.find_first_not_of(digits, start_current_token);
}
else
{
start_next_token = input.find_first_of(digits, start_current_token);
}
std::string const string = input.substr(start_current_token, start_next_token - start_current_token);
Token token;
token.is_number = inside_number_token;
token.string = string;
result.push_back(token);
start_current_token = start_next_token;
inside_number_token = !inside_number_token;
}
while (start_current_token != std::string::npos);
}
return result;
}
int ToInteger(std::string const &number_as_string)
{
std::istringstream converter(number_as_string);
int integer = 0;
converter >> integer;
return integer;
}
struct NaturalOrder
{
bool operator()(std::string const &lhs, std::string const &rhs) const
{
std::vector<Token> const tokens_lhs = Tokenize(lhs);
std::vector<Token> const tokens_rhs = Tokenize(rhs);
for (std::vector<Token>::size_type index = 0; index < tokens_lhs.size() && index < tokens_rhs.size(); ++index)
{
Token const &token_lhs = tokens_lhs[index];
Token const &token_rhs = tokens_rhs[index];
if (token_lhs.is_number && token_rhs.is_number)
{
int const number_lhs = ToInteger(token_lhs.string);
int const number_rhs = ToInteger(token_rhs.string);
if (number_lhs != number_rhs)
{
return number_lhs < number_rhs;
}
}
else
{
if (token_lhs.string != token_rhs.string)
{
return token_lhs.string < token_rhs.string;
}
}
}
return false;
}
};
int main()
{
std::vector<std::string> filenames;
filenames.push_back("file-10.txt");
filenames.push_back("file-2.txt");
filenames.push_back("file.txt");
filenames.push_back("100.txt");
filenames.push_back("100.txt");
filenames.push_back("file-23.txt");
filenames.push_back("file-11.txt");
filenames.push_back("test-01-a.txt");
filenames.push_back("test-022-b.txt");
filenames.push_back("test-03-c.txt");
filenames.push_back("aaa-10-2");
filenames.push_back("aaa-10-1");
std::sort(filenames.begin(), filenames.end(), NaturalOrder());
for (std::vector<std::string>::const_iterator iter = filenames.begin(); iter != filenames.end(); ++iter)
{
std::cout << *iter << "\n";
}
}
Output:
100.txt
100.txt
aaa-10-1
aaa-10-2
file-2.txt
file-10.txt
file-11.txt
file-23.txt
file.txt
test-01-a.txt
test-03-c.txt
test-022-b.txt

Finding adjacent chars in a string

How to find two adjacent characters in a string? My search for adjacent characters should only consider a set of characters defined by me.
I solved my problem using this function:
unsigned checkField (myset string, char mychar)
{
unsigned counter;
for (counter = 0; counter <= myset.length () - 1; counter + +)
if (myset [counter] == mychar)
return 1;
return 0; / * NOT FOUND * /
}
It may be useful to someone in the future

If it's ok to use boost, and you don't need the ultimate in efficiency, then the easiest way may be to use a regular expression such as "([abcd])\\1". For details on matching strings with boost regexps, see the boost regex docs.

I imagine you are storing each part of your equation separately at some point? Eg. "55" "+" "hh" "+" "bc" ?
In this case would it not just be enough to check that the sizeof is 1, and send an error if not? Sorry if I am missing something! Otherwise regular expressions as Edward suggested (+1) seem most appropriate.
Edit: also, of course it would be easy to check that the chars are/are not the ones you specified at the top.

Some quick code:
#include <cstdio>
#include <cstdlib>
#include <cstring>
const char* find_adjacent_string(const char* str, const char* set)
{
const char* loc = NULL;
if(set != NULL)
{
int size = strlen(set);
char adj[3];
adj[2] = '\0';
for(int i = 0; i + 1 < size; i++)
{
adj[0] = set[i];
adj[1] = set[i + 1];
loc = strstr(str, adj);
if(loc != NULL)
{
break;
}
}
}
return loc;
}
int main()
{
const char* myset = "pl";
const char* mystr = "apple";
printf("found at %i\n", find_adjacent_string(mystr, myset) - mystr);
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Splitting text into a list of words with ICU - c++

Related

Check if two given strings are isomorphic to each other c++, not sure why it's wrong

How can I speed up parsing of large strings?

C++ efficient parse

how to get filenames from a folder and keep filenames order unchanged?

Finding adjacent chars in a string

Categories

Resources