I have the following code (C++0x):
const set<char> s_special_characters = { '(', ')', '{', '}', ':' };
void nectar_loader::tokenize( string &line, const set<char> &special_characters )
{
auto it = line.begin();
const auto not_found = special_characters.end();
// first character special case
if( it != line.end() && special_characters.find( *it ) != not_found )
it = line.insert( it+1, ' ' ) + 1;
while( it != line.end() )
{
// check if we're dealing with a special character
if( special_characters.find(*it) != not_found ) // <----------
{
// ensure a space before
if( *(it-1) != ' ' )
it = line.insert( it, ' ' ) + 1;
// ensure a space after
if( (it+1) != line.end() && *(it+1) != ' ' )
it = line.insert( it+1, ' ');
else
line.append(" ");
}
++it;
}
}
with the crash pointing at the indicated line. This results in a segfault with this gdb backtrace:
#0 0x000000000040f043 in std::less<char>::operator() (this=0x622a40, __x=#0x623610, __y=#0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_function.h:230
#1 0x000000000040efa6 in std::_Rb_tree<char, char, std::_Identity<char>, std::less<char>, std::allocator<char> >::_M_lower_bound (this=0x622a40, __x=0x6235f0, __y=0x622a48, __k=#0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1020
#2 0x000000000040e840 in std::_Rb_tree<char, char, std::_Identity<char>, std::less<char>, std::allocator<char> >::find (this=0x622a40, __k=#0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1532
#3 0x000000000040e4fd in std::set<char, std::less<char>, std::allocator<char> >::find (this=0x622a40, __x=#0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_set.h:589
#4 0x000000000040de51 in ambrosia::nectar_loader::tokenize (this=0x7fffffffe3b0, line=..., special_characters=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:146
#5 0x000000000040dbf5 in ambrosia::nectar_loader::fetch_line (this=0x7fffffffe3b0)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:112
#6 0x000000000040dd11 in ambrosia::nectar_loader::fetch_token (this=0x7fffffffe3b0, token=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:121
#7 0x000000000040d9c4 in ambrosia::nectar_loader::next_token (this=0x7fffffffe3b0)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:72
#8 0x000000000040e472 in ambrosia::nectar_loader::extract_nectar<std::back_insert_iterator<std::vector<ambrosia::target> > > (this=0x7fffffffe3b0, it=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:43
#9 0x000000000040d46d in ambrosia::drink_nectar<std::back_insert_iterator<std::vector<ambrosia::target> > > (filename=..., it=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar.cpp:75
#10 0x00000000004072ae in ambrosia::reader::event (this=0x623770)
I'm at a loss, and have no clue where I'm doing something wrong. Any help is much appreciated.
EDIT: the string at the moment of the crash is
sub Ambrosia : lib libAmbrosia
UPDATE:
I replaced the above function following suggestions in comments/answers. Below is the result.
const string tokenize( const string &line, const set<char> &special_characters )
{
const auto not_found = special_characters.end();
const auto end = line.end();
string result;
if( !line.empty() )
{
// copy first character
result += line[0];
char previous = line[0];
for( auto it = line.begin()+1; it != end; ++it )
{
const char current = *it;
if( special_characters.find(previous) != not_found )
result += ' ';
result += current;
previous = current;
}
}
return result;
}
Another guess is that line.append(" ") will sometimes invalidate it, depending on the original capacity of the line.
You don't check that it != line.end() before the first time you dereference it.
I could not spot the error, I would suggest iterating slowly with the debugger since you have identitied the issue.
I'll just that in general, modifying what you are iterating over is extremely prone to failure.
I'd recommend using Boost Tokenizer, and more precisely: boost::token_iterator combined with boost::char_separator (code example included).
You could then simply build a new string from the first, and return the new string from the function. The speed up on computation should cover the memory allocation.
Related
Too many string related queries yet some doubt remains, for each string is different and each requirement is different too.
I have a single string in this form:
Random1A:Random1B::String1 Random2A:Random2B::String2 ... RandomNA:RandomNB::StringN
And I want to get back a single string in this form:
String1 String2 ... StringN
In short, the input string would look like A:B::Val1 P:Q::Val2, and o/p result string would look like "Val1 Val2".
PS: Randoms and Strings are small (variable) length alphanumeric strings.
std::string GetCoreStr ( std::string inputStr, int & vSeqLen )
{
std::string seqStr;
std::string strNew;
seqStr = inputStr;
size_t firstFind = 0;
while ( !seqStr.empty() )
{
firstFind = inputStr.find("::");
size_t lastFind = (inputStr.find(" ") < inputStr.length())? inputStr.find(" ") : inputStr.length();
strNew += inputStr.substr(firstFind+2, lastFind-firstFind-1);
vSeqStr = inputStr.erase( 0, lastFind+1 );
}
vSeqLen = strNew.length();
return strNew;
}
I want to get back a single string String1 String2 ... StringN.
My code works and I get result of my choice, but it is not an optimal form. I want help in improving the code quality.
I ended up doing it the C-way.
std::string GetCoreStr ( const std::string & inputStr )
{
std::string strNew;
for ( int i = 0; i < inputStr.length(); ++i )
{
if ( inputStr[i] == ':' && inputStr[i + 1] == ':' )
{
i += 2;
while ( ( inputStr[i] != ' ' && inputStr[i] != '\0' ) )
{
strNew += inputStr[i++];
}
if ( inputStr[i] == ' ' )
{
strNew += ' ';
}
}
}
return strNew;
}
I am having trouble deciding on how to adjust the offset. [...]
std::string getCoreString(std::string const& input)
{
std::string result;
// optional: avoid reallocations:
result.reserve(input.length());
// (we likely reserved too much – if you have some reliable hint how many
// input parts we have, you might subtract appropriate number)
size_t end = 0;
do
{
size_t begin = input.find("::", end);
// added check: double colon not found at all...
if(begin == std::string::npos)
break;
// single character variant is more efficient, if you need to find just such one:
end = std::min(input.find(' ', begin) + 1, input.length());
result.append(input.begin() + begin + 2, input.begin() + end);
}
while(end < input.length());
return result;
}
Side note: you do not need the additional 'length' output parameter; it's redundant, as the returned string contains the same value...
I have this predicate function which compares two strings alphabetically, the strings being compared are human names so are of unequal length, to get round this the shorter string is padded with white space.
Problem:
I've tracked the bug down to the string padding...which appears to randomly break the string iterator:
ls += std::string( maxlen - ls.size(), ' ' ) ;
rs += std::string( maxlen - rs.size(), ' ' ) ;
Here is what the two string iterators look like after successful padding, as you can see they both point to their respective string as they should:
& here are the same string iterators further down the list of names being compared, as you can see riter is now pointing to 'ar5' not "Aaron Tasso" which I'm guessing is the cause of the error:
I've tried removing the name "Abraham Possinger" from the input but it throws the same error further down the list on another name.
Input:
Aaron Tasso
Aaron Tier
Abbey Wren
Abbie Rubloff
Abby Tomopoulos
Abdul Veith
Abe Lamkin
Abel Kepley
Abigail Stocker
Abraham Possinger
bool
alphanum_string_compare( const std::string& s, const std::string& s2 )
#pragma region CODE
{
// copy strings: ...for padding to equal length..if required?
std::string ls = s ;
std::string rs = s2 ;
// string iters
std::string::const_iterator liter = ls.begin() ;
std::string::const_iterator riter = rs.begin() ;
// find length of longest string
std::string::size_type maxlen = 0 ;
maxlen = std::max( ls.size(), rs.size() ) ;
// pad shorter of the 2 strings by attempting to pad both ;)
// ..only shorter string will be padded!..as the other string == maxlen
// ..possibly more efficient than finding and padding ONLY the shorter string
ls += std::string( maxlen - ls.size(), ' ' ) ;
rs += std::string( maxlen - rs.size(), ' ' ) ;
// init alphabet order map
static std::map<char, int> m = alphabet() ;
//std::map<char, int> m = alphabet();
while( liter != ls.end() && riter != rs.end() )
{
if ( m[ *liter ] < m[ *riter ] ) return true ;
if ( m[ *liter ] > m[ *riter ] ) return false ;
// move to next char
++liter ;
++riter ;
}
return false ;
}
#pragma endregion CODE
The problem is that you pad the strings after you assign the iterators.
// string iters
std::string::const_iterator liter = ls.begin() ;
std::string::const_iterator riter = rs.begin() ;
ls += std::string( maxlen - ls.size(), ' ' ) ; <----------- potentially invalidates iterator
rs += std::string( maxlen - rs.size(), ' ' ) ; <----------- potentially invalidates iterator
while( liter != ls.end() && riter != rs.end() ) { <--- using invalid iterator
if ( m[ *liter ] < m[ *riter ] ) return true ;
if ( m[ *liter ] > m[ *riter ] ) return false ;
// move to next char
++liter ;
++riter ;
}
return false ;
}
Your padding is unneeded if you check after the loop which has ended and return the correct value of true or false there.
The padding may invalidate the iterator when the underlying storage is reallocated on expansion.
You could fix this by retrieving the iterators after the padding, but the padding is unnecessary.
You just need to check where the iterators ended up - s is less than s2 if its iterator reached the end but the other one didn't.
bool
alphanum_string_compare( const std::string& s, const std::string& s2 )
{
static std::map<char, int> m = alphabet();
std::string::const_iterator left = s.begin();
std::string::const_iterator right = s2.begin();
while (left != s.end() && right != s2.end())
{
if (m[*left] < m[*right])
return true;
if (m[*left] > m[*right])
return false;
++left;
++right;
}
return left == s.end() && right != s2.end();
}
std::string is a dynamic object, when you modify it it is quite possible that its internal memory buffer will be re-allocated. At this point your "old" iterators point to a memory that was returned to heap (deleted). It's just the same as with most of containers, for example std::vector - you can copy a iterator to an arbitrary element, but once you add anything to the vector, your iterator may be no longer valid. Any Most "modifying" operation invalidate iterators to such objects.
I don't think it's necessary to pad with whitespace if you're just going to see which name comes first in alphabetical order. One idea could be: check which character is smallest each time around the loop, if one character is smaller than the other, return that string. Example:
string StrCompare(const string& s1, const string& s2)
{
string::size_type len = (s1.length() < s2.length() ? s1.length() : s2.length());
for (auto i = 0; i != len; ++i) {
if (s1[i] < s2[i])
return s1;
else if (s2[i] < s1[i])
return s2;
else
;// do nothing
}
}
main()
string str = StrCompare("Aaron Tasso", "Aaron Tier");
cout << str;
Output: Aaron Tasso
In C/C++, how can I extract from c:\Blabla - dsf\blup\AAA - BBB\blabla.bmp the substrings AAA and BBB ?
i.e. extract the parts before and after - in the last folder of a filename.
Thanks in advance.
(PS: if possible, with no Framework .net or such things, in which I could easily get lost)
#include <iostream>
using namespace std;
#include <windows.h>
#include <Shlwapi.h> // link with shlwapi.lib
int main()
{
char buffer_1[ ] = "c:\\Blabla - dsf\\blup\\AAA - BBB\\blabla.bmp";
char *lpStr1 = buffer_1;
// Remove the file name from the string
PathRemoveFileSpec(lpStr1);
string s(lpStr1);
// Find the last directory name
stringstream ss(s.substr(s.rfind('\\') + 1));
// Split the last directory name into tokens separated by '-'
while (getline(ss, s, '-'))
cout << s << endl;
}
Explanation in comments.
This doesn't trim leading spaces - in the output - if you also want to do that - check this.
This can relatively easily be done with regular expressions:
std::regex if you have C++11; boost::regex if you don't:
static std::regex( R"(.*\\(\w+)\s*-\s*(\w+)\\[^\\]*$" );
smatch results;
if ( std::regex_match( path, results, regex ) ) {
std::string firstMatch = results[1];
std::string secondMatch = results[2];
// ...
}
Also, you definitely should have the functions split and
trim in toolkit:
template <std::ctype_base::mask test>
class IsNot
{
std::locale ensureLifetime;
std::ctype<char> const* ctype; // Pointer to allow assignment
public:
Is( std::locale const& loc = std::locale() )
: ensureLifetime( loc )
, ctype( &std::use_facet<std::ctype<char>>( loc ) )
{
}
bool operator()( char ch ) const
{
return !ctype->is( test, ch );
}
};
typedef IsNot<std::ctype_base::space> IsNotSpace;
std::vector<std::string>
split( std::string const& original, char separator )
{
std::vector<std::string> results;
std::string::const_iterator current = original.begin();
std::string::const_iterator end = original.end();
std::string::const_iterator next = std::find( current, end, separator );
while ( next != end ) {
results.push_back( std::string( current, next ) );
current = next + 1;
next = std::find( current, end, separator );
}
results.push_back( std::string( current, next ) );
return results;
}
std::string
trim( std::string const& original )
{
std::string::const_iterator end
= std::find_if( original.rbegin(), original.rend(), IsNotSpace() ).base();
std::string::const_iterator begin
= std::find_if( original.begin(), end, IsNotSpace() );
return std::string( begin, end );
}
(These are just the ones you need here. You'll obviously want
the full complement of IsXxx and IsNotXxx predicates, a split
which can split according to a regular expression, a trim which
can be passed a predicate object specifying what is to be
trimmed, etc.)
Anyway, the application of split and trim should be obvious
to give you what you want.
This does all the work and validations in plain C:
int FindParts(const char* source, char** firstOut, char** secondOut)
{
const char* last = NULL;
const char* previous = NULL;
const char* middle = NULL;
const char* middle1 = NULL;
const char* middle2 = NULL;
char* first;
char* second;
last = strrchr(source, '\\');
if (!last || (last == source))
return -1;
--last;
if (last == source)
return -1;
previous = last;
for (; (previous != source) && (*previous != '\\'); --previous);
++previous;
{
middle = strchr(previous, '-');
if (!middle || (middle > last))
return -1;
middle1 = middle-1;
middle2 = middle+1;
}
// now skip spaces
for (; (previous != middle1) && (*previous == ' '); ++previous);
if (previous == middle1)
return -1;
for (; (middle1 != previous) && (*middle1 == ' '); --middle1);
if (middle1 == previous)
return -1;
for (; (middle2 != last) && (*middle2 == ' '); ++middle2);
if (middle2 == last)
return -1;
for (; (middle2 != last) && (*last == ' '); --last);
if (middle2 == last)
return -1;
first = (char*)malloc(middle1-previous+1 + 1);
second = (char*)malloc(last-middle2+1 + 1);
if (!first || !second)
{
free(first);
free(second);
return -1;
}
strncpy(first, previous, middle1-previous+1);
first[middle1-previous+1] = '\0';
strncpy(second, middle2, last-middle2+1);
second[last-middle2+1] = '\0';
*firstOut = first;
*secondOut = second;
return 1;
}
The plain C++ solution (without boost, nor C++11), still the regex solution of James Kanze (https://stackoverflow.com/a/16605408/1032277) is the most generic and elegant:
inline void Trim(std::string& source)
{
size_t position = source.find_first_not_of(" ");
if (std::string::npos != position)
source = source.substr(position);
position = source.find_last_not_of(" ");
if (std::string::npos != position)
source = source.substr(0, position+1);
}
inline bool FindParts(const std::string& source, std::string& first, std::string& second)
{
size_t last = source.find_last_of('\\');
if ((std::string::npos == last) || !last)
return false;
size_t previous = source.find_last_of('\\', last-1);
if (std::string::npos == last)
previous = -1;
size_t middle = source.find_first_of('-',1+previous);
if ((std::string::npos == middle) || (middle > last))
return false;
first = source.substr(1+previous, (middle-1)-(1+previous)+1);
second = source.substr(1+middle, (last-1)-(1+middle)+1);
Trim(first);
Trim(second);
return true;
}
Use std::string rfind rfind (char c, size_t pos = npos)
Find character '\' from the end using rfind (pos1)
Find next character '\' using rfind (pos2)
Get the substring between the positions pos2 and pos1. Use substring function for that.
Find character '-' (pos3)
Extract 2 substrings between pos3 and pos1, pos3 and pos2
Remove the spaces in the substrings.
Resulting substrings will be AAA and BBB
I wrote the below code to check whether a certin string exists in a text or no. The issue is that match() function always returns false even the pattern exists in the text.
int main(){
char *text="hello my name is plapla";
char *patt="my";
cout<<match(patt,text);
system("pause");
return 0;
}
bool match(char* patt,char* text){
int textLoc=0, pattLoc=0, textStart=0;
while(textLoc <= (int) strlen(text) && pattLoc <= (int)strlen(patt)){
if( *(patt+pattLoc) == *(text+textLoc) ){
textLoc= textLoc+1;
pattLoc= pattLoc+1;
}
else{
textStart=textStart+1;
textLoc=textStart;
pattLoc=0;
}
}
if(pattLoc > (int) strlen(patt))
return true;
else return false;
}
Try pattLoc < (int)strlen(patt) in your while loop.
Loop will stop when pattLoc == 2, so you avoid comparing the '\0' of "my" with the ' ' of "hello my name is pala", which set pattloc to 0 and return false.
Or better, use string substr.
The obvious solution is:
bool
match( std::string const& pattern, std::string const& text )
{
return std::search( text.begin(), text.end(),
pattern.begin(), pattern.end() )
!= text.end();
}
This is idiomatic C++, and the way I would expect any C++ programmer to
write it, at least in a professional environment.
If the goal is to learn how to write such a function, then of course,
the above isn't much of a solution. The solution then should be mroe
divide and conquer; there's much too much in match for you to put it
in one function. I'd recommend something like:
bool
startsWith( std::string::const_iterator begin,
std::string::const_iterator end,
std::string const& pattern )
{
return end - begin >= pattern.size()
&& std::equal( pattern.begin(), pattern.end(), begin );
}
bool
match( std::string const& pattern, std::string const& text )
{
std::string::const_iterator current = text.begin();
while ( current != text.end()
&& !startsWith( begin, text.end(), pattern ) ) {
++ current;
}
return current != text.end();
}
This can obviously be improved; for example, there's no point in
continuing in the while loop when the length of the remaining text is
less than the length of the pattern.
And if your professor insists on your using char const* (if he insists
on char*, then he's totally incompetent, and should be fired), this
can easily be rewritten to do so: just replace all calls to begin with
the pointer, and all calls to end with pointer + strlen(pointer).
I have solved the problem:
while(textLoc <= (int) strlen(text) && pattLoc <= (int)strlen(patt))
should be:
while(textLoc < (int) strlen(text) && pattLoc < (int)strlen(patt))
and
if(pattLoc > (int) strlen(patt))
to
if(pattLoc >= (int) strlen(patt))
well is there? by string i mean std::string
Here's a perl-style split function I use:
void split(const string& str, const string& delimiters , vector<string>& tokens)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
There's no built-in way to split a string in C++, but boost provides the string algo library to do all sort of string manipulation, including string splitting.
Yup, stringstream.
std::istringstream oss(std::string("This is a test string"));
std::string word;
while(oss >> word) {
std::cout << "[" << word << "] ";
}
STL strings
You can use string iterators to do your dirty work.
std::string str = "hello world";
std::string::const_iterator pos = std::find(string.begin(), string.end(), ' '); // Split at ' '.
std::string left(str.begin(), pos);
std::string right(pos + 1, str.end());
// Echoes "hello|world".
std::cout << left << "|" << right << std::endl;
void split(string StringToSplit, string Separators)
{
size_t EndPart1 = StringToSplit.find_first_of(Separators)
string Part1 = StringToSplit.substr(0, EndPart1);
string Part2 = StringToSplit.substr(EndPart1 + 1);
}
The answer is no. You have to break them up using one of the library functions.
Something I use:
std::vector<std::string> parse(std::string l, char delim)
{
std::replace(l.begin(), l.end(), delim, ' ');
std::istringstream stm(l);
std::vector<std::string> tokens;
for (;;) {
std::string word;
if (!(stm >> word)) break;
tokens.push_back(word);
}
return tokens;
}
You can also take a look at the basic_streambuf<T>::underflow() method and write a filter.
What the heck... Here's my version...
Note: Splitting on ("XZaaaXZ", "XZ") will give you 3 strings. 2 of those strings will be empty, and won't be added to theStringVector if theIncludeEmptyStrings is false.
Delimiter is not any element in the set, but rather matches that exact string.
inline void
StringSplit( vector<string> * theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
UASSERT( theStringVector, !=, (vector<string> *) NULL );
UASSERT( theDelimiter.size(), >, 0 );
size_t start = 0, end = 0, length = 0;
while ( end != string::npos )
{
end = theString.find( theDelimiter, start );
// If at end, use length=maxLength. Else use length=end-start.
length = (end == string::npos) ? string::npos : end - start;
if ( theIncludeEmptyStrings
|| ( ( length > 0 ) /* At end, end == length == string::npos */
&& ( start < theString.size() ) ) )
theStringVector -> push_back( theString.substr( start, length ) );
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size() );
}
}
inline vector<string>
StringSplit( const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
vector<string> v;
StringSplit( & v, theString, theDelimiter, theIncludeEmptyStrings );
return v;
}
There is no common way doing this.
I prefer the boost::tokenizer, its header only and easy to use.
C strings
Simply insert a \0 where you wish to split. This is about as built-in as you can get with standard C functions.
This function splits on the first occurance of a char separator, returning the second string.
char *split_string(char *str, char separator) {
char *second = strchr(str, separator);
if(second == NULL)
return NULL;
*second = '\0';
++second;
return second;
}
A fairly simple method would be to use the c_str() method of std::string to get a C-style character array, then use strtok() to tokenize the string. Not quite as eloquent as some of the other solutions listed here, but it's easy and works.