Padding string with whitespace sometimes breaks string iterator

Padding string with whitespace sometimes breaks string iterator - c++

I have this predicate function which compares two strings alphabetically, the strings being compared are human names so are of unequal length, to get round this the shorter string is padded with white space.
Problem:
I've tracked the bug down to the string padding...which appears to randomly break the string iterator:
ls += std::string( maxlen - ls.size(), ' ' ) ;
rs += std::string( maxlen - rs.size(), ' ' ) ;
Here is what the two string iterators look like after successful padding, as you can see they both point to their respective string as they should:
& here are the same string iterators further down the list of names being compared, as you can see riter is now pointing to 'ar5' not "Aaron Tasso" which I'm guessing is the cause of the error:
I've tried removing the name "Abraham Possinger" from the input but it throws the same error further down the list on another name.
Input:
Aaron Tasso
Aaron Tier
Abbey Wren
Abbie Rubloff
Abby Tomopoulos
Abdul Veith
Abe Lamkin
Abel Kepley
Abigail Stocker
Abraham Possinger
bool
alphanum_string_compare( const std::string& s, const std::string& s2 )
#pragma region CODE
{
// copy strings: ...for padding to equal length..if required?
std::string ls = s ;
std::string rs = s2 ;
// string iters
std::string::const_iterator liter = ls.begin() ;
std::string::const_iterator riter = rs.begin() ;
// find length of longest string
std::string::size_type maxlen = 0 ;
maxlen = std::max( ls.size(), rs.size() ) ;
// pad shorter of the 2 strings by attempting to pad both ;)
// ..only shorter string will be padded!..as the other string == maxlen
// ..possibly more efficient than finding and padding ONLY the shorter string
ls += std::string( maxlen - ls.size(), ' ' ) ;
rs += std::string( maxlen - rs.size(), ' ' ) ;
// init alphabet order map
static std::map<char, int> m = alphabet() ;
//std::map<char, int> m = alphabet();
while( liter != ls.end() && riter != rs.end() )
{
if ( m[ *liter ] < m[ *riter ] ) return true ;
if ( m[ *liter ] > m[ *riter ] ) return false ;
// move to next char
++liter ;
++riter ;
}
return false ;
}
#pragma endregion CODE

The problem is that you pad the strings after you assign the iterators.
// string iters
std::string::const_iterator liter = ls.begin() ;
std::string::const_iterator riter = rs.begin() ;
ls += std::string( maxlen - ls.size(), ' ' ) ; <----------- potentially invalidates iterator
rs += std::string( maxlen - rs.size(), ' ' ) ; <----------- potentially invalidates iterator
while( liter != ls.end() && riter != rs.end() ) { <--- using invalid iterator
if ( m[ *liter ] < m[ *riter ] ) return true ;
if ( m[ *liter ] > m[ *riter ] ) return false ;
// move to next char
++liter ;
++riter ;
}
return false ;
}
Your padding is unneeded if you check after the loop which has ended and return the correct value of true or false there.

The padding may invalidate the iterator when the underlying storage is reallocated on expansion.
You could fix this by retrieving the iterators after the padding, but the padding is unnecessary.
You just need to check where the iterators ended up - s is less than s2 if its iterator reached the end but the other one didn't.
bool
alphanum_string_compare( const std::string& s, const std::string& s2 )
{
static std::map<char, int> m = alphabet();
std::string::const_iterator left = s.begin();
std::string::const_iterator right = s2.begin();
while (left != s.end() && right != s2.end())
{
if (m[*left] < m[*right])
return true;
if (m[*left] > m[*right])
return false;
++left;
++right;
}
return left == s.end() && right != s2.end();
}

std::string is a dynamic object, when you modify it it is quite possible that its internal memory buffer will be re-allocated. At this point your "old" iterators point to a memory that was returned to heap (deleted). It's just the same as with most of containers, for example std::vector - you can copy a iterator to an arbitrary element, but once you add anything to the vector, your iterator may be no longer valid. Any Most "modifying" operation invalidate iterators to such objects.

I don't think it's necessary to pad with whitespace if you're just going to see which name comes first in alphabetical order. One idea could be: check which character is smallest each time around the loop, if one character is smaller than the other, return that string. Example:
string StrCompare(const string& s1, const string& s2)
{
string::size_type len = (s1.length() < s2.length() ? s1.length() : s2.length());
for (auto i = 0; i != len; ++i) {
if (s1[i] < s2[i])
return s1;
else if (s2[i] < s1[i])
return s2;
else
;// do nothing
}
}
main()
string str = StrCompare("Aaron Tasso", "Aaron Tier");
cout << str;
Output: Aaron Tasso

Related

Extract information from std::string

Too many string related queries yet some doubt remains, for each string is different and each requirement is different too.
I have a single string in this form:
Random1A:Random1B::String1 Random2A:Random2B::String2 ... RandomNA:RandomNB::StringN
And I want to get back a single string in this form:
String1 String2 ... StringN
In short, the input string would look like A:B::Val1 P:Q::Val2, and o/p result string would look like "Val1 Val2".
PS: Randoms and Strings are small (variable) length alphanumeric strings.
std::string GetCoreStr ( std::string inputStr, int & vSeqLen )
{
std::string seqStr;
std::string strNew;
seqStr = inputStr;
size_t firstFind = 0;
while ( !seqStr.empty() )
{
firstFind = inputStr.find("::");
size_t lastFind = (inputStr.find(" ") < inputStr.length())? inputStr.find(" ") : inputStr.length();
strNew += inputStr.substr(firstFind+2, lastFind-firstFind-1);
vSeqStr = inputStr.erase( 0, lastFind+1 );
}
vSeqLen = strNew.length();
return strNew;
}
I want to get back a single string String1 String2 ... StringN.
My code works and I get result of my choice, but it is not an optimal form. I want help in improving the code quality.
I ended up doing it the C-way.
std::string GetCoreStr ( const std::string & inputStr )
{
std::string strNew;
for ( int i = 0; i < inputStr.length(); ++i )
{
if ( inputStr[i] == ':' && inputStr[i + 1] == ':' )
{
i += 2;
while ( ( inputStr[i] != ' ' && inputStr[i] != '\0' ) )
{
strNew += inputStr[i++];
}
if ( inputStr[i] == ' ' )
{
strNew += ' ';
}
}
}
return strNew;
}

I am having trouble deciding on how to adjust the offset. [...]
std::string getCoreString(std::string const& input)
{
std::string result;
// optional: avoid reallocations:
result.reserve(input.length());
// (we likely reserved too much – if you have some reliable hint how many
// input parts we have, you might subtract appropriate number)
size_t end = 0;
do
{
size_t begin = input.find("::", end);
// added check: double colon not found at all...
if(begin == std::string::npos)
break;
// single character variant is more efficient, if you need to find just such one:
end = std::min(input.find(' ', begin) + 1, input.length());
result.append(input.begin() + begin + 2, input.begin() + end);
}
while(end < input.length());
return result;
}
Side note: you do not need the additional 'length' output parameter; it's redundant, as the returned string contains the same value...

Vector's of unsigned char iterators not working

I wanna to cut CRLF at end of the vector, but my code is not working (at first loop of while - equal is calling and returns false). In debug mode "i" == 0 and have "ptr" value == "0x002e4cfe"
string testS = "\r\n\r\n\r\n<-3 CRLF Testing trim new lines 3 CRLF->\r\n\r\n\r\n";
vector<uint8> _data; _data.clear();
_data.insert(_data.end(), testS.begin(), testS.end());
vector<uint8>::iterator i = _data.end();
uint32 bytesToCut = 0;
while(i != _data.begin()) {
if(equal(i - 1, i, "\r\n")) {
bytesToCut += 2;
--i; if(i == _data.begin()) return; else --i;
} else {
if(bytesToCut) _data.erase(_data.end() - bytesToCut, _data.end());
return;
}
}
Thanks a lot for your answers. But i need version with iterators, because my code is used when i parsing chunked http transfering data, which is writed to vector and i need func, which would take a pointer to a vector and iterator defining the position to remove CRLF backwards. And all my problems, i think, apparently enclosed in iterators.

Your code is invalid at least due to setting incorrect range in algorithm std::equal
if(equal(i - 1, i, "\r\n")) {
In this expression you compare only one element of the vector pointed by iterator i - 1 with '\r'. You have to write something as
if(equal(i - 2, i, "\r\n")) {
If you need to remove pairs "\r\n" from the vector then I can suggest the following approach (I used my own variable names and included testing output):
std::string s = "\r\n\r\n\r\n<-3 CRLF Testing trim new lines 3 CRLF->\r\n\r\n\r\n";
std::vector<unsigned char> v( s.begin(), s.end() );
std::cout << v.size() << std::endl;
auto last = v.end();
auto prev = v.end();
while ( prev != v.begin() && *--prev == '\n' && prev != v.begin() && *--prev == '\r' )
{
last = prev;
}
v.erase( last, v.end() );
std::cout << v.size() << std::endl;

instead if re inventing th wheel you can the existing STL algo with something like:
std::string s;
s = s.substr(0, s.find_last_not_of(" \r\n"));

If you need to just trim '\r' & '\n' from the end then simple substr will do:
std::string str = "\r\n\r\n\r\nSome string\r\n\r\n\r\n";
size_t newLength = str.length();
while (str[newLength - 1] == '\r' || str[newLength - 1] == '\n') newLength--;
str = str.substr(0, newLength);
std::cout << str;
Don't sweat small stuff :)
Removing all '\r' and '\n' could be simple as (C++03):
#include <iostream>
#include <string>
#include <algorithm>
int main() {
std::string str = "\r\n\r\n\r\nSome string\r\n\r\n\r\n";
str.erase(std::remove(str.begin(), str.end(), '\r'), str.end());
str.erase(std::remove(str.begin(), str.end(), '\n'), str.end());
std::cout << str;
}
or:
bool isUnwantedChar(char c) {
return (c == '\r' || c == '\n');
}
int main() {
std::string str = "\r\n\r\n\r\nSome string\r\n\r\n\r\n";
str.erase(std::remove_if(str.begin(), str.end(), isUnwantedChar), str.end());
std::cout << str;
}

First of all, your vector initialization is ... non-optimal. All you needed to do is:
string testS = "\r\n\r\n\r\n<-3 CRLF Testing trim new lines 3 CRLF->\r\n\r\n\r\n";
vector<uint8> _data(testS.begin(), testS.end());
Second, if you wanted to remove the \r and \n characters, you could have done it in the string:
testS.erase(std::remove_if(testS.begin(), testS.end(), [](char c)
{
return c == '\r' || c == '\n';
}), testS.end());
If you wanted to do it in the vector, it is the same basic process:
_data.erase(std::remove_if(_data.begin(), _data.end(), [](uint8 ui)
{
return ui == static_cast<uint8>('\r') || ui == static_cast<uint8>('\n');
}), _data.end());
Your problem is likely due to the usage of invalidated iterators in your loop (that has several other logical issues, but since it shouldn't exist anyway, I won't touch on) that removes elements 1-by-1.
If you wanted to remove the items just from the end of the string/vector, it would be slightly different, but still the same basic pattern:
int start = testS.find_first_not_of("\r\n", 0); // finds the first non-\r\n character in the string
int end = testS.find_first_of("\r\n", start); // find the first \r\n character after real characters
// assuming neither start nor end are equal to std::string::npos - this should be checked
testS.erase(testS.begin() + end, testS.end()); // erase the `\r\n`s at the end of the string.
or alternatively (if \r\n can be in the middle of the string as well):
std::string::reverse_iterator rit = std::find_if_not(testS.rbegin(), testS.rend(), [](char c)
{
return c == '\r' || c == '\n';
});
testS.erase(rit.base(), testS.end());

Extract substrings of a filename

In C/C++, how can I extract from c:\Blabla - dsf\blup\AAA - BBB\blabla.bmp the substrings AAA and BBB ?
i.e. extract the parts before and after - in the last folder of a filename.
Thanks in advance.
(PS: if possible, with no Framework .net or such things, in which I could easily get lost)

#include <iostream>
using namespace std;
#include <windows.h>
#include <Shlwapi.h> // link with shlwapi.lib
int main()
{
char buffer_1[ ] = "c:\\Blabla - dsf\\blup\\AAA - BBB\\blabla.bmp";
char *lpStr1 = buffer_1;
// Remove the file name from the string
PathRemoveFileSpec(lpStr1);
string s(lpStr1);
// Find the last directory name
stringstream ss(s.substr(s.rfind('\\') + 1));
// Split the last directory name into tokens separated by '-'
while (getline(ss, s, '-'))
cout << s << endl;
}
Explanation in comments.
This doesn't trim leading spaces - in the output - if you also want to do that - check this.

This can relatively easily be done with regular expressions:
std::regex if you have C++11; boost::regex if you don't:
static std::regex( R"(.*\\(\w+)\s*-\s*(\w+)\\[^\\]*$" );
smatch results;
if ( std::regex_match( path, results, regex ) ) {
std::string firstMatch = results[1];
std::string secondMatch = results[2];
// ...
}
Also, you definitely should have the functions split and
trim in toolkit:
template <std::ctype_base::mask test>
class IsNot
{
std::locale ensureLifetime;
std::ctype<char> const* ctype; // Pointer to allow assignment
public:
Is( std::locale const& loc = std::locale() )
: ensureLifetime( loc )
, ctype( &std::use_facet<std::ctype<char>>( loc ) )
{
}
bool operator()( char ch ) const
{
return !ctype->is( test, ch );
}
};
typedef IsNot<std::ctype_base::space> IsNotSpace;
std::vector<std::string>
split( std::string const& original, char separator )
{
std::vector<std::string> results;
std::string::const_iterator current = original.begin();
std::string::const_iterator end = original.end();
std::string::const_iterator next = std::find( current, end, separator );
while ( next != end ) {
results.push_back( std::string( current, next ) );
current = next + 1;
next = std::find( current, end, separator );
}
results.push_back( std::string( current, next ) );
return results;
}
std::string
trim( std::string const& original )
{
std::string::const_iterator end
= std::find_if( original.rbegin(), original.rend(), IsNotSpace() ).base();
std::string::const_iterator begin
= std::find_if( original.begin(), end, IsNotSpace() );
return std::string( begin, end );
}
(These are just the ones you need here. You'll obviously want
the full complement of IsXxx and IsNotXxx predicates, a split
which can split according to a regular expression, a trim which
can be passed a predicate object specifying what is to be
trimmed, etc.)
Anyway, the application of split and trim should be obvious
to give you what you want.

This does all the work and validations in plain C:
int FindParts(const char* source, char** firstOut, char** secondOut)
{
const char* last = NULL;
const char* previous = NULL;
const char* middle = NULL;
const char* middle1 = NULL;
const char* middle2 = NULL;
char* first;
char* second;
last = strrchr(source, '\\');
if (!last || (last == source))
return -1;
--last;
if (last == source)
return -1;
previous = last;
for (; (previous != source) && (*previous != '\\'); --previous);
++previous;
{
middle = strchr(previous, '-');
if (!middle || (middle > last))
return -1;
middle1 = middle-1;
middle2 = middle+1;
}
// now skip spaces
for (; (previous != middle1) && (*previous == ' '); ++previous);
if (previous == middle1)
return -1;
for (; (middle1 != previous) && (*middle1 == ' '); --middle1);
if (middle1 == previous)
return -1;
for (; (middle2 != last) && (*middle2 == ' '); ++middle2);
if (middle2 == last)
return -1;
for (; (middle2 != last) && (*last == ' '); --last);
if (middle2 == last)
return -1;
first = (char*)malloc(middle1-previous+1 + 1);
second = (char*)malloc(last-middle2+1 + 1);
if (!first || !second)
{
free(first);
free(second);
return -1;
}
strncpy(first, previous, middle1-previous+1);
first[middle1-previous+1] = '\0';
strncpy(second, middle2, last-middle2+1);
second[last-middle2+1] = '\0';
*firstOut = first;
*secondOut = second;
return 1;
}

The plain C++ solution (without boost, nor C++11), still the regex solution of James Kanze (https://stackoverflow.com/a/16605408/1032277) is the most generic and elegant:
inline void Trim(std::string& source)
{
size_t position = source.find_first_not_of(" ");
if (std::string::npos != position)
source = source.substr(position);
position = source.find_last_not_of(" ");
if (std::string::npos != position)
source = source.substr(0, position+1);
}
inline bool FindParts(const std::string& source, std::string& first, std::string& second)
{
size_t last = source.find_last_of('\\');
if ((std::string::npos == last) || !last)
return false;
size_t previous = source.find_last_of('\\', last-1);
if (std::string::npos == last)
previous = -1;
size_t middle = source.find_first_of('-',1+previous);
if ((std::string::npos == middle) || (middle > last))
return false;
first = source.substr(1+previous, (middle-1)-(1+previous)+1);
second = source.substr(1+middle, (last-1)-(1+middle)+1);
Trim(first);
Trim(second);
return true;
}

Use std::string rfind rfind (char c, size_t pos = npos)
Find character '\' from the end using rfind (pos1)
Find next character '\' using rfind (pos2)
Get the substring between the positions pos2 and pos1. Use substring function for that.
Find character '-' (pos3)
Extract 2 substrings between pos3 and pos1, pos3 and pos2
Remove the spaces in the substrings.
Resulting substrings will be AAA and BBB

Finding all wanted words in a string

I have a string which is too long, I want to find and locate all of the wanted words. For example I want to find the locations of all "apple"s in the string. Can you tell me how I do that?
Thanks

Apply repeatedly std::string::find if you are using C++ strings, or std::strstr if you are using C strings; in both cases, at each iteration start to search n characters after the last match, where n is the length of your word.
std::string str="one apple two apples three apples";
std::string search="apple";
for(std::string::size_type pos=0; pos<str.size(); pos+=search.size())
{
pos=str.find(search, pos);
if(pos==std::string::npos)
break;
std::cout<<"Match found at: "<<pos<<std::endl;
}
(link)

Use a loop which repeatedly calls std::string::find; on each iteration, you start finding beyond your last hit:
std::vector<std::string::size_type> indicesOf( const std::string &s,
const std::string &needle )
{
std::vector<std::string::size_type> indices;
std::string::size_type p = 0;
while ( p < s.size() ) {
std::string::size_type q = s.find( needle, p );
if ( q == std::string::npos ) {
break;
}
indices.push_back( q );
p = q + needle.size(); // change needle.size() to 1 for overlapping matches
}
return indices;
}

void findApples(const char* someString)
{
const char* loc = NULL;
while ((loc = strstr(someString, "apple")) != NULL) {
// do something
someString = loc + strlen("apple");
}
}

Preventing code duplication in and outside of a loop

I have a problem rewriting a loop:
else if( "d" == option || "debug" == option )
{
debug(debug::always) << "commandline::set_internal_option::setting debug options: "
<< value << ".\n";
string::size_type index = 0;
do
{
const string::size_type previous_index = index+1;
index=value.find( ',', index );
const string item = value.substr(previous_index, index);
debug::type item_enum;
if( !map_value(lib::debug_map, item, item_enum) )
throw lib::commandline_error( "Unknown debug type: " + item, argument_number );
debug(debug::always) << "commandline::set_internal_option::enabling " << item
<< " debug output.\n";
debug(debug::always) << "\n-->s_level=" << debug::s_level << "\n";
debug::s_level = static_cast<debug::type>(debug::s_level ^ item_enum);
debug(debug::always) << "\n-->s_level=" << debug::s_level << "\n";
} while( index != string::npos );
}
value is something like string("commandline,parser") and the problem is that in the first run, I need substr(previous_index, index), but in every subsequent iteration I need substr(previous_index+1, index) to skip over the comma. Is there some easy way I'm overlooking or will I have to repeat the call to find outside the loop for the initial iteration?

Since your goal is to prevent code duplication:
std::vector<std::string> v;
boost::split(v, value, [](char c) { c == ','; });
If you want to create your own split function, you can do something like this:
template<typename PredicateT>
std::vector<std::string> Split(const std::string & in, PredicateT p)
{
std::vector<std::string> v;
auto b = in.begin();
auto e = b;
do {
e = std::find_if(b, in.end(), p);
v.emplace_back(b,e);
b = e + 1;
} while (e != in.end());
return v;
}

Why not update previous_index after taking the substr?
string::size_type index = 0;
string::size_type previous_index = 0;
do {
index=value.find( ',', previous_index );
const string item = value.substr(previous_index, index);
previous_index = index+1;
} while( index != string::npos );
Unchecked, but this should do the trick (with only one more word of memory).

Start at -1?
string::size_type index = -1;
do
{
const string::size_type previous_index = index + 1;
index=value.find(',', previous_index);
const string item = value.substr(previous_index, index - previous_index);
} while( index != string::npos );

A stupid (and somewhat unreadable) solution would be something like this:
string::size_type once = 0;
/* ... */
const string::size_type previous_index = index+1 + (once++ != 0); // or !!once

First, I think there's a small error:
In your code, the expression index=value.find( ',', index ); doesn't change the value of index if it already is the index of a comma character within the string (which is always the case except for the first loop iteration).
So you might want to replace while( index != string::npos ); with while( index++ != string::npos ); and previous_index = index+1 with previous_index = index.
This should also solve your original problem.
To clarify:
string::size_type index = 0;
do
{
const string::size_type previous_index = index;
index = value.find( ',', index );
const string item = value.substr(previous_index, index - previous_index);
} while( index++ != string::npos );

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Padding string with whitespace sometimes breaks string iterator - c++

Related

Extract information from std::string

Vector's of unsigned char iterators not working

Extract substrings of a filename

Finding all wanted words in a string

Preventing code duplication in and outside of a loop

Categories

Resources