Removing strings from a string vector, from a substring - c++

I am implementing the unit clause propagation algorithm into c++. I have read in the CNF file to a vector with each clause in individual element of a vector, so for example
1 2 0
1 2 3 0
1 0
3 4 0
So far I am able to isolate individual elements and set them as a string, so in this example i would set the string to be "1".
The next step would be to remove all elements in the vector which contain a 1, so in this example the 1st, 2nd and 3rd elements would be removed. However when i run the vector remove command
clauses.erase(std::remove(clauses.begin(), clauses.end(), "1"), clauses.end());
It will only remove elements which are exactly "1", not the elements which contain a 1 as well as other characters. Is there anyway to remove any element of a vector which contains the string?
(I hope this makes sense, thank you for your time)

Use std::remove_if and search for a 1 in the string (live example):
clauses.erase(
std::remove_if(clauses.begin(), clauses.end(),
[](const std::string &s) {return s.find('1') != std::string::npos;}
),
clauses.end()
);
If you don't have C++11 for the lambda, a normal function, or functor, or Boost lambda, or whatever floats your boat, will work as well.

Related

Using remove and then erase to delete elements of container

I am writing this as I am very confused. Am I dreaming or have I always been wrong to what remove does?
Suppose we have a vector of integers:
std::vector<int> v {1, 2, 3, 1, 4, 1}
We want to remove all the number 1 from the vector so:
auto it {std::remove(v.begin(), v.end(), 1)};
Now our container will look like this:
2 3 4 1 1 1 with the iterator it pointing on index 3
After that we use this:
v.erase(it, v.end()); to remove all the elements from the iterator to the end of the vector, corresponding to all the 1s. Isnt this how you remove the elements or am I delusional?
What I was surprised about is that we dont need to add erase. Remove will remove all elements. My professor told me to use erase from the iterator to the end of the vector to delete all the elements after performing remove.
Isnt this how you remove the elements
Yes, remove and erase are used as you described in paragraphs preceding the last one.
Now our container will look like this: 2 3 4 1 1 1
Not quite. The container will look like this: 2 3 4 X X X. Where values X are unspecified. The algorithm doesn't need to bother overwriting the elements in the "removed" partition.
What I was surprised about is that we dont need to add erase.
You don't need to erase if it's fine for the residual elements to remain in the vector. If you do need to erase the elements, you can achieve that by erasing them.
Remove will remove all elements.
As per your earlier description, remove will only move the non-removed elements to the left partition. It will not erase any elements from the container. The size of the container won't change.
P.S. instead of remove-erase idiom, you can use std::erase:
std::erase(v, 1);

Don't understand results of std::remove in C++ STL

I was reading Josuttis "The C++ Standard Library, 2nd ed.". In section 6.7.1 author explains that the code given below will give unexpected results. I still don't how std::remove() functions, and why I am getting this strange result. (Though I understood that you need to use std::erase() in order to actually remove elements, and it is actually better to use list::erase() rather than combination of std::remove() & `std::remove()).
list<int> coll;
// insert elements from 6 to 1 and 1 to 6
for (int i=1; i<=6; ++i) {
coll.push_front(i);
coll.push_back(i);
}
// print
copy (coll.cbegin(), coll.cend(), // source
ostream_iterator<int>(cout," ")); // destination
cout << endl;
// remove all elements with value 3
remove (coll.begin(), coll.end(), // range
3); // value
// print (same as above)
and the results are
pre: 6 5 4 3 2 1 1 2 3 4 5 6
post: 6 5 4 2 1 1 2 4 5 6 5 6 (???)
This explanation should help:
Removing is done by shifting the elements in the range in such a way
that elements to be erased are overwritten. Relative order of the
elements that remain is preserved and the physical size of the
container is unchanged. Iterators pointing to an element between the
new logical end and the physical end of the range are still
dereferenceable, but the elements themselves have unspecified values.
A call to remove is typically followed by a call to a container's
erase method, which erases the unspecified values and reduces the
physical size of the container to match its new logical size.
Note that the return value from std::remove() is the iterator that represents the new end. Therefore, calling std::erase() on this new end and the old end will free your excess space.
std::remove doesn't actually shorten the list. It can't - as it only gets iterators and not the container itself.
What it does is copies the remaining values so that you get them in the beginning of the container. But the final elements of the container (in your case - the last two: '5' and '6') are actually still there..
After using std::remove you have to shorten to container yourself to remove the remaining "junk" copies.
You asked the algorithm to remove "3" element. So, while enumerating the container the algo shifts the content if something is removed from the middle. Such shift occurs 2 times in your case, this is why you see "5 6" elements at the end (because actual end was moved to 2 items forward). Then, "std::erase" will fix the issue with tail zombies.
To quote from everyone's favorite c++ website:
The function cannot alter the properties of the object containing the
range of elements (i.e., it cannot alter the size of an array or a
container): The removal is done by replacing the elements that compare
equal to val by the next element that does not, and signaling the new
size of the shortened range by returning an iterator to the element
that should be considered its new past-the-end element.
So std::remove doesn't change the size of the list. It removes the matching elements and returns you an iterator that represents the new end of the list. To actually erase the extraneous elements, you then need to do:
auto it = remove(coll.begin(), coll.end(), 3);
coll.erase(it, coll.end());

Iterator arithmetic

I'm working right now with iterators arithmetic operations and stack on small problem .
I need to make a Sum of first and last element of vector<int> followed by second and last element of vector<int> , third and last element of vector<int>
Example:
Input numbers by user
1 2 3 4 5 6 7 8 9
Output should be
10 11 12 13 14 15 16 17
In general the code should do addition like that
1+9 2+9 3+9 4+9 5+9 6+9 7+9 ......
So basically i need the actual code for this arithmetic operation using iterator with member functions *.begin() , *.end() only ! I've try many ways but nothing coming in my head how to do this operation only with .begin() and .end() . I found other member functions but this functions is explained in STD library, not in basic knowledge level. So i need help to make code with only begin() and end() member functions if possible.
Code i got so far
int main()
{
vector<int> numset;
int num_input;
auto beg=numset.begin(), end=numset.end();
while (cin>>num_input)
{
numset.push_back(num_input);
}
for (auto it = numset.begin()+1; it !=numset.end(); ++it)
{
// *it=*it+1+nuset.end(); -- Wrong X
// *it+=(end-beg)/2; -- Totally wrong(and totally stupid) X
// *it + numset.back() -- can't use other member functions X
//////// I've stack here dont know what code need //////
cout<<*it<<endl;
}
Thank you for your time.
The operation you perform is *it+*(it-1). (It might help to add more parentheses and spaces in your code.) That adds two adjacent elements from the sequence.
The last element in the sequence is numset.back(). So try *it + numset.back() instead. And there's no need to start with the second element, since you do want to print the sum of the first and last elements. If you don't want to print the sum of the last element with itself, you should stop at end() - 1, though.

Searching For Elements With Multiple Matches

I have a vector of Key-Value pairs, where each Key-Value pair is also tagged with an Entry Type code. The possible Entry Type codes are:
enum Type
{
tData = 0,
tSeqBegin = 1, // the beginning of a sequence
tSeqEnd = 2 // the end of a sequence
};
So the Key-Value pair itself looks like this:
struct KeyVal
{
int key_;
string val_;
Type type_;
};
Within the vector are sub-arrays of additional Key-Value pairs. These sub-arrays are called 'sequences'. Sequences can be nested to any level. So sequences can themselves have (optional) sub-sequences of varying lengths. The combination of a Key and Type is unique within a sequence element. That is, within a single sequence element there can only be one 269 data row, but other sequence elements can have their own 269 data rows.
Here is a graphical representation of some sample data, grossly oversimplified (If the 'Type' column is blank, it is of type tData):
Row# Type Key Value
---- ------------- ----- --------
1 35 "W"
2 1181 "IBM"
3 tSeqBegin 268 "3"
4 269 "0"
5 270 "160.3"
6 tSeqEnd 0
7 269 "0"
8 290 "0"
9 tSeqBegin 453 "1" <-- subsequence
10 tSeqEnd 0 <-- end of subsequence
11 tSeqEnd 0
12 269 "0"
13 290 "1"
14 270 "160.4"
15 tSeqEnd 0
16 1759 "ABC"
[EDIT: A note on the above. There is one tSeqBegin that marks the beginning of the whole sequence. The end of each sequence element is marked by a tSeqEnd. But there is no special tSeqEnd that also marks the end of the whole sequence. So for a sequence you will see 1 tSeqBegin and n tSeqEnds, where n is the number of elements within the sequence.
Another note, in the above sequence beginning at row #3 and ending at row #15, there is one subsequence in the 2nd element (rows 7-11). The subsequence is empty, and occupies rows 9 and 10.]
What I'm trying to do is find a sequence element which has multiple Key-Value matches to certain criteria. For example, suppose I want to find the sequence element that has both 269="0" and 290="0". In this case, it should not find element #0 (starting at row 3) because that element doesn't have a 290=... row at all. It should find the element starting at row #7 instead. Ultimately I will extract other fields from this element, but that's beyond the scope of this problem, so I haven't included that data above.
I can't use std::find_if() because find_if() will evaluate each row individually, not the whole sequence element as a unit. So I can't construct a functor that evaluates something like if 269=="0" &&* 290=="0" because no single row will ever evaluate this to true.
I had thought to implement my own find_sequence_element(...) function. But this would involve some fairly complex logic. First I would have to identify the begin() and end() of the entire sequence, noting where each element begin()'s and end()'s. Then I would have to construct some kind of evaluation structure that I could string together like this psudocode:
Condition cond = KeyValueMatch(269, "0") + KeyValueMatch(290, "0");
But this is also complex. I can't just construct a find_sequence_element() that takes exactly 2 parameters, one for the 269 match and another for the 290 match, because I want to use this algorithm for other sequences as well, with more or fewer conditions.
Moreover, it seems like I should be able to use the STL <algorithm>'s that already exist. While I know the STL rather well, I can't figure out a way to use find_if() in any straightforward way.
So, finally, here's the question. If you were faced with the above problem, how would you solve it? I know the question is vague. I'm hoping that with some discussion we can narrow the problem domain down until we have an answer.
Some conditions:
I cannot change the single flat vector to a vector of vectors or anything of the like. The reasons for this are complex.
(Placeholder for more conditions :) )
(If consensus is that this should be CW, I will mark it as such)
I would want to process in an online fashion. Have a type which tracks:
where the current sequence started
a count how many requirements have been met so far by the current sequence.
In your example requirements could be represented as a map<int,string>. In general they could be a sequence of binary predicates, or something polymorphic if you need to use different functors for different conditions in the same set, and for efficiency progress could be represented as a sequence of booleans, "has this predicate been met yet?"
When you see a tSeqEnd you clear the set of met requirements and start again. If your count hits the number of requirements, you're done.
The simplest case is that all predicates specify the key value, and hence only match once. It might look something like:
template<typename DataIterator, typename PredIterator>
DataIterator find_matching_sequence(
DataIterator dataFirst,
DataIterator dataLast,
PredIterator predFirst,
PredIterator predLast) {
DataIterator sequence_start = dataFirst;
size_t required = std::distance(predFirst, predLast);
size_t sofar = 0;
while (dataFirst != dataLast) {
if (dataFirst->type == SeqEnd) {
count = 0;
++dataFirst;
sequence_start = dataFirst;
continue;
}
sofar += std::count(predFirst, predLast, Matches(*dataFirst));
if (sofar == required) return sequence_start;
++dataFirst;
}
}
If the same predicate could match multiple rows in a subsequence, then you can use a vector<bool> instead of a count, or possibly a valarray<bool>.
To cope with multiply-nested sub-sequences, you actually need a stack of "how am I doing" records, and you might be able to implement that by the function recursively calling itself, and returning early if it sees enough "end" records to know that it has reached the end of its outermost sequence. But I don't really understand that part of the data format.
So no serious use of STL algorithms, unless you want to std::copy your initial range into an output iterator that performs the online processing ;-)
Hoping I understand your setup correctly, I would proceed as a two-step fashion, nesting search algorithms along the lines of:
template<typename It, typename Pr>
It find_sequence_element ( It begin, It end, Pr predicate );
except that Pr here is a predicate that takes a sequence and returns if that sequence matches, yes or no. An example for a single match could be:
class HasPair
{
int key_; string value_;
public:
Hasmatch ( int key, string value);
template<typename It>
bool operator() ( It begin, It end ) const {
return (std::find_if(begin, end, item_predicate(key_, value_));
}
};
Where item_predicate() is suitable to find the (key_,value_) pair in [begin,end).
If you're interested in finding a sequence with two pairs, write a HasPairs predicate that invokes std::find_if twice, or some more optimized version of a search for two elements.

Weird behaviour with vector::erase and std::remove_if with end range different from vector.end()

I need to remove elements from the middle of a std::vector.
So I tried:
struct IsEven {
bool operator()(int ele)
{
return ele % 2 == 0;
}
};
int elements[] = {1, 2, 3, 4, 5, 6};
std::vector<int> ints(elements, elements+6);
std::vector<int>::iterator it = std::remove_if(ints.begin() + 2, ints.begin() + 4, IsEven());
ints.erase(it, ints.end());
After this I would expect that the ints vector have: [1, 2, 3, 5, 6].
In the debugger of Visual studio 2008, after the std::remove_if line, the elements of ints are modified, I'm guessing I'm into some sort of undefined behaviour here.
So, how do I remove elements from a Range of a vector?
Edit: Sorry, the original version of this was incorrect. Fixed.
Here's what's going on. Your input to remove_if is:
1 2 3 4 5 6
^ ^
begin end
And the remove_if algorithm looks at all numbers between begin and end (including begin, but excluding end), and removes all elements between that match your predicate. So after remove_if runs, your vector looks like this
1 2 3 ? 5 6
^ ^
begin new_end
Where ? is a value that I don't think is deterministic, although if it's guaranteed to be anything it would be 4. And new_end, which points to the new end of the input sequence you gave it, with the matching elements now removed, is what is returned by std::remove_if. Note that std::remove_if doesn't touch anything beyond the subsequence that you gave it. This might make more sense with a more extended example.
Say that this is your input:
1 2 3 4 5 6 7 8 9 10
^ ^
begin end
After std::remove_if, you get:
1 2 3 5 7 ? ? 8 9 10
^ ^
begin new_end
Think about this for a moment. What it has done is remove the 4 and the 6 from the subsequence, and then shift everything within the subsequence down to fill in the removed elements, and then moved the end iterator to the new end of the same subsequence. The goal is to satisfy the requirement that the (begin, new_end] sequence that it produces is the same as the (begin, end] subsequence that you passed in, but with certain elements removed. Anything at or beyond the end that you passed in is left untouched.
What you want to get rid of, then, is everything between the end iterator that was returned, and the original end iterator that you gave it. These are the ? "garbage" values. So your erase call should actually be:
ints.erase(it, ints.begin()+4);
The call to erase that you have just erases everything beyond the end of the subsequence that you performed the removal on, which isn't what you want here.
What makes this complicated is that the remove_if algorithm doesn't actually call erase() on the vector, or change the size of the vector at any point. It just shifts elements around and leaves some "garbage" elements after the end of the subsequence that you asked it to process. This seems silly, but the whole reason that the STL does it this way is to avoid the problem with invalidated iterators that doublep brought up (and to be able to run on things that aren't STL containers, like raw arrays).
Erasing elements in std::vector invalidates iterators past the removed element, so you cannot use "foreign" functions that accept ranges. You need to do that in a different way.
EDIT:
In general, you can use the fact that erasing one element "shifts" all elements at further positions one back. Something like this:
for (size_t scan = 2, end = 4; scan != end; )
{
if (/* some predicate on ints[scan] */)
{
ints.erase (ints.begin () + scan);
--end;
}
else
++scan;
}
Note that std::vector isn't suited for erasing elements in the middle. You should consider something else (std::list?) if you do that often.
EDIT 2:
As clarified by comments, first paragraph is not true. In such case std::remove_if should be more efficient than what I suggested in the first edit, so disregard this answer. (Keeping it for the comments.)
The behavior isn't weird - you're erasing the wrong range. std::remove_if moves elements it "removes" to the end of the input range. In this case, what you're looking for would be to do:
ints.erase(it, ints.begin() + 4 /* your end of range */);
From C++ in a Nutshell:
The remove_if function template
"removes" items for which pred returns
false from the range [first, last).
The return value is one past the new
end of the range. The relative order
of items that are not removed is
stable.
Nothing is actually erased from the
underlying container; instead, items
to the right are assigned to new
positions so they overwrite the
elements for which pred returns false.
See Figure 13-13 (under remove_copy)
for an example of the removal process.