I'd like to rangify the following code, which checks for the first occurance of a sequence of unique characters:
bool hasOnlyUniqueElements( auto& data ) {
std::unordered_set<char> set;
for( auto& value : data )
set.emplace( value );
return set.size() == data.size();
}
int64_t getStartPacketMarker( const std::string& data, int64_t markerSize ) {
for( int64_t i = 0; i < data.size() - markerSize; i++ )
{
std::string_view packet( data.begin() + i, data.begin() + i + markerSize );
if( hasOnlyUniqueElements( packet ) )
return i + markerSize;
}
return -1;
}
I came up with the following, that uses ranges but is only marginally better:
int64_t getStartPacketMarker( const std::string& data, int64_t markerSize ) {
int64_t idx = 0;
for( auto packet : data | ranges::views::sliding( markerSize ) ) {
if( hasOnlyUniqueElements( packet ) )
return idx + markerSize;
idx++;
}
return -1;
}
This should be a simple find operation, but I couldn't make it work and couldn't find any examples on find being used on views. Is it possible to use find on views?
Yes, you can use find on views. However, in your case, you should use find_if since you are checking against a predicate function:
auto view = data | std::views::slide(markerSize);
auto it = std::ranges::find_if(
view, somePredicate
);
return it == view.end() ? -1 : it - view.begin();
However, since your predicate function has an auto-deduced parameter, you can't get the function pointer of it directly, and you would need to wrap it in a lambda instead:
auto view = data | std::views::slide(markerSize);
auto it = std::ranges::find_if(
view, [](const auto& v) { return hasOnlyUniqueElements(v); }
);
return it == view.end() ? -1 : it - view.begin();
Besides using std::ranges::find_if on the range you could skip the ´for´ loop that builds the set in hasOnlyUniqueElements using std::unique:
auto set = data;
std::unique(std::sort(set));
Related
I declared a class like this:
class myclass{
array myarr;
pair<PT3D,PT3D> mypair;
ID myid;
CString tag;
}
after fill data, i have a vector<myclass> myvec;
How can I sort this vector based on the CString value of each class?
bool SortCompare(const wchar_t* a, const wchar_t* b)
{
if (wcslen(a) == wcslen(b))
{
int k = (int)wcslen(a);
return std::wcsncmp(a, b, k);
}
else
return wcslen(a) < wcslen(b);
}
sort(myvec.begin(), myvec.end(), [h](myclass& a,
myclass& b) {
return SortCompare(a.tag, b.tag);
});
However this doesn't work properly , my desired result is a list sorted by human sort as this post.
but I don't know how to use CSortStringArray::CompareAndSwap function, how to use this code with std::sort?
How can I correct my codes?
Note CString provides less operator so you do not have to fall back to C-API like wcsncmp.
You are making this overcomplicated.
using C++11:
std::sort(myvec.begin(), myvec.end(), [](const myclass& a, const myclass& b) {
return a.tag < b.tag;
});
C++20 has even something simpler called projection:
std::ranges::sort(myvec, {}, &myclass::tag);
Now the issue of alphanumeric compare. This can look like this:
struct AlphanumericSplitResult {
std::string_view prefix;
std::string_view digits;
std::string_view suffix;
};
AlphanumericSplitResult alphanumericSplit(std::string_view s)
{
auto i = s.find_first_of("0123456789"sv);
i = i == std::string_view::npos ? s.size() : i;
auto j = s.find_first_not_of("0123456789"sv, i);
j = j == std::string_view::npos ? s.size() : j;
return { s.substr(0, i), s.substr(i, j - i), s.substr(j) };
}
bool asNumberLess(std::string_view l, std::string_view r)
{
return l.size() != r.size() ? l.size() < r.size() : l < r;
}
bool alphanumeric_less(std::string_view l, std::string_view r)
{
if (l.empty())
return !r.empty();
auto a = alphanumericSplit(l);
auto b = alphanumericSplit(r);
if (a.prefix != b.prefix)
return a.prefix < b.prefix;
if (a.digits != b.digits)
return asNumberLess(a.digits, b.digits);
return alphanumeric_less(a.suffix, b.suffix);
}
It passes all test I wrote: https://godbolt.org/z/hdG6fq9sa
and it is easy to feed to both algorithms.
Disclaimer: there is small bug in implementation - can you find it? Depending on requirements there are more small issues.
std::wcsncmp returns 0, > 0 or < 0. > 0 and < 0 are converted to true, 0 to false. Thus, compare "a" with "b" and "b" with "a" both return true. std::sort expects a proper compare function. Perhaps you want return std::wcsncmp(a, b, k) < 0.
In your else, you return Len(a) < Len( b) but this is not the same as Len(a) != Len(b). I would just return false in the else block since you know they aren't the same at that point.
I have a queue of audio elements for each frame of a movie. I want to remove them from last seen to first seen up to a certain max number of elements. I use timeval to determine the time value. I have problems with the iterators due to calling erase on the elements.
I tried creating a std::multimap to store all the iterators based on the timeval key. Then I stored all the iterators on a vector based on max_frames to remove. I then sorted the vector of iterators from greater to less. Then erased all iterators.
// C++
void CMedia::timed_limit_audio_store(const int64_t frame)
{
unsigned max_frames = max_audio_frames();
struct customMore {
inline bool operator()( const timeval& a,
const timeval& b ) const
{
return (((a).tv_sec == (b).tv_sec) ?
((a).tv_usec > (b).tv_usec) :
(a).tv_sec > (b).tv_sec)));
}
};
typedef std::multimap< timeval, audio_cache_t::iterator,
customMore > TimedSeqMap;
TimedSeqMap tmp;
{
audio_cache_t::iterator it = _audio.begin();
audio_cache_t::iterator end = _audio.end();
for ( ; it != end; ++it )
{
if ( (*it)->frame() - max_frames < frame )
tmp.insert( std::make_pair( (*it)->ptime(), it ) );
}
}
unsigned count = 0;
TimedSeqMap::iterator it = tmp.begin();
typedef std::vector< audio_cache_t::iterator > IteratorList;
IteratorList iters;
for ( ; it != tmp.end(); ++it )
{
++count;
if ( count > max_frames )
{
// Store this iterator to remove it later
iters.push_back( it->second );
}
}
IteratorList::iterator i = iters.begin();
IteratorList::iterator e = iters.end();
// We sort the iterators from bigger to smaller to avoid erase
// trashing our iterator. However, this is not working properly.
std::sort( i, e, std::greater<audio_cache_t::iterator>() );
i = iters.begin();
e = iters.end();
for ( ; i != e; ++i )
{
_audio.erase( *i );
}
}
The expected result would be that the vector's elements are removed based on the timeval of the audio elements. The actual errors are memory trashing and sometimes crashes.
I have a finite array whose elements are only -1,0 or 1. I want to find the index of Nth occurrence of a number (say 0).
I can iterate through the entire array, but I'm looking for a faster approach. I can think of using Binary Search, but having trouble modelling the algorithm. How do I proceed with Binary Search in this case?
You cannot do this without at least one pass of O(N) pre-processing. From an standpoint of information theory alone, you must have knowledge of elements [0:k-1] to know whether element [k] is the one you want.
If you're going to make this search many times, then you can make a simple linear pass over the array, counting each element as you go. Store the indices in a 2-D array, so you can directly index whatever occurrence you want.
For instance, given [-1 0 1 1 -1 -1 0 0 0 -1 1], you can convert this to a 3xN array, idx
[[0 4 5 9]]
[[1 6 7 8]]
[[2 3 10]]
The Nth occurrence of element I is idx[I+1][N-1].
After that initial O(N) pass, your look-up is O(1) time, using O(N) space.
The OP stated that the ordered structure is important and that the vector or array is unsorted. To the best of my knowledge there is no faster search algorithm than linear for unsorted data. Here are a few links for references:
gamedev.net
quora.com
discuss.codechef.com
ubuntuforums.org
With the above links for references; this should be enough evidence to conclude that if the data in the array or vector is unsorted and must maintain its structure, then there is but no choice to use linear iteration, it may be possible to use a hashing technique, but that can still be tricky, using binary search will only work on sorted data in most cases.
- Here is a good linear algorithm to find the Nth occurrence of T in data.
To solve your problem of finding the Nth occurrence of element T in a given unsorted array, vector or container you can use this simple function template:
It takes 3 parameters:
a const reference to the container that is populated with data
a const unsigned value N where N is the Nth occurrence.
and a const template type T that you are searching for.
It returns an unsigned value for the index location within the container
of the Nth occurrence of element T
template<class T>
unsigned RepititionSearch( const std::vector<T>& data, const unsigned N, const T element ) {
if ( data.empty() || N < 0 || N >= data.size() ) {
return -1;
}
unsigned counter = 0;
unsigned i = 0;
for ( auto e : data ) {
if ( element == e ) {
++counter;
i++;
} else {
i++;
}
if ( counter == N ) {
return i - 1;
}
}
return -1;
}
Break down of the algorithm
It first does some sanity checks:
It checks to see if the container is empty
It checks the value N to see if it is within bounds of [0,container.size())
If any of these fail, it returns -1; in production code this might throw
an exception or an error
We then have a need for 2 incrementing counters:
1 for the current index location
1 for the number of occurrences of element T
We then use a simplified for loop using c++11 or higher
We go through each e in data
We check to see if the element passed into the function is equal to
the current e in data
If the check passes or is true we then pre-increment counter and
post-increment i otherwise we only want to post-increment i
After incrementing the counters we then check to see if the current
counter is equal to the Nth value passed into the function
If the check passes we return the value of i-1 since containers are 0 based
If the check fails here we then continue to the next iteration of the loop and repeat the process
If after all e in data has been checked and there are no occurrences
of T == e or N != counter then we leave the for loop and the function
returns a -1; in production code this might throw an exception or return an error.
The worst case scenario here is either there are no finds, or the Nth occurrence of T happens to be the very last e in data where this will yield O(N) which is linear, and for basic containers this should be efficient enough. If the containers have array indexing capabilities their item access should be O(1) constant if you know which index location you want.
Note: This would be the answer that I feel should solve the problem, if you are interested in a breakdown of how the design process of designing or modeling such an algorithm works you can refer to my reference answer here
AFAIK I do not think there is a better way to do this with unsorted array data, but don't quote me on it.
Since you are looking to search through an array, a vector or some container where the search in question pertains to the index location of some element T based on its Nth occurrence within its container this post may be of some help to you:
According to your question as well as some of the comments in regards to it where you explicitly stated that your container is Unsorted while you were thinking of using a binary search and were having trouble with the process of modeling an algorithm:
This post here serves as an example of the development process towards the design of an algorithm in which it may help you achieve what you are looking for:
The search algorithm here is a linear one, where a binary search will not be suitable to your current needs:
This same process of building an algorithm can be applied to other types of algorithms including, binary searches, hash tables, etc.
- 1st Build
struct Index {
static unsigned counter; // Static Counter
unsigned location; // index location of Nth element
unsigned count; // How many of this element up to this point
Index() : location( 0 ), count( 0 ) {}
};
unsigned Index::counter = 0;
// These typedefs are not necessarily needed;
// just used to make reading of code easier.
typedef Index IndexZero;
typedef Index IndexPos1;
typedef Index IndexNeg1;
template<class T>
class RepititionSearch {
public:
// Some Constants to compare against: don't like "magic numbers"
const T NEG { -1 };
const T ZERO { 0 };
const T POS { 1 };
private:
std::vector<T> data_; // The actual array or vector of data to be searched
std::vector<Index> indices_; // A vector of Indexes - record keeping to prevent multiple full searches.
public:
// Instantiating a search object requires an already populated container
explicit RepititionSearch ( const std::vector<T>& data ) : data_( data ) {
// make sure indices_ is empty upon construction.
indices_.clear();
}
// method to find the Nth occurrence of object A
unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
// Simple bounds checking
if ( NthOccurrence < 0 || NthOccurrence >= data.size() ) {
// Can throw error or print message...;
return -1;
}
IndexZero zeros;
IndexPos1 ones;
IndexNeg1 negOnes;
// Clear out the indices_ so that each consecutive call is correct
indices_.clear();
unsigned idx = 0;
for ( auto e : data_ ) {
if ( element == e && element == NEG ) {
++negOnes.counter;
negOnes.location = idx;
negOnes.count = negOnes.counter;
indices_.push_back( negOnes );
}
if ( element == e && element == ZERO ) {
++zeros.counter;
zeros.location = idx;
zeros.count = zeros.counter;
indices_.push_back( zeros );
}
if ( element == e && element == POS ) {
++ones.counter;
ones.location = idx;
ones.count = ones.counter;
indices_.push_back( ones );
}
idx++;
} // for each T in data_
// Reset static counters
negOnes.counter = 0;
zeros.counter = 0;
ones.counter = 0;
// Now that we saved a record: find the nth occurance
// This will not search the full vector unless it is last element
// This has early termination. Also this vector should only be
// a percentage of the original data vector's size in elements.
for ( auto index : indices_ ) {
if ( index.count == NthOccurrence) {
// We found a match
return index.location;
}
}
// Not Found
return -1;
}
};
int main() {
// using the sample array or vector from User: Prune's answer!
std::vector<char> vec{ -1, 0, 1, 1, -1, -1, 0, 0, 0, -1, 1 };
RepititionSearch <char> search( vec );
unsigned idx = search.getNthOccurrence( 3, 1 );
std::cout << idx << std::endl;
std::cout << "\nPress any key and enter to quit." << std::endl;
char q;
std::cin >> q;
return 0;
}
// output:
10
The value of 10 is the correct answer as the 3rd occurrence of the value 1 is at location 10 in the original vector since vectors are 0 based. The vector of indices is only used as book keeping for faster search.
If you noticed I even made this a class template to accept any basic type T that'll be stored in std::vector<T> as long as T is comparable, or has operators defined for it.
AFAIK I do not think that there is any other searching method that is faster than this for the type of search that you are striving for, but don't quote me on it. However I think I can optimize this code a little more... just need some time to look at it closer.
This may appear to be a bit crazy but this does work: just a bit of fun playing around with the code
int main() {
std::cout <<
RepititionSearch<char>( std::vector<char>( { -1, 0, 1, 1, -1, -1, 0, 0, 0, -1, 1 } ) ).getNthOccurrence( 3, 1 )
<< std::endl;
}
It can be done on a single line & printed to the console without creating an instance of class.
- 2nd Build
Now this may not necessarily make the algorithm faster, but this would clean up the code a bit for readability. Here I removed the typedefs, and just by using a single version of the Index struct in the 3 if statements you will see duplicate code so I decided to make a private helper function for that and this is how simple the algorithm looks for clear readability.
struct Index {
unsigned location;
unsigned count;
static unsigned counter;
Index() : location(0), count(0) {}
};
unsigned Index::counter = 0;
template<class T>
class RepitiionSearch {
public:
const T NEG { -1 };
const T ZERO { 0 };
const T POS { 1 };
private:
std::vector<T> data_;
std::vector<Index> indices_;
public:
explicit RepititionSearch( const std::vector<T>& data ) : data_( data )
indices_.clear();
}
unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
if ( NthOccurrence < 0 || NthOccurrence >= data.size() ) {
return -1;
}
indices_.clear();
Index index;
unsigned i = 0;
for ( auto e : data_ ) {
if ( element == e && element == NEG ) {
addIndex( index, i );
}
if ( element == e && element == ZERO ) {
addIndex( index, i );
}
if ( element == e && element == POS ) {
addIndex( index, i );
}
i++;
}
index.counter = 0;
for ( auto idx : indices_ ) {
if ( idx.count == NthOccurrence ) {
return idx.location;
}
}
return -1;
}
private:
void addIndex( Index& index, unsigned inc ) {
++index.counter;
index.location = inc;
index.count = index.counter;
indices_.push_back( index );
}
};
- 3rd Build
And to make this completely generic to find any Nth occurrence of any element T the above can be simplified and reduced down to this: I also removed the static counter from Index and moved it to the private section of RepititionSearch, it just made more sense to place it there.
struct Index {
unsigned location;
unsigned count;
Index() : location(0), count(0) {}
};
template<class T>
class RepititionSearch {
private:
static unsigned counter_;
std::vector<T> data_;
std::vector<Index> indices_;
public:
explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) {
indices_.clear();
}
unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
if ( NthOccurrence < 0 || NthOccurrence >= data_.size() ) {
return -1;
}
indices_.clear();
Index index;
unsigned i = 0;
for ( auto e : data_ ) {
if ( element == e ) {
addIndex( index, i );
}
i++;
}
counter_ = 0;
for ( auto idx : indices_ ) {
if ( idx.count == NthOccurrence ) {
return idx.location;
}
}
return -1;
}
private:
void addIndex( Index& index, unsigned inc ) {
++counter_;
index.location = inc;
index.count = counter_;
indices_.push_back( index );
}
};
template<class T>
unsigned RepititionSearch<T>::counter_ = 0;
- 4th Build
I have also done this same algorithm above without the need or dependency of needing a vector just to hold index information. This version doesn't need the Index struct at all and doesn't need a helper function either. It looks like this:
template<class T>
class RepititionSearch {
private:
static unsigned counter_;
std::vector<T> data_;
public:
explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) {}
unsigned getNthOcc( unsigned N, T element ) {
if ( N < 0 || N >= data_.size() ) {
return -1;
}
unsigned i = 0;
for ( auto e : data_ ) {
if ( element == e ) {
++counter_;
i++;
} else {
i++;
}
if ( counter_ == N ) {
counter_ = 0;
return i-1;
}
}
counter_ = 0;
return -1;
}
};
template<class T>
unsigned RepititionSearch<T>::counter_ = 0;
Since we were able to remove the dependency of the secondary vector and removed the need for a helper function; we don't even need a class at all to hold the container; we can just write a function template that takes a vector and apply the same algorithm. Also there is no need for a static counter with this version.
- 5th Build
template<class T>
unsigned RepititionSearch( const std::vector<T>& data, unsigned N, T element ) {
if ( data.empty() || N < 0 || N >= data.size() ) {
return -1;
}
unsigned counter = 0;
unsigned i = 0;
for ( auto e : data ) {
if ( element == e ) {
++counter;
i++;
} else {
i++;
}
if ( counter == N ) {
return i - 1;
}
}
return -1;
}
Yes this is a lot to take in; but these are the steps that are involved in the process of writing and designing an algorithm and refining it down to simpler code. As you have seen I have refined this code about 5 times. I went from using a struct, a class, typedefs, and a static member with multiple stored containers, to removing the typedefs and putting the repeatable code into a helper function, to removing the dependency of a secondary container & the helper function, down to not even needing a class at all and just creating a function that does what it is supposed to do.
You can apply a similar approach to these steps into building a function that does what you want or need it to do. You can use the same process to write a function that will do a binary search, hash table, etc.
I have some code that looks roughly like this; given two maps, if the first key exists in both maps, then multiply the two second values together, then sum all the products. For example:
s1 = {{1, 2.5}, {2, 10.0}, {3, 0.5}};
s2 = {{1, 10.0}, {3, 20.0}, {4, 3.33}};
The answer should be 2.5*10.0 + 0.5*20.0, the sum of the products of the matching keys.
double calcProduct(std::map<int, double> const &s1, std::map<int, double> const &s2)
{
auto s1_it = s1.begin();
auto s2_it = s2.begin();
double result = 0;
while (s1_it != s1.end() && s2_it != s2.end())
{
if (s1_it->first == s2_it->first)
{
result += s1_it->second * s2_it->second;
s1_it++:
s2_it++;
}
else if (s1_it->first < s2_it->first)
{
s1_it = s1.lower_bound(s2_it->first);
}
else
{
s2_it = s2.lower_bound(s1_it->first);
}
}
return result;
}
I would like to refactor this and std::set_intersection seems to be close to what I want as the documentation has an example using std::back_inserter, but is there a way to get this to work on maps and avoid the intermediate array?
The code you're using is already very close to the way that set_intersect would be implemented. I can't see any advantage to creating a new map and iterating over it.
However there were a couple of things with your code I wanted to mention.
If you're going to increment your iterators you shouldn't make them constant.
I would expect that there will be more misses than hits when looking for equivalent elements. I would suggest having the less than comparisons first:
double calcProduct( std::map<int , double> const &s1 , std::map<int , double> const &s2 )
{
auto s1_it = s1.begin();
auto s2_it = s2.begin();
double result = 0;
while ( s1_it != s1.end() && s2_it != s2.end() )
{
if ( s1_it->first < s2_it->first )
{
s1_it = s1.lower_bound( s2_it->first );
}
else if(s2_it->first < s1_it->first )
{
s2_it = s2.lower_bound( s1_it->first );
}
else
{
result += s1_it->second * s2_it->second;
s1_it++;
s2_it++;
}
}
return result;
}
Consider this unit test:
std::bitset<8> temp( "11010100" );
reverseBitSet( temp );
CPPUNIT_ASSERT( temp == std::bitset<8>( "00101011" ) );
This implementation works:
template<size_t _Count> static inline void reverseBitSet( std::bitset<_Count>& bitset )
{
bool val;
for ( size_t pos = 0; pos < _Count/2; ++pos )
{
val = bitset[pos];
bitset[pos] = bitset[_Count-pos-1];
bitset[_Count-pos-1] = val;
}
}
While this one does not:
template<size_t _Count> static inline void reverseBitSet( std::bitset<_Count>& bitset )
{
for ( size_t pos = 0; pos < _Count/2; ++pos )
{
std::swap( bitset[pos], bitset[_Count-pos-1] );
}
}
Result is "11011011" instead of "00101011"
Why is swap doing it wrong?
This:
std::swap( bitset[pos], bitset[_Count-pos-1] );
should actual fail to compile. operator[] for a std::bitset doesn't return a reference, it returns a proxy object. That proxy object isn't an lvalue, so it cannot bind to the T& in std::swap. I'm assuming the fact that it compiles at all means that you're using MSVC which has an extension that allows binding temporaries to non-const references - at which point you're probably just swapping the proxies and not what the proxies are actually referring to.
Side-note: The name _Count is reserved by the standard, as is any other name which begins with an _ followed by a capital letter.