Bubble-Sort: Ascending Lists v/s Descending Lists - bubble-sort

L1 = [9, 8, 7, 6, 5, 4, 3, 2, 1]
L2 = [8, 1, 3, 6, 9, 7, 4, 2, 5]
Would L1 cause bubblesort to do more swaps since the elements are in non-ascending order? I don't really understand what determines bubblesort to do more/less swaps.

Yes, L2 would make Bubble-sort do more swaps. Bubble-sort is seriously slowed down by turtles (i.e., small numbers near the end of the list). Rabbits (i.e., large numbers near the beginning of the code) get swapped quickly and don't matter while turtles move forward in the list slowly, once w/ every iteration.
That's why you almost never never use bubblesort in any heavy-duty sorting code. Introsort and Cocktail Sort are better variations of bubble-sort.
I don't get why you're asking this question here, and why you've tagged this question w/ python (though that is the only reason I saw this question in the first place).

Related

How to store repeated data in an efficient way?

I am trying to find an efficient structure to store data in c++. Efficiency in both time and storage is important.
I have a set S = {0, 1, 2, 3, ..., N}, and multiple levels, say L levels. For each level l ∈ {0, 1, ..., L}, I need a structure, say M, to store 2 subsets of S, which will be the rows and columns of a matrix in the next steps. For example:
S = {0, 1, 2, 3, ..., 15}
L = 2
L1_row = {0, 1, 2, 3, 4, 5}
L1_col = {3, 4, 5, 6, 9}
L2_row = {0, 2, 4, 5, 12, 14}
L2_col = {3, 6, 10, 11, 14, 15}
I have used an unordered_map with integer keys as level and a pair of unordered_sets for row and column, as follow:
unordered_map<int, pair<unordered_set<int>, unordered_set<int>>> M;
However, this is not efficient, for example, {3, 4, 5} recorded 3 times. As S is a large set, so M will contain many repeated numbers.
In the next step, I will extract row and cols per level from M and create a matrix, so, fast access is important.
M may or may not contain all items in S.
M will fill in at 2 stages, first, rows for all levels, and then columns for all levels.
That is a tough one. Memory and efficiency really depend on your concrete data set. If you don't want to store {3,4,5} 3 times, you'll have to create a "token" for it and use that instead.
There are patterns such as
flyweight or
run length encoding or
dictionary
for that. (boost-flyweight or dictionaries for ZIP/7Z and other compression algorithms).
However under some circumstances this can actually use more memory than just repeating the numbers.
Without further knowledge about the concrete use case it is very hard to suggest anything.
Example: Run length encoding is an efficient way to store {3,4,5,6,7...}. Basically you just store the first index and a length. So {3,4...12} becomes {{3, 10}}. That's easy to 'decode' and uses a lot less memory than the original sequence, but only if you have many consecutive sequences. If you have many short sequences it will be worse.
Another example: If you have lots of recurring patterns, say, {2,6,11} appears 23 times, then you could store this pattern as a flyweight or use a dictionary. You replace {2,6,11} with #4. The problem here is identifying patterns and optimizing your dictionary. That is one of the reasons why compressing files (7Zip, bzip etc...) takes longer than uncompressing.
Or you could even store the info in a bit pattern. If your matrix is 1024 columns wide, you could store the columns used in 128 Bytes with "bit set" == "use this column". Like a bitmask in image manipulation.
So I guess my answer really is: "it depends".

python reordering of list elements with conversion to set

I understand that Python sets are not ordered.
And the ordering of a list will not be preserved when converted to a set
My question is: why are the elements of a list reordered when converted (in some implementations).
It would seem an extra action to reorder the elements during conversion.
If nothing else, it does not seem there would be much overhead in preserving the order (for convenience).
Thats because sets are based on the Hash Table data structure, where records are stored in buckets using hash keys, and the order of items (when printed or converted to list) depends on the hashes, but internally the order doest matter. so it doesnt really bother to change the order, it just adds the item and creates a hash index for it. when you print the set it is probably printed according to the lexographical order of hashes, or something like that.
As you can see from the following, the list when created from a set, takes the same order of hashes of these items.
>>> s=set([5,4,3,7,6,2,1,0])
>>> s
{0, 1, 2, 3, 4, 5, 6, 7}
>>> list(s)
[0, 1, 2, 3, 4, 5, 6, 7]

Use c++ gslice to hide specific elements in valarray<int>

I want to hide multiple elements in a valarray<int> which has consecutive integers starting from 0. For example, from {0, 1, 2, 3, 4, 5} to {0, 2, 3, 5}. I have found that I can use indirect array to specify elements indices with valarray<size_t>. However, I don't know how to generate valarray<size_t> with indices I want in O(1) complexity. O(1) complexity or at most O(logn) complexity is very important to me. So, I think gslice may be able to solve the problem, but I still can't figure out how to implement it.
Note: I use c++11

Can I sort a vector to match the sorting of an unordered_map?

Can I sort a vector so that it will match the sorting of an unordered_map? I want to iterate over the unordered_map and if I could only iterate each container once to find their intersection, rather than having to search for each key.
So for example, given an unordered_map containing:
1, 2, 3, 4, 5, 6, 7, 8, 9
Which is hashed into this order:
1, 3, 4, 2, 5, 7, 8, 6, 9
I'd like if given a vector of:
1, 2, 3, 4
I could somehow distill the sorting of the unordered_map for use in sorting the vector so it would sort into:
1, 3, 4, 2
Is there a way to accomplish this? I notice that unordered_map does provide it's hash_function, can I use this?
As comments correctly state, there is no even remotely portable way of matching sorting on unordered_map. So, sorting is unspecified.
However, in the land of unspecified, sometimes for various reasons we can be cool with whatever our implementation does, even if unspecified and non-portable. So, could someone look into your map implementation and use the determinism it has there on the vector?
The problem with unordered_map is that it's a hash. Every element inserted into it will be hashed, with hash (mapped to the key space) used as an index in internal array. This looks promising, and it would be promising if not for collision. In case of key collision, the elements are put into the collision list, and this list is not sorted at all. So the order of iteration over collision would be determined by the order of inserts (reverse or direct). Because of that, absent information of order of inserts, it would not be possible to mimic the order of the unordered_map, even for specific implementation.

how to check whether a set has element(s) in certain range in C++

I need to check if a std::set contains element/elements in a range. For example, if the set is a set<int> {1, 2, 4, 7, 8}, and given an int interval [3, 5] (inclusive with both endpoints), I need to know if it has elements in the set. In this case, return true. But if the interval is [5, 6], return false. The interval may be [4, 4], but not [5, 3].
Looks like I can use set::lower_bound, but I am not sure whether this is the correct approach. I also want to keep the complexity as low as possible. I believe using lower_bound is logarithmic, correct?
You can use lower_bound and upper_bound together. Your example of testing for elements between 3 and 5, inclusive, could be written as follows:
bool contains_elements_in_range = s.lower_bound(3) != s.upper_bound(5);
You can make the range inclusive or exclusive on either end by switching which function you are using (upper_bound or lower_bound):
s.upper_bound(2) != s.upper_bound(5); // Tests (2, 5]
s.lower_bound(3) != s.lower_bound(6); // Tests [3, 6)
s.upper_bound(2) != s.lower_bound(6); // Tests (2, 6)
Logarithmic time is the best you can achieve for this, since the set is sorted and you need to find an element in the sorted range, which requires a dichotomic search.
If you're certain that you're going to use a std::set, then I agree that its lower_bound method is the way to go. As you say, it will have logarithmic time complexity.
But depending what you're trying to do, your program's overall performance might be better if you use a sorted std::vector and the standalone std::lower_bound algorithm (std::lower_bound(v.begin(), v.end(), 3)). This is also logarithmic, but with a lower constant. (The downside, of course, is that inserting elements into a std::vector, and keeping it sorted, is usually much more expensive than inserting elements into a std::set.)