Merge Sort With String Vectors C++ - c++

Hello all I am a noob to recursion and I'm feeling like banging my head against the wall. I watched some videos, read the chapter and have been trying to figure out the answer to this problem for over 6 hours now with no luck. My professor gave us the following code and we have to mod it from there. Note: We are reading 52k words from a file and then sorting them using this algorithm. Not sure if that matters but thought I would add the info just in case.
include
using namespace std;
vector<int> MergeUsingArrayIndices(const vector<int> & LHS,
const vector<int> & RHS)
{
vector<int> ToReturn;
int i = 0; // LHS index
int j = 0; // RHS index
while ((i < LHS.size()) && (j < RHS.size()))
{
if (LHS[i] < RHS[j])
{
ToReturn.push_back(LHS[i]);
++i;
}
else
{
ToReturn.push_back(RHS[j]);
++j;
}
}
while (i < LHS.size())
{
ToReturn.push_back(LHS[i]);
++i;
}
while (j < RHS.size())
{
ToReturn.push_back(RHS[j]);
++j;
}
return ToReturn;
}
Except now we have to make this work from just a single vector. This is what I have so far.
vector<string> MergeUsingArrayIndices(vector<string> & LHS,
int START, int MID, int MIDPLUSONE, int END)
{
vector<string> ToReturn;
int i = 0; // LHS index
int j = MIDPLUSONE; // RHS index
while ((i <= MID) && (j <= END))
{
if (LHS[i] < LHS[j])
{
ToReturn.push_back(LHS[i]);
++i;
}
else
{
ToReturn.push_back(LHS[j]);
++j;
}
}
while (i <= MID)
{
ToReturn.push_back(LHS[i]);
++i;
}
while (j <= END)
{
ToReturn.push_back(LHS[j]);
++j;
}
for (int k = 0; k < ToReturn.size(); ++k)
{
LHS[k] = ToReturn[k];
}
return ToReturn;
}
Plus this is the call prior to the function.
void MergeSort(vector<string> & VECTOR, int START, int END)
{
if (END > START)
{
int MID = (START + END) / 2;
MergeSort(VECTOR, START, MID);
MergeSort(VECTOR, MID + 1, END);
MergeUsingArrayIndices(VECTOR, START, MID, (MID+1), END);
}
}
void Merge(std::vector<string> & VECTOR)
{
MergeSort(VECTOR, 0, VECTOR.size()-1);
}
Console Screen Shot
Basically it is sorting but not very well since not everything is in alphabetical order. That was just a small sample of words from the list.
Thank you and best regards,
DON'T GET MARRIED.
UPDATE FOR: PNKFELIX
It tried the following;
vector<string> ToReturn;
int i = START; // LHS index
int j = MIDPLUSONE; // RHS index
while (i <= MID && j <= END)
{
if (LHS[i] <= LHS[j])
{
ToReturn[START] = LHS[i];
//ToReturn.push_back(LHS[i]);
++START;
++i;
}
and so on but this made the code worse so I am sure that is not what you were referring to. I have been up for days trying to figure this out and I cannot sleep......
The one thing you pointed to that is bothering me because I see why it's not happening but cannot fix is the call
I'm guessing that is why you used the apple, pear, orange, banana example. (very clever by the way). You can lead a horse to water but cannot make it drink. However, I still do not see how to fix this? I tried replacing my i = 0; with i = START as I now see this is probably the culprit when comparing the right side since it should start at that position but it actually made my code worse? What else am I missing here?
I have so much going on and cannot stand it when professors do stuff like this (my community college isn't great for CIS and my professor has never taught this class before). I cannot rest until I figure it out but the textbook is so far above my head (the professor even apologized for the textbook at the beginning of the semester saying it was too advanced for us but it is what they gave him) and uses a totally different approach (two separate arrays instead of one vector). What am I supposed to do with START? I have spent so much time on this and am dying to know the answer. Maybe that makes me lazy but there is a point where you can only think about something so much. I love to learn but this is not learning as I've hit my limit. I am missing something and don't know how to begin desk checking what it is. I am assuming the right hand side of each vector comparison is not sorted but how do I fix that? Is it because start is not always zero (example: for the right hand side )? I am not good at sorting algorithms (because I am not very bright (although I study allot)) as it is, and this is a new twist. It's like handing someone a bubble sort that is broken and asking them to desk check it, fix whats wrong with it, and make it more efficient yet they've never seen one working before.

The nice thing about a problem like this is that there's nothing specific to C++ here. One can take the proposed code and port it to pretty much any other reasonable language (e.g. JavaScript) and then debug it there to determine what is going wrong.
A good practice in any program is to document the assumptions and invariants of the code. If these invariants are simple enough, you can even check that they hold within the code itself, via assert statements.
So, lets see: from looking at how MergeUsingArrayIndices is invoked by MergeSort, it seems like your approach is a recursive divide-and-conquer: You first divide the input in two at a midpoint element, sort each side of the divided input, and then merge the two parts.
From that high-level description, we can identify a couple of invariants that must hold on entry to MergeUsingArrayIndices: the left half of LHS must be sorted, and the right half of LHS must also be sorted. We can check that these two conditions hold as we merge the vector, which may help us identify the spot where things are going wrong.
I took the original code and ported it as faithfully as I could to Rust (my preferred programming language), then added some assertions, and some print statements so that we can see where the assertions fail.
(There was one other change I forgot to mention above: I also got rid of the unused return value from MergeUsingArrayIndices. The array you're building up is solely used as temporary storage that is later copied into LHS; you never use the return value and therefore we can just remove that from the function's type entirely.)
Here is that code in a running playpen:
https://play.rust-lang.org/?gist=bd61b9572ea45b7139bf081cb51dc491&version=stable&backtrace=0
Some leading questions:
What indices is the assertion comparing when it reports that LHS[i] is in fact not less than LHS[i+1]?
The printouts report when the vector should be sorted at certain subranges: 0...0, 1...1, 0...1, et cetera. The indices you found above (assuming they are the same as the ones I found) are not within one of these subranges; so we in fact do not have a justification for trying to claim that LHS[i] is less than LHS[i+1]! So what happened, why does the code think that they should fall into a sorted subrange of the vector?
Strong hint number one: I left on a warning that the compiler issues about the code.
Strong hint number two: Try doing the exercise I left in the comment above the MergeUsingArrayIndices function.

Use strcmp(LHS[i],LHS[j])<0 in if condition

Related

Efficiency of an algorithm for scrambled input

I am currently writing a program, its done for the most part, in CPP that takes in a file, with numbered indices and then pushes out a scrambled quiz based on the initial input, so that no two are, theroretically, the same.
This is the code
// There has to be a more efficient way of doing this...
for (int tempCounter(inputCounter);
inputCounter != 0;
/* Blank on Purpose*/) {
randInput = (rand() % tempCounter) + 1;
inputIter = find (scrambledArray.begin(),
scrambledArray.end(),
randInput);
// Checks if the value passed in is within the given vector, no duplicates.
if (inputIter == scrambledArray.end()) {
--inputCounter;
scrambledArray.push_back(randInput);
}
}
The first comment states my problem. It will not happen, under normal circumstances, but what about if this were being applied to a larger application standpoint. This works, but it is highly inefficient should the user want to scramble 10000 or so results. I feel as if in that point this would be highly inefficient.
I'm not speaking about the efficiency of the code, as in shortening some sequences and compacting it to make it a bit prettier, I was more or less teaching someone, and upon getting to this point I came to the conclusion that this could be done in a way better manner, just don't know which way it could be...
So you want just the numbers 1..N shuffled? Yes, there is a more efficient way of doing that. You can use std::iota to construct your vector:
// first, construct your vector:
std::vector<int> scrambled(N);
std::iota(scrambled.begin(), scrambled.end(), 1);
And then std::shuffle it:
std::shuffle(scrambled.begin(), scrambled.end(),
std::mt19937{std::random_device{}()});
If you don't have C++11, the above would look like:
std::vector<int> scrambled;
scrambled.reserve(N);
for (int i = 1; i <= N; ++i) {
scrambled.push_back(i);
}
std::random_shuffle(scrambled.begin(), scrambled.end());

Is there only one way to implement a bubble sort algorithm?

I was trying to implement my own bubble sort algorithm without looking at any pseudo-code online, but now that I've successfully done it, mine code looks really different from the examples I see online. They all involve dealing with a swapped variable that is either true or false. My implementation does not include that at all, so did I NOT make a bubble sort?
Here is an example I see online:
for i = 1:n,
swapped = false
for j = n:i+1,
if a[j] < a[j-1],
swap a[j,j-1]
swapped = true
→ invariant: a[1..i] in final position
break if not swapped
end
Here is my implementation of it:
void BubbleSort(int* a, int size)
{
while (!arraySorted(a, size))
{
int i = 0;
while (i < (size-1))
{
if (a[i] < a[i+1])
{
i++;
}
else
{
int tmp = 0;
tmp = a[i+1];
a[i+1] = a[i];
a[i] = tmp;
i++;
}
}
}
}
It does the same job, but does it do it any differently?
As some people noted, your version without the flag works, but is needlessly slow.
However, if you take the original version and just throw away the flag (together with the break), it will still work. It's easy to see from the invariant that you conveniently posted.
The version without the break has roughly the same worst-case performance as with the break (worst case is for an array sorted in reverse order). It's better than the original one if you want an algorithm that is guaranteed to finish in a pre-defined time.
Wikipedia describes another idea for optimization of the bubble-sort, which includes throwing away the break.

C++ do while loop

I have a vector holding 10 items (all of the same class for simplicity call it 'a'). What I want to do is to check that 'A' isn't either a) hiding the walls or b) hiding another 'A'. I have a collisions function that does this.
The idea is simply to have this looping class go though and move 'A' to the next position, if that potion is causing a collision then it needs to give itself a new random position on the screen. Because the screen is small, there is a good chance that the element will be put onto of another one (or on top of the wall etc). The logic of the code works well in my head - but debugging the code the object just gets stuck in the loop, and stay in the same position. 'A' is supposed to move about the screen, but it stays still!
When I comment out the Do while loop, and move the 'MoveObject()' Function up the code works perfectly the 'A's are moving about the screen. It is just when I try and add the extra functionality to it is when it doesn't work.
void Board::Loop(void){
//Display the postion of that Element.
for (unsigned int i = 0; i <= 10; ++i){
do {
if (checkCollisions(i)==true){
moveObject(i);
}
else{
objects[i]->ResetPostion();
}
}
while (checkCollisions(i) == false);
objects[i]->SetPosition(objects[i]->getXDir(),objects[i]->getYDir());
}
}
The class below is the collision detection. This I will expand later.
bool Board::checkCollisions(int index){
char boundry = map[objects[index]->getXDir()][objects[index]->getYDir()];
//There has been no collisions - therefore don't change anything
if(boundry == SYMBOL_EMPTY){
return false;
}
else{
return true;
}
}
Any help would be much appreciated. I will buy you a virtual beer :-)
Thanks
Edit:
ResetPostion -> this will give the element A a random position on the screen
moveObject -> this will look at the direction of the object and adjust the x and Y cord's appropriately.
I guess you need: do { ...
... } while (checkCollisions(i));
Also, if you have 10 elements, then i = 0; i < 10; i++
And btw. don't write if (something == true), simply if (something) or if (!something)
for (unsigned int i = 0; i <= 10; ++i){
is wrong because that's a loop for eleven items, use
for (unsigned int i = 0; i < 10; ++i){
instead.
You don't define what 'doesn't work' means, so that's all the help I can give for now.
There seems to be a lot of confusion here over basic language structure and logic flow. Writing a few very simple test apps that exercise different language features will probably help you a lot. (So will a step-thru debugger, if you have one)
do/while() is a fairly advanced feature that some people spend whole careers never using, see: do...while vs while
I recommend getting a solid foundation with while and if/else before even using for. Your first look at do should be when you've just finished a while or for loop and realize you could save a mountain of duplicate initialization code if you just changed the order of execution a bit. (Personally I don't even use do for that any more, I just use an iterator with while(true)/break since it lets me pre and post code all within a single loop)
I think this simplifies what you're trying to accomplish:
void Board::Loop(void) {
//Display the postion of that Element.
for (unsigned int i = 0; i < 10; ++i) {
while(IsGoingToCollide(i)) //check is first, do while doesn't make sense
objects[i]->ResetPosition();
moveObject(i); //same as ->SetPosition(XDir, YDir)?
//either explain difference or remove one or the other
}
}
This function name seems ambiguous to me:
bool Board::checkCollisions(int index) {
I'd recommend changing it to:
// returns true if moving to next position (based on inertia) will
// cause overlap with any other object's or structure's current location
bool Board::IsGoingToCollide(int index) {
In contrast checkCollisions() could also mean:
// returns true if there is no overlap between this object's
// current location and any other object's or structure's current location
bool Board::DidntCollide(int index) {
Final note: Double check that ->ResetPosition() puts things inside the boundaries.

How to efficiently sort an array of doubles without libraries?

I'm looking for an efficient way to sort an array of doubles. I know bubble sort and selection sort, neither of them seems to be fast enough. I read about quick sort, but I don't understand how it works. There are a lots of example source codes, but all of them are poorly commented. Can someone explain it to me?
I wrote this after getting an idea about how qsort works. I do think qsort is not that easy to understand. It would probably need some optimalization, and is probably no where compared to the original qsort, but here it is. Thanks for peaple who tried to help with this.
/*recursive sorting, throws smaller values to left,
bigger to right side, than recursively sorts the two sides.*/
void sort(double szam[], int eleje, int vege){
if (vege > eleje + 1){ //if I have at least two numbers
double kuszob = szam[eleje]; //compare values to this.
int l = eleje + 1; //biggest index that is on the left.
int r = vege; //smallest index that is on the right side.
while (l < r){ //if I haven't processed everything.
if (szam[l] <= kuszob) l++; //good, this remains on the left.
else
swap(&szam[l], &szam[--r]); //swap it with the farthest value we haven't checked.
}
swap(&szam[--l], &szam[eleje]); //make sure we don't compare to this again, that could cause STACK OVERFLOW
sort(szam, eleje, l); //sort left side
sort(szam, r, vege); //sort right side
}
return; //if I have 1 number break recursion.
}

Stack versus Integer

I've created a program to solve Cryptarithmetics for a class on Data Structures. The professor recommended that we utilize a stack consisting of linked nodes to keep track of which letters we replaced with which numbers, but I realized an integer could do the same trick. Instead of a stack {A, 1, B, 2, C, 3, D, 4} I could hold the same info in 1234.
My program, though, seems to run much more slowly than the estimation he gave us. Could someone explain why a stack would behave much more efficiently? I had assumed that, since I wouldn't be calling methods over and over again (push, pop, top, etc) and instead just add one to the 'solution' that mine would be faster.
This is not an open ended question, so do not close it. Although you can implement things different ways, I want to know why, at the heart of C++, accessing data via a Stack has performance benefits over storing in ints and extracting by moding.
Although this is homework, I don't actually need help, just very intrigued and curious.
Thanks and can't wait to learn something new!
EDIT (Adding some code)
letterAssignments is an int array of size 26. for a problem like SEND + MORE = MONEY, A isn't used so letterAssignments[0] is set to 11. All chars that are used are initialized to 10.
answerNum is a number with as many digits as there are unique characters (in this case, 8 digits).
int Cryptarithmetic::solve(){
while(!solved()){
for(size_t z = 0; z < 26; z++){
if(letterAssignments[z] != 11) letterAssignments[z] = 10;
}
if(answerNum < 1) return NULL;
size_t curAns = answerNum;
for(int i = 0; i < numDigits; i++){
if(nextUnassigned() != '$') {
size_t nextAssign = curAns % 10;
if(isAssigned(nextAssign)){
answerNum--;
continue;
}
assign(nextUnassigned(), nextAssign);
curAns /= 10;
}
}
answerNum--;
}
return answerNum;
}
Two helper methods in case you'd like to see them:
char Cryptarithmetic::nextUnassigned(){
char nextUnassigned = '$';
for(int i = 0; i < 26; i++) {
if(letterAssignments[i] == 10) return ('A' + i);
}
}
void Cryptarithmetic::assign(char letter, size_t val){
assert('A' <= letter && letter <= 'Z'); // valid letter
assert(letterAssignments[letter-'A'] != 11); // has this letter
assert(!isAssigned(val)); // not already assigned.
letterAssignments[letter-'A'] = val;
}
From the looks of things the way you are doing things here is quite inefficiant.
As a general rule try to have the least amount of for loops possible since each one will slow down your implementation greatly.
for instance if we strip all other code away, your program looks like
while(thing) {
for(z < 26) {
}
for(i < numDigits) {
for(i < 26) {
}
for(i < 26) {
}
}
}
this means that for each while loop you are doing ((26+26)*numDigits)+26 loop operations. Thats assuming isAssigned() does not use a loop.
Idealy you want:
while(thing) {
for(i < numDigits) {
}
}
which i'm sure is possible with changes to your code.
This is why your implementation with the integer array is much slower than an implementation using the stack which does not use the for(i < 26) loops (I assume).
In Answer to your original question however, storing an array of integers will always be faster than any struct you can come up with simply because there are more overheads involved in assigning the memory, calling functions, etc.
But as with everything, implementation is the key difference between a slow program and a fast program.
The problem is that by counting you are considering also repetitions, when may be the problem asks to assign a different number to each different letter so that the numeric equation holds.
For example for four letters you are testing 10*10*10*10=10000 letter->number mappings instead of 10*9*8*7=5040 of them (the bigger is the number of letters and bigger becomes the ratio between the two numbers...).
The div instruction used by the mod function is quite expensive. Using it for your purpose can easily be less efficient than a good stack implementation. Here is an instruction timings table: http://gmplib.org/~tege/x86-timing.pdf
You should also write unit tests for your int-based stack to make sure that it works as intended.
Programming is actually trading memory for time and vice versa.
Here you are packing data into integer. You spare memory but loose time.
Speed of course depends on the implementation of stack. C++ is C with classes. If you are not using classes it's basically C(as fast as C).
const int stack_size = 26;
struct Stack
{
int _data[stack_size];
int _stack_p;
Stack()
:_stack_size(0)
{}
inline void push(int val)
{
assert(_stack_p < stack_size); // this won't be overhead
// unless you compile debug version(-DNDEBUG)
_data[_stack_p] = val;
}
inline int pop()
{
assert(_stack_p > 0); // same thing. assert is very useful for tracing bugs
return _data[--_stack_p]; // good hint for RVO
}
inline int size()
{
return _stack_p;
}
inline int val(int i)
{
assert(i > 0 && i < _stack_p);
return _data[i];
}
}
There is no overhead like vtbp. Also pop() and push() are very simple so they will be inlined, so no overhead of function call. Using int as stack element also good for speed because int is guaranteed to be of best suitable size for processor(no need for alignment etc).