C++ source code bug -- Computing Differences in Distance and Total Sums - c++

The purpose of this program is to be able to input a set of integer double values, and for it to output the total distance as a sum. It's also meant to recognize the smallest and largest distances -- as well as calculate the mean of two or more distances.
I would also like to be able to remove the repetitive block of code in my program, which I've literally copied to get the second part of the source code working. Apparently there's a way to remove the replication -- but I don't know how.
Here's the source:
/* These includes are all part of a custom header designed
by Bjarne Stroustrup as part of Programming: Principles and Practice
using c++
*/
#include<iostream>
#include<iomanip>
#include<fstream>
#include<sstream>
#include<cmath>
#include<cstdlib>
#include<string>
#include<list>
#include <forward_list>
#include<vector>
#include<unordered_map>
#include<algorithm>
#include <array>
#include <regex>
#include<random>
#include<stdexcept>
// I am also using the "stdafx.h" header.
// reading a sequence of integer doubles into a vector.
This could be the distance between two areas with different paths
int main()
{
vector<double> dist; // vector, double integer value
double sum = 0; // sum of two doubles
double min = 0; // min dist
double max = 0; // max dist
cout << "Please enter a sequence of integer doubles (representing distances): \n";
double val = 0;
while (cin >> val)
{
if (val <= 0)
{
if (dist.size() == 0)
error("no distances");
cout << "The total distance is: " << sum << "\n";
cout << "The smallest distance is: " << min << "\n";
cout << "The greatest distance is: " << max << "\n";
cout << "The average (mean) distance is: " << sum / dist.size() << "\n";
keep_window_open();
return 0;
}
dist.push_back(val); // stores vector value
// updating the runtime values
sum += val;
if (val > min)
min = val;
if (max < val)
max = val;
}
if (dist.size() == 0)
error("no distances");
cout << "The total distance is: " << sum << "\n";
cout << "The smallest distance is: " << min << "\n";
cout << "The greatest distance is: " << max << "\n";
cout << "The average (mean) distance is: " << sum / dist.size() << "\n";
keep_window_open();
}
Additionally, I have been trying to input a small block of source code in the form of something like "catch (runtime_error e)" but it expects a declaration of some sort and I don't know how to get it to compile without errors.
Help with removing the replicated/repeating block of code to reduce bloat would be great -- on top of everything else.

Instead of having the if statement inside the while, you should combine the two conditions to avoid duplicating that code:
while ( (cin >> val) && (val > 0) )
Also, you need to initialize min to a largest value, rather than zero, if you want the first comparison to capture the first possible value for min.
Making a function out of duplicated code is a general purpose solution that isn't a good choice in your case for two reasons: First, it isn't necessary, since it is easier and better to combine the flow of control so there is no need to invoke that code in two places. Second there are too many local variables used in the duplicated code, so if there were a reason to make the duplicated code into a function, good design would also demand collecting some or all of those local variables into an object.
If it had not been cleaner and easier to merge the two conditions, it still would be better to merge the flow of control than to invent the function to call from two places. You cold have used:
if (val <= 0)
{
break;
}

Related

Usage of arrays to generate random numbers

I'm trying to generate N random floats between 0 and 1 where N is specified by the user. Then I need to find the mean and the variance of the generated numbers. Struggling with finding the variance.
Already tried using variables instead of an array but have changed my code to allow for arrays instead.
#include <cstdlib>
#include <ctime>
#include <cmath>
using namespace std;
int main(){
int N, i;
float random_numbers[i], sum, mean, variance, r;
cout << "Enter an N value" << endl;
cin >> N;
sum = 0;
variance = 0;
for (i = 0; i < N; i++) {
srand(i + 1);
random_numbers[i] = ((float) rand() / float(RAND_MAX));
sum += random_numbers[i];
cout << random_numbers[i] << endl;
mean= sum / N;
variance += pow(random_numbers[i]-mean,2);
}
variance = variance / N;
cout << " The sum of random numbers is " << sum << endl;
cout << " The mean is " << mean << endl;
cout << " The variance is " << variance << endl;
}
The mean and sum is currently correct however the variance is not.
The mean you calculate inside the loop is a "running-mean", ie for each new incoming number you calculate the mean up to this point. For the variance however your forumla is incorrect. This:
variance += pow(random_numbers[i]-mean,2);
would be correct if mean was the final value, but as it is the running mean the result for variance is incorrect. You basically have two options. Either you use the correct formula (search for "variance single pass algorithm" or "running variance") or you first calculate the mean and then set up a second loop to calculate the variance (for this case your formula is correct).
Note that the single pass algorithm for variance is numerically not as stable as using two loops, so if you can afford it memory and performance-wise you should prefer the algorithm using two passes.
PS: there are other issues with your code, but I concentrated on your main question.
The mean that you use inside the variance computation is only the mean of the first to i element. You should compute the mean of the sample first, then do another loop to compute the variance.
Enjoy

c++ - Solution to 2-sum using unordered_map

Okay so I am trying to solve the 2-SUM problem in c++. Given a file of 1000000 numbers in arbitrary order, I need to determine if there exist pairs of integers whose sum is t where t is each of [-10000, 10000]. So this basically the 2-SUM problem.
So, I coded up my solution in C++ wherein I used unordered_map as my hash table. I am ensuring low load on the hash table. But still this takes around 1hr 15mins to finish(successful). Now, I am wondering if it should be that slow. Further reducing the load factor did not give any considerable performance boost.
I have no idea where I can optimise the code. I tried different load factors, doesn't help. This is question from a MOOC and people have been able to get this done in around 30 mins using the same hash table approach. Can anybody help me make this code faster. Or at least give a hint as to where the code might be slowing down.
Here is the code -
#include <iostream>
#include <unordered_map>
#include <fstream>
int main(int argc, char *argv[]){
if(argc != 2){
std::cerr << "Usage: ./2sum <filename>" << std::endl;
exit(1);
}
std::ifstream input(argv[1]);
std::ofstream output("log.txt");
std::unordered_map<long, int> data_map;
data_map.max_load_factor(0.05);
long tmp;
while(input >> tmp){
data_map[tmp] += 1;
}
std::cerr << "input done!" << std::endl;
std::cerr << "load factor " << data_map.load_factor() << std::endl;
//debug print.
for(auto iter = data_map.begin(); iter != data_map.end(); ++iter){
output << iter->first << " " << iter->second << std::endl;
}
std::cerr << "debug print done!" << std::endl;
//solve
long ans = 0;
for(long i = -10000; i <= 10000; ++i){
//try to find a pair whose sum = i.
//debug print.
if(i % 100 == 0)
std::cerr << i << std::endl;
for(auto iter = data_map.begin(); iter != data_map.end(); ++iter){
long x = iter->first;
long y = i - x;
if(x == y)
continue;
auto search_y = data_map.find(y);
if(search_y != data_map.end()){
++ans;
break;
}
}
}
std::cout << ans << std::endl;
return 0;
}
On a uniform set with all sums equally probable, the below will finish in seconds. Otherwise, for any missing sums, on my laptop takes about 0.75 secs to check for a missing sum.
The solution has a minor improvement in comparison with the OP's code: checking for duplicates and eliminating them.
Then it opens through a Monte Carlo heuristic: for about 1% of the total numbers, randomly picks one from the set and searches for all the sums in the [minSum, maxSum] range that can be made having one term as the randomly picked number and the rest of them. This will pre-populate the sums set with... say... 'sum that can be found trivially'. In my tests, using 1M numbers generated randonly between -10M and 10M, this is the single step necessary and takes a couple of seconds.
For pathological numbers distributions, in which some of the sum values are missing (or have not been found through the random heuristic), the second part uses a targeted exhaustive search over the not-found sum values, very much on the same line as the solution in the OP.
Extra explanations for the random/Monte Carlo heuristic(to address #AneeshDandime's comment of):
Though i do not fully understand it at the moment
Well, it's simple. Think like this: the naive approach is to take all the input values and add them in pairs, but retain only the sum in the [-10k, 10k]. It is however terrible expensive (O[N^2]). An immediate refinement would be: pick a value v0, then determine which other v1 values stand a chance to give a sum in the [-10k, 10k] range. If the input values are sorted, it's easier: you only need to select v1-s in the [-10k-v0, 10k-v0]; a good improvement, but if you keep this as the only approach, an exhaustive search would still be O(log2(N)N[-10k, 10k]).
However, this approach still has its value: if the input values are uniformly distributed, it will quickly populate the known sums set with the most common values (and spend the rest of time trying to find infrequent or missing sum values).
To capitalize, instead of using this 'til the end, one can proceed with a limited number of steps, hope to populate the majority of the sums. After that, we can switch the focus and enter the 'targeted search for sum values', but only for the sum value not found at this step.
[Edited: prev bug corrected. Now the algo is stable in regards with values present multiple times or single occurrences in input]
#include <algorithm>
#include <vector>
#include <random>
#include <unordered_set>
#include <unordered_map>
int main() {
typedef long long value_type;
// +++++++++++++++++++++++++++++++++++++++++++++++++++++++
// substitute this with your input sequence from the file
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<value_type> initRnd(-5500, 10000000);
std::vector<value_type> sorted_vals;
for(ulong i=0; i<1000000; i++) {
int rnd=initRnd(gen);
sorted_vals.push_back(rnd);
}
std::cout << "Initialization end" << std::endl;
// end of input
// +++++++++++++++++++++++++++++++++++++++++++++++++++++++
// use some constants instead of magic values
const value_type sumMin=-10000, sumMax=10000;
// Mapping val->number of occurrences
std::unordered_map<value_type, size_t> hashed_vals;
for(auto val : sorted_vals) {
hashed_vals[val]=hashed_vals[val]++;
}
// retain only the unique values and sort them
sorted_vals.clear();
for(auto val=hashed_vals.begin(); val!=hashed_vals.end(); ++val) {
sorted_vals.push_back(val->first);
}
std::sort(sorted_vals.begin(), sorted_vals.end());
// Store the encountered sums here
std::unordered_set<int> sums;
// some 1% iterations, looking at random for pair of numbers which will contribute with
// sum in the [-10000, 10000] range, and we'll collect those sums.
// We'll use the sorted vector of values for this purpose.
// If we are lucky, most of the sums (if not all) will be already filled in
std::uniform_int_distribution<size_t> rndPick(0, sorted_vals.size());
size_t numRandomPicks=size_t(sorted_vals.size()*0.1);
if(numRandomPicks > 75000) {
numRandomPicks=75000;
}
for(size_t i=0; i<numRandomPicks;i++) {
// pick a value index at random
size_t randomIx=rndPick(gen);
value_type val=sorted_vals[randomIx];
// now search for the values between -val-minSum and -val+maxSum;
auto low=std::lower_bound(sorted_vals.begin(), sorted_vals.end(), sumMin-val);
if(low==sorted_vals.end()) {
continue;
}
auto high=std::upper_bound(sorted_vals.begin(), sorted_vals.end(), sumMax-val);
if(high==sorted_vals.begin()) {
continue;
}
for(auto rangeIt=low; rangeIt!=high; rangeIt++) {
if(*rangeIt!=val || hashed_vals[val] > 1) {
// if not the same as the randomly picked value
// or if it is the same but that value occurred more than once in input
auto sum=val+*rangeIt;
sums.insert(sum);
}
}
if(sums.size()==size_t(sumMax-sumMin+1)) {
// lucky us, we found them all
break;
}
}
// after which, if some sums are not present, we'll search for them specifically
if(sums.size()!=size_t(sumMax-sumMin+1)) {
std::cout << "Number of sums still missing: "
<< size_t(sumMax-sumMin+1)-sums.size()
<< std::endl
;
for(int sum=sumMin; sum<=sumMax; sum++) {
if(sums.find(sum)==sums.end()) {
std::cout << "looking for sum: " << sum ;
// we couldn't find the sum, so we'll need to search for it.
// We'll use the unique_vals hash map this time to search for the other value
bool found=false;
for(auto i=sorted_vals.begin(); !found && i!=sorted_vals.end(); ++i) {
value_type v=*i;
value_type other_val=sum-v;
if( // v---- either two unequal terms to be summed or...
(other_val != v || hashed_vals[v] > 1) // .. the value occurred more than once
&& hashed_vals.find(other_val)!=hashed_vals.end() // and the other term exists
) {
// found. Record it as such and break
sums.insert(sum);
found=true;
}
}
std::cout << (found ? " found" : " not found") << std::endl;
}
}
}
std::cout << "Total number of distinct sums found: " << sums.size() << std:: endl;
}
You can reserve the space earlier for unordered map. It should increase performance a bit
What about sorting the array first and then for each element in the array, use binary search to find the number which would make it closer to -10000 and keep going "right" until you reached a sum +10000
This way you will avoid going through the array 20000 times.

Print "*" up to n terms in pattern and its reverse

I have a problem that is asking for me to write a C++ program using for loops with less than 3 “cout” statements in your code to print the following pattern (ignore the pipes, the asterisks wouldn't appear without them):
|*
|***
|*****
|*******
|*********
|*********
|*******
|*****
|***
|*
This is my code I used for a fibonacci generator and I feel like it might be similar. I am able to print the "*" symbol but not in horizontal lines. What I need most help with is reversing the output. As in if given number n, I want the series to go n numbers into the series and then back down to 0.
#include <iostream>
using namespace std;
int main()
{
int y = 1, sum = 1, n;
cout << "Enter the number of terms you want" << endl;
cin >> n;
cout << "First " << n << " terms are :- " << endl;
for (int x = 0; x < n; x++) {
cout << "\n" <<endl;
for (int i = 0; i < sum; i++) {
cout << "*" << endl;
}
sum = y + 2;
y = sum;
}
}
It seems this is a homework, so I give some hints instead of a full solution.
For printing the *s in one line, please note that << endl will end the line in the output, i.e. print a line break. (The same does << "\n" by the way.) Not every cout statement has to have an << endl at its end.
For reversing the fibonacci sequence, once you have the last number in the variable sum, just do the reverse computation (i.e. subtraction). This could be done in a second set of loops, however, since you should not use cout statements too often, you better reuse the same loop by using some additional variable holding the current state (i.e. if you are counting up or down) and using an if to decide which computation to do. (I read the requirements such that only the cout statements for printing the pattern count to the "less than three" = 2)

Program will not output data to console when using a data input size greater than 30 million

I'm trying to make a program that will eventually show the runtime differences with large data inputs by using a binary search tree and a vector. But before I get to that, I'm testing to see if the insertion and search functions are working properly. It seems to be fine but whenever I assign SIZE to be 30 million or more, after about 10-20 seconds, it will only display Press any key to continue... with no output. However if I assign SIZE to equal to 20 million or less, it will output the search results as I programmed it. So what do you think is causing this problem?
Some side notes:
I'm storing a unique, (no duplicates) randomly generated value into the tree as well as the vector. So at the end, the tree and the vector will both have the exact same values. When the program runs the search portion, if a value is found in the BST, then it should be found in the vector as well. So far this has worked with no problems when using 20 million values or less.
Also, I'm using randValue = rand() * rand(); to generate the random values because I know the maximum value of rand() is 32767. So multiplying it by itself will guarantee a range of numbers from 0 - 1,073,741,824. I know the insertion and searching methods I'm using are inefficient because I'm making sure there are no duplicates but it's not my concern right now. This is just for my own practice.
I'm only posting up my main.cpp for the sake of simplicity. If you think the problem lies in one of my other files, I'll post the rest up.
Here's my main.cpp:
#include <iostream>
#include <time.h>
#include <vector>
#include "BSTTemplate.h"
#include "functions.h"
using namespace std;
int main()
{
const long long SIZE = 30000000;
vector<long long> vector1(SIZE);
long long randNum;
binarySearchTree<long long> bst1;
srand(time(NULL));
//inserts data into BST and into the vector AND makes sure there are no duplicates
for(long long i = 0; i < SIZE; i++)
{
randNum = randLLNum();
bst1.insert(randNum);
if(bst1.numDups == 1)//if the random number generated is duplicated, don't count it and redo that iteration
{
i--;
bst1.numDups = 0;
continue;
}
vector1[i] = randNum;
}
//search for a random value in both the BST and the vector
for(int i = 0; i < 5; i++)
{
randNum = randLLNum();
cout << endl << "The random number chosen is: " << randNum << endl << endl;
//searching with BST
cout << "Searching for " << randNum << " in BST..." << endl;
if(bst1.search(randNum))
cout << randNum << " = found" << endl;
else
cout << randNum << " = not found" << endl;
//searching with linear search using vectors
cout << endl << "Searching for " << randNum << " in vector..." << endl;
if(containsInVector(vector1, SIZE, randNum))
cout << randNum << " = found" << endl;
else
cout << randNum << " = not found" << endl;
}
cout << endl;
return 0;
}
(Comments reposted as answer at OP's request)
Options include: compile 64 bit (if you're not already - may make it better or worse depending on whether RAM or address space are the issue), buy more memory, adjust your operating system's swap memory settings (letting it use more disk), design a more memory-efficient tree (but at best you'll probably only get an order of magnitude improvement, maybe less, and it could impact other things like performance characteristics), redesign your tree so it manually saves data out to disk and reads it back (e.g. with an LRU).
Here's a how-to for compiling 64 bit on VC++: msdn.microsoft.com/en-us/library/9yb4317s.aspx

Finding the smallest number and largest number from a list of random numbers generated

I'm new here and this forum has been a great help! Unfortunately, I'm not able to find the answer to my issue here or anywhere else on the web. I was hoping some of you can give me some help or tips on how to go about this.
The program will generate random numbers based on the max limit and the amount of random numbers that will generate.
I'm also required to find the smallest, largest number, as well as the average from all the numbers generated in the loop. The average I can find using the sum/MAX_COUNT_NUM. Unfortunately, I am stuck finding the smallest and largest number. Been at this for the past 6 hours. Please help anyway you can. Thank you.
#include <iostream>
#include <cmath>
#include <stdlib.h>
#include <iomanip>
using namespace std;
int main(){
int UP_MAX, MAX_COUNT_NUM, RAND_NUM, MIN_COUNT_NUM;
cout << "This program creates random numbers" << "\n" << "\n";
cout << "Enter the upper limit of all generated random numbers: ";
cin >> UP_MAX;
cout << "\n" << "\n";
cout << "Enter the count of random numbers: ";
cin >> MAX_COUNT_NUM;
cout << "\n" << "\n";
cout << "Creating " << MAX_COUNT_NUM << " random numbers from 1 to " << UP_MAX << ": " << "\n" << "\n";
MIN_COUNT_NUM = 1;
int LARGE = 0;
int SMALL = 0;
for (; MAX_COUNT_NUM >= MIN_COUNT_NUM; MIN_COUNT_NUM++)
{
RAND_NUM = rand() % UP_MAX + 1;
cout << setw(8) << RAND_NUM;
if (RAND_NUM < SMALL)
{
SMALL = UP_MAX + 1;
}
if (RAND_NUM > LARGE)
{
LARGE = RAND_NUM;
}
}
Unfortunately, I need to do this without arrays and vectors. In my head, I'm thinking it should work as well but it doesn't. The largest number comes out fine, but the smallest comes out as 0 which makes me scratch my head.
I'm taking a beginners course and this got me stumped, so my way of thinking may be off beat. If there are any other tips you can provide, I definitely appreciate it.
The problem is with the initial values that you picked for LARGE and SMALL: you set both of them to 0, which is incorrect: you should set them both to the first random number that you generate.
Alternatively, you can set SMALL to the largest possible int, and LARGE to the smallest possible int. Use <limits> header and std::numeric_limits<int> class.
Note: SMALL = UP_MAX + 1; should be SMALL = RAND_NUM;