Comparing lines of data in the same file

Comparing lines of data in the same file - c++

I’m currently working on a project for my intro CS class. We are still pretty new to C++ and working with rudimentary concepts like while and for loops as well as file streams. The below problem is supposed to be resolved without resort to advanced features like arrays, vectors or functions.
Basically, I take a text file (FILE ONE) that has student and course data and create a new file. File one (where I’m inputting the data from) has 6k lines. Here’s an example below:
20424297 1139 CSCI 16000 W -1 3.00 RNL
20424297 1142 PSYCH 18000 W -1 3.00 RLA
20424297 1142 PSYCH 22000 W -1 3.00 RLA
20608974 1082 ENGL 12000 A- 3.7 3.00 RECR
20608974 1082 HIST 15200 B+ 3.3 3.00 FUSR
20608974 1082 PHILO 10100 A+ 4 3.00 FISR
See that very first column? Each unique set of numbers represents a student (also known as an eiD). File one is a giant list of every class a student took, and includes the subject, courses and grades they got.
The point of this project is to create a new text file that summarizes the GPAs of each student. That part I’m fairly confident I could figure out (taking cumulative GPA data). What confuses me is how I’m supposed to compare lines within the file to one another.
My professor did make things easy by having all the data grouped together by student. That lightens my load a little bit. I basically have to go through this file, line by line, and compare it with the next line to see if it has the same student ID number.
My first inclination was to create a series of nested while loops. The first loop would be active as long as data was being read. My next inclination was to repeat this in another loop. I would then create variables to hold the previous line’s student ID number and the current lines student ID number, creating conditions that would be active depending on whether or not they were the same or not:
while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2) // This loop will keep running until there's no data left
{
string eiD_base = eiD_2; // eiD_base was the variable I made to hold the "previous" student's ID, for comparison to the next line
while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2) // This loop unfortunately reads the entire file, defeating its intent
{
string eiD_temp = eiD_2; // eiD_temp was the variable I made to hold the current student ID, for comparison
if (eiD_base == eiD_temp)
{
outputStream2 << "Same line :( " << endl;
}
else
{
outputStream2 << eiD_2 << endl; // this is where you post the student data from the previous line!
}
}
}
After compiling and running the above, I came to the realization that this approach would not work because the second, nested loop, would run through every line in the FILE ONE without touching the first loop. I eventually figured out another method that used a counter instead:
// NOTE: The logic of the below code is as follows:
// Create a counter to note what the first student ID is.
// Store that value in eiD_Base when counter = 0. Increment counter.
// Now change eiD_Base everytime you find a line where eiD_temp
// differs from eiD_base.
string eiD_base;
string eiD_temp;
int counter = 0; // counter to help figure out what the first student ID was
while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2)
{
eiD_temp = eiD_2;
if (counter == 0)
{
eiD_base = eiD_2; // basically, set the first student ID to eiD_base when counter is 0. This counter is incremented only once.
counter++;
}
if (eiD_base == eiD_temp)
{
outputStream2 << "Same ID: " << eiD_2 << endl;
// NOTE: This is my first instinct as to where the code for calculating GPAs should go.
// The problem is that if that if the code is here, how do I factor in GPA data
// from a line that doesn't meet (eiD_base == eiD_temp)? I feel like that data would
// be jettisoned from calculations.
}
else
{
outputStream2 << "Previous ID: " << eiD_base << " and this is what eiD is now is now: " << eiD_temp << endl; // This is my first instict for
eiD_base = eiD_2; // if eiD_base !== eiD_temp, have eiD_base reset here.
}
}
That seemed closer to what I needed. However, I noticed another issue. With this method, when the variables I created to note changes in student id (eiD_base & eiD_temp) are not equal on a line of data, it seems like that line is jettisoned. Given that I need to calculate a number of things like GPA data for each student, having a method that doesn’t allow to accumulate data for the first line of a different student isn’t a good solution.
I don't know if I should dispense with the counter method entirely (in which case I would welcome recommendations of how best to replace it) or if my counter method is workable by placing the code for calculating GPAs more strategically. Any insight or help would be most welcome!

My answer style was my attempt of following: https://meta.stackexchange.com/questions/10811/how-do-i-ask-and-answer-homework-questions
A question you have is that you do not know if you should dispense with the counter method entirely (in which case you would welcome recommendations of how best to replace it) or if your counter method is workable by placing the code for calculating GPAs more strategically.
For the former, LiMuBei mentioned the method already. When you calculate more than one GPA (major gpa, gpa for just comp sci classes), you sum up the multiple GPA's with multiple variables.
For the latter, you would like to consider the unknown elements that vary the scenarios in each of the if/while statements. (counter == 0) is the scenario for the first line. (eiD_base == eiD_temp) is the scenario for the first line and the scenario when there are at least 2 lines, the current line has the same ID as the previous line. (eiD_base != eiD_temp) is the scenario when there are at least 2 lines, the current line has a different ID as the previous line. Here're the unknown elements: {1 line, at least 2 lines}, {sameID, differentID}. When the unknown element is {1 line}, you have to modify (counter == 0) and (eiD_base == eiD_temp). In (counter == 0), you modify the code that applies to the first and the only 1 line. In (eiD_base == eiD_temp), which applies to {1 line} and {at least 2 lines}, {sameID}, the code has to work for the 2 scenarios.
For the complete solution, you are going to declare variables before the while loop, aggregate variables in (eiD_base == eiD_temp), print the GPA values of the previous ID & set the variables for the first line of a new student in (eiD_base != eiD_temp), and print the GPA values of the last ID after the while loop.
double csci_Grape_Point;
// more variables for doing the calculation
while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2) {
eiD_temp = eiD_2;
if (counter == 0)
{
eiD_base = eiD_2;
counter++;
csci_Grape_Point = 0.0;
// more initialization of variables for doing the calculation
}
if (eiD_base == eiD_temp)
{
csci_Grape_Point = csci_Grape_Point + (gpa_2 * courseHours2);
// more sum calculation, such as total csci credit hours
}
else
{
outputStream2 << "Previous ID: " << eiD_base << " and this is what eiD is now is now: " << eiD_temp << endl;
eiD_base = eiD_2;
// for the previous ID, calculate gpa for just comp sci classes
// for the previous ID, calculate more gpa's
// set the variable to include the first line of data of a new student
csci_Grape_Point = (gpa_2 * courseHours2);
// set more variables for doing the calculation
}
}
// for the last ID, calculate gpa for just comp sci classes
// for the last ID, calculate more gpa's
Another question you have is about data calculation in (eiD_base == eiD_temp).
When a line doesn't meet (eiD_base == eiD_temp), the current line is different from the previous line. You factor in GPA data from the data you aggregate in (eiD_base == eiD_temp) and the data you set for the first line of a new student in (eiD_base != eiD_temp).
You probably want to solve a simpler problem first, with a file with 1 line and 2 lines, if the problem is not easily solved for you and you would like to attempt to do well in programming.

Related

Problem reading a formatted text file in C++

Officially my first post. I'm sure the Stack is full of answers, but the problem that I need help with is a little bit specific. So here goes nothing...
The Task:
I'm doing a small school project and in one part of my program I need to read the temperature measurements at different locations, all from a single formatted text file. The data inside the file is written as follows:
23/5/2016
Location 1
-7,12,-16,20,18,13,6
9/11/2014
Location 2
−1,3,6,10,8
9/11/2014
Location 3
−5,−2,0,3,1,2,−1,−4
The first row represents the date, second row the location and the third row represents the all the measurements the were taken on that day (degrees Celsius).
The code that I wrote for this part of the program looks something like this:
tok.seekg(0, std::ios::beg);
int i = 0;
double element;
char sign = ',';
while (!tok.eof()) {
vector_measurements.resize(vector_measurements.size() + 1);
tok >> vector_measurements.at(i).day >> sign >> vector_measurements.at(i).month >> sign >> vector_measurements.at(i).year >> std::ws;
std::getline(tok, vector_measurements.at(i).location);
sign = ',';
while (tok && sign == ',') {
tok >> element;
vector_measurements.at(i).measurements.push_back(element);
sign = tok.get();
}
if (!tok.eof() && !tok) {
tok.clear();
break;
}
vector_measurements.at(i).SetAverage();
i++;
}
The code that I'm presenting is linked to a class:
struct Data {
std::string location;
std::vector<int> measurements;
int day, month, year;
double average = 0;
void SetAverage();
int GetMinimalTemperature();
int GetMaximalTemperature();
};
I've already checked and confirmed that the file exists and the stream is opened in the correct mode without any errors; all class methods working as intended. But here's the problem. Later on, after the data is sorted (the part of data that has been successfully read), it fails to correctly print the data on the screen. I get something like:
Location 2
Date: 9/11/2014
Minimal temperature: 0
Maximal temperature: 0
Average temperature: 0
Location 1
Date: 23/5/2016
Minimal temperature: -16
Maximal temperature: 20
Average temperature: 6.57143
; but I expect:
Location 3
----------
Date: 9/11/2014
Minimal temperature: -5
Maximal temperature: 3
Average temperature: -0.75
Location 2
----------
Date: 9/11/2014
Minimal temperature: -1
Maximal temperature: 10
Average temperature: 5.20
Location 1
----------
Date: 23/5/2016
Minimal temperature: -16
Maximal temperature: 20
Average temperature: 6.57143
The Problem:
The order of the locations is good, since I'm sorting from the lowest to the highest average temperature. But no matter the number of locations, the first location is always correct, the second one only has zero's, and every other location isn't even printed on the screen.
What do I need to change in order for my program to read the data properly? Or am I just missing something? Forgive me for any spelling mistakes I made since English isn't my native language. Thank you all in advance, any help is appreciated!

So the issue is there is some garbage in your text file. I do believe these are \0 characters, but I am not sure. They present themselves as ? characters in Atom text editor.
You're quite lucky StackOverflow didn't sanitize them, otherwise, nobody would be able to help you.
After I cleaned up the text file, your code works. You just need to also kill the loop and drop the last item when the file ends, I did it like this. It's not optimal but it works.
while (!tok.eof())
{
vector_measurements.resize(vector_measurements.size() + 1);
Data& currentItem = vector_measurements[i];
tok >> currentItem.day >> sign >> currentItem.month >> sign >> currentItem.year >> std::ws;
// If the file ends, the data is invalid and the last item can be thrown away
if (tok.eof())
{
vector_measurements.pop_back();
break;
}
std::getline(tok, currentItem.location);
sign = ',';
while (tok && sign == ',')
{
tok >> element;
currentItem.measurements.push_back(element);
sign = tok.get();
}
if (!tok.eof() && !tok)
{
tok.clear();
break;
}
currentItem.SetAverage();
i++;
}
Please inspect your file with hex editor and observe the weird characters, then figure out how to get rid of them.

Time limit exceeded on test 10 code forces

hello i am a beginner in programming and am in the array lessons ,i just know very basics like if conditions and loops and data types , and when i try to solve this problem.
Problem Description
When Serezha was three years old, he was given a set of cards with letters for his birthday. They were arranged into words in the way which formed the boy's mother favorite number in binary notation. Serezha started playing with them immediately and shuffled them because he wasn't yet able to read. His father decided to rearrange them. Help him restore the original number, on condition that it was the maximum possible one.
Input Specification
The first line contains a single integer n (1⩽n⩽105) — the length of the string. The second line contains a string consisting of English lowercase letters: 'z', 'e', 'r', 'o' and 'n'.
It is guaranteed that it is possible to rearrange the letters in such a way that they form a sequence of words, each being either "zero" which corresponds to the digit 00 or "one" which corresponds to the digit 11.
Output Specification
Print the maximum possible number in binary notation. Print binary digits separated by a space. The leading zeroes are allowed.
Sample input:
4
ezor
Output:
0
Sample Input:
10
nznooeeoer
Output:
1 1 0
i got Time limit exceeded on test 10 code forces and that is my code
#include <iostream>
using namespace std;
int main()
{
int n;
char arr[10000];
cin >> n;
for (int i = 0; i < n; i++) {
cin >> arr[i];
}
for (int i = 0; i < n; i++) {
if (arr[i] == 'n') {
cout << "1"
<< " ";
}
}
for (int i = 0; i < n; i++) {
if (arr[i] == 'z') {
cout << "0"
<< " ";
}
}
}

Your problem is a buffer overrun. You put an awful 10K array on the stack, but the problem description says you can have up to 100K characters.
After your array fills up, you start overwriting the stack, including the variable n. This makes you try to read too many characters. When your program gets to the end of the input, it waits forever for more.
Instead of putting an even more awful 100K array on the stack, just count the number of z's and n's as you're reading the input, and don't bother storing the string at all.

According to the compromise (applicable to homework and challenge questions) described here
How do I ask and answer homework questions?
I will hint, without giving a code solution.
In order to fix TLEs you need to be more efficient.
In this case I'd start by getting rid of one of the three loops and of all of the array accesses.
You only need to count two things during input and one output loop.

Reading from a text file properly

I am reading string from a line in a text file and for some reason the the code will not read the whole text file. It reads to some random point and then stops and leaves out several words from a line or a few lines. Here is my code.
string total;
while(file >> word){
if(total.size() <= 40){
total += ' ' + word;
}
else{
my_vector.push_back(total);
total.clear();
}
Here is an example of a file
The programme certifies that all nutritional supplements and/or ingredients that bear the Informed-Sport logo have been tested for banned substances by the world class sports anti-doping lab, LGC. Athletes choosing to use supplements can use the search function above to find products that have been through this rigorous certification process.
It reads until "through" and leaves out the last four words.
I expected the output to be the whole file. not just part of it.
This is how I printed the vector.
for(int x = 0; x< my_vector.size(); ++x){
cout << my_vector[x];
}

You missed two things here:
First: in case when total.size() is not <= 40 i.e >40 it moves to else part where you just update your my_vector but ignore the current data in word which you read from the file. You actually need to to update the total after total.clear().
Second: when your loop is terminated you ignore the data in word as well. you need to consider that and push_back()in vector (if req, depends on your program logic).
So overall you code is gonna look like this.
string total;
while(file >> word)
{
if(total.size() <= 40)
{
total += ' ' + word;
}
else
{
my_vector.push_back(total);
total.clear();
total += ' ' + word;
}
}
my_vector.push_back(total);//this step depends on your logic
//that what u actually want to do

Your loop finishes when the end of file is read. However at this point you still have data in total. Add something like this after the loop:
if(!total.empty()) {
my_vector.push_back(total);
}
to add the last bit to the vector.

There are two problems:
When 40 < total.size() only total is pushed to my_vector but the current word is not. You should probably unconditionally append the word to total and then my_vector.push_back(total) if 40 < total.size().
When the loop terminated you still need to push_back() the content of total as it may not have reached a size of more than 40. That is, if total is no-empty after the loop terminated, you still need to append it to my_vector.

Attempting to create a queue - issue with crashing C++

For my data structures course I have to create a queue that takes input from a .dat file, and organizes it based on high priority (ONLY if it's 1) and low priority (2 3 4 or 5). There must be two queues, * indicates how many to service (or remove). The .dat file looks like:
R 3
T 5
W 1
A 4
* 3
M 5
B 1
E 1
F 2
C 4
H 2
J 1
* 4
* 1
D 3
L 1
G 5
* 9
=
Here's the main.cpp
int main ()
{
arrayQueue myHigh; //creates object of arrayQueue
arrayQueue myLow; //creates another object of arrayQueue
while(previousLine != "=") //gets all the lines of file, ends program when it gets the line "="
{
getline(datfile, StringToChar);
if (StringToChar != previousLine)
{
previousLine=StringToChar; //sets previousline equal to a string
number = StringToChar[2]; //the number of the data is the third line in the string
istringstream ( number ) >> number1; //converts the string to int
character = StringToChar[0]; //the character is the first line in the string
}
if (number1 == 1) //if number is 1, sends to high priority queue
myHigh.addToQueue(number1);
else if (number1 == 2 || number1 == 3 || number1 == 4 || number1 == 5) //if number is 2 3 4 or 5 sends to low priority queue
myLow.addToQueue(number1);
}
datfile.close();
system ("pause");
}
And here's the array class:
void arrayQueue::addToQueue(int x)
{
if (full() == true)
cout << "Error, queue full \n";
else {
fill = (fill+1)%maxSize;
queueArray[fill] = x;
cout << x << endl; //testing that number is actually being passed through
count++;
size++;
}
}
However, the output that I get is just:
3
5
and then it crashes with no error.
I'm not sure where I should go, I haven't created two objects of a class OR used a file to read data before in C++. Did I do that correctly? I think it's just feeding 3 and 5 into the high priority queue, even though it's not supposed to do that.

Because output is typically buffered you may not be seeing all of the output before your program crashes. From my examination of your code, I would expect it to crash when it reaches the last line of the input file, because StringToChar is of length 1 and you are accessing the StringToChar[2]. Well, maybe not crash, but certainly get garbage. I'm not sure if string would raise an exception.

Your processing of the read lines is certainly not quite right. First of all, you don't check whether you could successfully read a line but input should always be checked after you attempted to read it. Also, if the input is = you actually treat the value as if it is a normal line. Your basic input should probably look something like this:
while (std::getline(datFile, StringToChar) && StringToChar != "=") {
...
}
Given that your "string" number actually contains exactly one character, it is a little bit of overkill to create an std::istringstream (creating these object is relatively expensive) and decode a char converted to an std::string. Also, you actually need to check whether this operation was successful (for your last line, for example, it fails).
Converting a single char representing a digit to a string can be done using something like this:
if (3 <= StringToChar.size()
&& std::isdigit(static_cast<unsigned char>(StringToChar[2])) {
number1 = StringToChar[2] - '0';
}
else {
std::cout << "the string '" << StringToChar << "' doesn't have a digit at position 2\n";
continue;
}

I think "adipy" is close, but...
getline(datfile, StringToChar);
First, you should check the return value to make sure a string was returned.
Second, if we assume that StringToChar equals =, then
(StringToChar != previousLine) is true.
Then StringToChar[2];, <<<<< access violation. array is only two characters long.
Also, you might be trying to enter the last previousLine twice.

Check for every rugby score the recursive way without repetitions

Just for fun I created an algorithm that computes every possible combination from a given rugby score (3, 5 or 7 points). I found two methods : The first one is brute force, 3 imbricated for loops. The other one is recursion.
Problem is some combinations appear multiple times. How can I avoid that ?
My code :
#include <iostream>
using namespace std;
void computeScore( int score, int nbTryC, int nbTryNC, int nbPenalties );
int main()
{
int score = 0;
while (true)
{
cout << "Enter score : ";
cin >> score;
cout << "---------------" << endl << "SCORE = " << score << endl
<< "---------------" << endl;
// Recursive call
computeScore(score, 0, 0, 0);
}
return 0;
}
void computeScore( int score, int nbTryC, int nbTryNC, int nbPenalties )
{
const int tryC = 7;
const int tryNC = 5;
const int penalty = 3;
if (score == 0)
{
cout << "* Tries: " << nbTryC << " | Tries NT: " << nbTryNC
<< " | Penal/Drops: " << nbPenalties << endl;
cout << "---------------" << endl;
}
else if (score < penalty)
{
// Invalid combination
}
else
{
computeScore(score - tryC, nbTryC+1, nbTryNC, nbPenalties);
computeScore(score - tryNC, nbTryC, nbTryNC+1, nbPenalties);
computeScore(score - penalty, nbTryC, nbTryNC, nbPenalties+1);
}
}

One way to think about this is to realize that any time you have a sum, you can put it into some "canonical" form by sorting all the values. For example, given
20 = 5 + 7 + 3 + 5
You could also write this as
20 = 7 + 5 + 5 + 3
This gives a few different options for how to solve your problem. First, you could always sort and record all of the sums that you make, never outputting the same sum twice. This has the problem that you're going to end up repeatedly generating the same sums multiple different times, which is extremely inefficient.
The other (and much better) way to do this is to update the recursion to work in a slightly different way. Right now, your recursion works by always adding 3, 5, and 7 at each step. This is what gets everything out of order in the first place. An alternative approach would be to think about adding in all the 7s you're going to add, then all the 5's, then all the 3's. In other words, your recursion would work something like this:
Let kValues = {7, 5, 3}
function RecursivelyMakeTarget(target, values, index) {
// Here, target is the target to make, values are the number of 7's,
// 5's, and 3's you've used, and index is the index of the number you're
// allowed to add.
// Base case: If we overshot the target, we're done.
if (target < 0) return;
// Base case: If we've used each number but didn't make it, we're done.
if (index == length(kValues)) return;
// Base case: If we made the target, we're done.
if (target == 0) print values; return;
// Otherwise, we have two options:
// 1. Add the current number into the target.
// 2. Say that we're done using the current number.
// Case one
values[index]++;
RecursivelyMakeTarget(target - kValues[index], values, index);
values[index]--;
// Case two
RecursivelyMakeTarget(target, values, index + 1);
}
function MakeTarget(target) {
RecursivelyMakeTarget(target, [0, 0, 0], 0);
}
The idea here is to add in all of the 7's you're going to use before you add in any 5's, and to add in any 5's before you add in any 3's. If you look at the shape of the recursion tree that's made this way, you will find that no two paths end up trying out the same sum, because when the path branches either a different number was added in or the recursion chose to start using the next number in the series. Consequently, each sum is generated exactly once, and no duplicates will be used.
Moreover, this above approach scales to work with any number of possible values to add, so if rugby introduces a new SUPER GOAL that's worth 15 points, you could just update the kValues array and everything would work out just fine.
Hope this helps!

Each time you find a solution you could store it in a dictionary ( a set of strings for example, with strings looking like "TC-TNT-P" )
Before printing a solution you verify it was not in the dictionary.

A nested for-loop is the natural way to do this. Using recursion is just silly (as you seem to have discovered).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Comparing lines of data in the same file - c++

Related

Problem reading a formatted text file in C++

Time limit exceeded on test 10 code forces

Reading from a text file properly

Attempting to create a queue - issue with crashing C++

Check for every rugby score the recursive way without repetitions

Categories

Resources