Strange Behavior in C++ Input/Output When Parsing Large Integer Input

Strange Behavior in C++ Input/Output When Parsing Large Integer Input - c++

I have the following piece of code:
#include <iostream>
using namespace std;
int main() {
// Number of inputs
int N;
scanf("%d", &N);
// Takes in input and simply outputs the step it's on
for (int i = 0; i < N; i++) {
int Temp;
scanf("%d", &Temp);
printf("%d ", i);
}
}
When taking in a large amount of integer input, C++ stops at a certain point in printing output, seemingly waiting for more input to come.
Given an input of 2049 1's, the program stops after printing 2048 integers (0 up to 2047), and does not print the final 2048 (the 2049th integer). 2048 looks suspicious, being a power of 2.
It seems to be the case that the larger the input values, the quicker the program decides to stop, and in this case after what looks like a random number of steps. For example, I gave it 991 integers (up to the ten thousands), and the program stopped outputting after iteration 724.
Note that I copied and pasted the numbers as a whole block, rather than typing and entering them one by one, but I doubt this plays a role. I also tried cin and cout, but they did not help.
Could someone please explain the reasons behind this phenomenon?

I have found the answer to my question. The reason behind the failure is indeed due to copying and pasting large chunks of input, as many have suggested, and I thank everyone for their help. There were no incorrect characters, though, and cause of this problem is instead the 4096 character limit posed by canonical mode.
In canonical mode, the terminal lets the user navigate the input, using arrow keys, backspace, etc. It sends the text to the processor only when there is a newline or the buffer is full. The size of this buffer being 4096 characters, it becomes clear why the code fails to parse more input than that, i.e. 2049 "1 "s is 4098 characters. One can switch to noncanonical mode, which allows larger input at the expense of not being able to navigate it, using stty -icanon. Entering stty icanon takes it back to canonical mode.
Practically speaking, entering the input with newlines separating the numbers seems like the easiest fix.
This source was quite helpful to me: http://blog.chaitanya.im/4096-limit.
This post on unix stack exchange is similar to my problem: https://unix.stackexchange.com/questions/131105/how-to-read-over-4k-input-without-new-lines-on-a-terminal.

My first thought was that you're reaching some sort of barrier on the input side...
The limit for the length of a command line is not [typically] imposed by the shell, but by the operating system.
Bash Command Line and Input Limit - SO
However, this is probably not the case.
First, focus on your data, make sure your data is what you think it is (no unexpected characters) and then try to debug your reads, make sure the values are making it into memory like you intend.
Try separating out your read and write into two loops, this might help your debugging a little easier depending on your skill level, but again, making sure something funky isn't going on with your reads. Suspicion is high with the reads on this one...
Here's a couple of cracks at it below... haven't tested. Hope this helps!
#include <iostream>
int main() {
int N;
std::cin >> N;
// Read N integers, storing to array
int* numbers = new int[N];
for (int i = 0; i < N; i++) {
std::cin >> numbers[i];
}
// Print
for (int i = 0; i < N; i++) {
std::cout << numbers[i] << " ";
}
std::cout << std::endl;
// Free the dynamically allocated memory
delete[] numbers;
return 0;
}
Okay... maybe a little more optimized...
#include <iostream>
int main() {
int N;
std::cin >> N;
// fixed-size on the stack
int numbers[N];
// cin.tie(nullptr) and ios::sync_with_stdio(false) might improve perf.
std::cin.tie(nullptr);
std::ios::sync_with_stdio(false);
// Read N integers, storing to array
for (int i = 0; i < N; i++) {
std::cin >> numbers[i];
}
// Print
for (int i = 0; i < N; i++) {
std::cout << numbers[i] << " ";
}
std::cout << std::endl;
return 0;
}

Related

Segmentation fault when reading text file

I am trying to run a code sorting the ages of Titanic survivors from a text file. It compiles fine, but the program terminates simply saying "segmentation fault" when I choose option B (option A is not written yet.)
Here is a small sample of the text file for reference.
29 1stClass TRUE
0.9 1stClass TRUE
2 1stClass FALSE
30 1stClass FALSE
I've isolated the error to the chunk where the file is processed (//actual processing), but I'm not sure what exactly is wrong.
#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <fstream>
#include <ctype.h>
void sortSurvivors();
void sortAgesLiving();
int main()
{
char options;
std::cout << "Titanic Data \nOptions \nA) Display count of people who lived and died... \nB) Display count of people who lived by age... \nPlease select option (A-B)...";
std::cin >> options;
switch (options)
{
case 'A':
sortSurvivors();
break;
case 'B':
sortAgesLiving();
break;
}
}
void sortSurvivors()
{
}
void sortAgesLiving()
{
std::ifstream inputFile;
std::string filename = "TitanicData.txt";
std::string age;
std::string classBoat;
std::string survival;
bool survived;
int eldest = 0;
//pre-sort processing
while (inputFile >> age >> classBoat >> survival)
{
int ageConv = stoi(age);
//G is for the ghetto fix I am pulling here, because I recieve an error when using "TRUE" as a string
char gchar = 'G';
survival += gchar;
if (survival == "TRUEG")
{
survived = true;
}
else
{
survived = false;
}
if (eldest < ageConv)
{
eldest = ageConv;
}
}
//initialize vector
std::vector<int> survivorVector;
for (int i = 0; i < eldest; i++)
{
survivorVector.push_back(0);
}
inputFile.open(filename);
//actual processing (ERROR HERE)
if (inputFile)
{
while (inputFile >> age >> classBoat >> survival)
{
int ageConv = stoi(age);
if (survived = true)
{
survivorVector[ageConv] = survivorVector[ageConv] + 1;
}
for (int j = 0; j <= eldest; j++)
{
std::cout << j << "\t" << survivorVector[j] << "\n";
}
}
// Close the file.
inputFile.close();
}
else
{
std::cout << "I don't know what broke, but uhhhhhhhhhh oops.";
}
}
As per usual I'm sure it's something dumb I overlooked.

In sortAgesLiving(), you have forgotten to open the file before starting your pre-sort processing. As a consequence, your first reading loop will fail to read anything at all. Therefore eldest will stay 0.
You then construct a vector and populate it. But since the loop is based on eldest, the vector survivorVector will stay empty.
When you finally open the file and read it, the first line will be considered as a survivor since you accidentally overwrite the boolean with true (i.e. if (survived = true) instead of if (survived == true) or simply if (survived). You'll then try to access the vector out of bounds.
Even if you correct this error, at the first survivor you'll get again out of bounds. Accessing a vector out of bounds is UB and one of the many possible symptoms can be segmentation fault.
Miscellaneous advices (not related to your issues):
You have an ambiguous age of 0.9. Converting it to an int will cause it to be 0. Is this ok, or do you need to round this up?
If it's rounding up, you could make the age variable a double and read it directly without conversion. You could then convert it mathematically to an integer age rounding it up or truncating it, as needed. If you're sure to have only integers, you could make the variable an int and not worry at all.
It is unsafe to trust a value in a file to directly index a vector. What if between the two reading phases, an additional line would have been added by someone else to the file with a value higher than eldest ? What if the value read would be negative? Better always check that it's in an acceptable range before using a value as an index. It can save you hours of debugging and your customers some nightmares.
Finally, the two-phase read is not necessary: you could just read the age, and after having checked that it's positive and smaller than 150 years (quite optimistic), you could, if needed, resize your vector if the age is equal or larger than the current vector size. Why? Imagine you work one day for US census with files having millions of lines: the fewer passes over the file, the better ;-)

C++ Creating a variable sized array from ifstream

Just a heads up: My c++ programming skills and terminology is intermediate at best. So please be gentle ;).
I am working on a multi-sort algorithm for a college class. Originally, I built the program to take in an array of 20 integers, since that was as big as the .txt files were. The final lab is now asking to take in files that have 10, 100, 1000, 10000, 100000 and 1000000 different numbers. I originally used an ifstream inside a for loop to read in the ints. Now that I need to read a variable amount of ints from a file, I have run into issues with this code. I have extensively searched this site and Google to find an answer to this problem. I have tried dozens of different code snippets, to no avail. Here is the code I am currently running that works for 20 ints.
int i;
int A[20];
int length;
char unsortedFilename[200];
ifstream unsorted;
cout << "Please type the full name of the file you would like sorted.\n* ";
cin >> unsortedFilename;
unsorted.open(unsortedFilename);
length = (sizeof(A) / sizeof(*A));
for( i = 0; i < length; i++ )
{
unsorted >> A[i];
cout << A[i] << "\n";
}
insertionSort();
I do have other code mixed in there, but it's error checking, selection of duplicate number removal, etc. I would like it so that code like this would run "i" number of times, where "i" is actually the number of ints in the file. Also, as I mentioned earlier, I will need to input a file that has 1,000,000 numbers in it. I don't believe that an int array will be able to hold that many numbers. Is it going to be as easy as swapping all my ints over to longs?
Thanks for any help you could provide.

As suggested in the comments, use std::vector<int> instead of an array.
Instead of a for loop, use a while loop. Break out of the while loop when there are no numbers to read.
The while loop:
std::vector<int> A;
int item;
while ( unsorted >> item )
{
A.push_back(item);
}
You can sort the std::vector by using std::vector::iterator or simply access the data through the int* returned by A.data().

You can simply read all the numbers into a vector. Then use the vector as you would have used the array.
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
int main()
{
std::string unsortedFilename;
std::cout << "Please type the full name of the file you would like sorted.\n";
std::cin >> unsortedFilename;
std::ifstream is(unsortedFilename);
std::istream_iterator<int> start(is), end;
std::vector<int> A(start, end);
std::cout << "Read " << A.size() << " numbers" << std::endl;
}

What you want is a vector.
try this,
int i;
vector<int> A;
int length;
string unsortedFilename;
ifstream unsorted;
cout << "Please type the full name of the file you would like sorted.\n* ";
cin >> unsortedFilename;
unsorted.open(unsortedFilename);
int temp;
for( i = 0; unsorted >> temp; i++ )
{
A.push_back(temp);
cout << A[i] << "\n";
}
insertionSort();
A vector is basically a dynamic array. It automatically grows as more space is needed. That way it doesn't matter if you have 10, 100, or even 100000 items, it'll automatically grow for you.
Also use a string for your file name, some file names are longer than 200 characters.
Good Luck!

will need to input a file that has 1,000,000 numbers in it. I don't believe that an int array will be able to hold that many numbers.
Sure it can. 1 Million ints is ~4Mb of memory, which is a trivial amount. You can even declare it static just as you do now int A[1000000];.
But real problem is that you're assuming a fixed length in your code, rather than determine the length from the input. I guess this is what your assignment is trying to teach you, so I won't show you the solution. But consider using ifstream::eof and make your sort accept the length as an argument...

Buffered input versus standard input

I was trying to read a long list of numbers (Around 10^7) from input file. Through some searching I found that reading the contents using buffer gives more performance when compared to reading the number one by one.
My second program is performing better than the first program. I am using a cin stream object in the first program and stringstream object in the second program. What is the difference between these two in terms of I/O performance?
#include <iostream>
using namespace std;
int main()
{
int n,k;
cin >> n >> k;
int count = 0;
while ( n-- > 0 )
{
int num;
cin >> num;
if( num % k == 0 )
count++;
}
cout << count << endl;
return 0;
}
This program is taking a longer time when compared to the following code using buffered input.
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
cin.seekg(0, cin.end);
int length = cin.tellg();
cin.seekg(0, cin.beg);
char *buffer = new char[length];
cin.read(buffer,length);
stringstream ss(buffer);
int n,k;
ss >> n >> k;
int result = 0;
while( n-- )
{
int num;
ss >> num;
if( num % k == 0 )
result++;
}
cout << result << endl;
return 0;
}

The second one will require ~twice the file's size in memory, otherwise, since it reads the entire file in one call, it will likely read data into memory as fast as the underlying storage can feed it, and then process it as fast as the CPU can do so.
It'd be good to avoid the memory cost, and in that respect, your first program is better. On my system, using an input called test.txt that looks like:
10000000 2
13
13
< 10000000-2 more "13"s. >
and your first program called a, and your second called b. I get:
% time ./a <test.txt
0
./a < test.txt 1.70s user 0.01s system 99% cpu 1.709 total
% time ./b <test.txt
0
./b < test.txt 0.76s user 0.04s system 100% cpu 0.806 total
cin is not buffered by default, to keep "synchronized" with stdio. See this excellent answer for a good explanation. To make it buffered, I added cin.sync_with_stdio(false) to the top of your first program, and called the result c, which runs perhaps slightly faster:
% time ./c <test.txt
0
./c < test.txt 0.72s user 0.01s system 100% cpu 0.722 total
(Note: the times waffle around a bit, and I only ran a few tests, but c seems to be at least as fast as b.)
Your second program runs quickly because while not buffered, we can just issue one read call. The first program must issue a read call for each cin >>, whereas the third program can buffer (issue a read call every now and then).
Note that adding this line means you can't read from stdin using the C FILE * by that name, or call any library methods that would do so. In practice, this is likely to not be an issue.

Getting multiple lines of input in C++

The first line contains an integer n (1 ≤ n ≤ 100). Each of the following n lines contains one word. All the words consist of lowercase Latin letters and possess the lengths of from 1 to 100 characters.
(Source: http://codeforces.com/problemset/problem/71/A)
How would you get input from the user given n? I tried using a while loop but it doesn't work:
#include <iostream>
using namespace std;
int main()
{
int n;
cin>>n;
int i;
while (i<=n) {
cin>>i ;
i++;
}
}

You probably meant to have something like:
#include <iostream>
int main() {
int n;
cin>>n;
int theInputNumbers[n];
for(int i = 0; i<n; ++i) {
cin >> theInputNumbers[i];
}
}

Your loop is really quite far off of what you need. What you wrote is extremely wrong such that I cannot provide advice other than to learn the basics of loops, variables, and input. The assistance you need is beyond the scope of a simple question/answer, you should consider buying a book and working through it cover to cover. Consider reading Programming Principles and Practice Using C++
Here is a working example of something approximating your question's requirements. I leave file input and output as an exercise up to you. I also make use of C++11's front and back std::string members. You would have to access via array index in older versions.
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main(){
int totalWords;
cin >> totalWords;
stringstream finalOutput;
for (int i = 0; i < totalWords; ++i){
string word;
cin >> word;
if (word.length() > 10){
finalOutput << word.front() << (word.length() - 2) << word.back();
}else{
finalOutput << word;
}
finalOutput << endl;
}
cout << endl << "_____________" << endl << "Output:" << endl;
cout << finalOutput.str() << endl;
}
With that said, let me give you some advice:
Name your variables meaningfully. "int i" in a for loop like I have above is a common idiom, the "i" stands for index. But typically you want to avoid using i for anything else. Instead of n, call it totalWords or something similar.
Also, ensure all variables are initialized before accessing them. When you first enter your while loop i has no defined value. This means it could contain anything, and, indeed, your program could do anything as it is undefined behavior.
And as an aside: Why are you reading into an integer i in your example? Why are you then incrementing it? What is the purpose of that? If you read in input from the user, they could type 0, then you increment by 1 setting it to 1... The next iteration maybe they'll type -1 and you'll increment it by 1 and set it to 0... Then they could type in 10001451 and you increment by 1 and set it to 10001452... Do you see the problem with the logic here?
It seems like you are trying to use i as a counter for the total number of iterations. If you are doing this, do not also read input into i from the user. That completely undermines the purpose. Use a separate variable as in my example.

infinite loop in c++ [duplicate]

This question already has answers here:
Infinite loop with cin when typing string while a number is expected
(4 answers)
Closed 3 years ago.
I'm learning C++ and writing little programs as I go along. The following is one such program:
// This program is intended to take any integer and convert to the
// corresponding signed char.
#include <iostream>
int main()
{
signed char sch = 0;
int n = 0;
while(true){
std::cin >> n;
sch = n;
std::cout << n << " --> " << sch << std::endl;
}
}
When I run this program and keep inputs at reasonably small absolute values, it behaves as expected. But when I enter larger inputs, e.g., 10000000000, the program repetitively spits out the same output. Some combinations of input cause erratic behavior. For example:
#: ./int2ch
10
10 -->
10000000000
10 -->
10 -->
10 -->
10 -->
The program spits out "10 --> " until it's killed. (With this particular sequence of inputs, the program's output changes speed erratically.) I also noticed that the output of large values is determined by the previous legal input as well as the value of the current illegal input.
What's going on? (I don't care about fixing the program, that's easy. I want to understand it.)

Basically your cin stream is in a fail state and thus returns immediately when you try to read it. Rewrite your example like this:
#include <iostream>
int main()
{
signed char sch = 0;
int n = 0;
while(std::cin >> n){
sch = n;
std::cout << n << " --> " << sch << std::endl;
}
}
cin >> n will return a reference to cin, which you can test for "good-ness" in a conditional. So basically the the "while(std::cin >> n)" is saying "while i could still read from standard input successfully, do the following"
EDIT: the reason it repeatedly output the last good value entered is because that was the last value successfully read in n, the failed reads won't change the value of n
EDIT: as noted in a comment, you can clear the error state and try again something like this would probably work and just ignore bad numbers:
#include <iostream>
#include <climits>
int main() {
signed char sch = 0;
int n = 0;
while(true) {
if(std::cin >> n) {
sch = n;
std::cout << n << " --> " << sch << std::endl;
} else {
std::cin.clear(); // clear error state
std::cin.ignore(INT_MAX, '\n'); // ignore this line we couldn't read it
}
}
}

Yes, Evan Teran pointed out most things already. One thing i want to add (since i cannot comment his comment yet :)) is that you must put the call to istream::clear before the call to istream::ignore. The reason is that istream::ignore likewise will just refuse to do anything if the stream is still in the fail state.

Given that you are on a 32 bit machine, 10000000000 is too big a number to be represented by an int. Also converting an int to a char will only give you from 0..255 or -128..127 depending on the compiler.

One problem here is that a char has a size of one byte, and thus can only hold a number between -127 and 128. An int on the other hand, is typically 4 bytes, and can take on much larger values. Second problem is that you are inputting a value that is too large even for an int.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js