Reading large strings in c++ [duplicate] - c++

I am trying to read in a string of length in 10^5 order. I get incorrect string if the size of string grows beyond 4096.
I am using the following code
string a;
cin>>a;
This didn't work then I tried reading character by character by following code
unsigned char c;
vector<unsigned char> a;
while(count>0){
c = getchar();
a.push_back(c);
count--;
}
I have done necessary escaping for using getchar this also had the 4096 bytes problem. Can someone suggest a workaround or point to correct way of reading it.

It is because your terminal inputs are buffered in the I/O queue of the kernel.
Input and output queues of a terminal device implement a form of buffering within the kernel independent of the buffering implemented by I/O streams.
The terminal input queue is also sometimes referred to as its typeahead buffer. It holds the characters that have been received from the terminal but not yet read by any process.
The size of the input queue is described by the MAX_INPUT and _POSIX_MAX_INPUT parameters;
By default, your terminal is in Canonical mode.
In canonical mode, all input stays in the queue until a newline character is received, so the terminal input queue can fill up when you type a very long line.
We can change the input mode of terminal from canonical mode to non-canonical mode.
You can do it from terminal:
$ stty -icanon (change the input mode to non-canonical)
$ ./a.out (run your program)
$ stty icanon (change it back to canonical)
Or you can also do it programatically,
To change the input mode programatically we have to use low level terminal interface.
So you can do something like:
#include <iostream>
#include <string>
#include <stdio.h>
#include <termios.h>
#include <unistd.h>
int clear_icanon(void)
{
struct termios settings;
int result;
result = tcgetattr (STDIN_FILENO, &settings);
if (result < 0)
{
perror ("error in tcgetattr");
return 0;
}
settings.c_lflag &= ~ICANON;
result = tcsetattr (STDIN_FILENO, TCSANOW, &settings);
if (result < 0)
{
perror ("error in tcsetattr");
return 0;
}
return 1;
}
int main()
{
clear_icanon(); // Changes terminal from canonical mode to non canonical mode.
std::string a;
std::cin >> a;
std::cout << a.length() << std::endl;
}

Using this test-program based on what you posted:
#include <iostream>
#include <string>
int main()
{
std::string a;
std::cin >> a;
std::cout << a.length() << std::endl;
}
I can do:
./a.out < fact100000.txt
and get the output:
456574
However, if I copy'n'paste from an editor to the console, it stops at 4095. I expect that's a limit somewhere in the consoles copy'n'paste handling. The easy solution to that is of course to not use copy'n'paste, but redirect from a file. On some other systems, the restruction to 4KB of input may of course reside somewhere else. (Note that, at least on my system, I can happily copy and paste the 450KB of factorial result to another editor window, so in my system it's simply the console buffer that is the problem).

This is much more likely to be a platform/OS problem than a C++ problem. What OS are you using, and what method are you using to get the string fed to stdin? It's pretty common for command-line arguments to be capped at a certain size.
In particular, given that you've tried reading one character at a time, and it still didn't work, this seems like a problem with getting the string to the program, rather than a C++ issue.

Related

Should I handle multiple instances of cin / stdin?

Below is a little program in C++ which is supposed to act as the cat linux binutil: it gets one or several inputs as detailed in the command line arguments (possibly specifying stdin via '-') and copy them onto the standard output. Unfortunately, it shows an unintended behaviour that I cannot understand the root causes of...
Upon the following command
./ccat - test.text
I hit CTRL-D directly without passing any character. I would expect the program to display anyway the content of test.txt, but instead, the program exits without passing any more characters onto the standard output stream.
Any idea on how I should correct my code below to have the correct behaviour in this situation? Should I handle multiple instances of the standard streams (cin, cout...)? If so, do you know how this can be achieved in C++?
Thank you in advance.
/**** ccat.cpp ****/
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int main(int argc, char **argv) {
if (argc <= 1) {
cout << cin.rdbuf();
} else {
vector<string> inputs;
for (int i=1; i<argc; ++i) inputs.push_back(argv[i]);
for (auto &in: inputs) {
if (in == "-" || in == "--") {
cout << cin.rdbuf();
}
else {
ifstream *fd = new ifstream(in);
if (!fd->is_open()) cerr << "Cannot open file \'" << in << "\'\n";
else cout << fd->rdbuf();
delete fd;
}
}
}
return 0;
}
I tried the following commands in sequence:
$ ./ccat > test.txt
Let's try this text.
I would expect a correct behaviour.
$ ./ccat - test.txt # I hit CTRL-D directly without passing any character first
$ ./ccat - test.txt
But when I add some characters before hitting CTRL-D... This works fine.
But when I add some characters before hitting CTRL-D... This works fine.
Let's try this text.
I would expect a correct behaviour.
As the example shows, I would expect in any of the two cases (last two shell prompts) that test.txt gets displayed onto the standard output, but this occurs only if I inject characters through the standard input first. Hitting CTRL-D straight away makes the program exit prematurely.
That's overload 10 here;
basic_ostream& operator<<( std::basic_streambuf<CharT, Traits>* sb );
and it says
If no characters were inserted, executes setstate(failbit).
In other words, cout is now in an error state and will not output anything.
Doing
cout.clear();
first of all in the else branch, or last of all in the if branch, should do it.
Note that sending end-of-file to standard input is usually not something you can recover or "restart" from, so you might only be able to use one standard input "section".

Why does the output comes after sleep without newline?

I'm using gcc 7.3 and g++ 7.3. GCC and G++ makes error. For example,
#include <stdio.h>
#include <unistd.h>
int main() {
printf("a");
sleep(1);
return 0;
}
'a' prints after waiting 1 seconds but when I use printf("a\n"); it works correctly. It's same on C++. For example,
#include <iostream>
#include <unistd.h>
int main() {
std::cout << "a";
sleep(1);
return 0;
}
'a' prints after waiting 1 seconds, too. However, when I use std::cout << "a" << std::endl; it works correctly. What's the problem and how to fix it?
sleep() is like schedule a process manually. printf() puts the data into stdout stream not directly on monitor.
printf("a"); /* data is there in stdout , not flushed */
sleep(1); /* as soon as sleep(1) statement occurs your process(a.out) jumped to waiting state, so data not gets printed on screen */
So either you should use fflush(stdout) or use \n to clear the stdout stream.
You are seeing this behaviour because stdout will be usually line buffered when used with terminal and fully buffered when used with files, the strings will be stored in a buffer and can be flushed by entering new line or when buffer fills or when program terminates
You can also override buffer mode by using setvbuf as below
setvbuf(stdout, NULL, _IONBUF, 1024);
printf("a");
It will print a without buffering, have a look at https://www.tutorialspoint.com/c_standard_library/c_function_setvbuf.htm for using setvbuf
Also have a look at different types of buffering with streams.
Hope this helps you.

CMD Prompt C++: Limiting literals entered on screen

I hope the question isn't to ambiguous.
when I ask:
int main()
{
string name = {""};
cout << "Please enter a name: " << endl;
getline(cin, name);
//user enters 12 characters stop displaying next literal keypresses.
enter code here
}
I would like to be able to limit the amount of times the user can enter a char on screen. Example, the screen stops displaying characters after length 12?
If so what would be the library and command line for doing something like this?
Wanting to this as, I have a ascii art drawn on the CMD, and when I cout the statement at x,y anything over 12 characters long inputed draws over the ascii art.
I hope this makes sense :'{ Thank you!
By default the console is in cooked mode (canonical mode, line mode, ...). This means
that the console driver is buffering data before it hands it to your application
characters will be automatically echoed back to the console by the console driver
Normally, this means that your program only ever gets hold of the input after a line ends, i.e. when enter is pressed. Because of the auto-echo, those character are then already on screen.
Both settings can be changed independently, however the mechanism is --unfortunately-- an OS-specific call:
For Window it's SetConsoleMode():
HANDLE h_stdin = GetStdHandle(STD_INPUT_HANDLE);
DWORD mode = 0;
// get chars immediately
GetConsoleMode(hStdin, &mode);
SetConsoleMode(hStdin, mode & ~ENABLE_LINE_INPUT));
// display input echo, set after 12th char.
GetConsoleMode(hStdin, &mode);
SetConsoleMode(hStdin, mode & ~ENABLE_ECHO_INPUT));
As noted by yourself, Windows still provides conio.h including a non-echoing _getch() (with underscore, nowadays). You can always use that and manually echo the characters. _getch() simply wraps the console line mode on/off, echo on/off switch into a function.
Edit: There is meant to be an example on the use of _getch(), here. I'm a little to busy to get it done properly, I refrained from posting potentially buggy code.
Under *nix you will most likely want to use curses/termcap/terminfo. If you want a leaner approach, the low level routines are documented in termios/tty_ioctl:
#include <sys/types.h>
#include <termios.h>
struct termios tcattr;
// enable non-canonical mode, get raw chars as they are generated
tcgetattr(STDIN_FILENO, &tcattr);
tcattr.c_lflag &= ~ICANON;
tcsetattr(STDIN_FILENO, TCSAFLUSH, &tcattr);
// disable echo
tcgetattr(STDIN_FILENO, &tcattr);
tcattr.c_lflag &= ~ECHO;
tcsetattr(STDIN_FILENO, TCSAFLUSH, &tcattr);
You can use scanf("%c",&character) on a loop from 1 to 12 and append them to a pre-allocated buffer.
As in my comments, I mentioned a method I figured out using _getch(); and
displaying each char manually.
simplified version:
#include <iostream>
#include <string>
#include <conio.h>
using namespace std;
string name = "";
int main()
{
char temp;
cout << "Enter a string: ";
for (int i = 0; i < 12; i++) { //Replace 12 with character limit you want
temp = _getch();
name += temp;
cout << temp;
}
system("PAUSE");
}
This lets you cout each key-press as its pressed,
while concatenating each character pressed to a string called name.
Then later on in what ever program you use this in, you can display the full name as a single string type.

visual studio c++ cin big string from command line

When I run the following program and paste 50000 symbols to the command line, the program gets 4096 symbols only. Could you please suggest me what to do in order to get the full list of symbols?
#include <iostream>
#include <string>
using namespace std;
int main()
{
char temp[50001];
while (cin.getline(temp, 50001, '\n'))
{
string s(temp);
cout << s.size() << endl;
}
return 0;
}
P.S.
When I read the symbols from file using fstream, it's OK
I'm taking a leap jump here but since many powershell terminals have 4096 truncation limits (take a look at the Out-File documentation), this is likely a Windows command line limitation rather than a getline limitation.
The same problem has been encountered previously by others: https://github.com/Discordia/large-std-input/blob/master/LargeStdInput/Main.cpp
I don't understand why you are reading into a character array, then transferring it into a string.
In any case, your issue may be with repeated allocations.
Reading into std::string directly
Two simple lines:
std::string s;
getline(cin, s, '\n');
Reading into an array first
Yes, there is a simpler method:
#define BUFFER_SIZE 8196 // Very important, named constant
char temp[BUFFER_SIZE];
cin.getline(temp, BUFFER_SIZE, '\n');
// Get the number of characters actually read
unsigned int chars_read = cin.gcount();
std::string s(temp, chars_read); // Here's how to transfer the characters.
Using a debugger, you need to view the value in chars_read to verify that the quantity of characters read is valid.
Binary reading
Some platforms provide translations between the data read and your program. For example, Windows uses Ctrl-Z as an EOF character; Linux uses Ctrl-D.
The input data may use UTF encoding and contain values outside the range of ASCII printable set.
So, the preferred method is to read from a stream opened in binary mode. Unfortunately, cin cannot be opened easily in binary mode.
See Open cin in binary
The preferred method, if possible, is to put the text into a file and read from the file.

How do I run a program from another program and pass data to it via stdin in c or c++?

Say I have an .exe, lets say sum.exe. Now say the code for sum.exe is
void main ()
{
int a,b;
scanf ("%d%d", &a, &b);
printf ("%d", a+b);
}
I wanted to know how I could run this program from another c/c++ program and pass input via stdin like they do in online compiler sites like ideone where I type the code in and provide the stdin data in a textbox and that data is accepted by the program using scanf or cin. Also, I wanted to know if there was any way to read the output of this program from the original program that started it.
The easiest way I know for doing this is by using the popen() function. It works in Windows and UNIX. On the other way, popen() only allows unidirectional communication.
For example, to pass information to sum.exe (although you won't be able to read back the result), you can do this:
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *f;
f = popen ("sum.exe", "w");
if (!f)
{
perror ("popen");
exit(1);
}
printf ("Sending 3 and 4 to sum.exe...\n");
fprintf (f, "%d\n%d\n", 3, 4);
pclose (f);
return 0;
}
In C on platforms whose name end with X (i.e. not Windows), the key components are:
pipe - Returns a pair of file descriptors, so that what's written to one can be read from the other.
fork - Forks the process to two, both keep running the same code.
dup2 - Renumbers file descriptors. With this, you can take one end of a pipe and turn it into stdin or stdout.
exec - Stop running the current program, start running another, in the same process.
Combine them all, and you can get what you asked for.
This is my solution and it worked:
sum.cpp
#include "stdio.h"
int main (){
int a,b;
scanf ("%d%d", &a, &b);
printf ("%d", a+b);
return 0;
}
test.cpp
#include <stdio.h>
#include <stdlib.h>
int main(){
system("./sum.exe < data.txt");
return 0;
}
data.txt
3 4
Try this solution :)
How to do so is platform dependent.
Under windows, Use CreatePipe and CreateProcess. You can find example from MSDN :
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682499(v=vs.85).aspx
Under Linux/Unix, you can use dup() / dup2()
One simple way to do so is to use a Terminal (like command prompt in windows) and use | to redirect input/output.
Example:
program1 | program2
This will redirect program1's output to program2's input.
To retrieve/input date, you can use temporary files, If you don't want to use temporary files, you will have to use pipe.
For Windows, (use command prompt):
program1 <input >output
For Linux, you can use tee utility, you can find detail instruction by typing man tee in linux terminal
It sounds like you're coming from a Windows environment, so this might not be the answer you are looking for, but from the command line you can use the pipe redirection operator '|' to redirect the stdout of one program to the stdin of another. http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/redirection.mspx?mfr=true
You're probably better off working in a bash shell, which you can get on Windows with cygwin http://cygwin.com/
Also, your example looks like a mix of C++ and C, and the declaration of main isn't exactly an accepted standard for either.
How to do this (you have to check for errors ie. pipe()==-1, dup()!=0, etc, I'm not doing this in the following snippet).
This code runs your program "sum", writes "2 3" to it, and than reads sum's output. Next, it writes the output on the stdout.
#include <iostream>
#include <sys/wait.h>
#include <unistd.h>
int main() {
int parent_to_child[2], child_to_parent[2];
pipe(parent_to_child);
pipe(child_to_parent);
char name[] = "sum";
char *args[] = {name, NULL};
switch (fork()) {
case 0:
// replace stdin with reading from parent
close(fileno(stdin));
dup(parent_to_child[0]);
close(parent_to_child[0]);
// replace stdout with writing to parent
close(fileno(stdout));
dup(child_to_parent[1]);
close(child_to_parent[1]);
close(parent_to_child[1]); // dont write on this pipe
close(child_to_parent[0]); // dont read from this pipe
execvp("./sum", args);
break;
default:
char msg[] = "2 3\n";
close(parent_to_child[0]); // dont read from this pipe
close(child_to_parent[1]); // dont write on this pipe
write(parent_to_child[1], msg, sizeof(msg));
close(parent_to_child[1]);
char res[64];
wait(0);
read(child_to_parent[0], res, 64);
printf("%s", res);
exit(0);
}
}
I'm doing what #ugoren suggested in their answer:
Create two pipes for communication between processes
Fork
Replace stdin, and stdout with pipes' ends using dup
Send the data through the pipe
Based on a few answers posted above and various tutorials/manuals, I just did this in Linux using pipe() and shell redirection. The strategy is to first create a pipe, call another program and redirect the output of the callee from stdout to one end of the pipe, and then read the other end of the pipe. As long as the callee writes to stdout there's no need to modify it.
In my application, I needed to read a math expression input from the user, call a standalone calculator and retrieve its answer. Here's my simplified solution to demonstrate the redirection:
#include <string>
#include <unistd.h>
#include <sstream>
#include <iostream>
// this function is used to wait on the pipe input and clear input buffer after each read
std::string pipeRead(int fd) {
char data[100];
ssize_t size = 0;
while (size == 0) {
size = read(fd, data, 100);
}
std::string ret = data;
return ret;
}
int main() {
// create pipe
int calculatorPipe[2];
if(pipe(calculatorPipe) < 0) {
exit(1);
}
std::string answer = "";
std::stringstream call;
// redirect calculator's output from stdout to one end of the pipe and execute
// e.g. ./myCalculator 1+1 >&8
call << "./myCalculator 1+1 >&" << calculatorPipe[1];
system(call.str().c_str());
// now read the other end of the pipe
answer = pipeRead(calculatorPipe[0]);
std::cout << "pipe data " << answer << "\n";
return 0;
}
Obviously there are other solutions out there but this is what I can think of without modifying the callee program. Things might be different in Windows though.
Some useful links:
https://www.geeksforgeeks.org/pipe-system-call/
https://www.gnu.org/software/bash/manual/html_node/Redirections.html