Illegal Seek with really_input_string on stdin - ocaml

I am retrofitting some code to accept input from stdin (in addition to files).
print_string (really_input_string stdin (in_channel_length stdin))
This works when I redirect stdin:-
$ ./a.out < /tmp/lorem.txt
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
But otherwise fails without waiting for input from me:-
$ ./a.out
Fatal error: exception Sys_error("Illegal seek")
$
Or:-
$ cat /tmp/lorem.txt | ./a.out
Fatal error: exception Sys_error("Illegal seek")
How do I get the latter also to work?

You don't mention what system you're using.
The Unix seek operation is meaningful only for regular files, i.e., files stored on disk (or similar randomly addressible media). In the usual Unix implementation, a seek on a terminal device or a pipe is just ignored. However, it appears that in the system you're using these are treated as an error. This makes me suspect you're not using a Unix-like (or sufficiently Unix-like) system.
At any rate the problem seems to be that in_channel_length seeks to the end of the file to determine how big it is. In your system this doesn't work when the input is coming from a terminal or a pipe.
It's hard to see how the code would work as expected even on a Unix system when input comes from a pipe or terminal.
I suggest you write your own loop to read until you see EOF.
Here's a crude implementation that's probably good enough for a text file:
let my_really_read_string in_chan =
let res = Buffer.create 1024 in
let rec loop () =
match input_line in_chan with
| line ->
Buffer.add_string res line;
Buffer.add_string res "\n";
loop ()
| exception End_of_file -> Buffer.contents res
in
loop ()

Related

Double echo when running commands under a pty

I'm writing a program to create a pty, then fork and execute an ssh command with the slave side of the pty as its stdin. The full source code is here.
using namespace std;
#include <iostream>
#include <unistd.h>
#include <fcntl.h>
int main() {
int fd = posix_openpt(O_RDWR);
grantpt(fd);
unlockpt(fd);
pid_t pid = fork();
if (pid == 0) { //slave
freopen(ptsname(fd), "r", stdin);
execlp("ssh", "ssh", "user#192.168.11.40", NULL);
} else { //master
FILE *f = fdopen(fd, "w");
string buf;
while (true) {
getline(cin, buf);
if (!cin) {
break;
}
fprintf(f, "%s\n", buf.c_str());
}
}
}
After executing this program and inputting just echo hello (and a newline), the child command re-sends my input before its own output, thus duplicating my input line:
~ $ echo hello
echo hello #duplication
hello
~ $
I think this is due to the fact that a pty behaves almost the same as a normal terminal. If I add freopen("log.txt", "w", stdout);" and input the same command, I get just
echo hello #This is printed because I typed it.
and the contents of log.txt is this:
~ $ echo hello #I think this is printed because a pty simulates input.
hello
~ $
How can I avoid the duplication?
Is that realizable?
I know it is somehow realizable, but don't know how to. In fact, the rlwrap command behaves the same as my program, except that it doesn't have any duplication:
~/somedir $ rlwrap ssh user#192.168.11.40
~ $ echo hello
hello
~ $
I'm reading the source code of rlwrap now, but haven't yet understood its implementation.
Supplement
As suggested in this question (To me, not the answer but the OP was helpful.), unsetting the ECHO terminal flag disables the double echoing. In my case, adding this snippets to the slave block solved the problem.
termios terminal_attribute;
int fd_slave = fileno(fopen(ptsname(fd_master), "r"));
tcgetattr(fd_slave, &terminal_attribute);
terminal_attribute.c_lflag &= ~ECHO;
tcsetattr(fd_slave, TCSANOW, &terminal_attribute);
It should be noted that this is not what rlwrap does. As far as I tested rlwrap <command> never duplicates its input line for any <command> However, my program echoes twice for some <command>s. For example,
~ $ echo hello
hello #no duplication
~ $ /usr/bin/wolfram
Mathematica 12.0.1 Kernel for Linux ARM (32-bit)
Copyright 1988-2019 Wolfram Research, Inc.
In[1]:= 3 + 4
3 + 4 #duplication (my program makes this while `rlwrap` doesn't)
Out[1]= 7
In[2]:=
Is this because the <command> (ssh when I run wolfram remotely) re-enables echoing? Anyway, I should keep reading the source code of rlwrap.
As you already observed, after the child has called exec() the terminal flags of the slave side are not under your control anymore, and the child may (and often will) re-enable echo. This means that is is not of much use to change the terminal flags in the child before calling exec.
Both rlwrap and rlfe solve the problem in their own (different) ways:
rlfe keeps the entered line, but removes the echo'ed input from the child's output before displaying it
rlwrap removes the entered line and lets it be replaced by the echo
Whatever approach you use, you have to know whether your input has been (in rlfes case) or will be (in rlwraps case) echoed back. rlwrap, at least, does this by not closing the pty's slave end in the parent process, and then watching its terminal settings (in this case, the ECHO bit in its c_lflag) to know whether the slave will echo or not.
All this is rather cumbersome, of course. The rlfe approach is probably easier, as it doesn't require the use of the readline library, and you could simply strcmp() the received output with the input you just sent (which will only go wrong in the improbable case of a cat command that disables echo on its input)

Confused by pipes. 'cat -A' seems to filter out part of output

I am seeing this.
cat test.hs |./TonHospel
25 x 25 matrix, 8 threads
permanent=-5258461839360 (0.213000 s)
user#user-desktop:~/python$ cat test.hs |./TonHospel |cat -A
25 x 25 matrix, 8 threads$
For some reason cat -A is filtering out part of the output. I guessed it might somehow be related to stderr and stdout so I tried piping both to stdout. This didn't help.
user#user-desktop:~/python$ cat test.hs |./TonHospel 2>&1 |cat -A
25 x 25 matrix, 8 threads$
Last I just randomly tried this.
user#user-desktop:~/python$ cat test.hs |./TonHospel 3>&1 1>&2 2>&3 |cat -A
25 x 25 matrix, 8 threads
permanent=-5258461839360 (0.236000 s)
What is going on? The C++ source code is at https://bpaste.net/show/ce5ca8643ba5 .
You call quick_exit at the end of main, rather than simply returning an exit code. That is extremely dangerous, as quick_exit does not bother to clean up the execution environment. In particular, it does not flush the buffer associated with stdout.
That won't be a problem if stdout is line-buffered, as it will be on most systems if it is attached to a terminal. But if it is fully buffered, as it will be if it is attached to a pipe, then output may be lost, which is what you are seeing.
That's not the only questionable programming practice in your code, but I believe it is the immediate problem.
(By the way, the first line is correctly printed because std::cout << std:::endl; explicitly flushes the cout buffer. Mixing c++ and c output functions is also a bad idea, though.)
When TonHospel detects that its standard output is a terminal, it writes an additional line (such as permanent=-5258461839360 (0.213000 s)) to standard error. Nothing is written to standard error when standard out is something else, such as a pipe or a file.

How to get the full cmdline that is used to call a program, if the cmdline has multiple pipes

Im writing a framework, to track how people use my utilities, like example utility 'result'
So I want to put in piece of code into result.cxx main() that will log stuff like,
1. what arguments were given to result = argc, argv
2. what other programs were used to process stuff and pipe to my utility 'result'
eg:
Im trying to run a program 'result' which is provided input from pipes like
abc -e earg -t targ | process_stuff_out_data | result -s sarg
now in the result.cxx I use to capture piped input
std::string piped_data;
std::getline(std::cin, piped_data);
this works with cases like
echo "1 2 3 " | result -i in_number
// here the piped input is just "1 2 3" so i am able to log it from result
but wont work for cases where the output from the previous program is a stream of binary data
abc -e earg -t targ | out_bin_data | result -s sarg
In this case i just want to
LOG_PIPED_STUFF: abc -e earg -t targ | process_stuff_out_data
std::getline won't return untill it reads a newline, see here: http://www.cplusplus.com/reference/string/string/getline
Use another separator token or just use another function to read from standard input while the data becomes availible.
You can use for example feof(stdin) to check if stdin has bytes availible and then fread() them.
If you are on linux you can use select(2) to wait for input on file descriptor 0.
Possible reasons you are not getting data
the data is binary and doesn't have newlines. You need to use binary I/O calls.
the data is buffered. You need to either flush the stdout of the middle process process_stuff_out_data if you can rewrite it, or just wait until abc and process_stuff_out_data exit.
process_stuff_out_data is writing to stderr, not stdout. Or it is writing to the console (ugh), i.e. /dev/console.

OCaml - Fatal error: exception Sys_error("Broken pipe") when using `| head` on output containing many lines

I have text file with many lines. I want to write a simple OCaml program that will process this file line by line and maybe print the line.
For writing this program, I first created a smaller file, with fewer lines - so that program will finish executing faster.
$ wc -l input/master
214745 input/master
$ head -50 input/master > input/small-master
Here is the simple boilerplate filter.ml program I wrote:
open Core.Std;;
open Printf;;
open Core.In_channel;;
if Array.length Sys.argv >= 2 then begin
let rec process_lines ?ix master_file =
let ix = match ix with
| None -> 0
| Some x -> x
in
match input_line master_file with
| Some line -> (
if ix > 9 then printf "%d == %s\n" ix line;
process_lines ~ix:(ix+1) master_file
)
| None -> close master_file
in
let master_file = create Sys.argv.(1) in
process_lines master_file
end
It takes the input file's location as a command line argument, creates a file-handle for reading this file and calls the recursive function process_lines with this file-handle as an argument.
process_lines uses the optional argument ix to count the line numbers as it reads from the file-handle line by line. process_lines simply prints the line that was read from the file_handle to the standard output.
Then, when, I execute the program on the smaller input file and pipe the output to the Linux head command everything works fine:
$ ./filter.native input/small-master |head -2
10 == 1000032|BINCH JAMES G|4|2012-11-13|edgar/data/1000032/0001181431-12-058269.txt
11 == 1000032|BINCH JAMES G|4|2012-12-03|edgar/data/1000032/0001181431-12-061825.txt
And, when, I execute the program on the larger file I see a broken-pipe error:
$ ./filter.native input/master |head -2
10 == 1000032|BINCH JAMES G|4|2012-11-13|edgar/data/1000032/0001181431-12-058269.txt
11 == 1000032|BINCH JAMES G|4|2012-12-03|edgar/data/1000032/0001181431-12-061825.txt
Fatal error: exception Sys_error("Broken pipe")
Raised by primitive operation at file "pervasives.ml", line 264, characters 2-40
Called from file "printf.ml", line 615, characters 15-25
Called from file "find.ml", line 13, characters 21-48
Called from file "find.ml", line 19, characters 2-27
I learnt that such broken pipe errors will occur when the reader of a pipe (head command in this case) exits before the writer of the pipe (my OCaml program in this case) has done writing. Which is why I will never get such an error if I used the tail command as the reader.
However, why didn't the broken-pipe error occur when the file had lesser number of lines ?
The broken pipe signal is a basic part of the Unix design. When you have a pipeline a | b where b reads only a small amount of data, you don't want a to waste its time writing after b has read all it needs. To make this happen, Unix sends the broken pipe signal to a process that writes to a pipe that nobody is reading. In the usual case, this causes the program to exit silently (i.e., it kills the program), which is just what you want.
In this hypothetical example, b exits after reading a few lines, which means nobody is reading the pipe. The next time a tries to write more output, it gets sent the broken pipe signal and exits.
In your case a is your program and b is head.
It appears that the OCaml runtime is noticing the signal and is not exiting silently. You could consider this a flaw, or maybe it's good to know whenever a signal has terminated your program. The best way to fix it would be to catch the signal yourself and exit silently.
The reason it doesn't happen for the small file is that the whole output fits into the pipe. (A pipe represents a buffer of 64K bytes or so.) Your program just writes its data and exits; there's not enough time for your program to try to write to a pipe with no reader.

How to read a input file with both argv and redirection from a input file

My program needs to accept three kinds of input commands below:
./Myprogram input.txt
./Myprogram < input.txt
./Myprogram
I'm thinking about using argc to check the number of arguments to resolve the first two situations (since redirection doesn't count as an argument). But then I stuck on the last case, which simply waits for an user input.
I'm wondering if there is a way to tell if a redirection is present in the shell command?
For a more complicated scenario such as a mix of redirection and argv forms (see below). Is there a way to do it or it's simply a bad design for taking user commands?
./Myprogram input1.txt input2.txt input3.txt
./Myprogram input1.txt < input2.txt input3.txt
./Myprogram
Any help will be much appreciated!
Z.Zen
Redirection will never be seen by your program as an argument. So in:
./Myprogram input.txt
./Myprogram < input.txt
./Myprogram
the second and third forms are identical. As for your second set of possibilities:
./Myprogram input1.txt input2.txt input3.txt
./Myprogram input1.txt < input2.txt input3.txt
./Myprogram
the second line is equivalent to:
./Myprogram input1.txt input3.txt < input2.txt
and it's also indistinguishable from:
./Myprogram input1.txt input3.txt
(the only different being where standard input actually comes from).
A typical way some programs handle mixed input from stdin and files specified on the command line is to accept "-" as a special filename meaning "use stdin as the input file at this position in the argument list". Many such programs will default to processing a singleton-list of "-" if the argument list is empty.
The basic algorithm is:
if (there are no arguments left after parsing options)
call_function(stdin);
else
{
foreach remaining argument
{
FILE *fp;
if (strcmp(argument, "-") == 0)
call_function(stdin);
else if ((fp = fopen(argument, "r")) == 0)
...error handling...
else
{
call_function(fp);
fclose(fp);
}
}
}
You could pass the file name to the 'call_function()' too, and sometimes I write the code with the output file stream specified. That function ('call_function()') is what processes one file - reading to the end of the file. It does not close the file; it was given an open file and should not close it.
The first 'if' deals with the I/O redirection case, of course.
I wrote, many years go, a function to handle this loop. It simplifies my life whenever I need to write a command in this 'UNIX filter' idiom - which is quite often. Along with a standardized error reporting package, it greatly simplifies life. Having it as a function also permits me to use variants on it, such as one that creates a backup of the file before it is overwritten, or which safely overwrites the file if the function completes successfully.
#R.. is correct for the usual cases.
If you want to have interactive behavior in case #3 but not #2, beyond letting the terminal buffer the user's input by line, you can use isatty (specifically isatty(0)) to determine whether there's a person on the other end.
This is not standard C, but neither is the notion of a terminal or a shell!
I'm wondering if there is a way to tell if a redirection is present in the shell command?
No. From your program's point of view there is no difference between these two cases:
./Myprogram < input.txt
./Myprogram
In both the cases the program is not taking any command line argument and would get it's input from the standard input.In the first case it's the shell that is connecting the contents of the file input.txt to the stdin of your program your program knows nothing about this.
It is possible to tell whether there is data to read by using select() on stdin. This will not tell you whether there is a redirection (you won't be able to tell when the file is empty, or when for some reason the user managed to put something on stdin before your program got a chance to test for it). Whether it works for your case or not depends on what you want to do in borderline cases.
Alternatively, you can use isatty() on stdin to find out if it's a tty or not. It's what most programs will do to find out whether they are interactive or not, and probably better in your case.
Now, you may notice that blocking waiting for user input in the third case is what all standard tools do, and is probably the behavior most users expect of your program too.