I would like to set a script in order to continuously parse a specific marker in a xml file.
The script contains the following while loop:
function scan_t()
{
INPUT_FILE=${1}
while : ; do
if [[ -f "$INPUT_FILE" ]]
then
ret=`cat ${INPUT_FILE} | grep "<data>" | awk -F"=|>" '{print $2}' | awk -F"=|<" '{print $1}'`
if [[ "$ret" -ne 0 ]] && [[ -n "$ret" ]]
then
...
fi
fi
done
}
scant_t "/tmp/test.xml"
The line format is :
<data>0</data> or <data>1</data> <data>2</data> ..
Even if the condition if [[ -f "$INPUT_FILE" ]] has been added to the script, sometimes I get:
cat: /tmp/test.xml: No such file or directory.
Indeed, the $INPUT_FILE is normally consumed by an other process which is charged to suppress the file after reading.
This while loop is only used for test, the cat error doesn't matter but I would like to hide this return because it pollutes the terminal a lot.
If some other process can also read and remove the file before this script sees it, you've designed your system with a race condition. (I assume that "charged to suppress" means "designed to unlink"...)
If it's optional for this script to see every input file, then just redirect stderr to /dev/null (i.e. ignore errors when the race condition bites). If it's not optional, then have this script rename the input file to something else, and have the other process watch for that. Check for that file existing before you do the rename, to make sure you don't overwrite a file the other process hasn't read yet.
Your loop has a horrible design. First, you're busy-waiting (with no sleep at all) on the file coming into existence. Second, you're running 4 programs when the input exists, instead of 1.
The busy-wait can be avoided by using inotifywait to watch the directory for changes. So the if [[ -f $INPUT_FILE ]] loop body only runs after a modification to the directory, rather than as fast as a CPU core can run it.
The second is simpler to address: never cat file | something. Either something file, or something < file if something doesn't take filenames on its command line, or behaves differently. cat is only useful if you have multiple files to concatenate. For reading a file into a shell variable, use foo=$(<file).
I see from comments you've already managed to turn your whole pipeline into a single command. So write
INPUT_FILE=foo;
inotifywait -m -e close_write -e moved_to --format %f . |
while IFS= read -r event_file;do
[[ $event_file == $INPUT_FILE ]] &&
awk -F '[<,>]' '/data/ {printf "%s ",$3} END {print ""}' "$INPUT_FILE" 2>/dev/null
# echo "$event_file" &&
# date;
done
# tested and working with the commented-out echo/date commands
Note that I'm waiting for close_write and moved_to, rather than other events, to avoid jumping the gun and reading a file that's not finished being written. Put $INPUT_FILE in its own directory, so you don't get false-positive events waking up your loop for other filenames.
To also implement the rename-to-input-for-next-stage suggestion, you'd put a while [[ -e $INPUT2 ]]; do sleep 0.2; done; mv -n "$INPUT_FILE" "$INPUT2" busy-wait loop after the awk.
An alternative would be to run inotifywait once per loop iteration, but that has the potential for you to get stuck with $INPUT_FILE created before inotifywait started watching. So the producer would be waiting for the consumer to consume, and the consumer wouldn't see the event.
# Race condition with an asynchronous producer, DON'T USE
while inotifywait -qq -e close_write -e moved_to; do
[[ $event_file == $INPUT_FILE ]] &&
awk -F '[<,>]' '/data/ {printf "%s ",$3} END {print ""}' "$INPUT_FILE" 2>/dev/null
done
There doesn't seem to be a way to specify the name of a file that doesn't exist yet, even as a filter, so the loop body needs to test for the specific file existing in the dir before using.
If you don't have inotifywait available, you could just put a sleep into the loop. GNU sleep supports fractional seconds, like sleep 0.5. Busybox probably doesn't. You might want to write a tiny trivial C program anyway, which keeps trying to open(2) the file in a loop that includes a usleep or nanosleep. When open succeeds, redirect stdin from that, and exec your awk program. That way, there's no race possible between a stat and an open.
#include <unistd.h> // for usleep/dup2
#include <sys/types.h> // for open
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h> // for perror
void waitloop(const char *path)
{
const char *const awk_args[] = { "-F", "[<,>]",
"/data/ {printf \"%s \",$3} END {print \"\"}",
path
};
while(42) {
int fd = open(path, O_RDONLY);
if (-1 != fd) {
// if you fork() here, you can avoid the shell loop too.
dup2(fd, 0); // redirect stdin from fd. In theory should check for error here, too.
close(fd); // and do this in the parent after fork
execv("/usr/bin/awk", (char * const*)awk_args); // execv's prototype doesn't prevent it from modifying the strings?
} else if(errno != ENOENT) {
perror("opening the file");
} // else ignore ENOENT
usleep(10000); // 10 milliseconds.
}
}
// optional TODO: error-check *all* the system calls.
This compiles, but I haven't tested it. Looping inside a single process doing open / usleep is much lighter weight than running a whole process to do sleep 0.01 from a shell.
Even better would be to use inotify to watch for directory events to detect the file appearing, instead of usleep. To avoid a race, after setting up the inotify watch, do another check for the file existing, in case it got created after your last check, but before the inotify watch became active.
Related
I need to go another server and perform a word count. Based on the count variable I will perform a if else logic.
However i am unable to do a word count and further unable to compare the variable value in if condition.
Error:
wc: cannot open the file v.txt
Script:
#!/bin/bash
ssh u1#s1 "cd ~/path1/ | fgrep-f abc.csv xyz.csv > par.csv | a=$(wc -l par.csv)| if ["$a" == "0"];
then echo "success"
fi"
First, although the wc program is named for 'word count', wc -l actually counts lines not words. I assume that is what you want even though it isn't what you said.
A shell pipline one | two | three runs things in parallel with (only) their stdout and stdin connected; thus your command runs one subshell that changes directory to ~/path1 and immediately exits with no effect on anything else, and at the same time tries to run fgrep-f (see below) in a different subshell which has not changed the directory and thus probably can't find any file, and in a third subshell does the assignment a= (see below) which also immediately exits so it cannot be used for anything.
You want to do things sequentially:
ssh u#h 'cd path1; fgrep -f abc.csv xyz.csv >par.csv; a=$(wc -l par.csv); if [ "$a" == "0" ] ...'
# you _might_ want to use && instead of ; so that if one command fails
# the subsequent ones aren't attempted (and possibly go further wrong)
Note several other important changes I made:
the command you give ssh to send the remote must be in singlequotes ' not doublequotes " if it contains any dollar as yours does (or backtick); with " the $(wc ...) is done in the local shell before sending the command to the remote
you don't need ~/ in ~/path1 because ssh (or really sshd) always starts in your home directory
there is no common command or program fgrep-f; I assume you meant the program fgrep with the flag -f, which must be separated by a space. Also fgrep although traditional is not standard (POSIX); grep -F is preferred
you must have a space after [ and before ]
However, this won't do what you probably want. The value of $a will be something like 0 par.csv or 1 par.csv or 999 par.csv; it will never equal 0 so your "success" branch will never happen. In addition there's no need to do these in separate commands: if your actual goal is to check that there are no occurrences in xyz.csv of the (exact/non-regexp) strings in abc.csv both in path1, you can just do
ssh u#h 'if ! grep -qFf path1/abc.csv path1/xyz.csv; then echo success; fi'
# _this_ case would work with " instead of ' but easier to be consistent
grep (always) sets its exit status to indicate whether it found anything or not; flag -q tells it not to output any matches. So grep -q ... just sets the status to true if it matched and false otherwise; using ! inverts this so that if grep does not match anything, the then clause is executed.
If you want the line count for something else as well, you can do it with a pipe
'a=$( fgrep -Ff path1/abc.csv path1/xyz.csv | wc -l ); if [ $a == 0 ] ...'
Not only does this avoid the temp file, when the input to wc is stdin (here the pipe) and not a named file, it outputs only the number and no filename -- 999 rather than 999 par.csv -- thus making the comparison work right.
I'm a bit new to bash scripting, I have a C++ program communicating back and forth with this bash script through some named pipes. I used inotifywait to watch a folder for new files and when a new file is added (ending in .job) sending it to through the pipe.
I'm having the C++ program pipe back the result, and if the result is 'quit', I want to bash script to quit execution.
I was trying to accomplish this with exit 1 as seen below, but that doesn't seem to exit the entire script. Instead after that exit is ran, when I drop another file in the watch folder the script ends.
I read a bit about subshells, and am wondering if this has something to do with them and any suggestions on how to exit the entire script.
DROP_FOLDER="$1"
DATA_FOLDER="$2"
OUTPUT_FOLDER="$3"
PATH_TO_EXECS="./tmp/"
PATH_TO_COMPLETED="../completed/"
# create pipes
read_pipe=/tmp/c_to_bash
write_pipe=/tmp/bash_to_c
if [[ ! -p $read_pipe ]]; then
mkfifo $read_pipe
fi
if [[ ! -p $write_pipe ]]; then
mkfifo $write_pipe
fi
# start c++ program
./tmp/v2 $DATA_FOLDER $OUTPUT_FOLDER $PATH_TO_EXECS "${write_pipe}" "${read_pipe}" &
# watch drop folder
inotifywait -m $DROP_FOLDER -e create -e moved_to |
while read path action file; do
# ends in .tga
if [[ "$file" =~ .*tga$ ]]; then
# move to image dir
mv "${DROP_FOLDER}${file}" "${DATA_FOLDER}${file}"
fi
# ends in .job
if [[ "$file" =~ .*job$ ]]; then
# pipe to dispatcher
echo "${DROP_FOLDER}${file}" > $write_pipe
# wait for result from pipe
if read line <$read_pipe; then
echo $line
# check for quit result
if [[ "$line" == 'quit' ]]; then
# move job file to completed
mv "${DROP_FOLDER}${file}" "${PATH_TO_COMPLETED}${file}"
# exit
exit 1
fi
# check for continue result
if [[ "$line" == 'continue' ]]; then
# move job file to completed
mv "${DROP_FOLDER}${file}" "${PATH_TO_COMPLETED}${file}"
fi
fi
fi
done
The problem is that exit only exits the current subshell, which in your case is your while loop due to the pipeline.
Bash still waits for inotifywait to exit, which it won't do until it tries to write another value and detects that the pipe is broken.
To work around it, you can use process substitution instead of a pipe:
while read path action file; do
...
done < <(inotifywait -m $DROP_FOLDER -e create -e moved_to)
This works because the loop is not executed in a subshell, and therefore an exit statement will exit the whole script. Additionally, bash doesn't wait for process substitutions to exit, so while it may hang around until the next time it tries to write, it won't stop the script from exiting.
In general, you can use kill "$$" from a subshell in order to terminate the main script ($$ will expand to the pid of the main shell even in subshells, and you can set a TERM trap in order to catch that signal).
But it looks that you actually want to terminate the left side of a pipeline from its right side -- i.e. cause inotifywait to terminate without waiting until it's writing something to the orphan pipe and is killed by SIGPIPE. For that you can kill just the inotifywait process explicitly with pkill:
inotifywait -m /some/dir -e create,modify |
while read path action file; do
pkill -PIPE -P "$$" -x inotifywait
done
pkill -P selects by parent; $$ should be the PID of your script. This solution is of course, not fool-proof. Also have a look at this.
I'm writing a program to create a pty, then fork and execute an ssh command with the slave side of the pty as its stdin. The full source code is here.
using namespace std;
#include <iostream>
#include <unistd.h>
#include <fcntl.h>
int main() {
int fd = posix_openpt(O_RDWR);
grantpt(fd);
unlockpt(fd);
pid_t pid = fork();
if (pid == 0) { //slave
freopen(ptsname(fd), "r", stdin);
execlp("ssh", "ssh", "user#192.168.11.40", NULL);
} else { //master
FILE *f = fdopen(fd, "w");
string buf;
while (true) {
getline(cin, buf);
if (!cin) {
break;
}
fprintf(f, "%s\n", buf.c_str());
}
}
}
After executing this program and inputting just echo hello (and a newline), the child command re-sends my input before its own output, thus duplicating my input line:
~ $ echo hello
echo hello #duplication
hello
~ $
I think this is due to the fact that a pty behaves almost the same as a normal terminal. If I add freopen("log.txt", "w", stdout);" and input the same command, I get just
echo hello #This is printed because I typed it.
and the contents of log.txt is this:
~ $ echo hello #I think this is printed because a pty simulates input.
hello
~ $
How can I avoid the duplication?
Is that realizable?
I know it is somehow realizable, but don't know how to. In fact, the rlwrap command behaves the same as my program, except that it doesn't have any duplication:
~/somedir $ rlwrap ssh user#192.168.11.40
~ $ echo hello
hello
~ $
I'm reading the source code of rlwrap now, but haven't yet understood its implementation.
Supplement
As suggested in this question (To me, not the answer but the OP was helpful.), unsetting the ECHO terminal flag disables the double echoing. In my case, adding this snippets to the slave block solved the problem.
termios terminal_attribute;
int fd_slave = fileno(fopen(ptsname(fd_master), "r"));
tcgetattr(fd_slave, &terminal_attribute);
terminal_attribute.c_lflag &= ~ECHO;
tcsetattr(fd_slave, TCSANOW, &terminal_attribute);
It should be noted that this is not what rlwrap does. As far as I tested rlwrap <command> never duplicates its input line for any <command> However, my program echoes twice for some <command>s. For example,
~ $ echo hello
hello #no duplication
~ $ /usr/bin/wolfram
Mathematica 12.0.1 Kernel for Linux ARM (32-bit)
Copyright 1988-2019 Wolfram Research, Inc.
In[1]:= 3 + 4
3 + 4 #duplication (my program makes this while `rlwrap` doesn't)
Out[1]= 7
In[2]:=
Is this because the <command> (ssh when I run wolfram remotely) re-enables echoing? Anyway, I should keep reading the source code of rlwrap.
As you already observed, after the child has called exec() the terminal flags of the slave side are not under your control anymore, and the child may (and often will) re-enable echo. This means that is is not of much use to change the terminal flags in the child before calling exec.
Both rlwrap and rlfe solve the problem in their own (different) ways:
rlfe keeps the entered line, but removes the echo'ed input from the child's output before displaying it
rlwrap removes the entered line and lets it be replaced by the echo
Whatever approach you use, you have to know whether your input has been (in rlfes case) or will be (in rlwraps case) echoed back. rlwrap, at least, does this by not closing the pty's slave end in the parent process, and then watching its terminal settings (in this case, the ECHO bit in its c_lflag) to know whether the slave will echo or not.
All this is rather cumbersome, of course. The rlfe approach is probably easier, as it doesn't require the use of the readline library, and you could simply strcmp() the received output with the input you just sent (which will only go wrong in the improbable case of a cat command that disables echo on its input)
I am working on a C++ Project. To fulfill one of the requirement, I need to check if a port is available for using in my application anytime. To fulfill this , I have come to this following solution.
#include <iostream>
#include <cstdlib>
#include <stdexcept>
#include <string>
#include <stdio.h>
std::string _executeShellCommand(std::string command) {
char buffer[256];
std::string result = "";
const char * cmd = command.c_str();
FILE* pipe = popen(cmd, "r");
if (!pipe) throw std::runtime_error("popen() failed!");
try {
while (!feof(pipe))
if (fgets(buffer, 128, pipe) != NULL)
result += buffer;
} catch (...) {
pclose(pipe);
throw;
}
pclose(pipe);
return result;
}
bool _isAvailablePort(unsigned short usPort){
char shellCommand[256], pcPort[6];
sprintf(shellCommand, "netstat -lntu | awk '{print $4}' | grep ':' | cut -d \":\" -f 2 | sort | uniq | grep %hu", usPort);
sprintf(pcPort, "%hu", usPort);
std::string output = _executeShellCommand(std::string(shellCommand));
if(output.find(std::string(pcPort)) != std::string::npos)
return false;
else
return true;
}
int main () {
bool res = _isAvailablePort(5678);
return 0;
}
Here Basically the _executeShellCommand function can excute any shell command anytime and can return the stdout output as return string.
And I am executing the following shell command in that function.
netstat -lntu | awk '{print $4}' | grep ':' | cut -d \":\" -f 2 | sort | uniq | grep portToCheck
So, if the port is already in use, the _executeShellCommand will return the PortValue itself, else it will return Blank. So, checking the returned string, I can decide.
So far so good.
Now, I want to be make my Project completely Crash-proof. So, before firing the netstat command, I want to make sure if it really exists or not. I want help in this case. I know, It's kind of stupid to doubt the availability of netstat command in a linux machine. I am just thinking of some user who removed netstat binary from his machine for some reason.
N.B. : I don't want make a bind() call to chack if the port is available or not. Also, it will be best if I can check if netstat command is available without calling _executeShellCommand for another time (i.e. without executing another Shell Command).
An even better idea is to make your code work completely without netstat altogether.
On Linux, all that netstat does (for your use case) is read the contents of /proc/net/tcp, which enumerates all ports in use.
All you have to do is open /proc/net/tcp yourself, and parse it. This becomes just an ordinary, boring, file parsing code. Can't get much more "crash-proof" than that.
You will find the documentation of the format of /proc/net/tcp in Linux manual pages.
In the unlikely event that you need to check UDP ports, this would be /proc/net/udp.
Of course, there is a race window between the time you check /proc/net/tcp, where someone can grab the port. But that's also true with netstat as well, and since that's going to be a much slower process, this will actually be an improvement, and reduce the race window significantly.
Since you're asking for a way to check if netstat command is available, I won't try to suggest the other ways in C++. The shell way is checking return code of the following command:
command -v netstat
If netstat binary is available in $PATH, then the command returns 0. In Bash it usually looks like this:
command -v netstat
if [ $? -eq 0 ]; then
netstat # ...
else
echo >&2 "Error: netstat is not available"
fi
Or simply
command -v netstat >/dev/null && netstat # ...
As I'm writing a test, there is special case of a FILE* which I handle this way:
// first we open the file
bool FileOutput::open(std::string const& filename)
{
if(f_file.is_open())
{
throw std::runtime_error("already open");
}
f_file.open(filename.c_str());
if(!f_file.is_open())
{
return false;
}
return true;
}
// later we write to it
void FileOutput::internal_write(std::string const& data)
{
f_file << data;
if(!f_file)
{
throw std::runtime_error("I/O error: could not write to output.");
}
}
As I'm writing my test, I would like to induce an I/O error in internal_write(). In other words, I was the << operator (and whatever underlying I/O functions) to generate an error so that !f_file becomes true.
This is for a test to make sure that errors to indeed end up in throwing. So I'm not looking at writing the code differently.
Note that closing the file is not a good idea and the f_file is not accessible form the outside and there are no close() functions (it closes when the object gets destroyed.)
I looked into locks, but it does not look like that would work. It would just block here without timeout while a thread blocks the file for a little while. What else could be done?
Note that closing the file is not a good idea and the f_file is not accessible form the outside and there are no close() functions (it closes when the object gets destroyed.)
Is closing bad just because f_file is not accessible? If so then write a shell script that while your test program is running gets PID of the test program and closes a required file handle.
The scripts does two steps:
1) Run lsof to get file handle of your file
lsof -p PID | grep yourfile | awk '{print $4}' | tr -d wru
In the 4th field there will be the number of the file handle for your file. Assing it to the environment variable HANDLE_ID_FROM_LSOF.
2) Run gdb to close the handle you need to become bad
gdb --batch-silent your-program -p PID -ex "call close($HANDLE_ID_FROM_LSOF)" -ex "detach"
After running gdb the file handle will be closed and hopefully you will get an error you need. At least with strace it looks like this:
write(1, "10606216\n10606217\n10606218\n10606"..., 65536) = 65536
write(1, "7\n10613498\n10613499\n10613500\n106"..., 65536) = 65536
write(1, "779\n10620780\n10620781\n10620782\n1"..., 65536) = 65536
write(1, "\n15827311\n15827312\n15827313\n1582"..., 65536) = -1 EBADF (Bad file descriptor)
write(1, "\n15834593\n15834594\n15834595\n1583"..., 65536) = -1 EBADF (Bad file descriptor)
write(1, "\n15841875\n15841876\n15841877\n1584"..., 65536) = -1 EBADF (Bad file descriptor)
This is an example of this script:
$ cat close_file.sh
#!/bin/sh
TARGET_PROG=$1
TARGET_PID=$2
TARGET_FILE=$3
TARGET_FILE_FD=$(lsof -p $TARGET_PID | grep seq.txt | awk '{print $4}' | tr -d wur)
gdb --batch-silent $TARGET_PROG -p $TARGET_PID -ex "call close($TARGET_FILE_FD)" -ex "detach"
The first parameter - the full path to your program, the second parameter - the process ID, the third parameter -- the name of the file that needs to be closed.
This example shows how this script can be run. First run your test program. Then run the script. This is an example of running it:
$ ./close_file.sh /usr/bin/seq 16640 seq.txt