shell script pattern matching? - regex

I think I've written maybe one shell script my entire life, and I'm not even sure if it's possible to do this, but I'm trying to write a script that will ftp the contents of a directory, one at a time. That is, it'll ftp one and then close the connection, then ftp the second, and close that etc. This is because there may be up to five files in a directory all of which are a minimum of 2GB each. FTPing them all at once always results in a reset connection. I thought that if I could match by partial filename, then perhaps that will help, as they are all named the same way.
So, in a directory, it'll have:
SampleFileA_20100322_1.txt
SampleFileA_20100322_2.txt
SampleFileB_20100322_1.txt
SampleFileC_20100322_1.txt
I'd like to ftp SampleFileA_xxxx_1 first, then SampleFileA_xxxx_2, etc. This is the current ftp script, which tries to download everything all at once...
#!/bin/bash
REMOTE='ftp.EXAMPLE.com'
USER='USERNAME'
PASSWORD='PASSWORD'
FTPLOG='/tmp/ftplog'
date >> $FTPLOG
ftp -in $REMOTE <<EOF
_FTP>>$FTPLOG
quote USER $USER
quote PASS $PASSWORD
bin
cd download
mget *
quit
_FTP
:wq!

based on your question I think you need something like
files=`ls Sample*txt`
for file in $files
do
run_ftp_function $file
done
you'll need to setup "run_ftp_function" to do the send (like you already have) using $1 as the file to send

Related

Using regex and pscp on AIX servers

I've been having lots of troubles using pscp and regex on AIX server.
As you can see, I am trying to use Putty's pscp to transfer file "stdout" to my local folder.
This usually works fine, but my problem is that I won't know exact same folder name so I need to use REGEX.
I've been told that possibly my regex was written for grep and that it wasn't supported by pscp.
What would be alternative then to write regex for pscp.
Error message is: "multiple-level wildcards unsupported"
pscp.exe -P 22 -pw krt_345 testuser#testserver5:"/app/log/s500/20201023/.\*/20201023-02\.2[0-9]\.[0-5][0-9]_s500_testuser.\*"/stdout C:\logs
regex only:
"/app/log/s500/20201023/.\*/20201023-02\.2[0-9]\.[0-5][0-9]_s500_testuser.\*"/stdout
With SCP protocol, the filemask in the path is resolved by the server. With a typical OpenSSH scp "server", you can use standard Linux glob masks. Definitely not regex. Though your mask is simple enough, that a simple glob mask 20201023-02.2[0-9].[0-5][0-9]_s500_testuser* would. But you can use glob mask for the last path component only. Not for the parent directory. What is what the "multiple-level wildcards unsupported" error message is trying to tell you.
So what you are doing is not doable with SCP. You would have to obtain the folder name using other means. Like using a shell commands over SSH.
And I belive you have asked for this already:
Finding folder name on a server using batch file
And based on your comments to the answer, you already know what you need to combine a shell command like find with scp. So I do not understand, why don't you ask for that.

How to run list of perl regex from file in terminal

I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!
Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.
It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

rsync --exclude-from 'list' file not working

I am trying to use rsync to complete an unfinished transfer from a remote server to a local machine using
rsync -a user#domain.com:~/source/ /dest/
where /dest/ is the location of the partially completed transfer. However, due to bandwidth concerns I need to run rsync to a /tmp_dest/ on a different machine that does not have a copy of /dest/, from where I can then later move /tmp_dest/ to /dest/
The solution I have come up with thus far is to use rync's --exclude-from option, using a file containing a complete list of files from /dest/.
The command would look something like this
rsync -a --exclude-from 'list.txt' user#domain.com:~/source/ /tmp_dest/
At this point I feel as though I have scoured everywhere for a solution and tried every variant I came across.
This included relative and absolute paths for the 'list.txt'
relative:
path 1/file 1
path 2/file 2
--or--
absolute:
/absolute/source/path 1/file 1
/absolute/source/path 2/file 2
I have tried the above with combinations of including - to explicitly exclude that line (where I have seen examples of people wanting to also + other files)
- /absolute/source/path 1/file 1
- /absolute/source/path 2/file 2
I have tried putting leading **/ in front of the file paths to rectify the relative path problem
**/path 1/file 1
**/path 2/file 2
I have also tried navigating to the directory containing 'list' and executing rsync from there, to avoid the issue where rsync looks for
/path/to/the/list/something1/to.exclude
/path/to/the/list/something2/to.exclude
/path/to/the/list/something3/to.exclude
and undoubtedly finding nothing
I have also ensued that the correct line breaks are being used in the 'list' file. i.e. LF (Unix) line breaks.
I have tried to create the 'list' with the following command
find . -type f | tee list.txt
this initially created a file looking something like this
./yyyy-mm-dd folder 1/sub folder [foo]/file.a
./(yyyy) folder 2 {foo2}/file.b
./folder, 3/sub-folder 3/file.c
as you can see, there are spaces and other characters in the file paths, but from my current understanding, this shouldn't affect. But perhaps I am mistaken and will need to escape any characters with special meaning, which I may then need help with
which I then perform a replace on ./ in notepad++ or some other text editor that preserves the LF (Unix) line breaks to get the desired result.
(e.g. as above, I've tried replacing ./ with nothing, with /absolute/path/for/source/ noting the leading slash, or even double wildcards to match any parent tree structure containing the files.
The only thing I feel that I haven't tried is escaping the spaces in the file names and paths, but I have read that this shouldn't be an issue.
Perhaps I am overlooking something and any help would be appreciated.
Here is from rsync man page how to use "--exclude-from":
--exclude-from=FILE read exclude patterns from FILE
Use the following command:
rsync -a --exclude-from=list.txt user#domain.com:~/source/ /tmp_dest/
And also it is better to use full path name of list.txt file

Bash, Netcat, Pipes, perl

Background: I have a fairly simple bash script that I'm using to generate a CSV log file. As part of that bash script I poll other devices on my network using netcat. The netcat command returns a stream of information that I can pipe that into a grep command to get to certain values I need in the CSV file. I save that return value from grep into a bash variable and then at the end of the script, I write out all saved bash variables to a CSV file. (Simple enough.)
The change I'd like to make is the amount of netcat commands I have to issue for each piece of information I want to save off. With each issued netcat command I get ALL possible values returned (so each time returns the same data and is burdensome on the network). So, I'd like to only use netcat once and parse the return value as many times as I need to create the bash variables that can later be concatenated together into a single record in the CSV file I'm creating.
Specific Question: Using bash syntax if I pass the output of the netcat command to a file using > (versus the current grepmethod) I get a file with each entry on its own line (presumably separated with the \n as the EOL record separator -- easy for perl regex). However, if I save the output of netcat directly to a bash variable, and echo that variable, all of the data is jumbled together, so it is cumbersome to parse out (not so easy).
I have played with two options: First, I think a perl one-liner may be a good solution here, but I'm not sure how to best execute it. Pseudo code might be to save the netcat output to a a bash variable and then somehow figure out how to parse it with perl (not straight forward though).
The second option would be to use bash's > and send netcat's output to a file. This would be easy to process with perl and Regex given the \n EOL, but that would require opening an external file and passing it to a perl script for processing AND then somehow passing its return value back into the bash script as a bash variable for entry into the CSV file.
I know I'm missing something simple here. Is there a way I can force a newline entry into the bash variable from netcat and then repeatedly run a perl-one liner against that variable to create each of the CSV variables I need -- all within the same bash script? Sorry, for the long question.
The second option would be to use bash's > and send netcat's output to
a file. This would be easy to process with perl and Regex given the \n
EOL, but that would require opening an external file and passing it to
a perl script for processing AND then somehow passing its return value
back into the bash script as a bash variable for entry into the CSV
file.
This is actually a fairly common idiom: save the output from netcat in
a temporary file, then use grep or awk or perl or what-have-you as
many times as necessary to extract data from that file:
# create a temporary file and arrange to have it
# deleted when the script exists.
tmpfile=$(mktemp tmpXXXXXX)
trap "rm -f $tmpfile" EXIT
# dump data from netcat into the
# temporary file.
nc somehost someport > $tmpfile
# extract some information into variable `myvar`
myvar=$(awk '/something/ {print $4}' $tmpfile)
That last line demonstrates how to get the output of something (in this case, an awk script) into a variable. If you were using perl to extract some information you could do the same thing.
You could also just write the whole script in perl, which might make your life easier.

How do I use Procmail with PHP?

I'm trying to use procmail to send emails to a PHP script so the script will check a MySQL database and edit the subject line based on the sender email. I believe I've got a working procmail to do this:
:0:
* ^To:.*#barrett.com
! '/usr/local/bin/php-5.2 -f $HOME/ticket/emailcustcheck.php'
However, I'm not sure exactly how procmail executes the command. How does the email get passed to the PHP script, and therefore, how do I refer to it inside the script?
The correct syntax for piping to a script is
:0 # no lock file
* ^To:.*#barrett\.com
| /usr/local/bin/php-5.2 -f $HOME/ticket/emailcustcheck.php # no quotes, use pipe
The ! action would attempt to forward to an email address, but of course, the long quoted string with the path to your PHP interpreter is not a valid email address.
If you need locking (i.e. no two instances of this PHP script are allowed to run at the same time), you need to name a lock file; Procmail cannot infer a lock file name here, so the lock action you had would only produce an error message anyway. If you are uncertain, adding a named lock file is the safer bet, but if you don't have concurrency issues (such as, the script needs to write to a database while no other process is using the database) it should not be necessary, and could potentially slow down processing.
The condition regex also looks somewhat imprecise, but I can only speculate that you might want to trigger on Cc mail as well as direct To:. Look up the ^TO_ macro in the documentation if so.
The script gets the message as its standard input; it should probably read all input lines to an array, or split into two arrays so that everything before the first empty line goes into the "headers" array and the rest goes into the "body" array. Or perhaps PHP has some class which can read an email message into an object from standard input.
:0 wf
* ^To:.*#barrett\.com
| /usr/local/bin/php-5.2 -f $HOME/ticket/emailcustcheck.php
The f tells procmail that you are going to filter the message ie change it.
The w Wait for the filter or program to finish and check its exitcode.
If you want to work only on the body of the message you must add the flag b
If you want to work only on the header of the message you must add the flag h