Perl - Read input pipe as single variable - regex

I have a very simple perl script which I perform some regex on the value of the data piped to my perl command. ex:
cat /tmp/myfile.txt | perl -wnE"say for /my_pattern/gi"
Note, I do realize in my current setup that the -n option wraps my command ex:
while(<>){
say for /my_pattern/gi
}
...thus iterating over each line of input.
I'd like to change this where I can perform a regex against the final output of my cat command. All examples I find show processing input line by line.
Any help would be appreciated.
Update:
To be clear, I'm referring to any pipe, not only reading from a file as in my example (think curl, wget, echo, etc...) I'm not sure it is even possible given the fact that the originating command could be long-lived or run for an "indefinite" period of time.

To answer your question directly:
cat aaaa.txt | perl -ne 'BEGIN{local $/};print for /a/gi'
To get what your stuff work:
cat aaaa.txt | perl -ne ';print if /aaaa/gi'

Related

Extracting group from regex in shell script using grep

I want to extract the output of a command run through shell script in a variable but I am not able to do it. I am using grep command for the same. Please help me in getting the desired output in a variable.
x=$(pwd)
pw=$(grep '\(.*\)/bin' $x)
echo "extracted is:"
echo $pw
The output of the pwd command is /opt/abc/bin/ and I want only /root/abc part of it. Thanks in advance.
Use dirname to get the path and not the last segment of the path.
You can use:
x=$(pwd)
pw=`dirname $x`
echo $pw
Or simply:
pw=`dirname $(pwd)`
echo $pw
All of what you're doing can be done in a single echo:
echo "${PWD%/*}"
$PWD variable represents current directory and %/* removes last / and part after last /.
For your case it will output: /root/abc
The second (and any subsequent) argument to grep is the name of a file to search, not a string to perform matching against.
Furthermore, grep prints the matching line or (with -o) the matching string, not whatever the parentheses captured. For that, you want a different tool.
Minimally fixing your code would be
x=$(pwd)
pw=$(printf '%s\n' "$x" | sed 's%\(.*\)/bin.*%\1%')
(If you only care about Bash, not other shells, you could do sed ... <<<"$x" without the explicit pipe; the syntax is also somewhat more satisfying.)
But of course, the shell has basic string manipulation functions built in.
pw=${x%/bin*}

How to replace using sed command in shell scripting to replace a string from a txt file present in one directory by another?

I am very new to shell scripting and trying to learn the "sed" command functionality.
I have a file called configurations.txt with some variables defined in it with some string values initialised to each of them.
I am trying to replace a string in a file (values.txt) which is present in some other directory by the values of the variables defined. The name of the file is values.txt.
Data present in configurations.txt:-
mem="cpu.memory=4G"
proc="cpu.processor=Intel"
Data present in the values.txt (present in /home/cpu/script):-
cpu.memory=1G
cpu.processor=Dell
I am trying to make a shell script called repl.sh and I dont have alot of code in it for now but here is what I got:-
#!/bin/bash
source /home/configurations.txt
sed <need some help here>
Expected output is after an appropriate regex applied, when I run script sh repl.sh, in my values.txt , It must have the following data present:-
cpu.memory=4G
cpu.processor=Intell
Originally which was 1G and Dell.
Would highly appreciate some quick help. Thanks
This question lacks some sort of abstract routine and looks like "help me do something concrete please". Thus it's very unlikely that anyone would provide a full solution for that problem.
What you should do try to split this task into number of small pieces.
1) Iterate over configuration.txt and get values from each line. To do that you need to get X and Y from a value="X=Y" string.
This regex could be helpful here - ([^=]+)=\"([^=]+)=([^=]+)\". It contains 3 matching groups separated by ". For example,
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\1/' configurations.txt
mem
proc
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\2/' configurations.txt
cpu.memory
cpu.processor
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\3/' configurations.txt
4G
Intel
2) For each X and Y find X=Z in values.txt and substitute it with a X=Y.
For example, let's change cpu.memory value in values.txt with 4G:
>> X=cpu.memory; Y=4G; sed -r "s/(${X}=).*/\1${Y}/" values.txt
cpu.memory=4G
cpu.processor=Dell
Use -i flag to do changes in place.
Here is an awk based answer:
$ cat config.txt
cpu.memory=4G
cpu.processor=Intel
$ cat values.txt
cpu.memory=1G
cpu.processor=Dell
cpu.speed=4GHz
$ awk -F= 'FNR==NR{a[$1]=$2; next;}; {if($1 in a){$2=a[$1]}}1' OFS== config.txt values.txt
cpu.memory=4G
cpu.processor=Intel
cpu.speed=4GHz
Explanation: First read config.txt & save in memory. Then read values.txt. If a particular value was defined in config.txt, use the saved value from memory (config.txt).

Converting LaTeX pmatrix command to amsmath pmatrix environment using sed

I have an old LaTeX document (with a lot of formatting commands) that I want to convert to the more modern LaTeX (I want to do the update for several reasons, not the least of which is to reduce the coupling between content and formatting). At any rate, the document has a lot of calls to the deprecated command \pmatrix{ .... } which I would like to replace with the new amsmath command \begin{pmatrix} ... \end{pmatrix}. I have been trying to use sed to do this conversion but I have never used it before and I am having trouble.
Here is a MWE
LaTeX input string
\pmatrix{0&0\cr \frac{1}{2}&0\cr 0&0\cr}\pmatrix{1&1\cr 1&1\cr 1&1\cr}
with the expected output
\begin{pmatrix}0&0\\ \frac{1}{2}&0\\ 0&0\end{pmatrix}\begin{pmatrix}1&1\\ 1&1\\ 1&1\end{pmatrix}
The commands that I have been trying to use are variants of the following
sed 's/\\pmatrix{\(.*\cr[ ]*\)}/\\begin{pmatrix}\1 \\end{pmatrix}/g' <$WORKING_FILE >$OUTPUT_FILE
but the closest output that I have been able to achieve is
\begin{pmatrix}0 & 0 \\ 0 & 0 \\ 0 & 0 \end{pmatrix}
I am pretty sure that the problem is related to having two calls to pmatrix side by side, but I am not sure how to modify the regex to make this work.
I have searched google, but being so new to regex, I just got confused by all of the variations out there and which to use, and how to properly format such a thing.
The following might work for you:
sed -re 's/(\\pmatrix)\{([^}]*)}/\\begin{pmatrix}\2\\end{pmatrix}/g' -e 's/\\cr/\\\\/g' -e 's/\\\\\\end/\\end/g' inputfile
This works by:
substituting \pmatrix{...} with `\begin{matrix}...\end{matrix}
substituting \cr with \\
handling \\\end to make it \end
EDIT: As per your update, you might be better off splitting the relevant parts using grep before piping to sed:
grep -oP '\\pmatrix.*?\\cr}' inputfile | sed -re 's/\\pmatrix\{(.*)}/\\begin{pmatrix}\1\\end{pmatrix}/g;s/\\cr/\\\\/g;s/\\\\\\end/\\end/g'
This might work for you (GNU sed):
sed -r 's/\\cr/\n/g;s/\\(pmatrix)\{([^\n]*)\n([^\n]*)\n([^\n]*)\n\}/\\begin{\1}\2\\\\ \3\\\\ \4\\end{\1}/g;s/\n/\\cr/g' file
Convert \\cr to newlines. Do a global substitution command. Then convert those newlines left back to \\cr's.

Complex changes to a URL with sed

I am trying to parse an RSS feed on the Linux command line which involves formatting the raw output from the feed with sed.
I currently use this command:
feedstail -u http://www.heise.de/newsticker/heise-atom.xml -r -i 60 -f "{published}> {title} {link}" | sed 's/^\(.\{3\}\)\(.\{13\}\)\(.\{6\}\)\(.\{3\}\)\(.*\)/\1\3\5/'
This gives me a number of feed items per line that look like this:
Sat 20:33 GMT> WhatsApp-Ausfall: Server-Probleme blockieren Messaging-Dienst http://www.heise.de/newsticker/meldung/WhatsApp-Ausfall-Server-Probleme-blockieren-Messaging-Dienst-2121664.html/from/atom10?wt_mc=rss.ho.beitrag.atom
Notice the long URL at the end. I want to shorten this to better fit on the command line. Therefore, I want to change my sed command to produce the following:
Sat 20:33 GMT> WhatsApp-Ausfall: Server-Probleme blockieren Messaging-Dienst http://www.heise.de/-2121664
That means cutting everything out of the URL except a dash and that seven digit number preceeding the ".html/blablabla" bit.
Currently my sed command only changes stuff in the date bit. It would have to leave the title and start or the URL alone and then cut stuff out of it until it reaches the seven digit number. It needs to preserve that and then cut everything after it out. Oh yeah, and we need to leave a dash right in front of that number too.
I have no idea how to do that and can't find the answer after hours of googling. Help?
EDIT:
This is the raw output of a line of feedstail -u http://www.heise.de/newsticker/heise-atom.xml -r -i 60 -f "{published}> {title} {link}", in case it helps:
Sat, 22 Feb 2014 20:33:00 GMT> WhatsApp-Ausfall: Server-Probleme blockieren Messaging-Dienst http://www.heise.de/newsticker/meldung/WhatsApp-Ausfall-Server-Probleme-blockieren-Messaging-Dienst-2121664.html/from/atom10?wt_mc=rss.ho.beitrag.atom
EDIT 2:
It seems I can only pipe that output into one command. Piping it through multiple ones seems to break things. I don't understand why ATM.
Unfortunately (for me), I could only think of solving this with extended regexp syntax (either -E or -r flag on different systems):
... | sed -E 's|(://[^/]+/).*(-[0-9]+)\.html/.*|\1\2|'
UPDATE: In basic regexp syntax, the best I can do is
... | sed 's|\(://[^/]*/\).*\(-[0-9][0-9]*\)\.html/.*|\1\2|'
The key to writing this sort of regular expression is to be very careful about what the boundaries of what you expect are, so as to avoid the random gunk that you want to get rid of causing you problems. Also, you should bear in mind that you can use characters other than / as part of a s operation's delimiters.
sed 's!\(http://www\.heise\.de/\)newsticker/meldung/[^./]*\(-[0-9]+\)\.html[^ ]*!\1\2!'
Be aware that getting the RE right can be quite tricky; assume you'll need to test it! (This is a key part of the “now you have two problems” quote; REs very easily become horrendous.)
Something like this maybe?
... | awk -F'[^0-9]*' '{print "http://www.heise.de/-"$2}'
This might work for you (GNU sed):
sed 's|\(//[^/]*/\).*\(-[0-9]\{7\}\).*|\1\2|' file
You can place the first sed command so:
feedstail -u http://www.heise.de/newsticker/heise-atom.xml -r -i 60 -f "{published}> {title} {link}" |
sed 's/^\(.\{3\}\)\(.\{13\}\)\(.\{6\}\)\(.\{3\}\)\(.*\)/\1\3\5/;s|\(//[^/]*/\).*\(-[0-9]\{7\}\).*|\1\2|'

extract and replace a token in shell script

In a source code file, I need to change all the instances like
foo_bar('param')
to become
booch['param']=$param
Here, param is a token and it can vary. It can be anything.
i.e., in a file,
all the
foo_bar('xxx')
have to become
booch['xxx']=$xxx
and,
all the
foo_bar('yyy')
have to become
booch['yyy']=$yyy
and so on. I'm trying to do it by writing a shell script and using sed. But couldnt figure out how to do it. Hope my query is understandable. Thanks
sed "s/foo_bar('\([^']*\)')/booch['\1']=$\1/g" infile > outfile
Test
$ echo "foo_bar('xxx')" | sed "s/foo_bar('\(.*\)')/booch['\1']=$\1/"
booch['xxx']=$xxx
Basically the same thing, using perl
cat yourfile.txt | perl -pe "s/(foo_bar)\(\s*('(\w*)')\s*\)/booch\[\$2\]=\\$\$3/" > youralteredfile.txt