Why the following variations of s/g from command line are wrong? - regex

I have a small file as follows:
$ cat text.txt
vacation
cat
This is a test
This command substitutes all occurrences of cat to CAT correctly as I wanted:
$ perl -p -i -e '
s/cat/CAT/g;
' text.txt
But why the following two mess the file up?
The following deletes the contents of the file
$ perl -n -i -e '
$var = $_;
$var =~ s/cat/CAT/g;
' text.txt
And this one just does not do the substitution correctly
$ perl -p -i -e '
$var = $_;
$var =~ s/cat/CAT/g;
' text.txt
$ cat text.txt
cation
cat
This is a test
Why? What am I messing up here?

-p prints out each line automatically (the contents of $_, which contains the current line's contents), which re-populates the file (due to the -i flag in use), where -n loops over the file like -p does, but it doesn't automatically print. You have to do that yourself, otherwise it just overwrites the file with nothing. The -n flag allows you to skip over lines that you don't want to re-insert into the original file (amongst other things), whereby with -p, you'd have to use conditional statements along with next() etc. to achieve the same result.
perl -n -i -e '
$var = $_;
$var =~ s/cat/CAT/g;
print $var;
' text.txt
See perlrun.
In your last example, -p will only automatically print $_ (the original, unmodified line). It doesn't auto-print $var at all, so in that case, you'd have to print $var like in the example above, but then you'd get both the original line, and the modified one printed to the file.
You're better off not assigning $_ to anything if all you're doing is overwriting a file. Just use it as is. eg. (same as your first example):
perl -p -i -e '
s/cat/CAT/g;
' text.txt

Related

How can I format this data with bash script

I want to format data from this
header1|header2|header3
"ID001"|"""TEST"""|"
TEST TEST TEST"|"TEST 4"
"ID002"|"TEST"|"TESTTESTTEST"|"TEST 5"
into
header1|header2|header3
"ID001"|"TEST"|"TEST TEST TEST"|"TEST 4"
"ID002"|"TEST"|"TESTTESTTEST"|"TEST 5"
So the logics are
keep the header as original
check other lines if not start with " then move this line to end of previous line
replace """ to "
I want to format this with bash script.
I've created this line but still not working
#!/bin/bash
if [ $# -eq 0 ]
then
echo "No arguments supplied"
exit;
fi
FOLD=$1"*"
CHECK=$1"/bix.done"
if test -f $CHECK; then
date > /result.txt
echo "starting Covert.... "
echo "from folder : " $1
for file in $FOLD
do
if [[ $file != *History* ]]; then
if [[ $file == *.csv ]]; then
FILETEMP=$file".temp"
mv $file $FILETEMP
awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' $FILETEMP > $file
#rm $FILETEMP
fi
fi
done
date > /home/result.txt
fi
#ls $1 -l
This might work for you (GNU sed):
sed '1b;:a;N;/\n"/!s/\n//;ta;s/"""/"/g;P;D' file
Always print the first header line. Append the next line to the current line and if that line does not begin with a " remove the newline and repeat until there is such a line. Now substitute a single " for """ globally, print the first line and repeat.
Specific to joining the 2nd line and condensing the muliple-double quotes to a single double-quote you could do:
sed '2{s/""*/"/g;h;N;s/\n//}' file
print all lines by default, except for
the 2 second line, then
s/""*/"/g substitute multiple double quotes for a single double quote,
h copy pattern-space to hold space,
N append the next line to hold space, and
s/\n// substitute the '\n' with nothing joining the line.
Example Use/Output
With your data in file you could do:
$ sed '2{s/""*/"/g;h;N;s/\n//}' file
header1|header2|header3
"ID001"|"TEST"|"TEST TEST TEST"|"TEST 4"
"ID002"|"TEST"|"TESTTESTTEST"|"TEST 5"
(note: if you need to condense multiple double quotes to single double quotes in all lines, you can turn the command around and use sed 's/""*/"/g;2{h;N;s/\n//}')
It's been resolved with below codes
if test -f $CHECK; then
date > /home/startconvert.txt
echo "starting Convert.... "
echo "from folder : " $1
for file in $FOLD
do
if [[ $file != *History* ]]; then
if [[ $file == *.csv ]]; then
#FILETEMP=$file".temp"
#mv $file $FILETEMP
#awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' $FILETEMP > $file
#rm $FILETEMP
perl -i -0777pe 's/\r\n([^"])/ $1/g' $file;
perl -i -0777pe 's/\n"""/"/' $file;
perl -i -0777pe 's/\r("\|)/ $1/g' $file;
sed -i -e 's/"""/"/g' $file;
perl -i -0777pe 's/\n([^"])/ $1/g' $file;
perl -i -0777pe 's/\n("\|)/ $1/g' $file;
sed -i -e 's/""-/-/g' $file;
perl -i -0777pe 's/\n([^"])/ $1/g' $file;
perl -i -0777pe 's/\r([^"])/ $1/g' $file;
perl -i -0777pe 's/\r\n([^"])/ $1/g' $file;
fi
fi
done
date > /home/endconvert.txt
fi
Not sure about the bash part, this expression though,
[\r\n]^([^"])
with a replacement of $1 might be somewhat close.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

Bash replace '\n\n}' string in file

I've got files repeatedly containing the string \n\n} and I need to replace such string with \n} (removing one of the two newlines).
Since such files are dynamically generated through a bash script, I need to embed replacing code inside the script.
I tried with the following commands, but it doesn't work:
cat file.tex | sed -e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | perl -p00e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | awk -v RS="" '{gsub (/\n\n}/, "\nb")}1' # it does work, but not for large files
You didn't provide any sample input and expected output so it's a guess but maybe this is what you're looking for:
$ cat file
a
b
c
}
d
$ awk '/^$/{f=1;next} f{if(!/^}/)print "";f=0} 1' file
a
b
c
}
d
a way with sed:
sed -i -n ':a;N;$!ba;s/\n\n}/\n}/g;p' file.tex
details:
:a # defines the label "a"
N # append the next line to the pattern space
$!ba # if it is not the last line, go to label a
s/\n\n}/\n}/g # replace all \n\n} with \n}
p # print
The i parameter will change the file in place.
The n parameter prevents to automatically print the lines.
This Perl command will do as you ask
perl -i -0777 -pe's/\n(?=\n})//g' file.tex
This should work:
cat file.tex | sed -e 's/\\n\\n}/\\n}/g'
if \n\n} is written as raw string.
Or if it's new line:
cat file.tex | sed -e ':a;N;$!ba;s/\n\n}/\n}/g'
Another method:
if the first \n is any new line:
text=$(< file.tex)
text=${text//$'\n\n}'/$'\n}'}
printf "%s\n" "$text" #> file
If the first \n is an empty line:
text=$(< file.tex)
text=${text//$'\n\n\n}'/$'\n\n}'}
printf "%s\n" "$text" #> file
Nix-style line filters process the file line-by-line. Thus, you have to do something extra to process an expression which spans lines.
As mentioned by others, '\n\n' is simply an empty line and matches the regular expression /^$/. Perhaps the most efficient thing to do is to save each empty line until you know whether or not the next one will contain a close bracket at the beginning of the line.
cat file.tex | perl -ne 'if ( $b ) { print $b unless m/^\}/; undef $b; } if ( m/^$/ ) { $b=$_; } else { print; } END { print $b if $b; }'
And to clean it all up we add an END block, to process the case that the last line in the file is blank (and we want to keep it).
If you have access to node you can use rexreplace
npm install -g regreplace
and then run
rexreplace '\n\n\}' '\n\}' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\n\n\}' '\n\}' data/*.txt

Get all variables in bash from text line

Suppose I have a text line like
echo -e "$text is now set for ${items[$i]} and count is ${#items[#]} and this number is $((i+1))"
I need to get all variables (for example, using sed) so that after all I have list containing: $text, ${items[$i]}, $i, ${#items[#]}, $((i+1)).
I am writing script which have some complex commands and before executing each command it prompts it to user. So when my script prompts command like "pacman -S ${softtitles[$i]}" you can't guess what this command is actually does. I just want to add a list of variables used in this command below and it's values. So I decided to do it via regex and sed, but I can't do it properly :/
UPD: It can be just a string like echo "$test is 'ololo', ${items[$i]} is 'today', $i is 3", it doesn't need to be list at all and it can include any temporary variables and multiple lines of code. Also it doesn't have to be sed :)
SOLUTION:
echo $m | grep -oP '(?<!\[)\$[{(]?[^"\s\/\047.\\]+[})]?' | uniq > vars
$m - our line of code with several bash variables, like "This is $string with ${some[$i]} variables"
uniq - if we have string with multiple same variables, this will remove dublicates
vars - temporary file to hold all variables found in text string
Next piece of code will show all variables and its values in fancy style:
if [ ! "`cat vars`" == "" ]; then
while read -r p; do
value=`eval echo $p`
Style=`echo -e "$Style\n\t$Green$p = $value$Def"`
done < vars
fi
$Style - predefined variable with some text (title of the command)
$Green, $Def - just tput settings of color (green -> text -> default)
Green=`tput setaf 2`
Def=`tput sgr0`
$p - each line of vars file (all variables one by one) looped by while read -r p loop.
You could simply use the below grep command,
$ grep -oP '(?<!\[)(\$[^"\s]+)' file
$text
${items[$i]}
${#items[#]}
$((i+1))
I'm not sure its perfect , but it will help for you
sed -r 's/(\$[^ "]+)/\n\1\n/g' filename | sed -n '/^\$/p'
Explanation :
(\$[^ "]+) - Match the character $ followed by any charter until whitespace or double quote.
\n\1\n - Matched word before and after put newlines ( so the variable present in separate line ) .
/^\$/p - start with $ print the line like print variable
A few approaches, I tested each of them on file which contains
echo -e "$text is now set for ${items[$i]} and count is ${#items[#]} and this number is $((i+1))"
grep
$ grep -oP '\$[^ "]*' file
$text
${items[$i]}
${#items[#]}
$((i+1))
perl
$ perl -ne '#f=(/\$[^ "]*/g); print "#f"' file
$text ${items[$i]} ${#items[#]} $((i+1))
or
$ perl -ne '#f=(/\$[^ "]*/g); print join "\n",#f' file
$text
${items[$i]}
${#items[#]}
$((i+1))
The idea is the same in all of them. They will collect the list of strings that start with a $ and as many subsequent characters as possible that are neither spaces nor ".

bash regex multiple match in one line

I'm trying to process my text.
For example i got:
asdf asdf get.this random random get.that
get.it this.no also.this.no
My desired output is:
get.this get.that
get.it
So regexp should catch only this pattern (get.\w), but it has to do it recursively because of multiple occurences in one line, so easiest way with sed
sed 's/.*(REGEX).*/\1/'
does not work (it shows only first occurence).
Probably the good way is to use grep -o, but i have old version of grep and -o flag is not available.
This grep may give what you need:
grep -o "get[^ ]*" file
Try awk:
awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
You might need to tweak the regex between the slashes for your specific issue. Sample output:
$ awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
get.this
get.that
get.it
With awk:
awk -v patt="^get" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
printf "%s%s", $i, OFS;
print ""
}' <<< "$text"
bash
while read -a words; do
for word in "${words[#]}"; do
if [[ $word == get* ]]; then
echo -n "$word "
fi
done
echo
done <<< "$text"
perl
perl -lane 'print join " ", grep {$_ =~ /^get/} #F' <<< "$text"
This might work for you (GNU sed):
sed -r '/\bget\.\S+/{s//\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1 /g;s/ $//}' file
or if you want one per line:
sed -r '/\n/!s/\bget\.\S+/\n&\n/g;/^get/P;D' file

Extract all numbers from a text file and store them in another file

I have a text file which have lots of lines. I want to extract all the numbers from that file.
File contains text and number and each line contains only one number.
How can i do it using sed or awk in bash script?
i tried
#! /bin/bash
sed 's/\([0-9.0-9]*\).*/\1/' <myfile.txt >output.txt
but this didn't worked.
grep can handle this:
grep -Eo '[0-9\.]+' myfile.txt
-o tells to print only the matches and [0-9\.]+ is a regular expression to match numbers.
To put all numbers on one line and save them in output.txt:
echo $(grep -Eo '[0-9\.]+' myfile.txt) >output.txt
Text files should normally end with a newline characters. The use of echo above assures that this happens.
Non-GNU grep:
If your grep does not support the -o flag, try:
echo $(tr ' ' '\n' <myfile.txt | grep -E '[0-9\.]+') >output.txt
This uses tr to replace all spaces with newlines (so each number appears separately on a line) and then uses grep to search for numbers.
tr -sc '0-9.' ' ' "$file"
Will transform every string of non-digit-or-period characters into a single space.
You can also use Bash:
while read line; do
if [[ $line =~ [0-9\.]+ ]]; then
echo $BASH_REMATCH
fi
done <myfile.txt >output.txt