Get all variables in bash from text line - regex

Suppose I have a text line like
echo -e "$text is now set for ${items[$i]} and count is ${#items[#]} and this number is $((i+1))"
I need to get all variables (for example, using sed) so that after all I have list containing: $text, ${items[$i]}, $i, ${#items[#]}, $((i+1)).
I am writing script which have some complex commands and before executing each command it prompts it to user. So when my script prompts command like "pacman -S ${softtitles[$i]}" you can't guess what this command is actually does. I just want to add a list of variables used in this command below and it's values. So I decided to do it via regex and sed, but I can't do it properly :/
UPD: It can be just a string like echo "$test is 'ololo', ${items[$i]} is 'today', $i is 3", it doesn't need to be list at all and it can include any temporary variables and multiple lines of code. Also it doesn't have to be sed :)
SOLUTION:
echo $m | grep -oP '(?<!\[)\$[{(]?[^"\s\/\047.\\]+[})]?' | uniq > vars
$m - our line of code with several bash variables, like "This is $string with ${some[$i]} variables"
uniq - if we have string with multiple same variables, this will remove dublicates
vars - temporary file to hold all variables found in text string
Next piece of code will show all variables and its values in fancy style:
if [ ! "`cat vars`" == "" ]; then
while read -r p; do
value=`eval echo $p`
Style=`echo -e "$Style\n\t$Green$p = $value$Def"`
done < vars
fi
$Style - predefined variable with some text (title of the command)
$Green, $Def - just tput settings of color (green -> text -> default)
Green=`tput setaf 2`
Def=`tput sgr0`
$p - each line of vars file (all variables one by one) looped by while read -r p loop.

You could simply use the below grep command,
$ grep -oP '(?<!\[)(\$[^"\s]+)' file
$text
${items[$i]}
${#items[#]}
$((i+1))

I'm not sure its perfect , but it will help for you
sed -r 's/(\$[^ "]+)/\n\1\n/g' filename | sed -n '/^\$/p'
Explanation :
(\$[^ "]+) - Match the character $ followed by any charter until whitespace or double quote.
\n\1\n - Matched word before and after put newlines ( so the variable present in separate line ) .
/^\$/p - start with $ print the line like print variable

A few approaches, I tested each of them on file which contains
echo -e "$text is now set for ${items[$i]} and count is ${#items[#]} and this number is $((i+1))"
grep
$ grep -oP '\$[^ "]*' file
$text
${items[$i]}
${#items[#]}
$((i+1))
perl
$ perl -ne '#f=(/\$[^ "]*/g); print "#f"' file
$text ${items[$i]} ${#items[#]} $((i+1))
or
$ perl -ne '#f=(/\$[^ "]*/g); print join "\n",#f' file
$text
${items[$i]}
${#items[#]}
$((i+1))
The idea is the same in all of them. They will collect the list of strings that start with a $ and as many subsequent characters as possible that are neither spaces nor ".

Related

Find multi-line text & replace it, using regex, in shell script

I am trying to find a pattern of two consecutive lines, where the first line is a fixed string and the second has a part substring I like to replace.
This is to be done in sh or bash on macOS.
If I had a regex tool at hand that would operate on the entire text, this would be easy for me. However, all I find is bash's simple text replacement - which doesn't work with regex, and sed, which is line oriented.
I suspect that I can use sed in a way where it first finds a matching first line, and only then looks to replace the following line if its pattern also matches, but I cannot figure this out.
Or are there other tools present on macOS that would let me do a regex-based search-and-replace over an entire file or a string? Maybe with Python (v2.7 and v3 is installed)?
Here's a sample text and how I like it modified:
keyA
value:474
keyB
value:474 <-- only this shall be replaced (follows "keyB")
keyC
value:474
keyB
value:474
Now, I want to find all occurances where the first line is "keyB" and the following one is "value:474", and then replace that second line with another value, e.g. "value:888".
As a regex that ignores line separators, I'd write this:
Search: (\bkeyB\n\s*value):474
Replace: $1:888
So, basically, I find the pattern before the 474, and then replace it with the same pattern plus the new number 888, thereby preserving the original indentation (which is variable).
You can use
sed -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
# Or, to replace the contents of the file inline in FreeBSD sed:
sed -i '' -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
Details:
/keyB$/ - finds all lines that end with keyB
n - empties the current pattern space and reads the next line into it
s/\(.*\):[0-9]*/\1:888/ - find any text up to the last : + zero or more digits capturing that text into Group 1, and replaces with the contents of the group and :888.
The {...} create a block that is executed only once the /keyB$/ condition is met.
See an online sed demo.
Use a perl one-liner with -0777 to scan over multiple lines:
$ # inline edit:
$ perl -0777 -i -pe 's/\bkeyB\s*value):\d*/$1:888/' file.txt
$ # to stdout:
$ cat file.txt | perl -0777 -pe 's/\bkeyB\s*value):\d*/$1:888/'
In plain bash:
#!/bin/bash
keypattern='^[[:blank:]]*keyB$'
valpattern='(.*):'
replacement=888
while read -r; do
printf '%s\n' "$REPLY"
if [[ $REPLY =~ $keypattern ]]; then
read -r
if [[ $REPLY =~ $valpattern ]]; then
printf '%s%s\n' "${BASH_REMATCH[0]}" "$replacement"
else
printf '%s\n' "$REPLY"
fi
fi
done < file

Pattern matching in if statement in bash

I'm trying to count the words with at least two vowels in all the .txt files in the directory. Here's my code so far:
#!/bin/bash
wordcount=0
for i in $HOME/*.txt
do
cat $i |
while read line
do
for w in $line
do
if [[ $w == .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
then
wordcount=`expr $wordcount + 1`
echo $w ':' $wordcount
else
echo "In else"
fi
done
done
echo $i ':' $wordcount
wordcount=0
done
Here is my sample from a txt file
Last modified: Sun Aug 20 18:18:27 IST 2017
To remove PPAs
sudo apt-get install ppa-purge
sudo ppa-purge ppa:
The problem is it doesn't match the pattern in the if statement for all the words in the text file. It goes directly to the else statement. And secondly, the wordcount in echo $i ':' $wordcount is equal to 0 which should be some value.
Immediate Issue: Glob vs Regex
[[ $string = $pattern ]] doesn't perform regex matching; instead, it's a glob-style pattern match. While . means "any character" in regex, it matches only itself in glob.
You have a few options here:
Use =~ instead to perform regular expression matching:
[[ $w =~ .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
Use a glob-style expression instead of a regex:
[[ $w = *[aeiouAEIOU]*[aeiouAEIOU]* ]]
Note the use of = rather than == here; while either is technically valid, the former avoids building finger memory that would lead to bugs when writing code for a POSIX implementation of test / [, as = is the only valid string comparison operator there.
Larger Issue: Properly Reading Word-By-Word
Using for w in $line is innately unsafe. Use read -a to read a line into an array of words:
#!/usr/bin/env bash
wordcount=0
for i in "$HOME"/*.txt; do
while read -r -a words; do
for word in "${words[#]}"; do
if [[ $word = *[aeiouAEIOU]*[aeiouAEIOU]* ]]; then
(( ++wordcount ))
fi
done
done <"$i"
printf '%s: %s\n' "$i" "$wordcount"
wordcount=0
done
Try:
awk '/[aeiouAEIOU].*[AEIOUaeiou]/{n++} ENDFILE{print FILENAME":"n; n=0}' RS='[[:space:]]' *.txt
Sample output looks like:
$ awk '/[aeiouAEIOU].*[AEIOUaeiou]/{n++} ENDFILE{print FILENAME":"n; n=0}' RS='[[:space:]]' *.txt
one.txt:1
sample.txt:9
How it works:
/[aeiouAEIOU].*[AEIOUaeiou]/{n++}
Every time we find a word with two vowels, we increment variable n.
ENDFILE{print FILENAME":"n; n=0}
At the end of each file, we print the name of the file and the 2-vowel word count n. We then reset n to zero.
RS='[[:space:]]'
This tells awk to use any whitespace as a word separator. This makes each word into a record. Awk reads the input one record at a time.
Shell issues
The use of awk avoids a multitude of shell issues. For example, consider the line for w in $line. This will not work the way you hope. Consider a directory with these files:
$ ls
one.txt sample.txt
Now, let's take line='* Item One' and see what happens:
$ line='* Item One'
$ for w in $line; do echo "w=$w"; done
w=one.txt
w=sample.txt
w=Item
w=One
The shell treats the * in line as a wildcard and expands it into a list of files. Odds are you didn't want this. The awk solution avoids a variety of issues like this.
Using grep - this is pretty simple to do.
#!/bin/bash
wordcount=0
for file in ./*.txt
do
count=`cat $file | xargs -n1 | grep -ie "[aeiou].*[aeiou]" | wc -l`
wordcount=`expr $wordcount + $count`
done
echo $wordcount

Bash replace '\n\n}' string in file

I've got files repeatedly containing the string \n\n} and I need to replace such string with \n} (removing one of the two newlines).
Since such files are dynamically generated through a bash script, I need to embed replacing code inside the script.
I tried with the following commands, but it doesn't work:
cat file.tex | sed -e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | perl -p00e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | awk -v RS="" '{gsub (/\n\n}/, "\nb")}1' # it does work, but not for large files
You didn't provide any sample input and expected output so it's a guess but maybe this is what you're looking for:
$ cat file
a
b
c
}
d
$ awk '/^$/{f=1;next} f{if(!/^}/)print "";f=0} 1' file
a
b
c
}
d
a way with sed:
sed -i -n ':a;N;$!ba;s/\n\n}/\n}/g;p' file.tex
details:
:a # defines the label "a"
N # append the next line to the pattern space
$!ba # if it is not the last line, go to label a
s/\n\n}/\n}/g # replace all \n\n} with \n}
p # print
The i parameter will change the file in place.
The n parameter prevents to automatically print the lines.
This Perl command will do as you ask
perl -i -0777 -pe's/\n(?=\n})//g' file.tex
This should work:
cat file.tex | sed -e 's/\\n\\n}/\\n}/g'
if \n\n} is written as raw string.
Or if it's new line:
cat file.tex | sed -e ':a;N;$!ba;s/\n\n}/\n}/g'
Another method:
if the first \n is any new line:
text=$(< file.tex)
text=${text//$'\n\n}'/$'\n}'}
printf "%s\n" "$text" #> file
If the first \n is an empty line:
text=$(< file.tex)
text=${text//$'\n\n\n}'/$'\n\n}'}
printf "%s\n" "$text" #> file
Nix-style line filters process the file line-by-line. Thus, you have to do something extra to process an expression which spans lines.
As mentioned by others, '\n\n' is simply an empty line and matches the regular expression /^$/. Perhaps the most efficient thing to do is to save each empty line until you know whether or not the next one will contain a close bracket at the beginning of the line.
cat file.tex | perl -ne 'if ( $b ) { print $b unless m/^\}/; undef $b; } if ( m/^$/ ) { $b=$_; } else { print; } END { print $b if $b; }'
And to clean it all up we add an END block, to process the case that the last line in the file is blank (and we want to keep it).
If you have access to node you can use rexreplace
npm install -g regreplace
and then run
rexreplace '\n\n\}' '\n\}' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\n\n\}' '\n\}' data/*.txt

Extract all numbers from a text file and store them in another file

I have a text file which have lots of lines. I want to extract all the numbers from that file.
File contains text and number and each line contains only one number.
How can i do it using sed or awk in bash script?
i tried
#! /bin/bash
sed 's/\([0-9.0-9]*\).*/\1/' <myfile.txt >output.txt
but this didn't worked.
grep can handle this:
grep -Eo '[0-9\.]+' myfile.txt
-o tells to print only the matches and [0-9\.]+ is a regular expression to match numbers.
To put all numbers on one line and save them in output.txt:
echo $(grep -Eo '[0-9\.]+' myfile.txt) >output.txt
Text files should normally end with a newline characters. The use of echo above assures that this happens.
Non-GNU grep:
If your grep does not support the -o flag, try:
echo $(tr ' ' '\n' <myfile.txt | grep -E '[0-9\.]+') >output.txt
This uses tr to replace all spaces with newlines (so each number appears separately on a line) and then uses grep to search for numbers.
tr -sc '0-9.' ' ' "$file"
Will transform every string of non-digit-or-period characters into a single space.
You can also use Bash:
while read line; do
if [[ $line =~ [0-9\.]+ ]]; then
echo $BASH_REMATCH
fi
done <myfile.txt >output.txt

How can I assign the match of my regular expression to a variable?

I have a text file with various entries in it. Each entry is ended with line containing all asterisks.
I'd like to use shell commands to parse this file and assign each entry to a variable. How can I do this?
Here's an example input file:
***********
Field1
***********
Lorem ipsum
Data to match
***********
More data
Still more data
***********
Here is what my solution looks like so far:
#!/bin/bash
for error in `python example.py | sed -n '/.*/,/^\**$/p'`
do
echo -e $error
echo -e "\n"
done
However, this just assigns each word in the matched text to $error, rather than a whole block.
I'm surprised to not see a native bash solution here. Yes, bash has regular expressions. You can find plenty of random documentation online, particularly if you include "bash_rematch" in your query, or just look at the man pages. Here's a silly example, taken from here and slightly modified, which prints the whole match, and each of the captured matches, for a regular expression.
if [[ $str =~ $regex ]]; then
echo "$str matches"
echo "matching substring: ${BASH_REMATCH[0]}"
i=1
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo " capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
else
echo "$str does not match"
fi
The important bit is that the extended test [[ ... ]] using its regex comparision =~ stores the entire match in ${BASH_REMATCH[0]} and the captured matches in ${BASH_REMATCH[i]}.
If you want to do it in Bash, you could do something like the following. It uses globbing instead of regexps (The extglob shell option enables extended pattern matching, so that we can match a line consisting only of asterisks.)
#!/bin/bash
shopt -s extglob
entry=""
while read line
do
case $line in
+(\*))
# do something with $entry here
entry=""
;;
*)
entry="$entry$line
"
;;
esac
done
Try putting double quotes around the command.
#!/bin/bash
for error in "`python example.py | sed -n '/.*/,/^\**$/p'`"
do
echo -e $error
echo -e "\n"
done
depending on what you want to do with the variables
awk '
f && /\*/{print "variable:"s;f=0}
/\*/{ f=1 ;s="";next}
f{
s=s" "$0
}' file
output:
# ./test.sh
variable: Field1
variable: Lorem ipsum Data to match
variable: More data Still more data
the above just prints them out. if you want, store in array for later use...eg array[++d]=s
Splitting records in (ba)sh is not so easy, but can be done using IFS to split on single characters (simply set IFS='*' before your for loop, but this generates multiple empty records and is problematic if any record contains a '*'). The obvious solution is to use perl or awk and use RS to split your records, since those tools provide better mechanisms for splitting records. A hybrid solution is to use perl to do the record splitting, and have perl call your bash function with the record you want. For example:
#!/bin/bash
foo() {
echo record start:
echo "$#"
echo record end
}
export -f foo
perl -e "$/='********'; while(<>){chomp;system( \"foo '\$_'\" )}" << 'EOF'
this is a 2-line
record
********
the 2nd record
is 3 lines
long
********
a 3rd * record
EOF
This gives the following output:
record start:
this is a 2-line
record
record end
record start:
the 2nd record
is 3 lines
long
record end
record start:
a 3rd * record
record end