I want to pre- and postfix an array in bash similar to brace expansion.
Say I have a bash array
ARRAY=( one two three )
I want to be able to pre- and postfix it like the following brace expansion
echo prefix_{one,two,three}_suffix
The best I've been able to find uses bash regex to either add a prefix or a suffix
echo ${ARRAY[#]/#/prefix_}
echo ${ARRAY[#]/%/_suffix}
but I can't find anything on how to do both at once. Potentially I could use regex captures and do something like
echo ${ARRAY[#]/.*/prefix_$1_suffix}
but it doesn't seem like captures are supported in bash variable regex substitution. I could also store a temporary array variable like
PRE=(${ARRAY[#]/#/prefix_})
echo ${PRE[#]/%/_suffix}
This is probably the best I can think of, but it still seems sub par. A final alternative is to use a for loop akin to
EXPANDED=""
for E in ${ARRAY[#]}; do
EXPANDED="prefix_${E}_suffix $EXPANDED"
done
echo $EXPANDED
but that is super ugly. I also don't know how I would get it to work if I wanted spaces anywhere the prefix suffix or array elements.
Bash brace expansion don't use regexes. The pattern used is just some shell glob, which you can find in bash manual 3.5.8.1 Pattern Matching.
Your two-step solution is cool, but it needs some quotes for whitespace safety:
ARR_PRE=("${ARRAY[#]/#/prefix_}")
echo "${ARR_PRE[#]/%/_suffix}"
You can also do it in some evil way:
eval "something $(printf 'pre_%q_suf ' "${ARRAY[#]}")"
Your last loop could be done in a whitespace-friendly way with:
EXPANDED=()
for E in "${ARRAY[#]}"; do
EXPANDED+=("prefix_${E}_suffix")
done
echo "${EXPANDED[#]}"
Prettier but essentially the same as the loop solution:
$ ARRAY=(A B C)
$ mapfile -t -d $'\0' EXPANDED < <(printf "prefix_%s_postfix\0" "${ARRAY[#]}")
$ echo "${EXPANDED[#]}"
prefix_A_postfix prefix_B_postfix prefix_C_postfix
mapfile reads rows into elements of an array. With -d $'\0' it instead reads null-delimited strings and -t omits the delimiter from the result. See help mapfile.
For arrays:
ARRAY=( one two three )
(IFS=,; eval echo prefix_\{"${ARRAY[*]}"\}_suffix)
For strings:
STRING="one two three"
eval echo prefix_\{${STRING// /,}\}_suffix
eval causes its arguments to be evaluated twice, in both cases first evaluation results in
echo prefix_{one,two,three}_suffix
and second executes it.
For array case subshell is used to avoid overwiting IFS
You can also do this in zsh:
echo ${${ARRAY[#]/#/prefix_}/%/_suffix}
Perhaps this would be the most elegant solution:
$ declare -a ARRAY=( one two three )
$ declare -p ARRAY
declare -a ARRAY=([0]="one" [1]="two" [2]="three")
$
$ IFS=$'\n' ARRAY=( $(printf 'prefix %s_suffix\n' "${ARRAY[#]}") )
$
$ declare -p ARRAY
declare -a ARRAY=([0]="prefix one_suffix" [1]="prefix two_suffix" [2]="prefix three_suffix")
$
$ printf '%s\n' "${ARRAY[#]}"
prefix one_suffix
prefix two_suffix
prefix three_suffix
$
By using IFS=$'\n' in front of the array reassignment (being valid only for this assignment line), it is possible to preserve spaces in both prefix & suffix as well as array element strings.
Using "printf" is rather handy, because it allows to apply the format string (1st argument) to each additional string argument supplied to the call of "printf".
I have exactly the same question, and I come up with the following solution using sed's word boundary match mechanism:
myarray=( one two three )
newarray=( $(echo ${myarray[*]}|sed "s/\(\b[^ ]\+\)/pre-\1-post/g") )
echo ${newarray[#]}
> pre-one-post pre-two-post pre-three-post
echo ${#newarray[#]}
> 3
Waiting for more elegant solutions...
Related
I'm parsing a document with a bash script and output different parts of it. At one point i need find and reformat text in the form of:
(foo)[X]
[Y]
(bar)[Z]
to something like:
X->foo
Y
Z->bar
Now, I'm able to grep the parts I want with RegEx, but I'm having trouble swapping the two elements in one line and handling the fact that the text in parentheses is optional. Is this even possible with a combination of sed and grep?
Thank You for your time.
You can use sed:
sed -e 's/(\([^)]*\))\[\([^]]*\)]/\2->\1/' -e 's/\[\([^]]*\)]/\1/' file
This works for your given input example:
X->foo
Y
Z->bar
You might need to make the patterns more strict if you have more kinds of input to handle.
You can use awk:
awk -F '[][()]+' '{print (NF>3 ? $3 "->" $2 : $2)}' file
X->foo
Y
Z->bar
You can even do it in bash itself, although it's not pretty.
# Three capture groups:
# 1. The optional paranthesized text
# 2. The contents of the parentheses
# 3. The contents of the square brackets
regex="(\((.*)\))?\[(.*)\]"
while IFS= read -r str; do
[[ "$str" =~ $regex ]]
# If the 2nd array element is not empty, print -> followed by the
# non-empty value.
echo "${BASH_REMATCH[3]}${BASH_REMATCH[2]:+->${BASH_REMATCH[2]}}"
done < file.txt
I'm having problems with sed and the back-referencig when using variables containing regexes.
It is a parser written in bash. At a very earlier point, I want to use sed to clean every line into the needed data: the indentation, a key and a value (colon separated). The data is similar to yaml but using an equals.
A basic example of the data:
overview = peparing 2016-10-22
license= sorted 2015-11-01
The function I'm having problems with does the logic in a while loop:
function prepare_parsing () {
local file=$1
# regex components:
local s='[[:space:]]*' \
w='[a-zA-Z0-9_]*' \
fs=':'
# regexes(NoQuotes, SingleQuotes, DoubleQuotes):
local searchNQ='^('$s')('$w')'$s'='$s'(.*)'$s'$' \
searchSQ='^('$s')('$w')'$s'='$s\''(.*)'\'$s'\$' \
searchDQ='^('$s')('$w')'$s'='$s'"(.*)"'$s'\$' \
replace="\1$fs\2$fs\3"
while IFS="$fs" read -r indentation key value; do
...
SOME CUSTOM LOGIC
...
done < <(sed -n "s/${searchNQ}/${replace}/p" $file)
}
When trying to call the function, I receive the known invalid reference error into \3: invalid reference \3 on s' command's RHS
To debug this, after the vars definition, I've printed their values using the printf and the %q option.
printf "%q\n" $searchNQ $searchSQ $searchDQ $replace
Getting these values:
\^\(\[\[:space:\]\]\*\)\(\[a-zA-Z0-9_\]\*\)\[\[:space:\]\]\*=\[\[:space:\]\]\*\(.\*\)\[\[:space:\]\]\*\$
\^\(\[\[:space:\]\]\*\)\(\[a-zA-Z0-9_\]\*\)\[\[:space:\]\]\*=\[\[:space:\]\]\*\'\(.\*\)\'\[\[:space:\]\]\*\\\$
\^\(\[\[:space:\]\]\*\)\(\[a-zA-Z0-9_\]\*\)\[\[:space:\]\]\*=\[\[:space:\]\]\*\"\(.\*\)\"\[\[:space:\]\]\*\\\$
$'\\1\034\\2\034\\3'
And maybe here's the problem, the excessive escape sequences when the shell (bash) expand the variables (for example, it seems to be escaping the *, the [], ...).
If I pass the -r option to sed, it works perfectly, but I have to avoid this since the system that will execute the script won't have this sed implementation: I have to use basic sed.
Do you have any idea on how to store the regex into variables and make them usable for the backreferencing on the RHS?
It works in these two cases:
When using a plain regex string:
sed -n "s/^\([[:space:]]*\)\([a-zA-Z0-9_]*\)[[:space:]]*=[[:space:]]*\(.*\)[[:space:]]*\$/\1:\2:\3/p" $file
And when I use just the vars s, w and fs:
sed -n "s/^\($s\)\($w\)$s=$s\(.*\)$s\$/\1$fs\2$fs\3/p" $file
Many thanks for the help!
perl that supports extended RegExps may be used instead of sed, like
perl -n -e "s/${searchNQ}/${replace}/; print"
I am very new to regex, therefore I do imagine this is quite a simple question to answer and must have been asked several times already, but unfortunly I can't find any of those answers.
Given a directory, I need the list of all of its subdirectories whose names respect the pattern "nw=[number].a=[number]", and for every directory I need to retrieve those numbers and do a few things based on those. Some of these directories are nw=82.a=40, nw=100.a=9, ecc.
My guess to accomplish this would be
#! /bin/bash
cd $mydir
for dir in `ls | grep nw=[:digit:]+.a=[:digit:]`: do
retrieve the numbers
a few things
done
Why doesn't it work, and how could I retrieve the numbers?
Thank you in advance,
Ferdinando
Some corrections on your grep command:
grep -E 'nw=[[:digit:]]+\.a=[[:digit:]]+'
Use the "-E" flag so you can use an extended regex, which includes the '+' operator, for example.
Use double square brackets
Escape the period, otherwise it will be used as an operator to match any character
A final '+' was missing from the end, not entirely necessary since grep will match more general cases, but it probably represents better your path names
It is probably good practice to place your regex between quotes (in this case, single quotes will do)
Hope this helps =)
perl -e '#a=`ls`;m/nw=(\d+)\.a=(\d+)(?{print"$1\t$2\n"})/ for#a'
Enjoy.
Call the terminal's ls command and store the list in the array #a.
#a=`ls`;
looking for match
m/
nw=(digits that I capture in $1).a=(digits that I capture in $2)
nw=(\d+)\.a=(\d+)
start evaluation of code from within a pattern
(?{
print first number,tab, second number, newline
print"$1\t$2\n"})
end matching pattern group
/
perform this match attempt with embedded code on each filename (with newlines still appended) in array #a
for#a
Yes, that was cryptic.
Don't parse ls. Use find instead:
find . -maxdepth 1 -type d -regex '.*nw=[0-9]+\.a=[0-9]+.*' | while IFS= read -r dir
do
echo "Found directory: $dir"
if [[ "$dir" =~ nw=([0-9]+)\.a=([0-9]+) ]]
then
echo "numbers are ${BASH_REMATCH[1]} and ${BASH_REMATCH[2]}"
fi
done
Consider the following:
var="text more text and yet more text"
echo $var | egrep "yet more (text)"
It should be possible to get the result of the regex as the string: text
However, I don't see any way to do this in bash with grep or its siblings at the moment.
In perl, php or similar regex engines:
$output = preg_match('/yet more (text)/', 'text more text yet more text');
$output[1] == "text";
Edit: To elaborate why I can't just multiple-regex, in the end I will have a regex with multiple of these (Pictured below) so I need to be able to get all of them. This also eliminates the option of using lookahead/lookbehind (As they are all variable length)
egrep -i "([0-9]+) +$USER +([0-9]+).+?(/tmp/Flash[0-9a-z]+) "
Example input as requested, straight from lsof (Replace $USER with "j" for this input data):
npviewer. 17875 j 11u REG 8,8 59737848 524264 /tmp/FlashXXu8pvMg (deleted)
npviewer. 17875 j 17u REG 8,8 16037387 524273 /tmp/FlashXXIBH29F (deleted)
The end goal is to cp /proc/$var1/fd/$var2 ~/$var3 for every line, which ends up "Downloading" flash files (Flash used to store in /tmp but they drm'd it up)
So far I've got:
#!/bin/bash
regex="([0-9]+) +j +([0-9]+).+?/tmp/(Flash[0-9a-zA-Z]+)"
echo "npviewer. 17875 j 11u REG 8,8 59737848 524264 /tmp/FlashXXYOvS8S (deleted)" |
sed -r -n -e " s%^.*?$regex.*?\$%\1 \2 \3%p " |
while read -a array
do
echo /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
It cuts off the first digits of the first value to return, and I'm not familiar enough with sed to see what's wrong.
End result for downloading flash 10.2+ videos (Including, perhaps, encrypted ones):
#!/bin/bash
lsof | grep "/tmp/Flash" | sed -r -n -e " s%^.+? ([0-9]+) +$USER +([0-9]+).+?/tmp/(Flash[0-9a-zA-Z]+).*?\$%\1 \2 \3%p " |
while read -a array
do
cp /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
Edit: look at my other answer for a simpler bash-only solution.
So, here the solution using sed to fetch the right groups and split them up. You later still have to use bash to read them. (And in this way it only works if the groups themselves do not contain any spaces - otherwise we had to use another divider character and patch read by setting $IFS to this value.)
#!/bin/bash
USER=j
regex=" ([0-9]+) +$USER +([0-9]+).+(/tmp/Flash[0-9a-zA-Z]+) "
sed -r -n -e " s%^.*$regex.*\$%\1 \2 \3%p " |
while read -a array
do
cp /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
Note that I had to adapt your last regex group to allow uppercase letters, and added a space at the beginning to be sure to capture the whole block of numbers. Alternatively here a \b (word limit) would have worked, too.
Ah, I forget mentioning that you should pipe the text to this script, like this:
./grep-result.sh < grep-result-test.txt
(provided your files are named like this). Instead you can add a < grep-result-test after the sed call (before the |), or prepend the line with cat grep-result-test.txt |.
How does it work?
sed -r -n calls sed in extended-regexp-mode, and without printing anything automatically.
-e " s%^.*$regex.*\$%\1 \2 \3%p " gives the sed program, which consists of a single s command.
I'm using % instead of the normal / as parameter separator, since / appears inside the regex and I don't want to escape it.
The regex to search is prefixed by ^.* and suffixed by .*$ to grab the whole line (and avoid printing parts of the rest of the line).
Note that this .* grabs greedy, so we have to insert a space into our regexp to avoid it grabbing the start of the first digit group too.
The replacement text contains of the three parenthesed groups, separated by spaces.
the p flag at the end of the command says to print out the pattern space after replacement. Since we grabbed the whole line, the pattern space consists of only the replacement text.
So, the output of sed for your example input is this:
5 11 /tmp/FlashXXu8pvMg
5 17 /tmp/FlashXXIBH29F
This is much more friendly for reuse, obviously.
Now we pipe this output as input to the while loop.
read -a array reads a line from standard input (which is the output from sed, due to our pipe), splits it into words (at spaces, tabs and newlines), and puts the words into an array variable.
We could also have written read var1 var2 var3 instead (preferably using better variable names), then the first two words would be put to $var1 and $var2, with $var3 getting the rest.
If read succeeded reading a line (i.e. not end-of-file), the body of the loop is executed:
${array[0]} is expanded to the first element of the array and similarly.
When the input ends, the loop ends, too.
This isn't possible using grep or another tool called from a shell prompt/script because a child process can't modify the environment of its parent process. If you're using bash 3.0 or better, then you can use in-process regular expressions. The syntax is perl-ish (=~) and the match groups are available via $BASH_REMATCH[x], where x is the match group.
After creating my sed-solution, I also wanted to try the pure-bash approach suggested by Mark. It works quite fine, for me.
#!/bin/bash
USER=j
regex=" ([0-9]+) +$USER +([0-9]+).+(/tmp/Flash[0-9a-zA-Z]+) "
while read
do
if [[ $REPLY =~ $regex ]]
then
echo cp /proc/${BASH_REMATCH[1]}/fd/${BASH_REMATCH[2]} ~/${BASH_REMATCH[3]}
fi
done
(If you upvote this, you should think about also upvoting Marks answer, since it is essentially his idea.)
The same as before: pipe the text to be filtered to this script.
How does it work?
As said by Mark, the [[ ... ]] special conditional construct supports the binary operator =~, which interprets his right operand (after parameter expansion) as a extended regular expression (just as we want), and matches the left operand against this. (We have again added a space at front to avoid matching only the last digit.)
When the regex matches, the [[ ... ]] returns 0 (= true), and also puts the parts matched by the individual groups (and the whole expression) into the array variable BASH_REMATCH.
Thus, when the regex matches, we enter the then block, and execute the commands there.
Here again ${BASH_REMATCH[1]} is an array-access to an element of the array, which corresponds to the first matched group. ([0] would be the whole string.)
Another note: Both my scripts accept multi-line input and work on every line which matches. Non-matching lines are simply ignored. If you are inputting only one line, you don't need the loop, a simple if read ; then ... or even read && [[ $REPLY =~ $regex ]] && ... would be enough.
echo "$var" | pcregrep -o "(?<=yet more )text"
Well, for your simple example, you can do this:
var="text more text and yet more text"
echo $var | grep -e "yet more text" | grep -o "text"
How can I find the index of a substring which matches a regular expression on solaris10?
Assuming that what you want is to find the location of the first match of a wildcard in a string using bash, the following bash function returns just that, or empty if the wildcard doesn't match:
function match_index()
{
local pattern=$1
local string=$2
local result=${string/${pattern}*/}
[ ${#result} = ${#string} ] || echo ${#result}
}
For example:
$ echo $(match_index "a[0-9][0-9]" "This is a a123 test")
10
If you want to allow full-blown regular expressions instead of just wildcards, replace the "local result=" line with
local result=$(echo "$string" | sed 's/'"$pattern"'.*$//')
but then you're exposed to the usual shell quoting issues.
The goto options for me are bash, awk and perl. I'm not sure what you're trying to do, but any of the three would likely work well. For example:
f=somestring
string=$(expr match "$f" '.*\(expression\).*')
echo $string
You tagged the question as bash, so I'm going to assume you're asking how to do this in a bash script. Unfortunately, the built-in regular expression matching doesn't save string indices. However, if you're asking this in order to extract the match substring, you're in luck:
if [[ "$var" =~ "$regex" ]]; then
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo "capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
fi
This snippet will output in turn all of the submatches. The first one (index 0) will be the entire match.
You might like your awk options better, though. There's a function match which gives you the index you want. Documentation can be found here. It'll also store the length of the match in RLENGTH, if you need that. To implement this in a bash script, you could do something like:
match_index=$(echo "$var_to_search" | \
awk '{
where = match($0, '"$regex_to_find"')
if (where)
print where
else
print -1
}')
There are a lot of ways to deal with passing the variables in to awk. This combination of piping output and directly embedding one into the awk one-liner is fairly common. You can also give awk variable values with the -v option (see man awk).
Obviously you can modify this to get the length, the match string, whatever it is you need. You can capture multiple things into an array variable if necessary:
match_data=($( ... awk '{ ... print where,RLENGTH,match_string ... }'))
If you use bash 4.x you can source the oobash. A string lib written in bash with oo-style:
http://sourceforge.net/projects/oobash/
String is the constructor function:
String a abcda
a.indexOf a
0
a.lastIndexOf a
4
a.indexOf da
3
There are many "methods" more to work with strings in your scripts:
-base64Decode -base64Encode -capitalize -center
-charAt -concat -contains -count
-endsWith -equals -equalsIgnoreCase -reverse
-hashCode -indexOf -isAlnum -isAlpha
-isAscii -isDigit -isEmpty -isHexDigit
-isLowerCase -isSpace -isPrintable -isUpperCase
-isVisible -lastIndexOf -length -matches
-replaceAll -replaceFirst -startsWith -substring
-swapCase -toLowerCase -toString -toUpperCase
-trim -zfill