How to extract two numbers from a word and store then in two separate variables in bash? - regex

I believe my question is very simple for someone who knows how to use regular expressions, but I am very new at it and I can't figure out a way to do it. I found many questions similar to this, but none could solve my problem.
In bash, i have a few variables that are of the form
nw=[:digit:]+.a=[:digit:]+
for example, some of these are nw=323.a=42 and nw=90.a=5
I want to retrieve these two numbers and put them in the variables $n and $a.
I tried several tools, including perl, sed, tr and awk, but couldn't get any of these to work, despite I've been googling and trying to fix it for an hour now. tr seems to be the fittest though.
I'd like a piece of code which would achieve the following:
#!/bin/bash
ldir="nw=64.a=2 nw=132.a=3 nw=4949.a=30"
for dir in $ldir; do
retrieve the number following nw and place it in $n
retrieve the number following a and place it in $a
done
... more things...

If you trust your input, you can use eval:
for dir in $ldir ; do
dir=${dir/w=/=} # remove 'w' before '='
eval ${dir/./ } # replace '.' by ' ', evaluate the result
echo $n, $a # show the result so we can check the correctness
done

if you do not trust your input :) use this:
ldir="nw=64.a=2 nw=132.a=3 nw=4949.a=30"
for v in $ldir; do
[[ "$v" =~ ([^\.]*)\.(.*) ]]
declare "n=$(echo ${BASH_REMATCH[1]}|cut -d'=' -f2)"
declare "a=$(echo ${BASH_REMATCH[2]}|cut -d'=' -f2)"
echo "n=$n; a=$a"
done
result in:
n=64; a=2
n=132; a=3
n=4949; a=30
for sure there are more elegant ways, this is just a quick working hack

ldir="nw=64.a=2 nw=132.a=3 nw=4949.a=30"
for dir in $ldir; do
#echo --- line: $dir
for item in $(echo $dir | sed 's/\./ /'); do
val=${item#*=}
name=${item%=*}
#echo ff: $name $val
let "$name=$val"
done
echo retrieve the number following nw and place it in $nw
echo retrieve the number following a and place it in $a
done

Related

Extract value of get parameter in shell

I have an input that could either be dn3321 or
https://domaincom/file?tag=dn3321 and I'm trying to parse the value of tag using shell.
Looks like a regex could do the trick, how would I write a one liner to detect if it's a URL if it is apply the regex to extract the value and if its not just use the value directly.
It's unclear from the question what the full space of possible inputs looks like, but, for the simple cases you gave, you can use parameter expansion:
#!/usr/bin/env bash
in1='dn3321'
in2='https://domaincom/file?tag=dn3321'
echo "${in1#*=}"
echo "${in2#*=}"
# prints "dn3321" twice
This works by removing the first = and any text preceding it.
If you just need to print out a very specific part of the string that is a url you can do it like this:
#!/bin/bash
url="https://domaincom/file?tag=dn3321"
if [[ "$url" =~ "${http,,}" ]] ; then
tag=$(echo $url | cut -d'=' -f2)
fi
if you need something more elaborate I can post an example.

Bash: Replace array value with curl result

I have a text file named raw.txt with something like the following:
T DOTTY CRONO 52/50 53/40 54/30 55/20 RESNO NETKI
U CYMON DENDU 51/50 52/40 53/30 54/20 DOGAL BEXET
V YQX KOBEV 50/50 51/40 52/30 53/20 MALOT GISTI
W VIXUN LOGSU 49/50 50/40 51/30 52/20 LIMRI XETBO
X YYT NOVEP 48/50 49/40 50/30 51/20 DINIM ELSOX
Y DOVEY 42/60 44/50 47/40 49/30 50/20 SOMAX ATSUR
Z SOORY 43/50 46/40 48/30 49/20 BEDRA NERTU
A DINIM 51/20 52/30 50/40 47/50 RONPO COLOR
B SOMAX 50/20 51/30 49/40 46/50 URTAK BANCS
C BEDRA 49/20 50/30 48/40 45/50 VODOR RAFIN
D ETIKI 48/15 48/20 49/30 47/40 44/50 BOBTU JAROM
E 46/40 43/50 42/60 DOVEY
F 45/40 42/50 41/60 JOBOC
G 43/40 41/50 40/60 SLATN
I'm reading it into an array:
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
done < raw.txt
I'm trying to replace all occurrences of [A-Z]{5} with an curl result where the match of [A-Z]{5} is fed as a variable into the curl call.
First match to be replaced would be DOTTY. The call looks similar to curl -s http://example.com/api_call/DOTTY and the result is something like -55.5833 50.6333 which should replace DOTTY in the array.
I was so far unable to correctly match the desired string and feed the match into curl.
Your help is greatly appreciated.
All the best,
Chris
EDIT:
Solution
Working solution based on #Kevin extensive answer and #Floris hint about a possible carriage return in the curl result. This was indeed the case. Thank you! Combined with some tinkering on my side I now got it to work.
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
i=0
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
curl_tmp=$(curl -s http://example.com/api_call/$str)
# cut off line break
curl=${curl_tmp/$'\r'}
# insert at given index
declare array[$i]="$curl"
fi
let i++
done
# write to file
for index in "${array[#]}"; do
echo $index
done >> $WORK_DIR/nats.txt
done < raw.txt
I didn't change anything about your script except add the matching part, since it seems that's what you're needing help on:
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
echo curl "http://example.com/api_call/$str"
fi
done
done < raw.txt
EDIT: added in the url example you provided with the variable in the URI. You can do whatever you need with the fetched output by changing it to do_something "$(curl ...)"
EDIT2: Since you're wanting to maintain the bash array you create from each line, how about this:
I'm not great at bash when it comes to arrays, so I expect someone to call me out on it, but this should work.
I've left some echos there so you can see what it's doing. The shift commands are to push the array index from the current location when the regex matches. The tmp variable to hold your curl output could probably be improved, but this should get you started, I hope.
removed temporarily to avoid confusion
EDIT3: Oops the above didn't actually work. My mistake. Let me try again here.
EDIT4:
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
i=0
# echo ${array[#]} below is just so you can see it before processing. You can remove this
echo "Array before processing: ${array[#]}"
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
# replace the echo command below with your curl command
# ie - curl="$(curl http://example.com/api_call/$str)"
curl="$(echo 1234 -1234)"
if [[ "$flag" = "1" ]]; then
array=( ${adjustedArray[#]} )
push=$(( $push + 2 ));
let i++
else
push=1
fi
adjustedArray=( ${array[#]:0:$i} ${curl[#]} ${array[#]:$(( $i + $push)):${#array[#]}} )
#echo "DEBUG adjustedArray in loop: ${adjustedArray[#]}"
flag=1;
fi
let i++
done
unset flag
echo "final: ${adjustedArray[#]}"
# do further processing here
done < raw.txt
I know there's a smarter way to do this than the above, but we're getting into areas in bash where I'm not really suited to give advice. The above should work, but I'm hoping someone can do better.
Hope it helps, anyway
ps - You should probably not use a shell script for this unless you really need to. Perl, php, or python would make the code simple and readable
Since I misread the first time:
How about just using sed?
sed "s/\([A-Z]\{5\}\)/$(echo curl http:\\/\\/example.com\\/api_call\\/\\1)/g" /tmp/raw.txt
Try that, then try removing the echo. I'm not 100% on this since I can't run it on the real domain
EDIT: And just so I'm clear, the echo is just there so you can see what it will do with the echo removed
create a file cmatch:
#!/bin/bash
while read line
do
echo $line
a=`echo $line | egrep -o '\b[A-Z]{5}\b'`
for v in $a
do
echo "doing curl to replace $v in $line"
r=`curl -s http://example.com/api_call/$v`
r1=`echo $r | xargs echo`
line=`echo $line | sed 's/'$v'/'$r1'/'`
done
done
then call it with
chmod 755 cmatch
./cmatch < inputfile.txt > outputfile.txt
It will do what you asked
Notes:
the \b before and after the [A-Z]{5} ensures that ABCDEFG (which is not a five letter word) will not match.
using egrep -o produces an array of matches
I loop over this array to allow the replacement of multiple matches in a line
I update the line for each match found using the result of the curl call
to keep code clean, I assign the result of the curl to an intermediate variable
edit Just saw the comments about arrays. I suggest to take the output of this script and convert it to an array if you want to do further manipulation...
more edits If your curl command returns a multi-line string (which would explain the error you see), you can use the new line I introduced in the script to remove the newlines (essentially stringing all the arguments together):
echo $r | xargs echo
calls echo with one line at a time as argument, and without the carriage returns. It's a fun way of getting rid of carriage returns.
#!/bin/bash
while read line;do
set -- $line
echo "second parm is $2"
echo "do your curl here"
done < afile.txt

How to list files with numbers in their name and retrieve the numbers?

I am very new to regex, therefore I do imagine this is quite a simple question to answer and must have been asked several times already, but unfortunly I can't find any of those answers.
Given a directory, I need the list of all of its subdirectories whose names respect the pattern "nw=[number].a=[number]", and for every directory I need to retrieve those numbers and do a few things based on those. Some of these directories are nw=82.a=40, nw=100.a=9, ecc.
My guess to accomplish this would be
#! /bin/bash
cd $mydir
for dir in `ls | grep nw=[:digit:]+.a=[:digit:]`: do
retrieve the numbers
a few things
done
Why doesn't it work, and how could I retrieve the numbers?
Thank you in advance,
Ferdinando
Some corrections on your grep command:
grep -E 'nw=[[:digit:]]+\.a=[[:digit:]]+'
Use the "-E" flag so you can use an extended regex, which includes the '+' operator, for example.
Use double square brackets
Escape the period, otherwise it will be used as an operator to match any character
A final '+' was missing from the end, not entirely necessary since grep will match more general cases, but it probably represents better your path names
It is probably good practice to place your regex between quotes (in this case, single quotes will do)
Hope this helps =)
perl -e '#a=`ls`;m/nw=(\d+)\.a=(\d+)(?{print"$1\t$2\n"})/ for#a'
Enjoy.
Call the terminal's ls command and store the list in the array #a.
#a=`ls`;
looking for match
m/
nw=(digits that I capture in $1).a=(digits that I capture in $2)
nw=(\d+)\.a=(\d+)
start evaluation of code from within a pattern
(?{
print first number,tab, second number, newline
print"$1\t$2\n"})
end matching pattern group
/
perform this match attempt with embedded code on each filename (with newlines still appended) in array #a
for#a
Yes, that was cryptic.
Don't parse ls. Use find instead:
find . -maxdepth 1 -type d -regex '.*nw=[0-9]+\.a=[0-9]+.*' | while IFS= read -r dir
do
echo "Found directory: $dir"
if [[ "$dir" =~ nw=([0-9]+)\.a=([0-9]+) ]]
then
echo "numbers are ${BASH_REMATCH[1]} and ${BASH_REMATCH[2]}"
fi
done

Getting the index of the substring on solaris

How can I find the index of a substring which matches a regular expression on solaris10?
Assuming that what you want is to find the location of the first match of a wildcard in a string using bash, the following bash function returns just that, or empty if the wildcard doesn't match:
function match_index()
{
local pattern=$1
local string=$2
local result=${string/${pattern}*/}
[ ${#result} = ${#string} ] || echo ${#result}
}
For example:
$ echo $(match_index "a[0-9][0-9]" "This is a a123 test")
10
If you want to allow full-blown regular expressions instead of just wildcards, replace the "local result=" line with
local result=$(echo "$string" | sed 's/'"$pattern"'.*$//')
but then you're exposed to the usual shell quoting issues.
The goto options for me are bash, awk and perl. I'm not sure what you're trying to do, but any of the three would likely work well. For example:
f=somestring
string=$(expr match "$f" '.*\(expression\).*')
echo $string
You tagged the question as bash, so I'm going to assume you're asking how to do this in a bash script. Unfortunately, the built-in regular expression matching doesn't save string indices. However, if you're asking this in order to extract the match substring, you're in luck:
if [[ "$var" =~ "$regex" ]]; then
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo "capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
fi
This snippet will output in turn all of the submatches. The first one (index 0) will be the entire match.
You might like your awk options better, though. There's a function match which gives you the index you want. Documentation can be found here. It'll also store the length of the match in RLENGTH, if you need that. To implement this in a bash script, you could do something like:
match_index=$(echo "$var_to_search" | \
awk '{
where = match($0, '"$regex_to_find"')
if (where)
print where
else
print -1
}')
There are a lot of ways to deal with passing the variables in to awk. This combination of piping output and directly embedding one into the awk one-liner is fairly common. You can also give awk variable values with the -v option (see man awk).
Obviously you can modify this to get the length, the match string, whatever it is you need. You can capture multiple things into an array variable if necessary:
match_data=($( ... awk '{ ... print where,RLENGTH,match_string ... }'))
If you use bash 4.x you can source the oobash. A string lib written in bash with oo-style:
http://sourceforge.net/projects/oobash/
String is the constructor function:
String a abcda
a.indexOf a
0
a.lastIndexOf a
4
a.indexOf da
3
There are many "methods" more to work with strings in your scripts:
-base64Decode -base64Encode -capitalize -center
-charAt -concat -contains -count
-endsWith -equals -equalsIgnoreCase -reverse
-hashCode -indexOf -isAlnum -isAlpha
-isAscii -isDigit -isEmpty -isHexDigit
-isLowerCase -isSpace -isPrintable -isUpperCase
-isVisible -lastIndexOf -length -matches
-replaceAll -replaceFirst -startsWith -substring
-swapCase -toLowerCase -toString -toUpperCase
-trim -zfill

vim & csv file: put header info into a new column

I have a large number of csv files that look like this below:
xxxxxxxx
xxxxx
Shipment,YD564n
xxxxxxxxx
xxxxx
1,RR1760
2,HI3503
3,HI4084
4,HI1824
I need to make them look like the following:
xxxxxxxx
xxxxx
Shipment,YD564n
xxxxxxxxx
xxxxx
YD564n,1,RR1760
YD564n,2,HI3503
YD564n,3,HI4084
YD564n,4,HI1824
YD564n is a shipment number and will be different for every csv file. But it always comes right after "Shipment,".
What vim command(s) can I use?
In one file type the following in normal mode:
qqgg/^Shipment,<CR>ww"ay$}j:.,$s/^/<C-R>a,<CR>q
Note that <CR> is the ENTER key, and <C-R> is CTRL-R.
This will update that file and recrd the commands in register q.
Then in each other file type #q (also in normal mode). (this will play back register q)
You can do this using a macro, and applying it over several files.
Here's one example. Type the following in as is:
3gg$"ayiw:6,$s/^/<C-R>a/<CR>:w<CR>:bn<CR>
Now that looks horrendous. Let me see if I can explain that a bit better.
3gg$ : Go to the end of the third line.
"ayiw : Copy the last word into the register a.
:6,$s/^/<C-R>a/<CR> : In every line from the 6th onwards, replace at the beginning whatever is in register a.
:w<CR>:bn<CR> : Save and go to the next buffer.
Now you can map this to a key, by
:nnoremap <C-A> 3gg$"ayiw:6,$s/^/<C-R>a/<CR>:w<CR>:bn<CR>
Then if you have say 200 csv files, you open vim as
vim *.csv
and then
200<C-A>
Where you type Ctrl-A there, and it should be all done.
That said, I'd definitely be more comfortable doing this in a proper scripting language, it'd be much more straightforward.
This could be done as a Perl one-liner:
perl -i.bak -e' $c = do {local $/; <>};
($n) = ($c =~ /Shipment,(\w+)/);
$c =~ s/^(\d+,)/$n,$1/gm;
print $c' shipment.csv
This will read contents of shipment.csv into $c, extract the shipment ID into $n, and prepend every CSV line with the shipment number. The file will be modified in-place with a backup saved to shipment.csv.bak.
To do this from within Vim, adapt it as a filter:
:%!perl -e' $c = do {local $/; <>}; ($n) = ($c =~ /Shipment,(\w+)/); $c =~ s/^(\d+,)/$n,$1/gm; print $c'
Well, don't bash me, but... you could consider: Don't do this in vim!!
This is a classic usage example for scripting languages.
Take a basic python, perl or ruby tutorial. The solution for this would
be in it.
The regex for this might not be too difficult and it is doable in vim.
But there are much easier alternatives out there.
And much more flexible ones.
Why vim?
Try this shell script:
#!/bin/sh
input=$1
shipment=`grep Shipment $input|awk -F, '{print $2}'`
mv $input $input.orig
sed -e "s/^\([0-9]\)/$shipment,\1/" $input.orig > $input
You could iterate through specific files:
for input in *.txt
do
script.sh $i
done
I also think this isn't well suited for vim, how about in Bash instead?
FILENAME='filename.csv' && SHIPMENT=`grep Shipment $FILENAME | sed 's/^Shipment,//'` && cat $FILENAME | sed "s/^[0-9]/$SHIPMENT,&/" > $FILENAME