RegEx Confusion in linux shell script - regex

Can someone explain what this does in a linux shell.....
port=$((${devpath##*[-.]} - 1))
I have a variable named $devpath, and one possible value is /sys/bus/usb/devices/usb2/2-1.
I'm assuming that ${devpath##*[-.]} performs some sort of regex on $devpath, but it makes no sense to me. Nor does *[-.] which I understand to mean "one of more of any one of the character '-' or any other character except newline"
When running through a script (this is from usb-devices.sh), it seems that the value of port is always the first numeric digit. Something else that confuses me is the '-1' at the end, shouldn't that reduce whatever ${devpath##*[-.]} does by one?
I tried looking up regex in shell expressions but nothing made any sense and no where could I find an explanation for ##.

Given the variable:
r="/sys/bus/usb/devices/usb2/2-123.45"
echo ${r##*-} returns 123.45 and echo ${r##*[-.]} returns 45. Do you see the pattern here?
Let's go a bit further: the expression ${string##substring} strips the longest match of $substring from the front of $string.
So with ${r##*[-.]} we are stripping everything in $r until the last - or . is found.
Then, $(( )) is used for arithmetic expressions. Thus, with $(( $var - 1 )) you are subtracting 1 from the value coming from ${r##*[-.]}.
All together, port=$((${devpath##*[-.]} - 1)) means: store in $port the value of the last number after either - or . at the end of $devpath.
Following the example below, echo $((${r##*[-.]} - 1)) returns 44 (45 - 1).

There is no regex here. ${var##pattern} returns the value of var with any match on pattern removed from the prefix (but this is a glob pattern, not a regex); $((value - 1)) subtracts one from value. So the expression takes the number after the last dash or dot and reduces it by one.
See Shell Parameter Expansion and Arithmetic Expansion in the Bash manual.

Related

regex, period allowed, not comma

Hi Im looking for a regex for
Valid:
20000
20.000
If a comma is used, it should not match with the comma and whats after.
Not valid
20.000,12
Right now Im using:
([0-9]+([.][0-9]+)*)+?
But this one also takes the last 2 digits after comma.
You could use
^\d+(?:\.\d+)*
# start of line, 1+ digits, .1234 eventually
See a demo on regex101.com.
If you add a ^ to the beginning of the regex, only the part from the start of the string will match
^([0-9]+([.][0-9]+)*)+?
But i think
^\d+(\.\d+)*
is the better solution to match numbers
If you want to match floating point numbers then use (^\d*.?\d+)
There is nothing in the question that suggests that the outcome needs to be a valid number. Therefore it looks like the expression needs to accept any number or a period in which case the regex is quite straightforward and the following should work:
^([0-9\.]+)$
Here are my tests to demonstrate the outcome
20000 - pass
20.000 - pass
20.000,12 - fail
1.000.000 - pass
23000,000 - fail
For info, I used the php code below for my test:
$testdata = array('20000', '20.000', '20.000,12', '1.000.000', '23000,000');
$pattern = "/^([0-9\.]+)$/";
foreach ($testdata as $k => $v) {
$result = preg_match($pattern, $v)? 'pass': 'fail';
echo "".$v." - ".$result."<br />";
}

Regular Expressions with multiple dots in Linux bash shell give strange results

I tried to match a substring including a lot of dots, and it failed in Debian Linux shell. I made a simple script to look how dots are processed and found it completely out of rules. I retried it Bash, perl, Ubunta shell it all the same. The script and output are below.
#!/bin/sh
my_regex=u2734523abcABCB.C123.ABC.abc.1..2.34.2
Numbering=123456789_123456789_123456789_123456789
echo "$my_regex"
echo "$Numbering"
echo `expr index "$my_regex" '(ABC)'`
echo `expr index "$my_regex" '(ABC\.)'`
echo `expr index "$my_regex" '(\.\.)'`
echo `expr index "$my_regex" '(.)'`
echo `expr index "$my_regex" '(\.1)'`
Output:
u2734523abcABCB.C123.ABC.abc.1..2.34.2
123456789_123456789_123456789_123456789
12
12
16
16
16
The first regex should match ABC and return number-position of first character. It works.
The second one should find ABC followed by dot, it looks like it ignores dot.
The third one should find two dots but it finds first occurrence of one dot. Ignores again?
The fourth should find first any character, but it still finds the dot on position 16.
The fifth should find a dot followed by 1, it still finds the first occurrence of dot.
It seems like neither \ nor [ ] (I tried it too), nor the dot itself works as in common regular expression.
Why?
expr index has nothing to do with regular expressions.
expr index STRING CHARS outputs the index of the first occurrance of any of the CHARS in STRING. So your first search for '(ABC)' finds the first left parenthesis, A, B, C, or right parenthesis in your string. The first one is the A at position 12.
'(ABC\.)' does the same thing, except it's now also looking for a backslash or period. But the A is still the first match at position 12.
'(\.\.)' looks only for a parenthesis, backslash, or period. The first match is the period at position 16.
Likewise, all your other searches find the period at position 16, because none of the other characters you're listing come before that.
(On a side note, it's silly to capture the output with backticks only to immediately echo it. You'd get the same result by omitting the echo and backticks.)
You are incorrectly using index function of expr. As per man expr:
index STRING CHARS - index in STRING where any CHARS is found, or 0
So 2 things to note here:
index doesn't do any regex matching
index will find position of any of the char is found in string
If you want regex matching then use:
STRING : REGEXP
like this:
my_regex='u2734523abcABCB.C123.ABC.abc.1..2.34.2'
expr u2734523abcABCB.C123.ABC.abc.1..2.34.2 : '.*ABC'
24
expr u2734523abcABCB.C123.ABC.abc.1..2.34.2 : '.*ABC\.'
25
expr u2734523abcABCB.C123.ABC.abc.1..2.34.2 : '.*\.\.'
32
expr u2734523abcABCB.C123.ABC.abc.1..2.34.2 : '.*.'
38
expr u2734523abcABCB.C123.ABC.abc.1..2.34.2 : '.*\.1'
30
The numbers after each expr command is actually the length of the match.
There is no need to use echo here as expr anyway writes output on stdout.
You might want to take a look at BASH built-in =~ operator for regex matching.

vim search regular expression replace with register

I'd like to search a regex pattern with vim and replace the matches with a paste from a register. In detail that means:
acb123acb
asokqwdad
def442ads
asduiosdf
df567hjk
should finish with
acbXYZacb
asokqwdad
defPOWads
asduiosdf
dafMANhjk
where I had
XYZ
POW
MAN
in a register A (:g/pattern/y A)
A regex pattern to search for might be [0-9]{3} to match the 3 numbers from the text block.
Block mode would help if there were no lines between the matches...
I could use a perl script therefore of course. However I'm sure, if possible in vim it were a lot faster, right?
Thank you in advance
If you want to replace all strings matching [0-9]{3} with the same value, which happens to be the contents of register a:
:%s/\v\d{3}/\=#a/g
In detail:
:% - apply to all lines in buffer
s/.../.../g - replace all occurrences
\v - what follows is a "very magic" regular expression
\d{3} - match 3 digits
\= - replace with the value of...
#a - register a
If on the other hand you want to read replacement values from register a:
:let a=getreg('a', 1, 1)
:%s/\v\d{3}/\=remove(a, 0)/g
In detail:
let a=getreg('a', 1, 1) - transfer the contents of register a to a list, imaginatively also named a
then same as above, except...
remove(a, 0) - deletes the first element in list a and returns it.
Also, VimL is, sadly, nowhere near as fast as Perl. :)

Shell script variable assignment with two values (regular expression)

I'm try to set a variable with two values. Here is an example:
letter='[[:alpha:]]'
digit='[[:digit:]]'
integer='$digit'
float='$digit.$digit'
The integer variable must appear one or more times. The variable float should display the first field (before the dot) zero or more times. How can I do this?
Thanks for help!
-- UPDATE --
It's very good to have the support of all of you. Below the solution that has served me:
letter='[[:alpha:]]'
digit='[[:digit:]]'
integer="${digit}+"
float="[0-9]*\\.[0-9]+"
Thank you guys! :D
I haven't looked into bash's expr command (which I assume you are using) to verify which flavor of regex they use, so you may need to do something like [a-zA-Z] instead of [[:alpha:]] and similar substitutions. But assume you have chosen the right value in letter and digit then this should work:
expr match "$string" "(${digit}*.${digit}*)"
or, using your float variable:
float="(${digit}*.${digit}*)"
expr match "$string" "$float"
Remove the parens if you just want to use the return value rather than returning the actual value matched.
Any of the following would be equivalent regexes for the integer:
integer="(${digit}+)"
integer="(${digit}{1,})"
integer="(${digit}${digit}*)"
Do be aware that there are different "flavors" of regex and in different contexts things need to be escaped where in another context they don't need it.
for egrep and grep -E on the bash command line:
float: [0-9]*\\.[0-9]+
integer: [0-9]+
see chart of egrep regxes at http://www.cyberciti.biz/faq/grep-regular-expressions/ for some hints but needs testing for specific situation
for perl and java:
float: [0-9]*?\.[0-9]+?
integer: [0-9]+?
+ matches preceding char or char class >= 1 times
* matches preceding char or char class >= 0 times
. matches any char
\. matches an uninterpreted period
[0-9] matches the class of any digit
? forces reluctant (non-greedy) matching

TCL_REGEXP:: How to grep a line from variable that looks similar in TCL

My TCL script:
set test {
a for apple
b for ball
c for cat
number n1
numbers 2,3,4,5,6
d for doctor
e for egg
number n2
numbers 56,4,5,5
}
set lines [split $test \n]
set data [join $lines :]
if { [regexp {number n1.*(numbers .*)} $data x y]} {
puts "numbers are : $y"
}
Current output if I run the above script:
C:\Documents and Settings\Owner\Desktop>tclsh stack.tcl
numbers are : numbers 56,4,5,5:
C:\Documents and Settings\Owner\Desktop>
Expected output:
In the script regexp, If I specify "number n1"... Its should print "numbers are : numbers 2,3,4,5,6"
If I specify "number n2"... Its should print "numbers are : numbers 56,4,5,5:"
Now always its prints the last (final line - numbers 56,4,5,5:) as output. How to resolve this issue.
Thanks,
Kumar
Try using
regexp {number n1.*?(numbers .*)\n} $test x y
(note that I'm matching against test. There is no need to replace the newlines.)
There are two differences from your pattern.
The question mark behind the first star makes the match non-greedy.
There is a newline character behind the capturing parentheses.
Your pattern told regexp to match from the first occurrence of number n1 up to the last occurrence of numbers, and it did. This is because the .* match between them was greedy, i.e. it matched as many characters as it could, which meant it went past the first numbers.
Making the match non-greedy means that the pattern will match from the first occurrence of number n1 up to the following occurrence of numbers, which was what you wanted.
After numbers, there is another .* match which is a bit troublesome. If it were greedy, it would match everything up to the end of the variable content. If it were non-greedy, it wouldn't match any characters, since matching a zero-length string satisfies the match. Another problem is that the Tcl RE engine doesn't really allow for switching back from non-greedy mode.
You can fix this by forcing the pattern to match one character past the text that you want the .* to match, making the zero-length match invalid. Matching a newline (\n) or space (\s) character should work. (This of course means that there must be a newline / other space character after every data field: if a numbers field is the last character range in the variable that field can't be located.)
Documentation: regular expression syntax, regexp
To use a Tcl variable in a regular expression is easy. On one level anyway: you put the regular expression in double quotes so that you have standard Tcl variable substitution inside it prior to it being passed to the RE engine:
# ...
set target "n1"
if { [regexp "number $target.*(numbers .*)" $data x y]} {
# ...
The hard part is that you've got to remember that switching to "…" from {…} will affect the whole of that word, and that the substitutions are of regular expression fragments. We usually recommend using {…} because that's easier to get consistently and unconfusingly right in the majority of cases.
Let's illustrate how this can get annoying. In your specific case, you may want to actually use this:
if { [regexp "number $target\[^:\]*:(numbers \[^:\]*)" $data x y]} {
The character sets here exclude the : (which you've — unnecessarily — used as a newline replacement) but because […] is also standard Tcl metasyntax, you have to backslash-quote it. (Things get even more annoying when you want to always use the contents of the variable as a literal even though they might include RE metasyntax characters; you need a regsub call to tidy things up. And you start to potentially make Tcl's RE cache less efficient too.)