regex - multiple $1 by 10 - regex

I want to replace the results of this:
(something=)([\-\d\.]*)
with this:
nowitis=($2*10)
but isntead of getting
nowitis=(80)
i get
nowitis=(8*10)
How to solve it?

In sed, for example:
echo "something=123" | sed -r 's/(something=)([\-\d\.]*)/\1\2*10)/'
something=123*10)
echo "something=123" | sed -r 's/(something=)([\-\d\.]*)/\1\20/'
something=1230
Multiplication by 10 is just adding a Zero to the number. Sed doesn't calculate results.
However, all regex implementations I know of, can have it a bit more easy:
echo "something=123" | sed -r 's/(something=)([-\d.]*)/\1\20/'
something=0123
In the group [-\d.], the - sign is leading, so it can't be part of a range like A-Z. Well, it could, it could mean from \0 to something, but it doesn't. As first or last character, it doesn't need a mask.
Similarly, every group containing a dot, if dot was interpreted as a joker sign, could be reduced to just that jokersign. Therefore you don't need a joker like this in the group. So you don't have to mask it too.

Let's suppose you are on a POSIX system with Perl available.
echo "something= 8" | perl -pe 's/\w\s*=\s*\K-?\d+(\.\d+)?/$&*10/ge'
something= 80
What you want to do is not possible with regular regex because they cannot do arithmetic e.g. compute 8*10. One way is to use an interpreter that can do so.
Perl has a nice feature which is the e switch. It evaluates the replacement pattern in which I do $& * 10, where $& is the captured pattern.

The input string can be like:
something=10.2
something=-3.15
So there can be negative numbers and float numbers.
I have a PHPStorm IDE and I'm using its find&replace function with regex
So it is fine but no multiplication.
So I think I could do it in couple runs.
For example in next run I would find mine results and then move the dot by 1.
I read the PCRE docs and didn't find multiplication option.
Easier would be writing a script even in PHP to do it right.
But I thought it could be done easier.

Related

Regex whitespace before character [duplicate]

I am attempting to grep for all instances of Ui\. not followed by Line or even just the letter L
What is the proper way to write a regex for finding all instances of a particular string NOT followed by another string?
Using lookaheads
grep "Ui\.(?!L)" *
bash: !L: event not found
grep "Ui\.(?!(Line))" *
nothing
Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep. You need a PCRE-enabled grep.
If you have GNU grep, the current version supports options -P or --perl-regexp and you can then use the regex you wanted.
If you don't have (a sufficiently recent version of) GNU grep, then consider getting ack.
The answer to part of your problem is here, and ack would behave the same way:
Ack & negative lookahead giving errors
You are using double-quotes for grep, which permits bash to "interpret ! as history expand command."
You need to wrap your pattern in SINGLE-QUOTES:
grep 'Ui\.(?!L)' *
However, see #JonathanLeffler's answer to address the issues with negative lookaheads in standard grep!
You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the "inverse" switch '-v'. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.
For the regex in question you might do something like
grep 'Ui\.' * | grep -v 'Ui\.L'
(Edit: this is not as strong as a true lookahead, but can often be used to work around the problem.)
If you need to use a regex implementation that doesn't support negative lookaheads and you don't mind matching extra character(s)*, then you can use negated character classes [^L], alternation |, and the end of string anchor $.
In your case grep 'Ui\.\([^L]\|$\)' * does the job.
Ui\. matches the string you're interested in
\([^L]\|$\) matches any single character other than L or it matches the end of the line: [^L] or $.
If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a not followed by bc:
grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *
Which is either (a followed by not b or followed by the end of the line: a then [^b] or $) or (a followed by b which is either followed by not c or is followed by the end of the line: a then b, then [^c] or $.
This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it'd probably be easier to just use a regex implementation that supports negative lookaheads.
*If your implementation supports non-capturing groups then you can avoid capturing extra characters.
If your grep doesn't support -P or --perl-regexp, and you can install PCRE-enabled grep, e.g. "pcregrep", than it won't need any command-line options like GNU grep to accept Perl-compatible regular expressions, you just run
pcregrep "Ui\.(?!Line)"
You don't need another nested group for "Line" as in your example "Ui.(?!(Line))" -- the outer group is sufficient, like I've shown above.
Let me give you another example of looking negative assertions: when you have list of lines, returned by "ipset", each line showing number of packets in a middle of the line, and you don't need lines with zero packets, you just run:
ipset list | pcregrep "packets(?! 0 )"
If you like perl-compatible regular expressions and have perl but don't have pcregrep or your grep doesn't support --perl-regexp, you can you one-line perl scripts that work the same way like grep:
perl -e "while (<>) {if (/Ui\.(?!Lines)/){print;};}"
Perl accepts stdin the same way like grep, e.g.
ipset list | perl -e "while (<>) {if (/packets(?! 0 )/){print;};}"
At least for the case of not wanting an 'L' character after the "Ui." you don't really need PCRE.
grep -E 'Ui\.($|[^L])' *
Here I've made sure to match the special case of the "Ui." at the end of the line.

Named capture groups with grep

I use Unix grep. I would like to know how can I handle named capture groups with it.
Currently this is what I have:
echo "foobar" | grep -P "(?<q>.)ooba(?<w>.)"
So in theory, I have q=f and w=r, however I don't know how can I use these variables or hand them over to the next command (for example awk) via the pipeline.
In the end, I would like to have the following result:
f r
The above string is just an example. The capture groups could be anywhere, could be in any number, and printing could also be in any order. I'm saying this because I'm not specifically looking for a way to extract the last and the first character of a string, but rather an approach to extract as many variables as I want from a string. I know tricks like using -o, \K or (?<=some text).*?(?=some other text), but these only extract one portion of the string and not multiple.
There is a limitation of 9 captured groups in sed. However, this is not the case with gawk.
From Question you mentioned,"but rather an approach to extract as many variables as I want from a string".
sed is best for the job if you have to are playing with 1-9 groups. If this is not the case match function of gawk is also helpful. (Using same regex as Inian)
echo "foobar" | awk '{match($0,/^(.)(.+)(.)$/,a);print a[1],a[3]}'
f r
PS: This is an alternate approach could be really helpful if dealing with groups more then 9. Also, for lesser number it work just fine. Also there are tightly coupled with awk's variables like NR,OFS ,FS so formatting is easier.
grep does not have the capabilities to print the captured groups alone, but sed can with your given example,
echo "foobar" | sed 's/^\(.\)\(.\+\)\(.\)$/\1 \3/'
f r
which literally means, match the first character - rest of the string and last character. Now you can access the individual captured groups from \1..\n notation,
RegEx Demo
The reason for \ around the braces are because sed by default uses BRE (Basic RegEx) and not ERE (Extended RegEx) which can be enabled using the -E or -r flag. The ERE is not supported in POSIX sed so basically the answer simulates ERE tokens from BRE by escaping them with \

Is there an alternative to negative look ahead in sed

In sed I would like to be able to match /js/ but not /js/m I cannot do /js/[^m] because that would match /js/ plus whatever character comes after. Negative look ahead does not work in sed. Or I would have done /js/(?!m) and called it a day. Is there a way to achieve this with sed that would work for most similar situations where you want a section of text that does not end in another section of text?
Is there a better tool for what I am trying to do than sed? Possibly one that allows look ahead. awk seems a bit too much with its own language.
Well you could just do this:
$ echo 'I would like to be able to match /js/ but not /js/m' |
sed 's:#:#A:g; s:/js/m:#B:g; s:/js/:<&>:g; s:#B:/js/m:g; s:#A:#:g'
I would like to be able to match </js/> but not /js/m
You didn't say what you wanted to do with /js/ when you found it so I just put <> around it. That will work on all UNIX systems, unlike a perl solution since perl isn't guaranteed to be available and you're not guaranteed to be allowed to install it.
The approach I use above is a common idiom in sed, awk, etc. to create strings that can't be present in the input. It doesn't matter what character you use for # as long as it's not present in the string or regexp you're really interested in, which in the above is /js/. s/#/#A/g ensures that every occurrence of # in the input is followed by A. So now when I do s/foobar/#B/g I have replaced every occurrence of foobar with #B and I KNOW that every #B represents foobar because all other #s are followed by A. So now I can do s/foo/whatever/ without tripping over foo appearing within foobar. Then I just unwind the initial substitutions with s/#B/foobar/g; s/#A/#/g.
In this case though since you aren't using multi-line hold-spaces you can do it more simply with:
sed 's:/js/m:\n:g; s:/js/:<&>:g; s:\n:/js/m:g'
since there can't be newlines in a newline-separated string. The above will only work in seds that support use of \n to represent a newline (e.g. GNU sed) but for portability to all seds it should be:
sed 's:/js/m:\
:g; s:/js/:<&>:g; s:\
:/js/m:g'

Grep Search Specific Character Trouble

I have searched extensively and cannot figure out what I am doing wrong here. I have a text file that may contain a string similar to the following:
/dev/dir1/dir2 200G 22G 179G 11% /usr/dir3/dir4
I generally know what the sting will look like up until the disk percentage indicator (i.e. 11%), but in the final part of the string I need to figure out if it ends in the usr (or sub) directories.
I want to use grep to do this search but am having problems. For example, the following command gives me output, but once i replace any of the "." characters where the "G" or "%" would be, or if I try to add "/usr/.*" at the end it refuses to return anything.
$ egrep ^/dev/dir1/dir2\s*\d*.\s*\d*.\s*\d*.\s*\d*.\s*.*$ testfile
/dev/dir1/dir2 200G 22G 179G 11% /usr/dir3/dir4
grep's extended regular expressions do not support using \d to match digits. Instead, use [0-9] or [:digit:]. You can use the following grep command:
egrep '^/dev/dir1/dir2\s*[0-9]*G\s*[0-9]*G\s*[0-9]*G\s*[0-9]*%\s*.*$'
You can also pass grep the -P option to enable Perl compatible regular expressions, which do support \d:
grep -P '^/dev/dir1/dir2\s*\d*G\s*\d*G\s*\d*G\s*\d*%\s*.*$'
Note the use of grep instead of egrep in the above command; -P is incompatible with egrep.
As a side note, I prefer to use + instead of * when I can, because it is stricter and can cause errors to become apparent sooner. For example, I assume there will always be at least one space and one digit in each place in the input, so you can use \s+ and [0-9]+ (or \d+). If your original pattern had used +, it would not have matched at all in the first place (whether it was quoted or not), and you would have known you had a problem even before adding the G or % to it. A working example is
egrep '^/dev/dir1/dir2\s+[0-9]+.\s+[0-9]+.\s+[0-9]+.\s+[0-9]+.\s+.+$'

Cleanup file of phone numbers that are not properly formatted

I have a file with nearly 10,000 phone numbers in it and many were not formatted properly, e.g. 123-456-7890 and although I've cleaned up most I still have one pattern I'm not sure how to handle. I used sed to clean up most of it and don't mind using either sed or awk, although I use sed more often then awk, to get one of the last groups (2306 line) formatted properly
Example: 123 4567890 (3 tab 7) needs to be 123-456-7890 (3 dash 3 dash 4).
I know I can find the pattern and replace the tab easily enough using:
sed "^[0-9][0-9][0-9]\t[0-9][0-9][0-9][0-9][0-9][0-9][0-9]/s/\t/-/" infile.txt > outfile.txt
However if I could augment the instruction to parse the 7 numbers, that are grouped together, at the same time it would make it easier for me to clean up what's left after this round. I've done a fair amount of searching although I couldn't get anything I found from the list when I typed in the subject to work before following through with posting the question.
Use extended regular expressions and capturing groups:
sed -E 's/^([0-9]{3})\t([0-9]{3})([0-9]{4})$/\1-\2-\3/' infile.txt > outfile.txt
basicaly something like this will work for a phone number alone.
sed 's/\([0-9]\)[^0-9]*/\1/g;s/\(...\)\(...\)\(....\)/\1-\2-\3/' YourFile
now, you certainly have your phone number associate with other info, so extraction and filtering is more specific
An awk version:
echo "123 4567890" | awk '{gsub(/[^0-9]/,"");print substr($0,1,3)"-"substr($0,4,3)"-"substr($0,7,3)}'
123-456-789
It just removes all non numbers, then print it out in groups of three.