Unix grep regex containing 'x' but not containing 'y' - regex

I need a single-pass regex for unix grep that contains, say alpha, but does not contain beta.
grep 'alpha' <> | grep -v 'beta'

The other answers here show some ways you can contort different varieties of regex to do this, although I think it does turn out that the answer is, in general, “don’t do that”. Such regular expressions are much harder to read and probably slower to execute than just combining two regular expressions using the boolean logic of whatever language you are using. If you’re using the grep command at a unix shell prompt, just pipe the results of one to the other:
grep "alpha" | grep -v "beta"
I use this kind of construct all the time to winnow down excessive results from grep. If you have an idea of which result set will be smaller, put that one first in the pipeline to get the best performance, as the second command only has to process the output from the first, and not the entire input.

Well as we're all posting answers, here it is in awk ;-)
awk '/x/ && !/y/' infile
I hope this helps.

^((?!beta).)*alpha((?!beta).)*$ would do the trick I think.

I'm pretty sure this isn't possible with true regular expressions. The [^y]*x[^y]* example would match yxy, since the * allows zero or more non-y matches.
EDIT:
Actually, this seems to work: ^[^y]*x[^y]*$. It basically means "match any line that starts with zero or more non-y characters, then has an x, then ends with zero or more non-y characters".

Try using the excludes operator: [^y]*x[^y]*

Q: How to match x but not y in grep without pipe if y is a directory
A: grep x --exclude-dir='y'

Simplest solution:
grep "alpha" * | grep -v "beta"
Please take care of gaps and double quotes.

Related

bash regex for word with some suffixes but not one specific

I need (case-insensitive) all matches of several variations on a word--except one--including unknowns.
I want
accept
acceptance
acceptable
accepting
...but not "acception." A coworker used it when he meant "exception." A lot.
Since I can't anticipate the variations (or typos), I need to allow things like "acceptjunk" and "acceptMacarena"
I thought I could accomplish this with a negative lookahead, but this didn't give the results I needed
grep -iE '(?!acception)(accept[a-zA-Z]*)[[:space:]]' file
The trick is that I can accept (har) lines that contain "acception," provided that the other words match. For example this line is okay to match:
The acceptance of the inevitable is the acception.
...otherwise by now I'd have piped grep through grep -v and been done with it:
grep -iE '(accept)[a-zA-Z]*[[:space:]]' | grep -vi 'acception'
I've found some questions that are similar and many that are not quite so. Using a-zA-Z is likely unnecessary in grep -i but I'm flailing. I'm probably missing something small or basic...but I'm missing it nonetheless. What is it?
Thanks for reading.
PS: I'm not married to grep--but I am operating in bash--so if there's a magic awk command that would do this I'm all ears (eyes).
PPS: forgot to mention that on https://regex101.com/ the above lookahead seemed to work, but it doesn't with my full grep command.
To use lookarounds, you need GNU grep with PCRE available
grep -iP '(?!acception)(accept[a-z]*)[[:space:]]'
With awk, this might work
awk '{ip=$0; sub(/acception/, ""); if(/accept[a-zA-Z]*[[:space:]]/) print ip}'
ip=$0 save input line
sub(/acception/, "") remove unwanted words, can add other unwanted words with alternation
if(/accept[a-zA-Z]*[[:space:]]/) print ip then print the line if it still contains words being searched

regex - multiple $1 by 10

I want to replace the results of this:
(something=)([\-\d\.]*)
with this:
nowitis=($2*10)
but isntead of getting
nowitis=(80)
i get
nowitis=(8*10)
How to solve it?
In sed, for example:
echo "something=123" | sed -r 's/(something=)([\-\d\.]*)/\1\2*10)/'
something=123*10)
echo "something=123" | sed -r 's/(something=)([\-\d\.]*)/\1\20/'
something=1230
Multiplication by 10 is just adding a Zero to the number. Sed doesn't calculate results.
However, all regex implementations I know of, can have it a bit more easy:
echo "something=123" | sed -r 's/(something=)([-\d.]*)/\1\20/'
something=0123
In the group [-\d.], the - sign is leading, so it can't be part of a range like A-Z. Well, it could, it could mean from \0 to something, but it doesn't. As first or last character, it doesn't need a mask.
Similarly, every group containing a dot, if dot was interpreted as a joker sign, could be reduced to just that jokersign. Therefore you don't need a joker like this in the group. So you don't have to mask it too.
Let's suppose you are on a POSIX system with Perl available.
echo "something= 8" | perl -pe 's/\w\s*=\s*\K-?\d+(\.\d+)?/$&*10/ge'
something= 80
What you want to do is not possible with regular regex because they cannot do arithmetic e.g. compute 8*10. One way is to use an interpreter that can do so.
Perl has a nice feature which is the e switch. It evaluates the replacement pattern in which I do $& * 10, where $& is the captured pattern.
The input string can be like:
something=10.2
something=-3.15
So there can be negative numbers and float numbers.
I have a PHPStorm IDE and I'm using its find&replace function with regex
So it is fine but no multiplication.
So I think I could do it in couple runs.
For example in next run I would find mine results and then move the dot by 1.
I read the PCRE docs and didn't find multiplication option.
Easier would be writing a script even in PHP to do it right.
But I thought it could be done easier.

Find a string after a certain character

An example will explain it better:
structure_1/structure_2/<I NEED WHAT'S HERE/structure_3
Structure_1 is always the same value
Structure_2 is a string that can be of any size, sometimes with _ or -
What I need is behind the second forward slash
I don't care what comes after
Other example:
order/shirt/blue_stripes/America
order/pants_ripped/green/Europe
order/skirts/yellow-folded/Asia
order/socks/orange/Africa
Results that I want to become after regex
blue_stripes
pants_ripped
yellow-folded
orange
I'm writing a BASH script for my Unix machine
UPDATE
I first used a regex in order to do this but I was informed by Flying that it would be better to use the command 'awk' and this did the trick with ease!
This one will do the trick: ^(?:[^\/]+\/){2}([^\/]+). You're basically need to skip first 2 groups of chars. You can check it by yourself here.
UPDATE: Since, as defined into comment, actual task is not about finding correct regular expression, but about grepping information from Unix file - it is much better to use awk instead:
awk -F"/" '{print $3}' orders.txt

Using sed to fix format of date string

The question specifically involves modifying a string of form
abc_MM-DD-YY_XX.jpg
(where XX can be comprised of two or three digits) to
xyz_YYYY-MM-DD_XXX.jpg
I was able to do this using:
sed 's/\(.*_\)\(.\{5\}\)-\([0-9][0-9]\)_\([0-9][0-9]\.\)/xyz_20\3-\2_0\4/'
I was wondering, though, if there are any better, perhaps more concise alternatives. Also, is using TRE (tagged regular expression) the only way sed can accomplish such a task? Thanks!
EDIT: Sorry, to clarify, the original string can either be in the format "abc_MM-DD-YY_XX.jpg" or "abc_MM-DD-YY_XXX.jpg", but the output format must be "abc_MM-DD-YY_XXX.jpg". So in the first case I would want to pad "XX" with a 0 and in the second case I would want to leave it be. I also realized that my expression doesn't work for the second case...
This will work only in the century!
Using awk
I would use awk for that. It is simpler to use:
awk -F'[-_]' '$0="xyz_20"$4"-"$2"-"$3"_"sprintf("%03d",$5)' <<<'abc_03-24-15_11.jpg'
will give you:
xyz_2015-03-24_011.jpg
while:
awk -F'[-_]' '$0="xyz_20"$4"-"$2"-"$3"_"sprintf("%03d",$5)' <<<'abc_03-24-15_111.jpg'
will give you:
xyz_2015-03-24_111.jpg
what should be what you want.
Explanation:
I'm using either - or _ as the field delimiter and simply reorganize the fields. To achieve the padding of and XX value to XXX I'm using sprintf(). (Thanks Amadan)
Using sed
Btw, you can simplify the sed command a lot if you would use the -r option and if you simply match sequences of not occurring characters:
sed -r 's/([^_]+)_([^-]+)-([^-]+)-([^_]+)_([^.]+)/xyz_20\4-\2-\3_0\5/;' <<<'abc_03-24-15_12.jpg'
(This doesn't work perfectly since it does not solve the XX to XXX problem properly at the moment. Give me a minute ... )
To solve that you can simply append another s command:
s/0([0-9]{3})\./\1./
which will replace the sequence 0123 by 123. The final command looks like this:
sed -r 's/([^_]+)_([^-]+)-([^-]+)-([^_]+)_([^.]+)/xyz_20\4-\2-\3_0\5/;s/0([0-9]{3})\./\1./' <<<'abc_03-24-15_12.jpg'
Doesn't it look simpler using -r ;) (hihi)

grep egrep multiple-strings

Suppose I have several strings: str1 and str2 and str3.
How to find lines that have all the strings?
How to find lines that can have any of them?
And how to find lines that have str1 and either of str2 and str3 [but not both?]?
This looks like three questions. The easiest way to put these sorts of expressions together is with multiple pipes. There's no shame in that, particularly because a regular expression (using egrep) would be ungainly since you seem to imply you want order independence.
So, in order,
grep str1 | grep str2 | grep str3
egrep '(str1|str2|str3)'
grep str1 | egrep '(str2|str3)'
you can do the "and" form in an order independent way using egrep, but I think you'll find it easier to remember to do order independent ands using piped greps and order independent or's using regular expressions.
You can't reasonably do the "all" or "this plus either of those" cases because grep doesn't support lookahead. Use Perl. For the "any" case, it's egrep '(str1|str2|str3)' file.
The unreasonable way to do the "all" case is:
egrep '(str1.*str2.*str3|str3.*str1.*str2|str2.*str1.*str3|str1.*str3.*str2)' file
i.e. you build out the permutations. This is, of course, a ridiculous thing to do.
For the "this plus either of those", similarly:
egrep '(str1.*(str2|str3)|(str2|str3).*str1)' file
grep -E --color "string1|string2|string3...."
for example to find whether our system using AMD(svm) or Intel(vmx) processor and if it is 64bit(lm) lm stands for long mode- that means 64bit...
command example:
grep -E --color "lm|svm|vmx" /proc/cpuinfo
-E is must to find multiple strings
Personally, I do this in perl rather than trying to cobble together something with grep.
For instance, for the first one:
while (<FILE>)
{
next if ! m/pattern1/;
next if ! m/pattern2/;
next if ! m/pattern3/;
print $_;
}