Simple but elusive expression

Simple but elusive expression - regex

I feel even shy of asking this here but it took already more time than it should.
Say I have these four files:
IPCDR_ARB06067956VPLUS_T_201103
IPCDR_ARB06067957VPLUS_T_201103
IPCDR_MOV_ARB06067959VPLUS_T_20110
MOV_CDRARB06067959VPLUS_T_201103
I want to grep for only those starting with IPCDR_MOV and MOV_CDR.
First thing off was:
ls -1 | grep "^IPCDR_MOV|^MOV_CDR"
but didn't work.
I've done plenty of dumb tests (which I wont bother you with) and nothing comes out. Can someone please put me out of my pain?
Thanks!

Add the -E switch for using extended regex.
$ ls -1 | grep -E "^IPCDR_MOV|^MOV_CDR"
IPCDR_MOV_ARB06067959VPLUS_T_20110
MOV_CDRARB06067959VPLUS_T_201103

Use egrep instead of grep. Otherwise it's a literal string instead of a pattern.

Related

sed with capturing group

I have strings like below
VIN_oFDCAN8_8d836e25_In_data;
IPC_FD_1_oFDCAN8_8d836e25_In_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_data
I want to insert _Moto in between as below
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data
But when I used sed with capturing group as below
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_*\(_data\)/_Moto_\1/'
I get output as:
VIN_oFDCAN8_8d836e25_Moto__data
Can you please point me to right direction?

Though you could use simple substitution of IN string(considering that it is present only 1 time in your Input_file) but since your have asked specifically for capturing style in sed, you could try following then.
sed 's/\(.*_In\)\(.*\)/\1_Moto\2/g' Input_file
Also above will add string _Moto to avoid adding 2 times _ after Moto confusion, Thanks to #Bodo for mentioning same in comments.
Issue with OP's attempt: Since you are NOT keeping _In_* in memory of sed so it is taking \(_data_\) only as first thing in memory, that is the reason it is not working, I have fixed it in above, we need to keep everything till _IN in memory too and then it will fly.

$ sed 's/_[^_]*$/_Moto&/' file
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data

In your case, you can directly replace the matching string with below command
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_data/_In_Moto_data/'

Get all Commands without arguments from history (with Regex)

I have just started with learning shell commands and how to script in bash.
Now I like to solve the mentioned task in the title.
What I get from history command (without line numbers):
ls [options/arguments] | grep [options/arguments]
find [...] exec- sed [...]
du [...]; awk [...] file
And how my output should look like:
ls
grep
find
sed
du
awk
I already found a solution, but it doesn't really satisfy me. So far I declared three arrays, used the readarray -t << (...) command twice, in order to save the content from my history and after that, in combination with compgen -ac, to get all commands which I can possibly run. Then I compared the contents from both with loops, and saved the command every time it matched a line in the "history" array. A lot of effort for an simple exercise, I guess.
Another solution I thought of, is to do it with regex pattern matching.
A command usually starts at the beginning of the line, after a pipe, an execute or after a semicolon. And maybe more, I just don't know about yet.
So I need a regex which gives me only the next word after it matched one of these conditions. That's the command I've found and it seems to work:
grep -oP '(?<=|\s/)\w+'
Here it uses the pipe | as a condition. But I need to insert the others too. So I have put the pattern in double quotes, created an array with all conditions and tried it as recommend:
grep -oP "(?<=$condition\s/)\w+"
But no matter how I insert the variable, it fails. To keep it short, I couldn't figure out how the command works, especially not the regex part.
So, how can solve it using regular expressions? Or with a better approach than mine?
Thank you in advance! :-)

This is simple and works quite well
history -w /dev/stdout | cut -f1 -d ' '

You can use this awk with fc command:
awk '{print $1}' <(fc -nl)
find
mkdir
find
touch
tty
printf
find
ls
fc -nl lists entries from history without the line numbers.

Strange regex behavior with grep

grep '[:digit:]{1,}-{1,}' *.txt| wc -l
This command outputs: 0
grep '1-' *.txt| wc -l
However, this command outputs: 10598
Both commands are being run from the same directory. The first command should have returned greater than or equal to the output of the second command. Can anyone shed some insight about what is going on here?

echo 1 | grep '[:digit:]'
#nothing....
grep uses a different syntax, you need [[:digit:]] or [0-9].
The {1,} syntax is not supported by basic grep, you can use other modes, like the extended one with -E... Note: Normally one would use + for matching one or more characters....
General note: always test regexes in small parts to see that each part really does what you thought it does. Once the expression gets complicated, it's really hard to tell what went wrong.

Unix grep regex containing 'x' but not containing 'y'

I need a single-pass regex for unix grep that contains, say alpha, but does not contain beta.
grep 'alpha' <> | grep -v 'beta'

The other answers here show some ways you can contort different varieties of regex to do this, although I think it does turn out that the answer is, in general, “don’t do that”. Such regular expressions are much harder to read and probably slower to execute than just combining two regular expressions using the boolean logic of whatever language you are using. If you’re using the grep command at a unix shell prompt, just pipe the results of one to the other:
grep "alpha" | grep -v "beta"
I use this kind of construct all the time to winnow down excessive results from grep. If you have an idea of which result set will be smaller, put that one first in the pipeline to get the best performance, as the second command only has to process the output from the first, and not the entire input.

Well as we're all posting answers, here it is in awk ;-)
awk '/x/ && !/y/' infile
I hope this helps.

^((?!beta).)*alpha((?!beta).)*$ would do the trick I think.

I'm pretty sure this isn't possible with true regular expressions. The [^y]*x[^y]* example would match yxy, since the * allows zero or more non-y matches.
EDIT:
Actually, this seems to work: ^[^y]*x[^y]*$. It basically means "match any line that starts with zero or more non-y characters, then has an x, then ends with zero or more non-y characters".

Try using the excludes operator: [^y]*x[^y]*

Q: How to match x but not y in grep without pipe if y is a directory
A: grep x --exclude-dir='y'

Simplest solution:
grep "alpha" * | grep -v "beta"
Please take care of gaps and double quotes.

Sed substitution not doing what I want and think it should do

I have am trying to use sed to get some info that is encoded within the path of a file which is passed as a parameter to my script (Bourne sh, if it matters).
From this example path, I'd like the result to be 8
PATH=/foo/bar/baz/1-1.8/sing/song
I first got the regex close by using sed as grep:
echo $PATH | sed -n -e "/^.*\/1-1\.\([0-9][0-9]*\).*/p"
This properly recognized the string, so I edited it to make a substitution out of it:
echo $PATH | sed -n -e "s/^.*\/1-1\.\([0-9][0-9]*\).*/\1/"
But this doesn't produce any output. I know I'm just not seeing something simple, but would really appreciate any ideas about what I'm doing wrong or about other ways to debug sed regular expressions.
(edit)
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
It is also possible to have an input string that the regular expression should not match and should product no output.

The -n option to sed supresses normal output, and since your second line doesn't have a p command, nothing is output. Get rid of the -n or stick a p back on the end

It looks like you're trying to get the 8 from the 1-1.8 (where 8 is any sequence of numerics), yes? If so, I would just use:
echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
No doubt you could get it working with one sed "instruction" (-e) but sometimes it's easier just to break it down.
The first strips out everything from the start up to and including 1-1., the second strips from the first non-numeric after that to the end.
$ echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
8
$ echo /foo/bar/baz/1-1.752/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
752
And, as an aside, this is actually how I debug sed regular expressions. I put simple ones in independent instructions (or independent part of a pipeline for other filtering commands) so I can see what each does.
Following your edit, this also works:
$ echo /foo/bar/baz/1-1.962/sing/song | sed -e "s/.*\/1-1\.\([0-9][0-9]*\).*/\1/"
962
As to your comment:
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
The two-part sed command I gave you should work with numerics anywhere in the string (as long as there's no 1-1. after the one you're interested in). That's because it actually deletes up to the specific 1-1. string and thereafter from the first non-numeric). If you have some examples that don't work as expected, toss them into the question as an update and I'll adjust the answer.

You can shorten you command by using + (one or more) instead of * (zero or more):
sed -n -e "s/^.*\/1-1\.\([0-9]\+\).*/\1/"

don't use PATH as your variable. It clashes with PATH environment variable
echo $path|sed -e's/.*1-1\.//;s/\/.*//'

You needn't divide your patterns with / (s/a/b/g), but may choose every character, so if you're dealing with paths, # is more useful than /:
echo /foo/1-1.962/sing | sed -e "s#.*/1-1\.\([0-9]\+\).*#\1#"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Simple but elusive expression - regex

Add the -E switch for using extended regex. $ ls -1 | grep -E "^IPCDR_MOV|^MOV_CDR" IPCDR_MOV_ARB06067959VPLUS_T_20110 MOV_CDRARB06067959VPLUS_T_201103

Use egrep instead of grep. Otherwise it's a literal string instead of a pattern.

Related

sed with capturing group

Get all Commands without arguments from history (with Regex)

Strange regex behavior with grep

Unix grep regex containing 'x' but not containing 'y'

Sed substitution not doing what I want and think it should do

Categories

Resources