How to shorter regular expression?

How to shorter regular expression? - regex

First off, I'm relatively new to regular expressions: I've built a regex that I'm using with sed that works fine for me, it looks like:
sed 's/^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] [0-9][0-9][0-9][0-9][0-9][0-9].[0-9][0-9][0-9][0-9][0-9][0-9] | info | tst.33.12.carmen | !: //g' but I'm pretty sure all the repetitive character occurrences could be simplified. How would I do this?
I want to replace:
20180630 180212.407107 | info | tst.33.12.carmen | !: from a line of text (timestamp in the front could be any numbers, strings behind the first '|' are constant)

EDIT: Since OP has put sample of input now so adding this solution.
sed -E 's/^[0-9]{8} [0-9]{6}\.[0-9]{6} \| info \| tst\.[0-9]{2}\.[0-9]{2}\.carmen \| \!:$//' Input_file
Test of code's working:
Let's say following is the Input_file:
cat Input_file
20180630 180212.407107 | info | tst.33.12.carmen | !:
fdfjwhfwifrwvf
vwkdnvkwkvwnvwv
20180630 180212.407107 | info | tst.33.12.carmen | !:
dwbvwbvwvbb
Now after running above code following will be the output then.
sed -E 's/^[0-9]{8} [0-9]{6}\.[0-9]{6} \| info \| tst\.[0-9]{2}\.[0-9]{2}\.carmen \| \!:$//' Input_file
fdfjwhfwifrwvf
vwkdnvkwkvwnvwv
dwbvwbvwvbb
With sed's -E option you could use like following but fair warning that it is opted from your solution and never tested since no samples were produced in your post.
sed -E 's/^[0-9]{8} [0-9]{5}.[0-9]{5} | info | tst.33.12.carmen | !: //g'

If you don't care about matching the exact format of your prefix, but just want to accept some combination of digits, dots and spaces, you can simplify the first part to:
[ .0-9]*
The complete sed expression then looks like:
sed 's/^[ .0-9]*| info | tst\.[0-9]*\.[0-9]*\.carmen | !:$//' file

Related

Filtering matched content

I want to Filter all content after match with the content and bring the first value after the "."
I have an output something like this:
Output:
product: 13.6.0.35_0
More specifically, I need only the first two digits and the first digit after the dot, remembering that we should not cling to the values in the issue, but rather on the method of filtering the content.
Expected:
13.6
I tried something like:
echo "product: 13.6.0.35_0" | grep -ow '\w*13\w*'

If you need to use grep with the current logic, you can use
echo "product: 13.6.0.35_0" | grep -ow '13\.[0-9]*' | head -1
where 13\.[0-9]* matches 13, . and zero or more digits (as whole word due to w option) and head -1 gets the first match.
You may also use sed or awk:
sed -En 's/.* ([0-9]+\.[0-9]+).*/\1/p' <<< "product: 13.6.0.35_0"
awk -F'[[:space:].]' '{print $2"."$3}' <<< "product: 13.6.0.35_0"
See the online demo.
The sed command matches any text up to space, then matches the space and captures the two subsequent dot-separated numbers into Group 1 (\1) and then the rest of the line is matched and replaced with Group 1 value that is printed (as the default line output is suppressed with -n).
In the awk command, the field separator is set to whitespace and . with -F'[[:space:].]' and the {print $2"."$3} part prints the second and third field values joined with a ..

A pure shell solution using the builtin read , Parameter Expansion and curly braces for command groupings.
echo "product: 13.6.0.35_0" | { read -r _ value; echo "${value%.*.*}" ; }

You can also use cut:
echo 'product: 13.6.0.35_0' | cut -d ' ' -f2 | cut -d '.' -f1-2
13.6

I reached the expected output, it's simple but it works:
var=$(echo "product: 13.6.0.35_0" | grep -Eo '[[:digit:]]+' | sed -n 1,2p)
echo ${var} | sed 's/ /./g'

Regex replacing hyphens in attributes name only

I have a string that looks something like this.
<tag-name i-am-an-attribute="123" and-me-too="321">
All I want to do is replace the dashes into an underscore, but the tag-name should remain like it is.
Hope there are some regex guru's who can help me out.
[solution]
In case someone needs this.
I ended up with a perl oneliner command
echo '<tag-name i-am-an-attribute="123" and-me-too="321">' | perl -pe 's/( \K[^*"]*)-/$1_/g;' | perl -pe 's/ / /g;'
results in
<tag-name i_am_an_attribute="123" and_me_too="321">

Using sed:
sed ':l;s/-\([^- ]*\)\( *=\)/_\1\2/g;tl' input
Gives:
<tag-name i_am_an_attribute="123" and_me_too="321">

With <tag-name i-am-an-attribute="123" and-me-too="321"> as a line in a file:-
read -r < file
fullstring=$(echo "${REPLY}" | sed s'#-name #-name:#')
field1=$(echo "${fullstring}" | cut -d':' f1)
field2=$(echo "${fullstring}" | cut -d':' f2)
fixedfield=$(echo "${field2}" | sed s'#-#_#'g)
echo "${field1} ${fixedfield}"
I'm discovering that the most important thing, with scripting, is to provide yourself anchors within the text, that you can use to cut it up into segments that you can then perform operations on. Try to format your text as actual fields with seperators; it makes life a lot easier.

awk sed perl, replacing specific pattern within a range of lines

I'm working in verilog and need to edit a specific line within a unique block, but am unsure of how to proceed
file.v
...
block1 block1(
.port1(port1),
.port2(port2),
);
block2 block2(
.(port2)(port2),
.(port3)(port3)
);
....
I need to somehow remove the " , " for port2 in block1. without modifying block2. There are also multiple blocks else where that contains port2.
block1 block1(
.port1(port1),
.port2(port2)
);
I've been trying ranges of awk and sed lines, but not getting the results to modify the file successfully. Any suggestions or solutions is much appreciated

This will remove any comma that occurs just before the end of a block (whitespace then );):
perl -0777 -pe 's/,(?=\s*\);)//g'
Notes:
-0777 causes perl to slurp all the input in as a single string. This is required because
we know there's newlines in between so we don't want to read line-by-line
there might be empty lines between the comma and the parentheses so reading by "paragraph" won't work either.
-p causes perl to print the input after modifications.
the regex is the trickiest part
it finds a comma and then looks ahead to match zero or more whitespace characters (includes spaces, tabs, newlines, etc) followed by a close parenthesis and a semicolon.
the lookahead text is not part of the matched text (lookaheads are known as "zero width assertions") -- the matched text will be just the comma
if there's a match, replace the comma with an empty string.
the g flag says do this globally in the string.

This might do the job for you
sed '/block1 block1/,/);/{s/\((port2)\),/\1/}' file.v

how about:
awk -v RS="" '/block1/{sub("port2),","port2)")}7' file

I guess you want to remove commas located after a closing paren ()) followed by a newline and a closing paren and a semicolon ();)?
In this case this might work for you:
sed -r ':a;N;s/\),\n\s*\);/)\n);/;P;D;ba'
| | | |---------| |---| | | |
| | | | | | | -- branch to label "a"
| | | | | | -- delete up to first newline of pattern space
| | | | | -- print up to first newline of pattern space
| | | | -- replace pattern
| | | -- search pattern
| | -- substitute
| -- read next line into pattern space (append)
-- branch label "a"

regex mixed case excluding specific case

I need a regex able to match:
a) All combinations of lower-/upper-cases of a certain word
b) Except a couple of certain case-combinations.
I must search the bash thru thousands of source-code files, occurrences of miss-spelled variables.
Specifically, the word I'm searching for is FrontEnd which in our coding-style guide can be written exactly in 2 ways depending on the context:
FrontEnd (F and E upper)
frontend (all lower)
So I need to "catch" any occurences that do not follow our coding standards as:
frontEnd
FRONTEND
fRonTenD
I have been reading many tutorials of regex for this specific example and I cannot find a way to say "match this pattern BUT do not match if it is exactly this one or this other one".
I guess it would be similar to trying to match "any number between 000000 to 999999, except exactly the number 555555 or the number 123456", I suppose the logic is similar (of course I don't knot to do this either :) )
Thnx
Additional comment:
I cannot use grep piped to grep -v because I could miss lines; for example if I do:
grep -i frontend | grep -v FrontEnd | grep -v frontend
would miss a line like this:
if( frontEnd.name == 'hello' || FrontEnd.value == 3 )
because the second occurence would hide the whole line. Therefore I'm searching for a regex to use with egrep capable to do the exact match I need.

You won't be able to do this easily with egrep because it doesn't support lookaheads. It's probably easiest to do this with perl.
perl -ne 'print if /(?!frontend|FrontEnd)(?i)frontend/;'
To use just pipe the text through stdin
How this works:
perl -ne 'print if /(?!frontend|FrontEnd)(?i)frontend/;'
^ ^^ ^ ^ ^ ^ ^ ^ ^ The pattern that matches both the correct and incorrect versions.
| || | | | | | | This switch turns on case insensitive matching for the rest of the regular expression (use (?-i) to turn it off) (perl specific)
| || | | | | | The pattern that match the correct versions.
| || | | | | Negative forward look ahead, ensures that the good stuff won't be matched
| || | | | Begin regular expression match, returns true if match
| || | | Begin if statement, this expression uses perl's reverse if semantics (expression1 if expression2;)
| || | Print content of $_, which is piped in by -n flag
| || Evaluate perl code from command line
| | Wrap code in while (<>) { } takes each line from stdin and puts it in $_
| Perl command, love it or hate it.

This really should be a comment, but is there any reason you cannot use sed? I'm thinking something like
sed 's/frontend/FrontEnd/ig' input.txt
That is, of course, assuming you want to correct the deviant versions...

grep to select strings that contains certain words

I have a list:
/device1/element1/CmdDiscovery
/device1/element1/CmdReaction
/device1/element1/Direction
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
How can I grep so that the returned strings containing only "Field" followed by digits or simply NRepeatLeft at the end of string (in my example it will be the last three strings)?
Expected output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft

Try doing this :
grep -E "(Field[0-9]*|NRepeatLeft$)" file.txt
| | | ||
| | OR end_line |
| opening_choice closing_choice
extented_grep
if you don't have -E switch (stands for ERE : Extented Regex Expression):
grep "\(Field[0-9]*\|NRepeatLeft$\)" file.txt
OUTPUT
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
That will grep for lines matching Field[0-9] or lines matching RepeatLeft at the end. Is it what you expect ?

I am not much sure of how to use grep for your purpose.Probably you would like perl for this:
perl -lne 'if(/Field[\d]+/ or /NRepeatLeft/){print}' your_file

$ grep -E '(Field[0-9]*|NRepeatLeft)$' file.txt
Output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Explanation:
Field # Match the literal word
[0-9]* # Followed by any number of digits
| # Or
NRepeatLeft # Match the literal word
$ # Match the end of the string
You can see how this works with your example here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to shorter regular expression? - regex

If you don't care about matching the exact format of your prefix, but just want to accept some combination of digits, dots and spaces, you can simplify the first part to: [ .0-9]* The complete sed expression then looks like: sed 's/^[ .0-9]| info | tst\.[0-9]\.[0-9]*\.carmen | !:$//' file

Related

Filtering matched content

Regex replacing hyphens in attributes name only

awk sed perl, replacing specific pattern within a range of lines

regex mixed case excluding specific case

grep to select strings that contains certain words

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to shorter regular expression? - regex

If you don't care about matching the exact format of your prefix, but just want to accept some combination of digits, dots and spaces, you can simplify the first part to: [ .0-9]* The complete sed expression then looks like: sed 's/^[ .0-9]*| info | tst\.[0-9]*\.[0-9]*\.carmen | !:$//' file

Related

Filtering matched content

Regex replacing hyphens in attributes name only

awk sed perl, replacing specific pattern within a range of lines

regex mixed case excluding specific case

grep to select strings that contains certain words

Categories

Resources

If you don't care about matching the exact format of your prefix, but just want to accept some combination of digits, dots and spaces, you can simplify the first part to: [ .0-9]* The complete sed expression then looks like: sed 's/^[ .0-9]| info | tst\.[0-9]\.[0-9]*\.carmen | !:$//' file