How to delete two groups of characters with regex? - regex

I have this type of string:
First part: [[archive 726|The Archive]] is a great start
And I want to print:
First part: The Archive is a great start
Here is what I've come to far:
input.gsub!(/\[\[(.*?)\|/,"")
print input
> "First part: The Archive]] is a great start"
How can I also match the ]]?

You may use
input.gsub!(/\[\[[^\]\[]*\|(.*?)\]\]/, '\1')
See the Rubular demo and a Ruby demo.
Details
\[\[ - a [[ substring
[^\]\[]* - any 0 or more chars other than [ and ], as many as possible (if there are multiple | chars inside [[...]], replace * with *? to match as few as possible)
\| - a | char
(.*?) - Group 1 (the group value is referred to with \1 from the replacement pattern, mind the single quotes around \1): any 0 or more chars other than line break chars, as few as possible
\]\] - a ]] substring.

Related

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

Options matching in a command

I'm actually creating a discord bot and I'm trying to match some command options and I have a problem getting the value between the square brackets. (if there is)
I've already tried to add a ? to match one or more of these but it's not working, searching about how I could match between two characters but found nothing that helped me.
Here is the pattern I've got so far : https://regexr.com/4icgi
and here it is in text : /[+|-](.+)(\[(.+)\])?/g
What I expect it to do is from an option like that : +user[someRandomPeople]
to extract the parameter user and the value someRandomPeople and if there is no square brackets, it will only extract the parameter.
You may use
^[+-](.*?)(?:\[(.*?)\])?$
Or, if there should be no square brackets inside the optional [...] substring at the end:
^[+-](.*?)(?:\[([^\][]*)\])?$
Or, if the matches are searched for on different lines:
^[+-](.*?)(?:\[([^\][\r\n]*)\])?$
See the regex demo and the regex graph:
Details
^ - start of string
[+-] - + or - (note that | inside square brackets matches a literal | char)
(.*?) - Group 1: any 0 or more chars other than line break chars as few as possible
(?:\[(.*?)\])? - an optional sequence of
\[ - a [ char
(.*?) - Group 2: any 0 or more chars other than line break chars as few as possible ([^\][]* matches 0 or more chars other than [ and ])
\] - a ] char
$ - end of string.

Match a string that doesn't contain another string in Bash?

I want to match a string that contains some text in the beginning and end but doesn't contain a different text in the middle. For example: starts with a word (\w+) and ends with another one but doesn't contain NOT in between:
some_YES_text // ok
other_COOL_string // also ok
some_NOT_string // don't want to match this
Normally, I could do that with negative lookahead:
\w+_(?!NOT)\w+_\w+
But I'm writing a script in Bash which doesn't support it. What is the easiest way to achieve the same effect?
Edit: I wasn't precise before - I still need to use regex, not just plain text matching.
You may match abc_NOT_def or abc_anywordhere_def and capture one of them, or part of them, and upon a match, check if that capture is not empty. Then, just implement the logic you need:
s="other_NOT_string"
rx='^([[:alnum:]_]+_(NOT)_[[:alnum:]_]+|[[:alnum:]_]+_[[:alnum:]_]+_[[:alnum:]_]+)$'
if [[ "$s" =~ $rx ]]; then
if [ -z ${BASH_REMATCH[2]} ]; then
echo "MATCH: ${BASH_REMATCH[0]}"
else
echo "No match"
fi;
else
echo "No match"
fi;
Details
^ - start of string
( - Start of Group 1:
[[:alnum:]_]+_ - 1+ word chars (POSIX ERE \w equivalent) and a _
(NOT) - Group 2: NOT
_[[:alnum:]_]+ - _ and 1+ word chars
| - or
[[:alnum:]_]+_[[:alnum:]_]+_[[:alnum:]_]+ - 1+ word chars, _, 1+ word chars, _ and again 1+ word chars
) - end of Group 1.
$ - end of string
With [ -z ${BASH_REMATCH[2]} ] condition, we check if NOT was matched. If it was, there is no valid match, else, there is one.

sed replacement dependent on surroundings

How do I do a sed replacement on a file containing things like
f[3][2,3][4] which should be replaced by f[3][2,3]["string"][4]. Of course the numbers could be different numbers and I want the same replacement to happen. Here we need to give full pattern, something like
f\[[0-9]*\]\[[0-9]*,[0-9]*\]
to pick out the right part but then I need the replacement to give the same numbers that where matched earlier (as well as attaching ["string"]). What is the best way to do this?
You may use
sed 's/\(f\[[0-9]*]\[[0-9]*,[0-9]*]\)\[\([0-9]*\)]/\1["string"][\2]/g'
Or
sed -E 's/(f\[[0-9]*]\[[0-9]*,[0-9]*])\[([0-9]*)]/\1["string"][\2]/g'
See the online sed demo
Basically, capture the first part from f to the third index, and capture the number in the third index, too. Then, reference these values with \1 and \2 placeholders from the RHS.
Details
(f\[[0-9]*]\[[0-9]*,[0-9]*]) (ERE, in BRE, the (...) must be escaped) - Group 1 matching f[, 0+ digits, ][, 0+ digits, , and 0+ digits followed with ]
\[ - a [
([0-9]*) - Group 2: 0+ digits
] - a ] char.

Looking for regex to match before and after a number

Given the string
170905-CBM-238.pdf
I'm trying to match 170905-CBM and .pdf so that I can replace/remove them and be left with 238.
I've searched and found pieces that work but can't put it all together.
This-> (.*-) will match the first section and
This-> (.[^/.]+$) will match the last section
But I can't figure out how to tie them together so that it matches everything before, including the second dash and everything after, including the period (or the extension) but does not match the numbers between.
help :) and thank you for your kind consideration.
There are several options to achieve what you need in Nintex.
If you use Extract operation, use (?<=^.*-)\d+(?=\.[^.]*$) as Pattern.
See the regex demo.
Details
(?<=^.*-) - a positive lookbehind requiring, immediately to the left of the current location, the start of string (^), then any 0+ chars other than LF as many as possible up to the last occurrence of - and the subsequent subpatterns
\d+ - 1 or more digits
(?=\.[^.]*$) - a positive lookahead requiring, immediately to the right of the current location, the presence of a . and 0+ chars other than . up to the end of the string.
If you use Replace text operation, use
Pattern: ^.*-([0-9]+)\.[^.]+$
Replacement text: $1
See another regex demo (the Context tab shows the result of the replacement).
Details
^ - a start of string anchor
.* - any 0+ chars other than LF up to the last occurrence of the subsequent subpatterns...
- - a hyphen
([0-9]+) - Group 1: one or more ASCII digits
\. - a literal .
[^.]+ - 1 or more chars other than .
$ - end of string.
The replacement $1 references the value stored in Group 1.
I don't know ninetex regex, but a sed type regex:
$ echo "170905-CBM-238.pdf" | sed -E 's/^.*-([0-9]*)\.[^.]*$/\1/'
238
Same works in Perl:
$ echo "170905-CBM-238.pdf" | perl -pe 's/^.*-([0-9]*)\.[^.]*$/$1/'
238