Regular expression to escape n # except 2 # - regex

I'm trying to override Parsedown's markup to only allow <h2> headings.
What regex would escape all heading types except <h2>?
#Heading -> \#Heading
##Heading -> ##Heading
###Heading -> \###Heading
####Heading -> \####Heading
#####Heading -> \#####Heading
######Heading -> \######Heading

You can use this regex
^(?!##\w)(?=#)
Regex Demo
Regex Breakdown
^ #Start of string
(?! #Negative lookahead(it means, whatever is there next do not match it)
##\w #Assert that its impossible to match two # followed by a word character
)
(?= #Positive lookahead
# #check if there is at least one #
)
NOTE
\w denotes any character from [A-Za-z0-9_].
[..] denotes character class. Any character(not string) present in this will be matched.

Use look aheads for headings, but not double hashes:
^(?!##\w)(?=#+)

Description
^((?:#|#{3,})[^#])
Replace with: \$1
This regular expression will do the following:
match one hash
match 3 or more hash
Example
Live Demo
https://regex101.com/r/kE4oK6/1
Sample text
#Heading
##Heading
###Heading
####Heading
#####Heading
######Heading
Sample Matches
\#Heading
##Heading
\###Heading
\####Heading
\#####Heading
\######Heading
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
#{3,} '#' (at least 3 times (matching the
most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
[^#] any character except: '#'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------

Related

RegEx for removing everything before and after a delimiter

I am trying to remove everything before and after two | delimiters using regex.
An example being:
EM|CX-001|Test Campaign Name
and grabbing everything except CX-001. I cannot use a substring as the number of characters before and after the pipes may change.
I tried using the regex (?<=\|)(.*?)(?=\-), but while this selects CX-001, I need to select everything else but this.
How do I solve this problem?
You can try the following regular expression:
(^[^|]*\|)|(\|[^|]*$)
String input = "EM|CX-001|Test Campaign Name";
System.out.println(
input.replaceAll("(^[^|]*\\|)|(\\|[^|]*$)", "")
); // prints "CX-001"
Explanation of the regular expression:
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of \2
If you have only 2 pipes in you string, you could either match upon the first pipe or match from the last one until the end of the string:
^.*?\||\|.*$
Explanation
^.*?\| Match from start of string non greedy until the first pipe
| Or
\|.*$ Match from last pipe until end of string
Regex demo
Or you might also use a negated character class [^|]* without the need of capturing groups:
^[^|]*\||\|[^|]*$
Regex demo
Note
In your pattern (?<=\|)(.*?)(?=\-) I think you meant that the last positive lookahead should be (?=\|) instead of the - if you want to select between 2 pipes.
Find: ^[^|]*\|([^|]+).+$
Replace: $1

Sublime SQL REGEX highlighting

I'm trying to modify an existing language definition in sublime
It's for SQL
Currently (\#|##)(\w)* is being used to match against local declared parameters (e.g. #MyParam) and also system parameters (e.g. ##Identity or ##FETCH_STATUS)
I'm trying to split these into 2 groups
System parameters I can get like (\#\#)(\w)* but I'm having problems with the local parameters.
(\#)(\w)* matches both #MyParam and ##Identity
(^\#)(\w)* only highlights the first match (i.e. #MyParam but not #MyParam2
Can someone help me with the correct regex?
Try the below regex to capture local parameters and system parameters into two separate groups.
(?<=^| )(#\w*)(?= |$)|(?<=^| )(##\w*)(?= |$)
DEMO
Update:
Sublime text 2 supports \K(discards the previous matches),
(?m)(?:^| )\K(#\w*)(?= |$)|(?:^| )\K(##\w*)(?= |$)
DEMO
Explanation:
(?m) set flags for this block (with ^ and $
matching start and end of line) (case-
sensitive) (with . not matching \n)
(matching whitespace and # normally)
(?: group, but do not capture:
^ the beginning of a "line"
| OR
' '
) end of grouping
\K '\K' (resets the starting point of the
reported match)
( group and capture to \1:
# '#'
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times)
) end of \1
(?= look ahead to see if there is:
' '
| OR
$ before an optional \n, and the end of a
"line"
) end of look-ahead
| OR
(?: group, but do not capture:
^ the beginning of a "line"
| OR
' '
) end of grouping
\K '\K' (resets the starting point of the
reported match)
( group and capture to \2:
## '##'
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times)
) end of \2
(?= look ahead to see if there is:
' '
| OR
$ before an optional \n, and the end of a
"line"

How to express this in regular expression?

I need to capture all Upper and Lower case character strings other than "FOO" and "BAR". How to do this?
I tried [^(^FOO$)(^BAR$)] but it doesn't work.
Update:
Actually I'm using this in a context, I am concatenating it with another regex
["(\w)+": _this_regex_ ]
For example ["abc":FOO] shouldn't be matched
All other types say ["abc":BAZ] should match
You want a negative look ahead:
\["(\w+)"\s*:\s*(?!FOO\b|BAR\b)(\w+)]
The (\w+) are capturing group, they store the key/value pairs inside variables (I guess that's what you want to do?)
(?!...) is a negative lookahead: it will cause the regex to fail if what's inside matches.
\b is a word-boundary: here it will make the loohahead match (and so fail the regex) only if FOO is followed by a non alphanum character (so ["foo": FOOLISH] will be accepted by the regex)
\s is a short for all type of whitespaces (spaces, tabs, newlines etc)
Demo: http://regex101.com/r/fM3uZ7
What you tried [^...] was a negative character range: it matches any character (and only one character) that's not inside the character range. And keep in mind that inside character ranges only ], ^ and - are special character (so $ means \$ and so on)
Have a try with:
(?i)^(?!.*foo)(?!.*bar)
In action within perl script:
my $re = qr~(?i)^(?!.*foo)(?!.*bar)~;
while(<DATA>) {
chomp;
say (/$re/ ? "OK : $_" : "KO : $_");
}
__DATA__
["abc":FOO]
["abc":BAZ]
output:
KO : ["abc":FOO]
OK : ["abc":BAZ]
Explanation:
The regular expression:
(?i)^(?!.*foo)(?!.*bar)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
foo 'foo'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
bar 'bar'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------

regex: symbols can't repeat next to eachother

I'm trying to make regex that picks all words that are a-z and with or without the symbol '.
the word needs to be at least 2 characters
cant start with the ' symbol
two ' symbols can't be next to each other
and "two character" words can't end with the ' symbol
I have being working for hours on that regex and i can't make it work:
/\b[a-z]([a-z(\')](?!\1))+\b/
it does not work and i don't know why! (the two ' symbols next to each other)
any ideas?
([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})
Live # RegExPal
You probably will not need to use \b as regex is greedy and will consume all words as a whole.
This version can't be tested with RegexPal (does not recognize the lookbehind) but has custom word borders:
(?<![a-z'])([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})(?![a-z'])
This should work (disclaimer: untested)
/\b(?![a-z]{2}'\b)[a-z]((?!'')['a-z])+\b/
Yours does not because you are attempting to nest a parenthesized expression inside a character class. That only adds ( and ) to the class, it will not set the value of your next \1 code.
(Edit) Added the constraint on aa'.
Assuming words are delimited by spaces:
(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)
In action in a perl script:
my $re = qr/(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)/;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
ab
abc
a'
ab''
abc'
a''b
:!ù
output:
OK: ab
OK: abc
KO: a'
OK: ab''
OK: abc'
KO: a''b
KO: :!ù
Explanation:
The regular expression:
(?-imsx:\b((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))\b)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z]{2} any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
'' '\'\''
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
[a-z']{2,} any character of: 'a' to 'z', ''' (at
least 2 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Perl regular expression in Perl/Curl script

I'm not all that sure how this works/what it means...
my ($value) = ($out =~ /currentvalue[^>]*>([^<]+)/);
So basically, thats part of a CURL/PERL script, it goes onto www.example.com, and finds <span id="currentvalue"> GETS THIS VALUE </span> in the pages html.
What exactly does the [^>]*>([^<]+)/) part of the script do? Does it define that its looking for span id=".." ?
Where can I learn more about the [^>]*>([^<]+)/) functions?
/.../ aka m/.../ is a the match operator. It checks if its operand (on the LHS of =~) matches the regular expression within the literal. Operators are documented in perlop. (Go down to "m/PATTERN/".) Regular expressions are documented in perlre.
As for the regular expression used here,
$ perl -MYAPE::Regex::Explain \
-e'print YAPE::Regex::Explain->new($ARGV[0])->explain' \
'currentvalue[^>]*>([^<]+)'
The regular expression:
(?-imsx:currentvalue[^>]*>([^<]+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
currentvalue 'currentvalue'
----------------------------------------------------------------------
[^>]* any character except: '>' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^<]+ any character except: '<' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
This is plain vanilla Perl regexp. See this tutorial
/ # Start of regexp
currentvalue # Matches the string 'currentvalue'
[^>]* # Matches 0 or more characters which is not '>'
> # Matches >
( # Captures match enclosed in () to Perl built-in variable $1
[^<]+ # Matches 1 or more characters which is not '<'
) # End of group $1
/ # End of regexp