Printing in patterns in perl - regex

I am having a great trouble to remove the errors in unicode encoded corpus.
In following form
രണവര്‍ഗ്ഗത്തിനകത്തു=ഭരണവര്‍ഗ്ഗത്തിന്:stemഅകത്തു|:suffix
ഭസ്മമാക്കിക്കളയുകയും=ഭസ്മം:stemആക്കിക്കളയുകയും|:suffix
ഭസ്മമാക്കി=ഭസ്മം:stemആക്കി|:suffix
ഭാഗത്തുനിന്നുണ്ടാകണം=ഭാഗത്ത്:stemനിന്ന്:stemഉണ്ടാകണം|:suffix,:
ഭാഗമായ=ഭാഗം:stemആയ|:suffix
ഭാര്യമാരില്‍നിന്നും=ഭാര്യമാരില്‍:stemനിന്നും|:suffix:suffix
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:stemനിന്നു|:suffix,:suffix:suffix
ഭാര്യയായി=ഭാര്യ:stemആയി|:suffix
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix:suffix
ഭിത്തികളൊക്കെ=ഭിത്തികള്‍:stemഒക്കെ|:suffix
ഭിന്നതയില്ലെന്നും=ഭിന്നത:stemഇല്ല:stemഎന്നും|:suffix,:suffix0
ഭൂപ്രഭുക്കളെന്ന്=ഭൂപ്രഭുക്കള്‍:stemഎന്ന്|:suffix0
ഭൂമിയില്‍നിന്ന്=ഭൂമിയില്‍:stemനിന്ന്|:suffix
ഭൂമിയിലുള്ള=ഭൂമിയില്‍:stemഉള്ള|:suffix
ഭൂമിയെപ്പോലൊരു=ഭൂമിയെ:stemപോലെ:stemഒരു|:suffix,:suffix0
ഭൂമുഖവീക്ഷണനായി=ഭൂമുഖവീക്ഷണന്‍:stemആയി|:suffix:suffix
ഭൂസഞ്ചാരംപോലെ=ഭൂസഞ്ചാരം:stemപോലെ|:suffix
ഭേദിക്കേണ്ടതായി=ഭേദിക്കേണ്ടതാ്:stemആയി|:suffix:suffix
ഭൗതികവാദികളാണ്=ഭൗതികവാദികള്‍:stemആണ്|:suffix0
മക്കളയച്ചു=മക്കള്‍:stemഅയച്ചു|:suffix
മക്കള്‍ക്കാണ്=മക്കള്‍ക്ക്:stemആണ്|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix
മഞ്ചേശ്വരത്താണ്=മഞ്ചേശ്വരത്ത്:stemആണ്|:suffix:suffix
മഞ്ഞുവെള്ളത്തിലാഴ്ത്തി=മഞ്ഞുവെള്ളത്തില്‍:stemആഴ്ത്തി|:suffix:suffix
മടങ്ങാണിതിന്=മടങ്ങ്:stemആണ്:stemഇതിന്|:suffix,:suffix
മടിയനായിരുന്നു=മടിയന്‍:stemആയിരുന്നു|:suffix
Where I need to remove two stem together and two suffixes together. In the case of two stems I need keep first stem and convert the second into suffix. In the case of two suffixes like this :suffix:suffix, :suffix,:suffix0 I need to keep only one suffix
use strict;
use warnings qw/ all FATAL /;
use List::Util 'reduce';
while ( <> ) {
my ($word, $ss) = / \( ( /[^()]* ) \) /gx;
my #ss = split ' ', $ss;
my $str = reduce { sprintf 'S (%s) (%s)', $a, $b } #ss;
printf "%s (%s)\n", $word, $str;
}
This is the perl code I am trying to change but that code is not sufficient to handle the complexities. Is there any way to handle the kinds of errors.
**Expected output**
`ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:stemനിന്നു|:suffix,:suffix:suffix` to
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:suffixനിന്നു|:suffix
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix:suffix to
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix to
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix
Any one interested in helping me?

Description
^([^:]+:stem[^:]+)(?::stem(?=.*?(:suffix))|)([^:]+?\|:suffix[^:]*)(?::suffix[^:]*)*$
Replace with: \1\2\3
This regular expression will do the following:
Assumes that each line will have a suffix string this is then pattern matched and pulled into the capture group 2
If there is a second stem it is replaced with suffix
Removes all but the first suffix entries
Example
Live Demo
https://regex101.com/r/rJ9gW3/2
Sample text
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:stemനിന്നു|:suffix,:suffix:suffix
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix
Sample Matches
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:suffixനിന്നു|:suffix,
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
:stem ':stem'
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
:stem ':stem'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[^:]+? any character except: ':' (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------

Related

Regex to remove all instance of letter outside of quotes

I have a string of text:
\n new"test \n aaaa" \n ta \n `this is a \n newline that should be kept`
My goal is to match all \n's outside of backticks (`), quotes ("), or single quotes ('). Based off another question (https://stackoverflow.com/a/48953880/14465957), I switched the positive lookahead used to a negative one, which now matches all newlines outside of quotes ("). However, it doesn't work when I attempted to ignore single and back ticks.
What am I doing wrong?
Working quotes:
https://regex101.com/r/ooqz5d/1/
If you're using PCRE, you can use a control verb to skip everything inside of a quote closure:
(['"`]).*?\1(*SKIP)(*F)|\\n
(['"`]) any type of quote, put it in group 1
.*? any characters, non greedy
\1 the quote that captured in group 1
(*SKIP)(*F) skip the current match, which is a quote closure
|\\n match a \n
See the test cases
Also, if you need to ignore escaped quotes(\", \' etc), you may try
(['"`])(?:(?<!\\)\\(?:\\\\)*\1|(?!\1).)*\1(*SKIP)(*F)|\\n
Check the test cases
Using JavaScript
For JavaScript, you can't use control verbs. But you can use group capture to replace outbound \n
Regex
((['"`])[\s\S]*?\2)|\\n
Substitution
$1
const regex = /((['"`])[\s\S]*?\2)|\\n/g;
const text = String.raw`\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n', text);
const result = text.replace(regex, '$1');
console.log('after\n', result);
Real line breaks
const regex = /((['"`])[\s\S]*?\2)|\n/g;
const text = `\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n----\n', text);
const result = text.replace(regex, '$1');
console.log('after\n----\n', result);
Use
text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1')
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
JavaScript code:
const text = String.raw`\nnew"test\naaaa\\\n"\nta\n\`this is a \nnewline that should be kept\`\n'this is a \nnew test'\n`
console.log(text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1'))

Capture words on the right side of | (OR) in regex expression that are not in the left

I am trying to capture words on the right side of this regex expression that are not captured on the left.
In the code below, the left side captures "17 inch" in this string: "this 235/45R17 is a 17 inch tyre"
(?<=([-.0-9]+(\s)(inches|inch)))|???????
However, anything I put in the right side, such as a simple +w is interfering with the left side
How can I tell the RegEx to capture any word, unless it is a digit followed by inch - in which case capture both 17 and inch?
Description
((?:(?![0-9.-]+\s*inch(?:es)?).)+)|([0-9.-]+\s*inch(?:es)?)
** To see the image better, simply right click the image and select view in new window
Example
Live Demo
https://regex101.com/r/fY9jU5/2
Sample text
this 235/45R17 is a 17 inch tyre
Sample Matches
Capture group 1 will be the values that didn't match the 17 inch
Capture Group 2 will be the number of inches
MATCH 1
1. [0-20] `this 235/45R17 is a `
MATCH 2
2. [20-27] `17 inch`
MATCH 3
1. [27-32] ` tyre`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture (1 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[0-9.-]+ any character of: '0' to '9', '.',
'-' (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
inch 'inch'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount
possible)):
----------------------------------------------------------------------
es 'es'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[0-9.-]+ any character of: '0' to '9', '.', '-'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
inch 'inch'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
es 'es'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------

Return specific matched value in a regex query

I have a string that looks like this:
5 Secs ( 14.2725% ) 60 Secs ( 12.630% ) 300 Secs ( 15.5993% )
Using (\d{2}[.]\d{3}), I can match the values I want; but, I just need value 1 on one query, value 2 on another query and value 3 for the third. This is part of a monitoring system, so it has to be done with a single line of regex, I don't have access to other shell tools that would make this easy.
Description
^(?:[^(]*\([^)]*\)){2}[^(]*\(\s*\K[0-9]{2}[.][0-9]{3}
^^^
The number in the {2} can be changed on demand and the expression will then select the N+1 percent value in the string. If you select 0 then the expression will match the first percent, if you select 1 then the expression will match the second percent... and so on.
This expression assumes the language supports the \K regex command that forces the engine to drop everything that it has matched uptil the \K.
Example
Live Demo
https://regex101.com/r/oX0mJ4/1
Sample text
5 Secs ( 14.2725% ) 60 Secs ( 12.630% ) 300 Secs ( 15.5993% )
Sample Matches
Using the expression as written it returns the 3 third entry.
15.599
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
(?: group, but do not capture (2 times):
----------------------------------------------------------------------
[^(]* any character except: '(' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
[^)]* any character except: ')' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
){2} end of grouping
----------------------------------------------------------------------
[^(]* any character except: '(' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\K 'K'
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
[.] any character of: '.'
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------

How to match minimize length of strings around keyword by regex

I have below string (one long string).
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1111","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1112","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2223"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=22223&Locale=en_US","selector":"ExecutionEnd"},
Please note:
The string is in one line and you could notice that in each {} pair, the content is very similar.
I could only do it with regex and cannot do any split by any functions.
I want to use regular expression to filter out the one containing CustomerID with minimum complete length. For example, I want to filter out as below.
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"}
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2223"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=22223&Locale=en_US","selector":"ExecutionEnd"},
But I'm not sure how to do this. I tried many times with zero width assertion but still cannot figure it out. Could you please enlighten me? Thanks!
Forward
I don't recommend using Regex to parse JSON because of all the possible edge cases. But it appears you have some control over the data and can therefore limit the edge cases.
Description
Based on your source text, this regex will do the following:
Find all the JSON entries that have a CustomerID field in nested inside the Params array, and embeded inside the href string
Validates that both the CustomerID located in Params and href are identical
work with both compressed and expanded JSON
Avoids some obvious edge cases that the regex police complain about
Note: running this regex I used the Case insensitive flag.
\{(?=(?:"[^"]*"|[^{}"]*|\{[^{}]*})*?"params":\{(?:"[^"]*"|[^{}"]*|\{[^{}]*})*?"CustomerID":"([^"]*)")(?=(?:"[^"]*"|[^{}"]*|\{[^{}]*})*?"href":"[^"]*&CustomerID=\1)(?:"[^"]*"|[^{}"]*|\{[^{}]*})*}
To view the image better, right click the image and select open in new window.
Examples
Source Text
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1111","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"}
,{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1112","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2223"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=22223&Locale=en_US","selector":"ExecutionEnd"},
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"44444"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=44444&Locale=en_US","selector":"ExecutionEnd"},
Matches
[0][0] = {"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"}
[0][1] = 2222
[1][0] = {"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"44444"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=44444&Locale=en_US","selector":"ExecutionEnd"}
[1][1] = 44444
Explained
Capture Groups
group 0 gets the entire matching JSON block
group 1 gets the value associated with CustomerID
Expanded
NODE EXPLANATION
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
"params": '"params":'
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
"CustomerID":" '"CustomerID":"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
"href":" '"href":"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
&CustomerID= '&CustomerID='
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
} '}'

Regex for valid sharepoint folder name

I am trying to make regular expression for Valid sharepoint folder name, which have conditions:
Cannot begin or end with a dot,
Cannot contain consecutive dots and
Cannot contain any of the following characters: ~ " # % & * : < > ? / \ { | }.
Wrote Regex for 1st and 3rd point:
[^\.]([^~ " # % & * : < > ? / \ { | }]+) [^\.]$
and for third (?!.*\.\.).*)$ but they are not working properly and have to integrate them into one expression.
Please help.
What about just
^\w(?:\w+\.?)*\w+$
I made a small test here
EDIT
This also works
^\w(?:\w\.?)*\w+$
How about:
/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/
explanation:
The regular expression:
(?-imsx:^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?/\\{|}]+).+$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[~"#%&*:<>?/\\{|}] any character of: '~', '"', '#', '%',
+ '&', '*', ':', '<', '>', '?', '/', '\\',
'{', '|', '}' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In action (perl script):
my $re = qr/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/;
while(<DATA>) {
chomp;
say /$re/ ? "OK : $_" : "KO : $_";
}
__DATA__
.abc
abc.
a..b
abc
output:
KO : .abc
KO : abc.
KO : a..b
OK : abc