I'd like to match every dot or comma but not in href attribute. So I have this regular expression:
^(?!.*?href=)(.*?)([.,])(\S+)
But it matches only the first occurrence. I think it because of non-greedy .*? But I can't come up with anything else. Can you help me, please?
What you might do to match every dot or comma and assuming that the attribute value is between single or double quotes is to match what you don't want and to capture in a group what you want to keep.
If you don't want to match a dot in the href you could match it with href=" followed by [^"]*" or '[^']*'. Then you could use an alternation | to capture in a group a dot or a comma using ([.,])
href=(?:"[^"]*"|'[^']*')|([.,])
If you want to match every occurrence, you will need to run the regex with the global (g) flag:
e.g.
/^(?!.*?href=)(.*?)([.,])(\S+)/g
I suggest you use a tool such as https://regex101.com/ to test and debug your regular expressions, it's super handy!
Related
I am trying to use regex to match anything but "id":digits part
I have come up with this "(\b(id":)(\d+)\b)" to find the id:byDigits pattern, but I need to negate that but haven't been able to get around it.
[{"age":1,"id":123,"value":"14"},
{"age":1,"id":4214,"value":"4324"},
{"age":3,"id":4244,"value":"545"}]
Any help is appreciated.
Simplest option is to capture the rest of the string into groups and use it in the substituion as below
Demo: https://regex101.com/r/cRVA5C/2/
Pattern: ^([\s\S]*?)\s*"id":\d+,?\s*([\s\S]*?)$
Breakdown:
([\s\S]*?): match any number of any characters before and after "id":. Capture it into groups \1 and \2
\s*"id":\d+,?\s*: match "id"=\d+, optionally preceded by spaces and optionally followed by spaces and ,.
In substituition, use \1\2, to get the desired output.
Note: Regex may not be the ideal tool for parsing JSON.
I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example
I am trying to create a regex expression to parse till \. Can you tell me how to create a regex expression.
The code i had created was
/[^\]*/
I find regex101.com really useful for testing regex.
I think you just need an extra backslash...
/[^\\]*/
If you want to get everything until a slash, just use:
/(.*?)\\/
(.*?) Capture group, containing the text until slash (not included)
.* Match everything 0 or more times.
? make the quantifier (*) lazy, so it matches only until the first slash if there are more than one.
Check this: http://regexr.com/3cnld
This text
"dhdhd89(dd)"
Matched against this regex
.+?(?:\()
..returns "dhdhd89(".
Why is the start parenthesis included in the capture?
Two different tools, as well as the .NET Regex class, returns the same result. So I gather there is something I don't understand about this.
The way I read my regex is.
Match any character, at least one occurrence. As few as possible.
The matched string should be followed by a start parenthesis, but not to be included in the capture.
I can find workaround, but I still want to know what is going on.
Just turn the non-capturing group to positive lookahead assertion.
.+?(?=\()
.+? non-greedy match of one or more characters followed by an opening parenthesis. Assertions won't match any characters but asserts whether a match is possible or not. But the non-capturing group will do the matching operation.
DEMO
You can just use this negation based regex to capture only text before a literal (:
^([^(]+)
When you use:
.+?(?:\()
Regex engine does match ( after initial text but it just doesn't return that in a captured group to you.
You havn't defined capture groups then I guess you display the whole match (group 0), you can do:
(.+?)(?:\()
and the string you want is in group 1
or use lookahead as #AvinashRaj said.
I'm trying since hours to get this negative-look-ahead to work for me. It should match my string only if it's NOT followed by '/CCC'
http://refiddle.com/1xb
/(^[\w]+)(?!./CCC$)/mg
Test string:
BBB/CCC
AAA/DDD/CCC
Could someone point out why my pattern still matches the 'BBB' of the first line?
Firstly, you have to escape the / inside the regular expression.
You also have a dot that shouldn't be there and are missing a word boundary:
/(^\w+)\b(?!\/CCC$)/mg
refiddle