Regex to match the value optionally enclosed by double quotes

Regex to match the value optionally enclosed by double quotes - regex

I have 3 columns delimited by white spaces but the second field is optionally enclosed by double quotes.
I want to extract the 1st field, 2nd field(value within the double quotes) and third field, sometimes the 2nd field value might not be enclosed within the double quotes in that case just return the existing value.
Sample Input
1a "2a 2.1a 2.2a" 3a
4b "5.5b 5.6b 5.7b" 6b
7c 8c 9c
Final output
Matching Information are
1st row match
\1 1a
\2 2a 2.1a 2.2a
\3 3a
2nd row match
\1 4b
\2 5.5b 5.6b 5.7b
\3 6b
3rd row match
\1 7c
\2 8c
\3 9c
I tried the below regex and it works fine for the first two inputs but the third line is not matched, Can someone help me to solve this issue?
Regex i tried:
([a-z0-9]+)\s+"([a-z0-9\s.]+)"\s+([a-z0-9]+)
Link:
https://regex101.com/r/rN4uB4/1

You could simply make the quotations optional in your pattern. By following the preceding token with ? you're telling the regular expression engine to match the preceding between "zero and one" time.
([a-z0-9]+)\s+"?([a-z0-9\s.]+)"?\s+([a-z0-9]+)
If your language supports it, you could use the branch reset feature. By using this feature, both capturing groups in the alternatives are considered as one capturing group.
([a-z0-9]+)\s+(?|"([^"]+)"|([a-z0-9]+))\s+([a-z0-9]+)

The problem with your regex is that it is is optional that the values are quoted.
You can parse this using:
([a-z0-9]+)\s+"?([a-z0-9\s.]+)"?\s+([a-z0-9]+)
The ? means the group (or character " in this case) is optional.
It makes me however wonder what you want to do? This looks a lot like bash argument parsing. Sometimes you can take advantage of libraries for this...
EDIT
#PetSerAl brings up a valid point: both quotes " are independent of each other, so:
4b "5.5b 5.6b 5.7b 6b
4b 5.5b 5.6b 5.7b" 6b
Will match as well, you can solve this by introducing additional capture groups:
([a-z0-9]+)\s+("([a-z0-9\s.]+)"|([a-z0-9\s.]+))\s+([a-z0-9]+)
In that case the old capture groups map on the new as follows:
\1 -> \1
\2 -> \3 (with quotes) or \4 (without quotes)
\3 -> \5
One can also use \2 for the old \2, but the new \2 will include the quotes " as well if they are part of the string.
It will thus cost more postprocessing to handle them correctly.

Related

Regex with Named Capture Group

I am trying to update a regex pattern to include a Named Capture Group. Currently, this regex pattern:
\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\1\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b
correctly returns 4 matches from this sample text:
AAA
43 42 040 012 036 00
43 42 090 037 124 00
53 07 010 005 124 00
06-14 301-830-081-49
BBB
When I revised the pattern to add a Named Capture Group it only returns 3 matches and misses the last one.
(?<myPattern>\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\1\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b)
How can I keep the Named Capture Group but still return 4 matches ?
See example here.
Thanks.

Named capturing groups are still assigned numeric IDs. That is, (?<myPattern>[a-z]+)(\d+) contains two groups, the first one - with ID 1 - is "myPattern" group matching one or more lowercase letters, and the second group is \d+, with ID 2.
In your case, the problem arises due to the use of the \1 backreference later in the pattern. It refers to "myPattern" group value now, so the matching is incorrect.
To fix the issue, you need to replace \1 to the corresponding group, \2:
(?<myPattern>\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\2\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b)
See the regex demo.

hours:minutes:seconds Timestamp regex

I'm trying to validate a user inputed video timestamp in the format hours:minutes:seconds using a regex. So I'm assuming the hours component can be arbitrarily long, so the final format is basically hhhh:mm:ss where there can be any number of h. This is what I have so far
(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])
where
(
([0-9]+:)? # hh: optionally with an arbitrary number of h
([0-5][0-9]:) # mm: , with mm from 00 to 59
)? # hh:mm optionally
([0-5][0-9]) # ss , wtih ss from 00 to 59
which I believe is almost there, but doesn't handle cases like 1:31 or just 1. So to account for this if I add the first digit inside the mm and ss blocks as optional,
(([0-9]+:)?([0-5]?[0-9]:))?([0-5]?[0-9])
firstly the last seconds block starts matching values like 111. Also values like 1:1:12 are matched , which I don't want (should be 1:01:12). So how can I modify this so that m:ss and s are valid whereas h:m:ss,m:s and sss are not?
I am new to regular expressions, so apologies in advance if I'm doing something stupid. Any help is appreciated. Thanks.

You can match either 1 or more digits followed by an optional :mm:ss part, or match mm:ss.
To also match 6:12 and not 1:1:12 make only the first digit optional in the second part of the pattern.
^(?:\d+(?::[0-5][0-9]:[0-5][0-9])?|[0-5]?[0-9]:[0-5][0-9])$
^ Start of string
(?: Non capture group
\d+ Match 1+ digits
(?::[0-5][0-9]:[0-5][0-9])? Match an optional :mm:ss part, both in range 00 - 59
| Or
[0-5]?[0-9]:[0-5][0-9] Match m:ss or mm:ss both in range 00-59 where the first m is optional
) Close non capture group
$ End of string
Regex demo

Doesn't adding the positional anchors(^ and $)solve your problem?
^(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])$
Check here: https://regex101.com/r/fRZf2R/1

How to replace “35yrs” with “35 yrs” using regular expressions?

This question is about Search & Replace.
I have a list that looks like this:
35yrs
74 yrs
40yrs
24yrs
36 yrs
I want to use regular expressions to make the list look like this:
35 yrs
74 yrs
40 yrs
24 yrs
36 yrs
I have this for the search:
\d+[y][r][s]
What should the replace string look like? Textmate's search engine requires numbers in the result, e.g., $1, that represent the regex in the search field.

Capture with:
*(yrs)
and replace with:
\1
Note the leading whitespace in both match and replacement. Demo here.

There’s no reason for […] around individual letters — ever.
For your expression, just capture the digits:
(\d+)\s?yrs
And replace them:
\1 yrs
Strictly speaking matching the space (\s?) is unnecessary: if you do not include it, those entries that already include a space will not be matched, which is fine: theyʼre already correct, after all.

You could use the following regex:
(\d+)\s*(yrs)
And replace it with:
$1 $2
This matches the number in the first capturing group (indicated with ()) and the yrs in the second. The replacement is then both capturing groups with a space between. The whitespace is not matched in between the two capturing groups and is optional.

If you're using Java, you can do it like so.
String str = "35yrs";
str = str.replaceAll("(\\d+)\\s*yrs)", "$1 yrs);
Changes years followed by zero or more spaces and yrs to years followed by one space followed by yrs.

Notepad ++ Regular Expressions

Notepad ++ Replacing Multiple Words
Okay so heres what i need to know, currently i am searching multiple words at once, heres some sample data
(\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b)
What i want to do is add a ":" to the end of every word that is found. Like this
41 dwadadad Rd:
93 awdawdadawd Terrace:
4/100 awdadawdwad St:
32 awdawdawdawd Ave:
59 awdawdawd Street: Ferny Grove
Is there a regular expression for only getting the end of the matched word?

I suggest using an alternation list with just two word boundaries - at the start and end of the pattern, and just one group:
\b(?:Rd|Terrace|St|Ave|Street)\b
And replace with $0: (where $0 backreference references the whole match, if the pattern matched Rd, the Rd will be inserted in the resulting string).
Note that we can use 2 \b only becayse they enclose the alternation non-capturing group (?:...), and are thus applied to each alternative. It shortens the regex and speeds it up.

All you have to do is change your regex to:
((\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b))
And then replace with: \1:

Regular expression for changing spaces to tabs in Notepad++

I'm trying to use regular expression in Notepad++ to change spaces to tabs in something like this
01 fsdfsd
01 01 fsdfsd
01 01* fsdfsd
01 01 01 fsdfsd
01 01 01* fsdfsd
How can I keep spaces between numbers and change only the last space?
Thanks.

Search for:
[ ]([a-zA-Z])
(Note that there is a space in front of the character class.) And replace with:
\t$1
An alternative that might be better suited if you also have lines that are of a different format, or if fsdfsd may contain spaces, is this:
^((?:\d+\*?)(?:[ ]\d+\*?)*)[ ]
Now replace with
$1\t
This matches any space after the longest possible string of digits with optional asterisks separated by spaces.

You could use a look head to only match on space followed by something other than a digit, but because notepad doesn't support look arounds, you must resort to a capture-and-release approach looking for a letter:
search: " +([a-zA-Z])" (don't include the quotes - there to show the space)
replace: \t$1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match the value optionally enclosed by double quotes - regex

Related

Regex with Named Capture Group

hours:minutes:seconds Timestamp regex

How to replace “35yrs” with “35 yrs” using regular expressions?

Notepad ++ Regular Expressions

Regular expression for changing spaces to tabs in Notepad++

Categories

Resources