Visual studio code - regex - edit multiple line? - regex

I want to use visual studio code regex to introduce commas and single quote at different point of a line. Can you help please?
I want to transform
(1 ant 18 0 test abacus123 789 pass),
(2 dog 26 67 exp b+45 456 fail),
(3 tiger 5 2 'reg e-t' 126 fail),
To
(1, 'ant', 18, 0, 'test abacus123', 789, 'pass'),
(2, 'dog', 26, 67, 'exp b+45', 456, 'fail'),
(3, 'tiger', 5, 2, 'reg e-t', 126, 'fail'),
There are so many lines of data that i have to transform like this, not sure how to do it.
Any help is much appreciated.

You need to provide more examples or describe the rules better.
According to the examples you have provided so far, you may try the following regex:
Regex
\((\d+) '?(.+?)'? (\d+) (\d+) '?(.+?)'? (\d+) '?(\w+)'?\)
Substitution
($1, '$2', $3, $4, '$5', $6, '$7')
Check the test cases

Related

Regex for month with optional leading 0

I am trying to match various months, that may be in the form of:
01
1
12
13
09
All of the above inputs are valid except for 13.
The current regex I have for this is:
0?(?#optional leading 0, for example 04)
\d(?#followed by any number, 01, 2, 09, etc.)
|(?#or 10,11,12)
1[012]
What's wrong with the above regex? Here's an example link: https://regex101.com/r/cujCmD/1
I would phrase the regex as:
^(?:0?[1-9]|1[012])$
Demo
The parentheses and anchors are needed to ensure that the alternation chosen gets applied to the entire number input.

R regex: How to extract string with one or two digit number within title?

I have a bunch of filenames that are numbered that I would like to be able to extract based on a regex statement.
For example, say I have the following filenames:
file.names <- paste0("run", 0:99, ".dat.gz")
If I wanted to extract files 5 through 8, I would need a regex that returns the following:
grep("correct_regex", file.names, value=TRUE)
"run5.dat.gz" "run6.dat.gz" "run7.dat.gz" "run8.dat.gz"
Or if I wanted to return files 9 through 21, it would return the following:
grep("correct_regex", file.names, value=TRUE)
"run9.dat.gz" "run10.dat.gz" "run11.dat.gz" "run12.dat.gz" "run13.dat.gz" "run14.dat.gz" "run15.dat.gz" "run16.dat.gz" "run17.dat.gz" "run18.dat.gz" "run19.dat.gz" "run20.dat.gz" "run21.dat.gz"
The tricky part if developing a regex that extracts the number as opposed to the digits (e.g. [0-9]). Any tips to help with this?
I also think that Sam's answer is the correct one, but just in case you also need to quickly extract non-sequential items, here is how you can easily build the regex you need (these subpatterns are to be used between "^run and [.]dat[.]gz$"):
Use [5-8] to match all digits from 5 to 8 (as in the current example)
For non-sequential one-digit values, add the ranges separately ([1-37-9] will match 1, 2, 3, 7, 8, 9)
When you need to combine numbers of different length, use alternations with (...|...):
(1[2-4]|2[89]) - will match 12, 13, 14, 28 and 29
(2[3-5]|[0-2]) - will match 23, 24, 25, 0, 1, and 2
In your case, you can use
> file.names <- paste0("run", 0:99, ".dat.gz")
> grep("^run[5-8][.]dat[.]gz$", file.names, value=TRUE)
[1] "run5.dat.gz" "run6.dat.gz" "run7.dat.gz" "run8.dat.gz"
>
Note that ^ matches the start of string and $ matches the end of string (so, this regex ensures a full string match).
You could accomplish this with a simple function and avoid regexes:
get_numbered_filenames <- function(num_vec){
target <- paste0("run", num_vec, ".dat.gz")
file.names[file.names %in% target]
}
get_numbered_filenames(9:21)
[1] "run9.dat.gz" "run10.dat.gz" "run11.dat.gz" "run12.dat.gz" "run13.dat.gz" "run14.dat.gz"
[7] "run15.dat.gz" "run16.dat.gz" "run17.dat.gz" "run18.dat.gz" "run19.dat.gz" "run20.dat.gz"
[13] "run21.dat.gz"

Regex leading zeros from string in Hive

I have a 19 - character string in Hive that I need to split up and remove any leading zeros.
Example:
7212092180052740029
I need it to be split like this
721 20 9218 00527 40029
So there are no leading zeros in 1st, 2nd, or 3rd section, and 00 would be removed from the 4th section; section 5 will be disregarded. My desired result would be
721209218527
My first-pass solution is
trim(concat_ws('', regexp_replace(substr(some_string, 1, 3), '^0*', '')
, regexp_replace(substr(some_string, 4, 2), '^0*', '')
, regexp_replace(substr(some_string, 6, 4), '^0*', '')
, regexp_replace(substr(some_string, 10, 5), '^0*', '')))
but this seems like extreme overkill. Any ideas how to do this with one line of regex?
Also, it should be noted that in any of the 5 sections, when split, will never be all zeros (i.e. section one will never be 000); if so then my 'solution' wouldn't work, as all zeros would be leading ones and '^0* would return nothing.
^0*|(?<=^.{3})0*|(?<=^.{5})0*|(?<=^.{9})0*|(?<=^.{14}).*$
You can use this regex and replace by empty string.See demo.
https://regex101.com/r/rO0yD8/15

What's the best Regular Expression to use for returning some phone numbers, but not all?

I'm new to Regular expressions and working on something that will return all UK phone numbers with an area code beginning 01, 02, 03 or 07 only. It has to not look up 08 or 09. It also has to take in to account the different grouping styles too. But here's the kicker... it's got to be 80 characters or less.
This was my best shot:
(01|02|03|07|44\D*1|44\D*2|44\D*3|44\D*7|)(\d\D*){9}
The problem is that it's returning any 9 digit or less number and I can't figure out why.
Any help would be grand!
(01|02|03|07|44\D*1|44\D*2|44\D*3|44\D*7) is matching either 0 or 44\D* followed by 1, 2, 3 or 7 which simplifies to:
(?:44\D*|0)[1237]
Putting that with the rest gives:
(?:44\D*|0)[1237](\D*\d\D*){9}
Debuggex Demo

Regex to Add a Character in a Space Pattern

I have something like-
[[59],
[73 41],
[52 40 09],
[26 53 06 34],
[10 51 87 86 81],
[61 95 66 57 25 68]]
I need to add a comma before every space to be like -
[[59],
[73, 41],
[52, 40, 09],
[26, 53, 06, 34],
[10, 51, 87, 86, 81],
[61, 95, 66, 57, 25, 68]]
What would be regex string for that?
Judging from your data, you may just replace a space ' ' by a comma followed by a space ', '. You do not need a regex for that.
This depends on what regex flavor you are using but in general, looking for matches would be
(\d+)\s
and replacing would be
\1,
In Notepad++, open up the find control window with Ctrl+H.
In Find What put a single space character
In Replace With put a comma followed by a space character
This gives the expected output, but isn't very interesting as far as Regexes go.
s/\( \)/,\1/g
And, as an afterthought:
s/ /, /g
Why bother with substitution replacement? :)
Replace (\d)\s(\d) with
\1, \2