Regex get every string from start until new line? - regex

I have a string like this :
Name: Yoza Jr
Address: Street 123, Canada
Email: yoza#gmail.com
I need get data using regex until new line, for example
Start with Name: get Yoza Jr until new line for name data
so I can have 3 data Name, Address, Email
How to Regex get every string from start until new line?
btw I will use it in golang : https://regex-golang.appspot.com/assets/html/index.html

The pattern ^.*$ should work, see the demo here. This assumes that .* would not be running in dot all mode, meaning that .* will not extend past the \r?\n newline at the end of each line.
If you want to capture the field value, then use:
^[^:]+:\s*(\S+)$
The quantity you want will be present in the first capture group.

I would suggest you use the pattern ^(.+):\s*(.*)$
Demo: https://regex101.com/r/Q9D4RM/1
Not only will it result in 3 distinct matches for the string given by you, the field name (before the ":") will be read as group 1 of the match, and the value (after the ":") will be read as group 2. So, if you want the key-value pairs, you can just search for groups 1 and 2 for each match.
Please let me know if it's unclear so I can elaborate.

Related

regex extract exact one string

I have two strings:
"John Johnson Phone Number"
"John Johnson Alternate Phone Number"
Need to extract first one, name and last name might change
I was matching first string with this regex as Name and Last name might change.
^\w+ \w+( \w+)? Phone Number$
Seems pretty easy but I've brain freeze cannot solve it for few hours.
Issue now that same regex picks up 2nd string which I do not want to be picked up.
Maybe someone could give me a hint how to match only first string and do not take strings which contains Alternate word? Thanks
If I understand correctly, you want to capture the whole string, and extract the words before "Phone number". You can do this with capture groups. You can name your capture groups to such that you do not have to worry about which index number the group is at (if you add/remove groups later).
The syntax is (?P<name>...).
So for your situation I put the first two \w+ into the capture group name. The returned matches is the full string matched in index 0. Indices after are the subgroups. You can use re.SubexpIndex("name") to find the correct subgroup index for the named subgroup name.
https://goplay.tools/snippet/dcwWg3FBWUd
re := regexp.MustCompile(`^(?P<name>\w+ \w+)( \w+)? Phone Number$`)
str := "John Johnson Alternate Phone Number"
index := re.SubexpIndex("name")
matches := re.FindStringSubmatch(str)
if len(matches) > 0 {
fmt.Printf("Name: %s\n", matches[index])
} else {
fmt.Println("No Match")
}
EDIT: I thought this was a golang question :facepalm:
This still works using capture groups to extract the relevant sub matches out.

Regex for SQL Query

Hello together I have the following problem:
I have a long list of SQL queries which I would like to adapt to one of my changes. Finally, I have a renaming problem and I'm afraid I want to solve it more complicated than expected.
The query looks like this:
INSERT member (member, prename, name, street, postalcode, town, tel1, tel2, fax, bem, anrede, salutation, email, name2, name3, association, project) VALUES (2005, N'John', N'Doe', N'Street 4711', N'1234', N'Town', N'1234-5678', N'1234-5678', N'1234-5678', N'Leader', NULL, N'Dear Mr. Doe', N'a#b.com', N'This is the text i want to delete', N'Name2', N'Name3', NULL, NULL);
In the "Insert" there was another column which I removed (which I did simply via Notepad++ by typing the search term - "example, " - and replaced it with an empty field. Only the following entry in Values I can't get out using this method, because the text varies here. So far I have only worked with the text file in which I adjusted the list of queries.
So as you can see there is one more entry in Values than in the insertions (there was another column here, but it was removed by my change).
It is the entry after the email address. I would like to remove this including the comma (N'This is the text i want to delete',).
My idea was to form a group and say that the 14th digit after the comma should be removed. However, even after research I do not know how to realize this.
I thought it could look like this (tried in https://regex101.com/)
VALUES\s?\((,) something here
Is this even the right approach or is there another method? I only knew Regex to solve this problem, because of course the values look different here.
And how can I finally use the regex to get the queries adapted (because the queries are local to my computer and not yet included in the code).
Short summary:
Change the query from
VALUES (... test5, test6, test7 ...)
To
VALUES (... test5, test7 ...)
As per my comment, you could use find/replace, where you search for:
(\bVALUES +\((?:[^,]+,){13})[^,]+,
And replace with $1
See the online demo
( - Open 1st capture group.
\bValues +\( - Match a word-boundary, literally 'VALUES', followed by at least a single space and a literal open paranthesis.
(?: - Open non-capturing group.
[^,]+, - Match anything but a comma at least once followed by a comma.
){13} - Close non-capture group and repeat it 13 times.
) - Close 1st capture group.
[^,]+, - Match anything but a comma at least once followed by a comma.
You may use the following to remove / replace the value you need:
Find What: \bVALUES\s*\((\s*(?:N'[^']*'|\w+))(?:,(?1)){12}\K,(?1)
Replace With: (empty string, or whatever value you need)
See the regex demo
Details
\bVALUES - whole word VALUES
\s* - 0+ whitespaces
\( - a (
(\s*(?:N'[^']*'|\w+)) - Group 1: 0+ whitespaces and then either N' followed with any 0 or more chars other than ' and then a ', or 1+ word chars
(?:,(?1)){12} - twelve repetitions of , followed with the Group 1 pattern
\K - match reset operator that discards the text matched so far from the match memory buffer
, - a comma
(?1) - Group 1 pattern.
Settings screen:

Extract everything between pipes in key value pair

I have following sourceString
|User=gmailUser1|login with password=false|addition information=|source IP location=DE|
I want to extract everything between pipes in key value pair. In this case
User=gmailUser1
Login with password=false
addition information=
Source IP location=DE
My regex pattern is giving me the entire string.
\|(\b+)=(\b+)\|
Try with the expression:
/\|([^=|]+)=([^|]*)/g
or if you just want the pattern:
\|([^=|]+)=([^|]*)
Depending on your environment you will be able to get captures of group 1 and 2 for each key-value pair.
(I'm not able to test it out right now.)
Update 1: I did a short test and adapted it with the optimization of Wiktor Stribizew.
Update 2: Short explanation of the regex used:
The \b in your pattern means word boundary and does not represend a sign. You cannot combine it with +. See also What is a word boudary.
The first group ([^=|]+) matches anything that is not a = or a | with at least one character.
The second group ([^|]*) matches anything that is not a = with zero or more characters (addition information has an empty value).
Try this:
\w+(=|\s|\w+)
this match:
\w+ = numletter chars and a matching group
(=|\s|\w+) = a = sing, blank space or another numletter group

Go ReplaceAllString

I read the example code from golang.org website. Essentially the code looks like this:
re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
The output is like this:
-T-T-
--xx-
---
-W-xxW-
I understand the first output, but I don't understand the the rest three. Can someone explain to me the results 2,3 and 4. Thanks.
The most intriguing is the fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W")) line. The docs say:
Inside repl, $ signs are interpreted as in Expand
And Expand says:
In the template, a variable is denoted by a substring of the form $name or ${name}, where name is a non-empty sequence of letters, digits, and underscores.
A reference to an out of range or unmatched index or a name that is not present in the regular expression is replaced with an empty slice.
In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.
So, in the 3rd replacement, $1W is treated as ${1W} and since this group is not initialized, an empty string is used for replacement.
When I say "the group is not initialized", I mean to say that the group is not defined in the regex pattern, thus, it was not populated during the match operation. Replacing means getting all matches and then they are replaced with the replacement pattern. Backreferences ($xx constructs) are populated during the matching phase. The $1W group is missing in the pattern, thus, it was not populated during matching, and only an empty string is used when replacing phase occurs.
The 2nd and 4th replacements are easy to understand and have been described in the above answers. Just $1 backreferences the characters captured with the first capturing group (the subpattern enclosed with a pair of unescaped parentheses), same is with Example 4.
You can think of {} as a means to disambiguate the replacement pattern.
Now, if you need to make the results consistent, use a named capture (?P<1W>....):
re := regexp.MustCompile("a(?P<1W>x*)b") // <= See here, pattern updated
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
Results:
-T-T-
--xx-
--xx-
-W-xxW-
The 2nd and 3rd lines now produce consistent output since the named group 1W is also the first group, and $1 numbered backreference points to the same text captured with a named capture $1W.
$number or $name is index of subgroup in regex or subgroup name
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
$1 is subgroup 1 in regex = x*
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
$1W no subgroup name 1W => Replace all with null
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
$1 and ${1} is the same. replace all subgroup 1 with W
for more information : https://golang.org/pkg/regexp/
$1 is a shorthand for ${1}
${1} is the value of the first (1) group, e.g. the content of the first pair of (). This group is (x*) i.e. any number of x.
ReplaceAllString replaces every match. There are two matches. The first is ab, the second is axxb.
No 2. replaces any match with the content of the group: This is "" in the first match and "xx" in the second.
No 4. adds a "W" after the content of the group.
No 3. Is left as an exercise. Hint: The twelfth capturing group would be $12.

Regular Expression: Extract the lines

I try to extract the name1 (first-row), name2 (second-row), name3 (third-row) and the street-name (last-row) with regex:
Company Inc.
JohnDoe
Foobar
Industrieterrein 13
The very last row is the street name and this part is already working (the text is stored in the variable "S2").
REGEXREPLACE(S2, "(.*\n)+(?!(.*\n))", "")
This expression will return me the very last line. I am also able the extract the first row:
REGEXREPLACE(S2, "(\n.*)", "")
My problem is, that I do not know how to extract the second and third row....
Also how do I test if the text contains one, two, three or more rows?
Update:
The regex is used in the context of Scribe (a ETL tool). The problem is I can not execute sourcecode, I only have the following functions:
REGEXMATCH(input, pattern)
REGEXREPLACE(input, pattern, replacement)
If the regex language provides support for lookaheads you may count rows backwards and thus get (assuming . does not match newline)
(.*)$ # matching the last line
(.*)(?=(\n.*){1}$) # matching the second last line (excl. newline)
(.*)(?=(\n.*){2}$) # matching the third last line (excl. newline)
just use this regex:
(.+)+
explain:
.
Wildcard: Matches any single character except \n.
+
Matches the previous element one or more times.
As for a regular expression that will match each of four rows, how about this:
(.*?)\n(.*?)\n(.*?)\n(.*)
The parentheses will match, and the \n will match a new line. Note: you may have to use \r\n instead of just \n depending; try both.
You can try the following:
((.*?)\n){3}