Regex to extract names from a string into a list - regex

How is it possible to use regex just to extract the names from the following string:
Liam got 6,andy got 6
And add it to a list, i've tried using regex but i cant find the correct expression to extract just the names and am still a bit shaky on this area.
any help would be appreciated

For simple case, I always recommend not to use Regex, you could do it like this using string.Split, string.Replace, and LINQ Where:
Dim names As String() = sentence.Replace("got ", "").Split(" ").Where(Function(t) Char.IsLetter(t(0))).ToArray()
Suppose you have this sentence:
Dim separators As Char() = {",", " "}
Dim names As String() = sentence.Replace("got ", "").Split(separators, System.StringSplitOptions.RemoveEmptyEntries).Where(Function(t) Char.IsLetter(t(0))).ToArray()
What happen step by step is:
"Andy got 6,may got 10, blue got 9, hERald got 0"
"Andy 6,may 10, blue 9, hERald 0" 'After replace
"Andy" "6" "may" "10" "blue" "9" "hERald" "0" 'After split
"Andy" "may" "blue" "hERald" 'After where

This should work in vb.net.
(?<=^|,)\w+
https://regex101.com/r/wT8rE9/1
And if it can have a space after the comma:
(?<=^|,|,\s)\w+
If you're comfortable with capture groups, you could do the following which should be more efficient:
(?:^|,\s*)(\w+)

Related

String excerpts

I would like to copy a certain string (out of a longer range of strings in one cell) and show it in a different cell with Google Sheets. This is what is in the initial cell A1:A :
"String 1","String 2","String 3"
In B1:B I'd like ONLY String 3, so without the "" and the other strings.
Is this possible with spreadsheets?
Or is there any other way of doing so?
Update
So the task is to get word inside double quotes. And the mathcing string is placed in the end of text.
You may use regular expressions to deal with that, the basic formula is:
=REGEXEXTRACT(A1,"([^""]+)""$")
This will give a word inside "" from text in cell A1 at the end of text.
For example:
some text...,"Thisthat","https://www.url.com/de/Thisthat"
gives https://www.url.com/de/Thisthat
You may also use arrayformula:
=ArrayFormula(REGEXEXTRACT(A1:A3,"([^""]+)""$"))
Please, read more about this functions here and here.
Old answer
if you want strings to be on their rows, use this formula in B1:
=ArrayFormula(if(A1:A = "String 3";A1:A;""))
If you have cells in A1:A, which contain 'string 3', and you want to match them too, use this:
=ArrayFormula(if(REGEXMATCH(A1:A , "String 3"),"String 3",""))

Split line at commas, only if commas not contained between quotes

Is there any way to use the split function in scala so that it splits a line at commas but doesn't at commas contained within 2 double quotes?
For example, I have the following:
x: String = """"??", "hamburger", "ketchup, mayo, mustard", "pizza""""
and I tried this:
x.split(',') but it didn't work. I then thought about removing all double quotes but that still doesn't solve my problem.
Any help would be greatly appreciated!
EDIT:
Here's a snippet of my code to see how I can incorporate this:
val data1 = noheader1.map { line =>
val values = line._1.split(',') //This is what I am trying to change
val name = values(2).replaceAll("\"", ""))
I am a bit new to scala and even more so to regex, so could someone clarify how to write that weird regex expression in my code so that I can obtain an ARRAY of the comma separated words of the line?
Try this!
(?>"(?>\\.|[^"])*?"|(,))
Regex101
Instead of split() you can use a regular expression and findAllIn(), like such:
val x = """"??", "hamburger", "ketchup, mayo, mustard", "pizza""""
""""[^"]+"""".r.findAllIn(x).toList
This will result in, List("??", "hamburger", "ketchup, mayo, mustard", "pizza")
Note: I am using triple-quotes (""") in the example.
Perhaps not so elegant as other regex already suggested, consider the splitting element between items as ", " and so
x.split("\",\\s+\"")
Array("??, hamburger, ketchup, mayo, mustard, pizza")
Then in the resulting array, to the head "?? apply stripPrefix("\"") and to the last pizza" apply stripSuffix("\"").

RegExReplace - A few examples to get me started, please

I'm trying to use RegExReplace to pre-process some text before it gets parsed for use in an Access database. Currently I have been defining a growing number of string patterns into a table, then use the stock Replace() function in VBA using that table. Works OK, but misses the mark in a few areas; I am pretty sure regular expressions will be a better long-term solution for me, but I am completely clueless how to construct them.
I'd like to see if the smart folks here can give me a leg up on the task using a few actual examples from my data, by illustrating the regex strings that will produce the desired result:
1. 6 IN 6IN
2. 12.3 IN X 2 YD 12.3IN_X_2YD
3. 6IN X 4IN 6IN_X_4IN
4. 8X120MM 8_X_120MM
5. 1 1/2" 1.5IN
6. CAT, DOG CAT DOG
7. CAT,DOG CAT DOG
8. CAT ,DOG CAT DOG
9. CAT , DOG CAT DOG
My patterns fail in ways like: CATHETER INFUSION => CATHETERINFUSION
I will be using a multi-pass approach vs attempting to come-up with some terribly complex expressions.
Can anyone offer some initial guidance to any of these samples. I'm confident I will be able to leverage these samples to extend as needed.
[Edit:] I did just find a few helpful examples:
NewStr := RegExReplace("abc123123", "123$", "xyz") ; Returns "abc123xyz" because the $ allows a match only at the end.
NewStr := RegExReplace("abc123", "i)^ABC") ; Returns "123" because a match was achieved via the case-insensitive option.
NewStr := RegExReplace("abcXYZ123", "abc(.*)123", "aaa$1zzz") ; Returns "aaaXYZzzz" by means of the $1 backreference.
NewStr := RegExReplace("abc123abc456", "abc\d+", "", ReplacementCount) ; Returns "" and stores 2 in ReplacementCount.
[Edit 2]: Making good progress!
strText = "BANDAGE, ADHESIVE, 2 FT X 3.5 IN X 0.25MM, LATEX-FREE"
strResult = RegExReplace(strText, "(,|\s+)", " ", True)
strResult = RegExReplace(strResult, "\s+(IN|FT|YD)\s+", "$1 ", True)
strResult = RegExReplace(strResult, "\s+X\s+", "_X_", True)
Produces:
BANDAGE ADHESIVE 2FT_X_3.5IN_X_0.25MM LATEX-FREE
Some regexps that might be useful:
/\s+IN/IN/
/\s+X\s+/_X_/
/(?:\d)X(?:\d)/_X_/

Splitting a string based on positions with regex

I need to convert this (date) String "12112014" to "12.11.2014"
What i would like to to is:
Split first 2 Strings "12", add ".",
then split the string from 3-4 to get "11", add "."
at the end split the last 4 strings (or 5-8) to get "2012"
I already found out how to get the first 2 characters ( "^\d{2}" ), but I failed to get characters based on a position.
Whatever be the programming language, You should try to extract the digits from string and then join them with a ".".
In perl, it can be done as :
$_ = '12112014';
s/(\d{2})(\d{2})(\d{4})/$1.$2.$3/;
print "$_";
Without you specifying the language you're after, I've picked javascript:
var s = '12012011';
var s2 = s.replace(/(\d{2})(\d{2})(\d{4})/,'$1.$2.$3'));
console.log(s2); // prints "12.01.2011"
The gist of it is that you use () to specify groups inside your regular expression and then can use the groups in your replace expression.
Same in Java:
String s = "12012011";
String s2 = s.replaceAll("(\\d{2})(\\d{2})(\\d{4})", "$1.$2.$3");
System.out.println(s2);
I dont think that you could do that only with split.
You could expand your expression to:
"(^(\d{2})(\d{2})(\d{4}))"
Then access the groups with the Regex language of your choice and build the string you want.
Note that - besides all regex learning - alternatively you could always parse the original string into strongly typed Date or DateTime variables and output the value using the appropriate locales.

Find & Replace with a wildcard in Xcode 4

I'm trying to find all instances of text inside "" marks with a semi-colon directly after them, and replace the text inside the "" marks. So, for example:
"FirstKey" = "First value";
"SecondKey" = "Second value";
"ThirdKey" = "Third value";
Would find only those values after the equals signs, and could replace them all (with a single string) at once, like so:
"FirstKey" = "BLAH";
"SecondKey" = "BLAH";
"ThirdKey" = "BLAH";
How can I do this? I found some stuff referring to regular expressions in Xcode 3, but such functionality seems either gone or hidden in Xcode 4.
Regular expression replace is still available in Xcode 4. Use "Replace" and set style to "Regular Expression", use "([^"]*)"; as pattern and replace with "BLAH";.