I am pretty bad at regex and need some help implementing my idea with the already complicated if-else syntax being used for user-defined snippets in VS Code.
I want to achieve the following:
Whenever I enter a number for variable $1 I want the snippet to create the
text "MAXVALUE $1" at the placeholder positon
if anything else is entered, there should be printed nothing
My current line for this with $1 being the variable I enter is:
"\t ${1/([0-9])|([a-zA-Z])/${1:+MAXVALUE }${2:+ }/}"
At this state I can capture the entire number EXCEPT the FRIST CHARACTER being entered and I can print MAXVALUE _mynumber_minus_char_at_index_0, LOL?!
If I enter a text, MAXVALUE won't be printed, but again the value from $1 minus the character at index 0 is being printed on screen.
Any help would be highly appreciated. If you got some useful links that explain advanced snippet creation for those kinda cases, I would be thankful as well.
For RegEx, well, time to learn them, so why not starting with a crazy-ass example like this - at least for me it is like rocket-science atm :D
Thanks in advance and best regards.
Using this snippet:
"Maxvalue": {
"prefix": "cll",
"body": [
"\t ${1/([0-9]+)|([a-zA-Z]+)/${1:+MAXVALUE }$1${2:+ }/}",
],
"description": "maxvalue"
},
([0-9]+) captures all the numbers you type; or
([a-zA-Z]+) captures all the letters you type
You were using ([0-9]) which captures, but more importantly
matches only the first number. If you don't match something it will not be transformed by the snippet transform, it just remains
untouched. That is why you were seeing everything but the first
number in the output.
You weren't actually outputting $1 anywhere - you see I added it to the transform after the MAXVALUE conditional.
${1:+MAXVALUE } is a conditional which means if there is a capture group 1, do something, in this case output MAXVALUE. That 1 in ${1:+MAXVALUE } is not a reference to your $1 tabstop. It is only a reference to the first capture group of your regex.
So you correctly outputted MAXVALUE when you had a capture group 1, but you didn't follow that up by outputting capture group 1 anywhere.
{2:+ } is anther conditional where the 2 refers to a capture group 2, if any, here ([a-zA-Z]+). So if there is a capture group 2, a space will be output. If there is no capture group 2, the conditional will fail and provide no output of its own. If you want nothing printed if you type letters, then match it and do nothing with it. As in the following:
"\t ${1/([0-9]+)|[a-zA-Z]+/${1:+MAXVALUE }$1/}", this will match all the letters you type (before tabbing to complete the transform) and they will disappear because you matched them and then didn't output them in the transform part anywhere.
If you simply want those letters to remain, don't match them as in
"\t ${1/([0-9]+)/${1:+MAXVALUE }$1/}"
If there is something you don't understand let me know.
[By the way, your question title mentions if/else conditions but you are using only if conditionals.]
At the moment I am busy with a spreadsheet to analyse results per url. The problem is that when I want to make a list of unique urls the urls with a parameter behind it (for example '?fbads') will be seen as unique, instead of that I need these results to be blended together with the main url. See example below:
https://www.holidayguru.nl/deal/accommodatie/luxe-strandvakantie-in-ijmuiden-5e25ba62-e001-4072-8eb5-b6c3b0e7e66f/?fbclid=IwA
&
https://www.holidayguru.nl/deal/accommodatie/luxe-strandvakantie-in-ijmuiden-5e25ba62-e001-4072-8eb5-b6c3b0e7e66f/
Should both be: https://www.holidayguru.nl/deal/accommodatie/luxe-strandvakantie-in-ijmuiden-5e25ba62-e001-4072-8eb5-b6c3b0e7e66f/
I already fixed this with a formula but I need one list with all urls. So I'm look for two options. Or in the
=LEFT(A11,FIND("?",A11)-1)
That I use right now I need to find a way how I can say. If you don't find a '?' than just copy cell A11
Or...
I have to work with an if fuction to say, if A11 contains '?' than execute =left fuction otherwise use A11.
I can't manage to get the formula working. Demo sheet is down below :). Thanks!
Example spreadsheet
Delete everything from Sheet1!A:A (including the header) and place the following in Sheet1!A1:
=ArrayFormula({"UNIQUE URLS"; UNIQUE(FILTER(REGEXEXTRACT(URLs!A2:A,"[^\?]+"),URLs!A2:A<>""))})
This will create the header (which you can change as you like within the formula itself) and a unique list of URLs as determined only by the portion before a question mark (if a question mark exists) or to the end of the original URL.
For your reference, the expression [^\?]+ means "a string of the greatest length that can be extracted without containing a literal question mark."
[ ] = "any of the characters contained herein"
[^ ] = "not any of these characters"
\ = literal marker (i.e., whatever is next will be treated as a literal character)
\? = literal question mark (using the literal marker before the ? is necessary, since alone, the ? has a separate special meaning in REGEX-type expressions)
+ = "one or more of the preceding character or group of characters"
I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :
=REGEXEXTRACT(A1, "\d-")
My problem is that it only returns the first match... how can I get all matches?
Example text:
"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"
My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).
I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula.
Something like the "global - Don't return after first match" option on regex101.com
I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).
Any help appreciated!
Thanks :)
You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:
You may create your own custom function in the Script Editor:
function ExtractAllRegex(input, pattern,groupId) {
return [Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId])];
}
Or, if you need to return all matches in a single cell joined with some separator:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then, just call it like =ExtractAllRegex(A1, "\d-", 0, ", ").
Description:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.
Edit
I came up with more general solution:
=regexreplace(A1,"(.)?(\d-)|(.)","$2")
It replaces any text except the second group match (\d-) with just the second group $2.
"(.)?(\d-)|(.)"
1 2 3
Groups are in ()
---------------------------------------
"$2" -- means return the group number 2
Learn regular expressions: https://regexone.com
Try this formula:
=regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")
It will handle string like this:
"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"
with output:
1-2-2-2-3-
I wasn't able to get the accepted answer to work for my case. I'd like to do it that way, but needed a quick solution and went with the following:
Input:
1111 days, 123 hours 1234 minutes and 121 seconds
Expected output:
1111 123 1234 121
Formula:
=split(REGEXREPLACE(C26,"[a-z,]"," ")," ")
The shortest possible regex:
=regexreplace(A1,".?(\d-)|.", "$1")
Which returns 1-2-2-2-2-2-2-2-2-2-3-3- for "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq".
Explanation of regex:
.? -- optional character
(\d-) -- capture group 1 with a digit followed by a dash (specify (\d+-) multiple digits)
| -- logical or
. -- any character
the replacement "$1" uses just the capture group 1, and discards anything else
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
This seems to work and I have tried to verify it.
The logic is
(1) Replace letter followed by hyphen with nothing
(2) Replace any digit not followed by a hyphen with nothing
(3) Replace everything which is not a digit or hyphen with nothing
=regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","")
Result
1-2-2-2-2-2-2-2-2-2-3-3-
Analysis
I had to step through these procedurally to convince myself that this was correct. According to this reference when there are alternatives separated by the pipe symbol, regex should match them in order left-to-right. The above formula doesn't work properly unless rule 1 comes first (otherwise it reduces all characters except a digit or hyphen to null before rule (1) can come into play and you get an extra hyphen from "Patho-jour").
Here are some examples of how I think it must deal with the text
The solution to capture groups with RegexReplace and then do the RegexExctract works here too, but there is a catch.
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
If the cell that you are trying to get the values has Special Characters like parentheses "(" or question mark "?" the solution provided won´t work.
In my case, I was trying to list all “variables text” contained in the cell. Those “variables text “ was wrote inside like that: “{example_name}”. But the full content of the cell had special characters making the regex formula do break. When I removed theses specials characters, then I could list all captured groups like the solution did.
There are two general ('Excel' / 'native' / non-Apps Script) solutions to return an array of regex matches in the style of REGEXEXTRACT:
Method 1)
insert a delimiter around matches, remove junk, and call SPLIT
Regexes work by iterating over the string from left to right, and 'consuming'. If we are careful to consume junk values, we can throw them away.
(This gets around the problem faced by the currently accepted solution, which is that as Carlos Eduardo Oliveira mentions, it will obviously fail if the corpus text contains special regex characters.)
First we pick a delimiter, which must not already exist in the text. The proper way to do this is to parse the text to temporarily replace our delimiter with a "temporary delimiter", like if we were going to use commas "," we'd first replace all existing commas with something like "<<QUOTED-COMMA>>" then un-replace them later. BUT, for simplicity's sake, we'll just grab a random character such as from the private-use unicode blocks and use it as our special delimiter (note that it is 2 bytes... google spreadsheets might not count bytes in graphemes in a consistent way, but we'll be careful later).
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
"xyzSixSpaces:[ ]123ThreeSpaces:[ ]aaaa 12345",".*?( |$)",
"$1"
)
),
""
)
We just use a lambda to define temp="match1match2match3", then use that to remove the last delimiter into "match1match2match3", then SPLIT it.
Taking COLUMNS of the result will prove that the correct result is returned, i.e. {" ", " ", " "}.
This is a particularly good function to turn into a Named Function, and call it something like REGEXGLOBALEXTRACT(text,regex) or REGEXALLEXTRACT(text,regex), e.g.:
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
text,
".*?("®ex&"|$)",
"$1"
)
),
""
)
Method 2)
use recursion
With LAMBDA (i.e. lets you define a function like any other programming language), you can use some tricks from the well-studied lambda calculus and function programming: you have access to recursion. Defining a recursive function is confusing because there's no easy way for it to refer to itself, so you have to use a trick/convention:
trick for recursive functions: to actually define a function f which needs to refer to itself, instead define a function that takes a parameter of itself and returns the function you actually want; pass in this 'convention' to the Y-combinator to turn it into an actual recursive function
The plumbing which takes such a function work is called the Y-combinator. Here is a good article to understand it if you have some programming background.
For example to get the result of 5! (5 factorial, i.e. implement our own FACT(5)), we could define:
Named Function Y(f)=LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) ) (this is the Y-combinator and is magic; you don't have to understand it to use it)
Named Function MY_FACTORIAL(n)=
Y(LAMBDA(self,
LAMBDA(n,
IF(n=0, 1, n*self(n-1))
)
))
result of MY_FACTORIAL(5): 120
The Y-combinator makes writing recursive functions look relatively easy, like an introduction to programming class. I'm using Named Functions for clarity, but you could just dump it all together at the expense of sanity...
=LAMBDA(Y,
Y(LAMBDA(self, LAMBDA(n, IF(n=0,1,n*self(n-1))) ))(5)
)(
LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) )
)
How does this apply to the problem at hand? Well a recursive solution is as follows:
in pseudocode below, I use 'function' instead of LAMBDA, but it's the same thing:
// code to get around the fact that you can't have 0-length arrays
function emptyList() {
return {"ignore this value"}
}
function listToArray(myList) {
return OFFSET(myList,0,1)
}
function allMatches(text, regex) {
allMatchesHelper(emptyList(), text, regex)
}
function allMatchesHelper(resultsToReturn, text, regex) {
currentMatch = REGEXEXTRACT(...)
if (currentMatch succeeds) {
textWithoutMatch = SUBSTITUTE(text, currentMatch, "", 1)
return allMatches(
{resultsToReturn,currentMatch},
textWithoutMatch,
regex
)
} else {
return listToArray(resultsToReturn)
}
}
Unfortunately, the recursive approach is quadratic order of growth (because it's appending the results over and over to itself, while recreating the giant search string with smaller and smaller bites taken out of it, so 1+2+3+4+5+... = big^2, which can add up to a lot of time), so may be slow if you have many many matches. It's better to stay inside the regex engine for speed, since it's probably highly optimized.
You could of course avoid using Named Functions by doing temporary bindings with LAMBDA(varName, expr)(varValue) if you want to use varName in an expression. (You can define this pattern as a Named Function =cont(varValue) to invert the order of the parameters to keep code cleaner, or not.)
Whenever I use varName = varValue, write that instead.
to see if a match succeeds, use ISNA(...)
It would look something like:
Named Function allMatches(resultsToReturn, text, regex):
UNTESTED:
LAMBDA(helper,
OFFSET(
helper({"ignore"}, text, regex),
0,1)
)(
Y(LAMBDA(helperItself,
LAMBDA(results, partialText,
LAMBDA(currentMatch,
IF(ISNA(currentMatch),
results,
LAMBDA(textWithoutMatch,
helperItself({results,currentMatch}, textWithoutMatch)
)(
SUBSTITUTE(partialText, currentMatch, "", 1)
)
)
)(
REGEXEXTRACT(partialText, regex)
)
)
))
)
I've been trying to match the following:
First Group:Line1,
Line2,
..
LineX
Second Group:Some_Sample_text
With this query:
First Group:(?<first_group>.+\n*\n)Second Group:(?<second_group>.*)
My main goal is to capture any amount of lines between Line1 and LineX (because I can't anticipate how many there'll be), but since there's no option to match the end of files I'll probably need to use the "\n" tokens. I've also tried with IF and THEN statements but I just can't get it to work.
Any ideas appreciated.
Here, we might want to design an expression that'd just pass newlines, such as
First Group:([\s\S]*)Second Group:(.*)
First Group:([\d\D]*)Second Group:(.*)
First Group:([\w\W]*)Second Group:(.*)
Demo 1
and we'd expand it to,
First Group:([\s\S]*)Second Group:([\s\S]*)
First Group:([\d\D]*)Second Group:([\d\D]*)
First Group:([\w\W]*)Second Group:([\w\W]*)
If our second group would have had multiple lines.
Demo 2
Advice
The fourth bird advises that:
You could make the charachter class non greedy to prevent over matching ([\s\S]*?)
which then the expression would become,
First Group:([\s\S]*?)Second Group:([\s\S]*)
for instance.
Demo 3
I'm working on a Haxe project that will, when finished, allow users to code in Haxe by typing in plain English. Example input:
Create a number called "Foo".
Set Foo to 100.
Print Foo to the console.
Right now, I'm trying to use an EReg (also known as RegExp or Regex) object to convert most of the common point syntaxes into x,y, which is something the main conversion function can easily understand. Here's all of the syntax features I'd like to take into consideration, in some combination or another:
100,200
100, 200
(100,200)
[100, 200]
x:100,y:200
(X:100, Y:200)
[ X: 100, y: 200 ]
Each of these strings a should be evaluated by a Regex object r with r.replace(a, "$1,$2") to get 100,200. Basically, this includes:
Any amount of whitespace
An optional pair of brackets or parentheses
An optional upper- or lower-case x: and y:
But always two numbers separated by a comma.
I've gotten each of these features correct with some Regex or another, and I had everything except the "x:"/"y:" in the same one. But because I don't have much experience with Regex, I can't figure out how to evaluate all of these conditions at the same time. Is this possible within the bounds of Regex, and if so, how could I do this?
Thanks in advance!
Just try with:
[\(\[]? *([xX]: *)?(\d+), *(?:[yY]: *)?(\d+) *[\)\]]?
But I guess that it's not necessery to match the whole pattern. Just try to match digits, like:
(\d+), *(?:[yY]: *)?(\d+)