How to generate a regex to capture everything but curly brace contents - regex

I'm trying to match strings of any size NOT surrounded with { and } as in foo{bar} the regex should match foo but not {bar}.
The regexes I originally came up with were ^([^${].*[}$]) and ^(?=[{]).+(?<=[}]) but they don't seem to do what I expected them to do.

If you want to fetch all the characters that is not within {} then you can attempt an split operation using regex.
Split the string by this regex by using your preferred language:
{.*?}
The returned array should consist the segments that was found outside each {}
The following java example returns an array (arr):
String abc="19{22}33{44}55{66}7";
String[] arr=abc.split("\\{.*?\\}");
which contains:
["19","33","55","7"]

Related

Writing using Regular Expressions in Google Cloud query

I currently have a regular expression: REGEXP_EXTRACT_ALL(data, r'\"createdAt\"\:(.*?)\}')
Which finds "createAt":" and outputs anything past that text and up until the next "}".
Example output: {"_seconds":1620327345,"_nanoseconds":155071000
This works BUT I need the last } to be included in the output.
Preferred Output: {"_seconds":1620327345,"_nanoseconds":155071000}
How will I need to change my regular expression so that the } is included in the output?
You need to include the } into the capturing group:
REGEXP_EXTRACT_ALL(data, r'"createdAt":(.*?})')
Besides, you can make it a bit more efficient with a negated character class:
REGEXP_EXTRACT_ALL(data, r'"createdAt":([^}]*})')
With [^}]*, you match any zero or more chars other than } as many times as possible.
Also, if you chose single quotation marks as a string literal delimiter char, you need not escape double quotation marks (they are not special regex metacharacters.) Note } is not a special character if there is no paired { with a number (or {<number>,<number>) in front of it.

extracting a word between brackets using regex pattern

Am using this tutorial to create regular expression for one of my task with input string as:
[Begin] { (GetLatestCode)
Trying to extract string between brackets i.e. trying to extract GetLatestCode for which I made the following:
(?<=\[Begin\]\s{\s\()\w+(?=\)) //returns GetLatestCode
But this solution does not seem to work when I have multiple spaces around the curly brace.
[Begin] { (GetLatestCode) //does not work
If you need to account for 0 or more spaces, add a * after each space:
(?<=\[Begin\]\s*{\s*\()\w+(?=\))
If you need to account for 1 or more, use a +:
(?<=\[Begin\]\s+{\s+\()\w+(?=\))

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Parsing variables within a string using a Regular Expression

I've got a bit of a problem with regular expressions with ColdFusion.
I have a string:
Hi my name is {firstname}. and i live in {towncity} my email address is {email}
What I would like to know is how would I go about finding all strings, within my string, that are encased within a set of {} brackets? I would like to split all the matching strings into an array so I can use the results of query data.
Also is this a commonly used pattern for processing strings within matching strings for merging variable data ?
Any help greatly appreciated.
Simple Answer
To find all the brace-encased strings, you can use rematch and the simple expression \{[^{}]+\}
Explanation
The backslashes \ before each brace are to escape them, and have them act as literal braces (they carry special meaning otherwise).
The [^...] is a negative character class, saying match any single char that is NOT one of those contained within, and the greedy + quantifier tells it to match as many as possible, but at least one, from the preceding item.
Thus using [^{}]+ between the braces means it will not match nested or unmatched braces. (Whilst using \{.*?\} could match two opening braces. Note: the *? is a lazy quantifier, it matches nothing (if possible), but as many as required.)
Extended Answer
However, since you say that the results come from a query, a way to only match the values you're dealing with is to use the query's ColumnList to form an expression:
`\{(#ListChangeDelims(QueryName.ColumnList,'|')#)\}`
This changes ColumnList into a pipe-delimited list - a set of alternatives, grouped by the parentheses - i.e. the generated pattern will be like:
\{(first_name|towncity|email)\}
(with the contents of that group going into capture group 1).
To actually populate the text (rather than just matching) you could do something similar, except there is no need for a regex here, just a straight replace whilst looping through columns:
<cfloop index="CurColumn" list=#QueryName.ColumnList#>
<cfset text = replace( text , '{#CurColumn#}' , QueryName[CurColumn][CurrentRow] , 'all' ) />
</cfloop>
(Since this is a standard replace, there's no need to escape the braces with backslashes; they have no special meaning here.)
Use the reMatch(reg_expression, string_to_search) function.
The details on Regular Expressions in Coldfusion 10 are here. (I believe the regexp in CF8 would be roughly the same.)
Use the following code.
<cfset str = "Hi my name is {firstname}. And I live in {towncity} my email address is {email}.">
<cfoutput>Search string: <b>#str#</b><br />Search result:<br /></cfoutput>
<cfset ret = reMatch("\{[\w\s\(\)\+\.#-]+\}", str)>
<cfdump var ="#ret#">
This returns an array with the following entries.
{firstname}
{towncity}
{email}
The [] brackets in CF regular expressions define a character set to match a single character. You put + after the brackets to match one or more characters from the character set defined inside the []. For example, to match one or more upper case letters you could write [A-Z]+.
As detailed in the link above, CF defines shortcuts to match various characters. The ones I used in the code are: \w to match an alpha-numeric character or an underscore, \s to match a whitespace character (including space, tab, newline, etc.).
To match the following special characters +*?.[^$({|\ you escape them by writing backslash \ before them.
An exception to this is the dash - character, which cannot be escaped with a backslash. So, to use it as a literal simply place it at the very end of the character set, like I did above.
Using the above regular expression you can extract characters from the following string, for example.
<cfset str = "Hi my name is { John Galt}. And I live in {St. Peters-burg } my email address is {john#exam_ple.com}.">
The result would be an array with the following entries.
{ John Galt}
{St. Peters-burg }
{john#exam_ple.com}
There may be much better ways to do this, but using something like rematch( '{.*?}', yourstring ) would give you an array of all the matches.
For future reference, I did this with the excellent RegExr, a really nice online regex checker. Full disclosure, it's not specifically for ColdFusion, but it's a great way to test things out.

C# String Format Placeholders in Regular Expressions

I have a regular expression, defined in a verbatim C# string type, like so:
private static string importFileRegex = #"^{0}(\d{4})[W|S]\.(csv|cur)";
The first 3 letters, right after the regex line start (^) can be one of many possible combinations of alphabetic characters.
I want to do an elegant String.Format using the above, placing my 3-letter combination of choice at the start and using this in my matching algorithm, like this:
string regex = String.Format(importFileRegex, "ABC");
Which will give me a regex of ^ABC(\d{4})[W|S]\.(csv|cur)
Problem is, when I do the String.Format, because I have other curly braces in the string (e.g. \d{4}) String.Format looks for something to put in here, can't find it and gives me an error:
System.FormatException : Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
Anyone know how, short of splitting the strings out, I can escape the other curly braces or something to avoid the above error?
Try this (notice the double curly braces):
#"^{0}(\d{{4}})[W|S]\.(csv|cur)"