C# String Format Placeholders in Regular Expressions - regex

I have a regular expression, defined in a verbatim C# string type, like so:
private static string importFileRegex = #"^{0}(\d{4})[W|S]\.(csv|cur)";
The first 3 letters, right after the regex line start (^) can be one of many possible combinations of alphabetic characters.
I want to do an elegant String.Format using the above, placing my 3-letter combination of choice at the start and using this in my matching algorithm, like this:
string regex = String.Format(importFileRegex, "ABC");
Which will give me a regex of ^ABC(\d{4})[W|S]\.(csv|cur)
Problem is, when I do the String.Format, because I have other curly braces in the string (e.g. \d{4}) String.Format looks for something to put in here, can't find it and gives me an error:
System.FormatException : Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
Anyone know how, short of splitting the strings out, I can escape the other curly braces or something to avoid the above error?

Try this (notice the double curly braces):
#"^{0}(\d{{4}})[W|S]\.(csv|cur)"

Related

How to use regex to format input number as string?

I want to take a number input as a string and then strip redundant zeroes. I've done this using substring but was wondering if there's a way to do it using regex. For example, regex replace.
I only use a regex to check if the string is a valid number or not. I use substring and if conditions repeatedly for the strippings. I want to be able to convert say: "0012340.3200E6" to "12340.32E6" using regex.
rr("((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?");
You would be much better off converting the number into a float and then use string formatting, i.e. using
char buffer [50];
sprintf(buffer, "%E", ...)
)
or spritnf(buffer, "%fE%d", ..., ...) if for whatever you're doing this should work better.
Anyways, the regex should look something like
([+-]?)0*(\d+)(?:(\.\d*[1-9])|\.)0*(?:([Ee][+-]?)0*(\d+))?
(see regex101.com) and the substitution pattern would then be
$1$2$3$4$5
so you have everything in matching groups except the zeros and then you replace it with the content of all matching groups.
What's left is to escape the above expression and deal with the \d for digits – I've seen you're using [[:digit:]] instead.

Regular expression for matching the sentences with a specific symbol like %

I want a regular expression for matching the following sentence.
myfunc(L"try my number 8 and value%s ",value);
myfunc(L"but %s is not true",word);
myfunc(L"his name is %s ",name);
and so on .
but i don't want to match the sentences without % like below
myfunc(L"It is raining");
ie only the sentence having % should be matched.I tried the following patterns but it matchs sentences without % too.
myfunc[(L"(A-Z)*(a-z)*(0-9)*(%)+(a-z)+(A-Z)*(a-z)*(0-9)*(,)+(A-Z)*(a-z)*(0-9))]
myfunc[(%)+]
and
myfunc[(+(%)+)+]
You don't need a regular expression for this... or if you really feel you need to use one, all it needs to be, quite literally, is "%".
Why not try the following instead?
if '%' in myString:
## We have a match!
Edit for DSM's comment (and now that you have actually said that this question has nothing to do with Python): From your updates it looks like you actually want to match the whole thing, i.e. "func(...," with the percent sign in the first argument, which is a string. Try the following regex:
myfunc\(L\".*?%.*?\",[a-zA-Z]*\)
Or, to restrict the other characters in the first parameter string to alphanumerics and spaces, you could try this, which is probably a little more robust than the above:
myfunc\(L\"[a-zA-Z0-9\s]*%[a-zA-Z0-9\s]*\",[a-zA-Z]*\)
This will ensure that the whole string matches your function prototype shape, including the "L" before the string, the "%" in the string, and the second alphabetical argument before the closing bracket.

Parsing variables within a string using a Regular Expression

I've got a bit of a problem with regular expressions with ColdFusion.
I have a string:
Hi my name is {firstname}. and i live in {towncity} my email address is {email}
What I would like to know is how would I go about finding all strings, within my string, that are encased within a set of {} brackets? I would like to split all the matching strings into an array so I can use the results of query data.
Also is this a commonly used pattern for processing strings within matching strings for merging variable data ?
Any help greatly appreciated.
Simple Answer
To find all the brace-encased strings, you can use rematch and the simple expression \{[^{}]+\}
Explanation
The backslashes \ before each brace are to escape them, and have them act as literal braces (they carry special meaning otherwise).
The [^...] is a negative character class, saying match any single char that is NOT one of those contained within, and the greedy + quantifier tells it to match as many as possible, but at least one, from the preceding item.
Thus using [^{}]+ between the braces means it will not match nested or unmatched braces. (Whilst using \{.*?\} could match two opening braces. Note: the *? is a lazy quantifier, it matches nothing (if possible), but as many as required.)
Extended Answer
However, since you say that the results come from a query, a way to only match the values you're dealing with is to use the query's ColumnList to form an expression:
`\{(#ListChangeDelims(QueryName.ColumnList,'|')#)\}`
This changes ColumnList into a pipe-delimited list - a set of alternatives, grouped by the parentheses - i.e. the generated pattern will be like:
\{(first_name|towncity|email)\}
(with the contents of that group going into capture group 1).
To actually populate the text (rather than just matching) you could do something similar, except there is no need for a regex here, just a straight replace whilst looping through columns:
<cfloop index="CurColumn" list=#QueryName.ColumnList#>
<cfset text = replace( text , '{#CurColumn#}' , QueryName[CurColumn][CurrentRow] , 'all' ) />
</cfloop>
(Since this is a standard replace, there's no need to escape the braces with backslashes; they have no special meaning here.)
Use the reMatch(reg_expression, string_to_search) function.
The details on Regular Expressions in Coldfusion 10 are here. (I believe the regexp in CF8 would be roughly the same.)
Use the following code.
<cfset str = "Hi my name is {firstname}. And I live in {towncity} my email address is {email}.">
<cfoutput>Search string: <b>#str#</b><br />Search result:<br /></cfoutput>
<cfset ret = reMatch("\{[\w\s\(\)\+\.#-]+\}", str)>
<cfdump var ="#ret#">
This returns an array with the following entries.
{firstname}
{towncity}
{email}
The [] brackets in CF regular expressions define a character set to match a single character. You put + after the brackets to match one or more characters from the character set defined inside the []. For example, to match one or more upper case letters you could write [A-Z]+.
As detailed in the link above, CF defines shortcuts to match various characters. The ones I used in the code are: \w to match an alpha-numeric character or an underscore, \s to match a whitespace character (including space, tab, newline, etc.).
To match the following special characters +*?.[^$({|\ you escape them by writing backslash \ before them.
An exception to this is the dash - character, which cannot be escaped with a backslash. So, to use it as a literal simply place it at the very end of the character set, like I did above.
Using the above regular expression you can extract characters from the following string, for example.
<cfset str = "Hi my name is { John Galt}. And I live in {St. Peters-burg } my email address is {john#exam_ple.com}.">
The result would be an array with the following entries.
{ John Galt}
{St. Peters-burg }
{john#exam_ple.com}
There may be much better ways to do this, but using something like rematch( '{.*?}', yourstring ) would give you an array of all the matches.
For future reference, I did this with the excellent RegExr, a really nice online regex checker. Full disclosure, it's not specifically for ColdFusion, but it's a great way to test things out.

how to avoid to match the last letter in this regexp?

I have a quesion about regexp in tcl:
first output: TIP_12.3.4 %
second output: TIP_12.3.4 %
and sometimes the output maybe look like:
first output: TIP_12 %
second output: TIP_12 %
I want to get the number 12.3.4 or 12 using the following exgexp:
output: TIP_(/[0-9].*/[0-9])
but why it does not matches 12.3.4 or 12%?
You need to escape the dot, else it stands for "match every character". Also, I'm not sure about the slashes in your regexp. Better solution:
/TIP_(\d+\.?)+/
Your problem is that / is not special in Tcl's regular expression language at all. It's just an ordinary printable non-letter character. (Other languages are a little different, as it is quite common to enclose regular expressions in / characters; this is not the case in Tcl.) Because it is a simple literal, using it in your RE makes it expect it in the input (despite it not being there); unsurprisingly, that makes the RE not match.
Fixing things: I'd use a regular expression like this: output: TIP_([\d.]+) under the assumption that the data is reasonably well formatted. That would lead to code like this:
regexp {output: TIP_([0-9.]+)} $input -> dottedDigits
Everything not in parentheses is a literal here, so that the code is able to find what to match. Inside the parentheses (the bit we're saving for later) we want one or more digits or periods; putting them inside a square-bracketed-set is perfect and simple. The net effect is to store the 12.3.4 in the variable dottedDigits (if found) and to yield a boolean result that says whether it matched (i.e., you can put it in an if condition usefully).
NB: the regular expression is enclosed in braces because square brackets are also Tcl language metacharacters; putting the RE in braces avoids trouble with misinterpretation of your script. (You could use backslashes instead, but they're ugly…)
Try this :
output: TIP_(/([0-9\.^%]*)/[0-9])
Capture group 1.
Demo here :
http://regexr.com?31f6g
The following expression works for me:
{TIP_((\d+\.?)+)}

Can I shorten this regular expression?

I have the need to check whether strings adhere to a particular ID format.
The format of the ID is as follows:
aBcDe-fghIj-KLmno-pQRsT-uVWxy
A sequence of five blocks of five letters upper case or lower case, separated by one dash.
I have the following regular expression that works:
string idFormat = "[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}";
Note that there is no trailing dash, but the all of the blocks within the ID follow the same format. Therefore, I would like to be able to represent this sequence of four blocks with a trailing dash inside the regular expression and avoid the duplication.
I tried the following, but it doesn't work:
string idFormat = "[[a-zA-Z]{5}[-]{1}]{4}[a-zA-Z]{5}";
How do I shorten this regular expression and get rid of the duplicated parts?
What is the best way to ensure that each block does also not contain any numbers?
Edit:
Thanks for the replies, I now understand the grouping in regular expressions.
I'm running a few tests against the regular expression, the following are relevant:
Test 1: aBcDe-fghIj-KLmno-pQRsT-uVWxy
Test 2: abcde-fghij-klmno-pqrst-uvwxy
With the following regular expression, both tests pass:
^([a-zA-Z]{5}-){4}[a-zA-Z]{5}$
With the next regular expression, test 1 fails:
^([a-z]{5}-){4}[a-z]{5}$
Several answers have said that it is OK to omit the A-Z when using a-z, but in this case it doesn't seem to be working.
You can try:
([a-z]{5}-){4}[a-z]{5}
and make it case insensitive.
If you can set regex options to be case insensitive, you could replace all [a-zA-Z] with just plain [a-z]. Furthermore, [-]{1} can be written as -.
Your grouping should be done with (, ), not with [, ] (although you're correctly using the latter in specifying character sets.
Depending on context, you probably want to throw in ^...$ which matches start and end of string, respectively, to verify that the entire string is a match (i.e. that there are no extra characters).
In javascript, something like this:
/^([a-z]{5}-){4}[a-z]{5}$/i
This works for me, though you might want to check it:
[a-zA-Z]{5}(-[a-zA-Z]{5}){4}
(One group of five letters, followed by [dash+group of five letters] four times)
([a-zA-Z]{5}[-]{1}){4}[a-zA-Z]{5}
Try
string idFormat = "([a-zA-Z]{5}[-]{1}){4}[a-zA-Z]{5}";
I.e. you basically replace your brackets by parentheses. Brackets are not meant for grouping but for defining a class of accepted characters.
However, be aware that with shortened versions, you can use the expression for validating the string, but not for analyzing it. If you want to process the 5 groups of characters, you will want to put them in 5 groups:
string idFormat =
"([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})";
so you can address each group and process it.