Helping users write regular expressions [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am working on a project where there are formatting rules for certain text fields.
Users are able to edit these rules. Currently, the rules are regular expressions. The users see regular expressions as very intimidating and would like an easier more user friendly way to write them.
I have in mind some simple translation tool... where users could enter # for a digit... X for a letter, etc.. But I know that the day will come when they need more than letters and digits.
I wonder if a simple translation tool exists or if there is a better way to do this?
Thank you for reading, all suggestions and ideas are welcome.

Split string into single characters and apply
Replace each X with [a-zA-Z]
Replace each # with \d
Replace each ^ with \^
All others replace with [*], where * is the character you are replacing
Join all patterns into one final regex pattern
If you want to apply regex pattern to the entire string, add ^ at the beginning and $ at the end.

It should be fairly straightforward to create a tool such as you are proposing. Simply expand your placeholders into corresponding regex values:
var input= "XX##*"
var regex = input.replace('#', '[0-9]')
regex = regex.replace('X', '[a-zA-Z]')
regex = regex.replace('*', '.*')
# etc
Of course, you will have to define your placeholders and any other options you want to provide your users. You could also get creative and allow power users to enter a regex directly by surrounding input with / (or some other identifier).
var input= '/\d\d\w\w.*/'
var regex = undefined;
if (input.matches('^/(.*)/$')) {
regex = $1 (group 1 from regex)
} else {
regex = input.replace('#', '[0-9]')
regex = regex.replace('X', '[a-zA-Z]')
regex = regex.replace('*', '.*')
}
Of course, this is entirely a made up language to demonstration the solution... If you find a way to compile it, I would be most interested. :)

Related

Extract dynamic string with regex (PowerShell) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 days ago.
Improve this question
I have a long output string in PowerShell with all complex characters
this is part of it:
{host-up|rp-web1|/images/logos/Generic_Host.gif|0|276|0 CRITICAL service-critical|rp-web1|ssl_expiration_bitwarden|26186|0|0|1|0 2023/02/06 ....
"service-critical" is a fixed string and appears several more times in the string
"rp-web1|ssl_expiration_bitwarden" - this is a dynamic string which comes right after "service-critical"
I was not able to write a regex that managed to extract all the dynamic strings in the text
Of course I tried to use 3 pipes between the dynamic string but without success
I expect to get all dynamic string after "service-critical" like:
rp-web1|ssl_expiration_bitwarden
Use the .NET [regex]::Matches() method to extract all desired matches of a given regex.
# Sample input string with 2 "service-critical" groups.
$str=#'
{host-up|...|0 CRITICAL service-critical|rp-web1|ssl_expiration_bitwarden1|26186|0|0|1|0 2023/02/06 ....
{host-up|...|0 CRITICAL service-critical|rp-web2|ssl_expiration_bitwarden2|26186|0|0|1|0 2023/02/06 ....
'#
# Define the regex.
$regex = '(?<=\bservice-critical\|)[^|]+\|[^|]+'
# Find and report all matches
# -> 'rp-web1|ssl_expiration_bitwarden1', 'rp-web2|ssl_expiration_bitwarden2'
[regex]::Matches($str, $regex).Value
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Unfortunately, use of PowerShell's -matchoperator (with subsequent inspection of automatic $Matches variable) is not an option in this case, because it invariably looks only for one match.
GitHub issue #7867 is a green-lit proposal to introduce a -matchall operator in order to support finding all matches.
You can try this one:
(?:service-critical\|)\K(.*?\|.*?)(?=\|)
Here is a demo
(?:service-critical\|) - looks for service-critial| phrase but without grouping it
\K - omits matching of found service-critial|, as that is the part you do not want to see in results
NOTE: as #mklement0 pointed out it won't work in .NET regex engine (which is used by powershell). You can skip it in this case and get matching with service-critical| phrase or use positive lookbehind structure which is shown in #mklement0 answer
(.*?\|.*?) - captures a group of any character (or no characters at all, separated by single| in non-greedy way
(?=\|) - assures it is followed by | (it is called positive lookahead)

How to get text out of a delimited string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm trying to extract a portion of a delimited string. The string is something like this:
272;#This is the text i want
I'd like to get everything after the "#". Anybody have any suggestions?
TL;DR
Language implementations matter. Not all languages support every regular expression operator or feature. There are some general approaches, though, such as zero-width assertions and capture groups.
Positive Lookbehind
Use a zero-width assertion to find the character preceding your string. For example, to capture just the text of interest using Ruby 2.0:
'272;#This is the text i want'.match /(?<=#).*/
pp $&
#=> "This is the text i want"
Capture Groups
Use capture and non-capture groups to match text, then extract the group you're interested in. For example, to capture your desired match in the first capturing group with Ruby 2.0:
'272;#This is the text i want'.match /(?:#)(.*)/
pp $1
#=> "This is the text i want"
You can use the regex #(.*) and extract the first capturing group - btw, what language are you trying to do this??
edit: if you can't access the capturing groups you can try lookbehind if it's supported by the engine:
(?<=#).*
Consider the following Regex...
(?<=#).*
Good Luck!
Try:
string text = "272;#";
int index = text.IndexOf("#");
string sub = text.Substring(index + 1);

Match everything in a string after the 3rd '/' [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have two different forms of a string:
https://anexample.com/things/stuff
and
https:///things/stuff
I need a regular expression that will match everything in the string after the 3rd slash, no matter the syntax. Once it hits the 3rd slash, the rest of the string is matched. I have found a bunch of examples, but I can't seem to tweak the right way to get it to work. Thanks in advance.
You can use this
^[^/]*/[^/]*/[^/]*/(.*)$
You can use this regex:
^(?:[^\/]*\/){3}(.*)$
And use matched group #1
In javascript:
var s = 'https:///things/stuff';
m = s.match(/^(?:[^\/]*\/){3}(.*)$/);
// m[1] => things/stuff
Assuming PCRE, and that you won't have newlines in your string:
If the 3 slashes can be at any position (like your first example):
^[^/]/[^/]*/[^/]*/(.*)$
This could also be expressed as
^(:?[^/]*/){3}(.*)$
Using positive lookbehind, you could use the following, which should only match what you want instead of putting it into a capturing group:
(?<=^(:?[^/]*/){3}).*$
Any needed escaping due to used delimiters is left as an exercise to the reader of course ( if you use / as a delimiter, you have to escape all / in the expression, like \/)
And there's probably a million other alternatives, depending on what exact needs you have besides the ones you mentioned.
Something like this should work, however I'm writting it without any testing, but it should look for three sections of any character any number of times followed by slash and then catch last section which is everything until line end - you can of course change delimiter to whitespace or whatever.
^.*/{3}(.*$)

Regex expression to replace word before search pattern [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I am very confused how to replace a word before pattern ".ext".
example :
Before Replace : abcd.ext.com
After Replace : customer.ext.com
You can use something like [^.]+(?=\.) as the match and replace it by customer.
(?=\.) is a positive lookahead which will match when there is a dot following the part before, but it won't match any characters on its own.
E.g. in C# you can use
Regex.Replace(foo, #"[^.]+(?=\.)", "customer");
If you're doing this in C# then I would recommend you just doing something like this:
var newFileName = fileName.Replace(Path.GetFileName(fileName), "newFileNameValue");
If it's in VB.NET, it would look almost exactly the same:
Dim newFileName As String = fileName.Replace(Path.GetFileName(fileName), "newFileNameValue")
You can use a Regex, but it's probably a little overkill and less stable. See, when building a Regex you have to break it down to a really abstract level. You need to handle every extension that's in your domain and that list can grow pretty quickly. So then it's generally not feasible to include those extensions in the Regex itself.
To further add to the problem, a valid file name might be something like this, MyFile.v1.l1.ext1.txt. The extension of that file is .txt, but grabbing that with a Regex is tough.
On Unix you can use sed like this:
echo "$str"|sed 's/abcd\(\.ext\)/customer\1/'
i.e. look for abcd immediately followed by .ext (capture this in a group). Then replace it with customer and match group #1 (.ext)
If you're using any other platform/language approach should be same.
perl
$x =~ s/(.*)(\.ext\.com)/customer$2/;

How to write this Regular Expression for this situation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to get the input String splitted by a colon. For example, a:int. I can use [^:]* to get the a and int.
However, I don't want the String to be split by any combination which includes colon, such as A:=3:command. What I want are the A:=3 and command but not A, =3, command.
Could someone tell me how to write the regular expression?
I'm going to assume, pending an edit by the OP, that the only colons that should appear in a split are those followed by simple ASCII letters or numbers. The solution can easily be generalized.
Here is a concrete example in JavaScript:
s = "x:=3:comment"
s.split(/:(?=[\s\w])/)
The result is
['x:=3','comment']
The split function says "split on colons that are followed by spaces or word characters (ASCII letters or numbers or underscores)".
Other languages have more powerful forms of lookaround (in particular negative lookarounds), but the basic idea is to construct a regex where the split value is a colon in a particular context.
ADDENDUM
Another example:
"this:has:(some%: 7colons:$:6)".split(/:(?=[\s\w])/)
produces:
['this','has:(some%',' 7colons:$','6')]
On the face of it, you want to split on the last colon in the string, so you want the trailing material to be a string of non-colons, and the preceding material to be anything. You also didn't specify (at the time I answered the question) which sub-species of regex you want (which language you are writing in), so you get Perl for my answer.
#!/usr/bin/env perl
use strict;
use warnings;
my #array = ( "a:int", "A:=3:comment" );
foreach my $item (#array)
{
my($prefix, $suffix) = $item =~ m/^(.*):([^:]+)$/;
print "$prefix and $suffix\n";
}
The output from that script is:
a and int
A:=3 and comment
Clearly, if the rule for the split is different (it isn't simply 'the last colon'), then the pattern will have to change. But this achieves the stated requirements reasonably cleanly.
In addition to Ray's answer, another option is to white-list the operators you support, for example, to support := (JavaScript example):
var s = "hello:world:=5:and:r";
var tokens = s.match(/(?:[^:]|:=)+/g);
For example, if you want the operators :=, =:, :=: and ::, you could write:
/(?:[^:]|:=|=:|:=:|::)+/g
(this can be simplified to, but I think it's easily maintainable).