How to get text out of a delimited string [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm trying to extract a portion of a delimited string. The string is something like this:
272;#This is the text i want
I'd like to get everything after the "#". Anybody have any suggestions?

TL;DR
Language implementations matter. Not all languages support every regular expression operator or feature. There are some general approaches, though, such as zero-width assertions and capture groups.
Positive Lookbehind
Use a zero-width assertion to find the character preceding your string. For example, to capture just the text of interest using Ruby 2.0:
'272;#This is the text i want'.match /(?<=#).*/
pp $&
#=> "This is the text i want"
Capture Groups
Use capture and non-capture groups to match text, then extract the group you're interested in. For example, to capture your desired match in the first capturing group with Ruby 2.0:
'272;#This is the text i want'.match /(?:#)(.*)/
pp $1
#=> "This is the text i want"

You can use the regex #(.*) and extract the first capturing group - btw, what language are you trying to do this??
edit: if you can't access the capturing groups you can try lookbehind if it's supported by the engine:
(?<=#).*

Consider the following Regex...
(?<=#).*
Good Luck!

Try:
string text = "272;#";
int index = text.IndexOf("#");
string sub = text.Substring(index + 1);

Related

Extract dynamic string with regex (PowerShell) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 days ago.
Improve this question
I have a long output string in PowerShell with all complex characters
this is part of it:
{host-up|rp-web1|/images/logos/Generic_Host.gif|0|276|0 CRITICAL service-critical|rp-web1|ssl_expiration_bitwarden|26186|0|0|1|0 2023/02/06 ....
"service-critical" is a fixed string and appears several more times in the string
"rp-web1|ssl_expiration_bitwarden" - this is a dynamic string which comes right after "service-critical"
I was not able to write a regex that managed to extract all the dynamic strings in the text
Of course I tried to use 3 pipes between the dynamic string but without success
I expect to get all dynamic string after "service-critical" like:
rp-web1|ssl_expiration_bitwarden
Use the .NET [regex]::Matches() method to extract all desired matches of a given regex.
# Sample input string with 2 "service-critical" groups.
$str=#'
{host-up|...|0 CRITICAL service-critical|rp-web1|ssl_expiration_bitwarden1|26186|0|0|1|0 2023/02/06 ....
{host-up|...|0 CRITICAL service-critical|rp-web2|ssl_expiration_bitwarden2|26186|0|0|1|0 2023/02/06 ....
'#
# Define the regex.
$regex = '(?<=\bservice-critical\|)[^|]+\|[^|]+'
# Find and report all matches
# -> 'rp-web1|ssl_expiration_bitwarden1', 'rp-web2|ssl_expiration_bitwarden2'
[regex]::Matches($str, $regex).Value
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Unfortunately, use of PowerShell's -matchoperator (with subsequent inspection of automatic $Matches variable) is not an option in this case, because it invariably looks only for one match.
GitHub issue #7867 is a green-lit proposal to introduce a -matchall operator in order to support finding all matches.
You can try this one:
(?:service-critical\|)\K(.*?\|.*?)(?=\|)
Here is a demo
(?:service-critical\|) - looks for service-critial| phrase but without grouping it
\K - omits matching of found service-critial|, as that is the part you do not want to see in results
NOTE: as #mklement0 pointed out it won't work in .NET regex engine (which is used by powershell). You can skip it in this case and get matching with service-critical| phrase or use positive lookbehind structure which is shown in #mklement0 answer
(.*?\|.*?) - captures a group of any character (or no characters at all, separated by single| in non-greedy way
(?=\|) - assures it is followed by | (it is called positive lookahead)

$1 through $9 only in perl, can we go further? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have a requirement to match 12 numbers in a sequence but i am getting limited to 9th number. Is there any way to go beyond 9 matches ?
my string is something like
{"Column5": "Null", "Column4": "Null", "Column6": "Null", "Column1": "END", "Column3": "Null", "Column2": "Null"}
where columns are fixed but in place of Null there can be any sequence/characters.
I tried matching columns and subsequent strings but i have 12 matches whereas i am limited till only $9.
Any suggestions ?
You can easily put your matches into an actual array rather than relying on $1 and friends:
my #matches = $some_string =~ /(some) (regex) with (m)(a)(n)(y) (c)(a)(p)(t)(u)r(e)(s)/;
Or, as suggested in a comment, use a JSON parser if you're parsing JSON data. It will be more reliable than a quick regex-based solution.
Please use Dave Sherohman's suggestion about using a JSON parser, or at least use an actual array to store the matches.
Perl imposes no hard limit on the number of captures (or the limit is so high that no reasonable script would run into). The code in this answer and even the script in the question shows that you can refer to matched text in capturing group beyond 9 as usual, i.e. group 10 with $10, group 100 with $100.
(In case anyone is confused, $1, $10, ... are variables used outside the regex to refer to content of the capturing group. It's not syntax for backreference (e.g. \1, \10, ... or \g{1}, \g{10}, ...), which is used in the regex to match the same text captured by the capturing groups).

Match everything in a string after the 3rd '/' [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have two different forms of a string:
https://anexample.com/things/stuff
and
https:///things/stuff
I need a regular expression that will match everything in the string after the 3rd slash, no matter the syntax. Once it hits the 3rd slash, the rest of the string is matched. I have found a bunch of examples, but I can't seem to tweak the right way to get it to work. Thanks in advance.
You can use this
^[^/]*/[^/]*/[^/]*/(.*)$
You can use this regex:
^(?:[^\/]*\/){3}(.*)$
And use matched group #1
In javascript:
var s = 'https:///things/stuff';
m = s.match(/^(?:[^\/]*\/){3}(.*)$/);
// m[1] => things/stuff
Assuming PCRE, and that you won't have newlines in your string:
If the 3 slashes can be at any position (like your first example):
^[^/]/[^/]*/[^/]*/(.*)$
This could also be expressed as
^(:?[^/]*/){3}(.*)$
Using positive lookbehind, you could use the following, which should only match what you want instead of putting it into a capturing group:
(?<=^(:?[^/]*/){3}).*$
Any needed escaping due to used delimiters is left as an exercise to the reader of course ( if you use / as a delimiter, you have to escape all / in the expression, like \/)
And there's probably a million other alternatives, depending on what exact needs you have besides the ones you mentioned.
Something like this should work, however I'm writting it without any testing, but it should look for three sections of any character any number of times followed by slash and then catch last section which is everything until line end - you can of course change delimiter to whitespace or whatever.
^.*/{3}(.*$)

Regex expression to replace word before search pattern [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I am very confused how to replace a word before pattern ".ext".
example :
Before Replace : abcd.ext.com
After Replace : customer.ext.com
You can use something like [^.]+(?=\.) as the match and replace it by customer.
(?=\.) is a positive lookahead which will match when there is a dot following the part before, but it won't match any characters on its own.
E.g. in C# you can use
Regex.Replace(foo, #"[^.]+(?=\.)", "customer");
If you're doing this in C# then I would recommend you just doing something like this:
var newFileName = fileName.Replace(Path.GetFileName(fileName), "newFileNameValue");
If it's in VB.NET, it would look almost exactly the same:
Dim newFileName As String = fileName.Replace(Path.GetFileName(fileName), "newFileNameValue")
You can use a Regex, but it's probably a little overkill and less stable. See, when building a Regex you have to break it down to a really abstract level. You need to handle every extension that's in your domain and that list can grow pretty quickly. So then it's generally not feasible to include those extensions in the Regex itself.
To further add to the problem, a valid file name might be something like this, MyFile.v1.l1.ext1.txt. The extension of that file is .txt, but grabbing that with a Regex is tough.
On Unix you can use sed like this:
echo "$str"|sed 's/abcd\(\.ext\)/customer\1/'
i.e. look for abcd immediately followed by .ext (capture this in a group). Then replace it with customer and match group #1 (.ext)
If you're using any other platform/language approach should be same.
perl
$x =~ s/(.*)(\.ext\.com)/customer$2/;

Helping users write regular expressions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am working on a project where there are formatting rules for certain text fields.
Users are able to edit these rules. Currently, the rules are regular expressions. The users see regular expressions as very intimidating and would like an easier more user friendly way to write them.
I have in mind some simple translation tool... where users could enter # for a digit... X for a letter, etc.. But I know that the day will come when they need more than letters and digits.
I wonder if a simple translation tool exists or if there is a better way to do this?
Thank you for reading, all suggestions and ideas are welcome.
Split string into single characters and apply
Replace each X with [a-zA-Z]
Replace each # with \d
Replace each ^ with \^
All others replace with [*], where * is the character you are replacing
Join all patterns into one final regex pattern
If you want to apply regex pattern to the entire string, add ^ at the beginning and $ at the end.
It should be fairly straightforward to create a tool such as you are proposing. Simply expand your placeholders into corresponding regex values:
var input= "XX##*"
var regex = input.replace('#', '[0-9]')
regex = regex.replace('X', '[a-zA-Z]')
regex = regex.replace('*', '.*')
# etc
Of course, you will have to define your placeholders and any other options you want to provide your users. You could also get creative and allow power users to enter a regex directly by surrounding input with / (or some other identifier).
var input= '/\d\d\w\w.*/'
var regex = undefined;
if (input.matches('^/(.*)/$')) {
regex = $1 (group 1 from regex)
} else {
regex = input.replace('#', '[0-9]')
regex = regex.replace('X', '[a-zA-Z]')
regex = regex.replace('*', '.*')
}
Of course, this is entirely a made up language to demonstration the solution... If you find a way to compile it, I would be most interested. :)