I am having issues matching regex to pull the follow out of the text below.
23d63443-47d5-4b19-9fce-5a0b151526a0
Output will always look like below but what I'm looking to match above varies slightly.
"C:\Program Files\ScreenConnect Client (bd5ecacad274bdc6)\Elsinore.ScreenConnect.ClientService.exe" "?e=Access&y=Guest&h=screenconnect.com&p=8041&s=23d63443-47d5-4b19-9fce-5a0b151526a0&k=BgIAAACkAABSU0ExAAgAAAEAAQDvDCdQGcu%2fuKP5cPvdclGMBYhhdI0zIC3oNwkJnNmUCbrd%2bAgugzNThBGHoR8mu30zR6nYVJbqYrtjMgxvhC7b2MJptUanf5mLh%2fMpmdQE1rGMtTqCWDH%2fpXQa4DN5QUbz66UcJ%2bdpCQ5TUax8oSw%2fX1I2x1llgax4jCk%2fWc6%2fpcj3JQIODej0z85X%2f1LJhELki2eNcD1QMMN0t%2fR7GZICw7HlL%2ftqOnZyF%2fnr9d62LQQ37n4L5Ra9S5VDk1B9V8umOx9aTkeXuhcRE88e6uGXkuNSQfXjqaAlwSV1xNkPJA8aJvS%2bkkMSNCWfi5chKhGyU4CXaldWPDcsPpA05XKw&t=&c=&c=&c=&c=&c=&c=&c=&c="
How to achieve this?
You can use the following regex to capture what you want:
&s=([^&]+)&
&s= matches the literal characters &s=, [^&]+ (a character class) matches any character other than & one or more times. It is enclosed in a pair of parentheses (a capturing group), meaning the matched text is captured to group 1 (as it's the first pair of parentheses in the regex).
Visualization:
Group 1 will contain the string you're looking for.
RegEx Demo
Related
I have strings which look like this:
/xxxxx/xxxxx-xxxx-xxxx-338200.html
With my regex:
(?<=-)(\d+)(?=\.html)
It matches just the numbers before .html.
Is it possible to write a regex that matches everything that surrounds the numbers (matches the .html part and the part before the numbers)?
In your current pattern you already use a capturing group. In that case you might also match what comes before and after instead of using the lookarounds
-(\d+)\.html
To get what comes before and after the digits, you could use 2 capturing groups:
^(.*-)\d+(\.html)$
Regex demo
In the replacement use the 2 groups.
This should do the job:
.*-\d+\.html
Explanation: .* will match anything until -\d+ say it should match a - followed by a sequence of digits before a \.html (where \. represents the character .).
To capture groups, just do (.*-)(\d+)(\.html). This will put everything before the number in a group, the number in another group and everything after the number in another group.
I have this regex pattern which I made myself (I'm a noob though, and made it through following tutorials):
^([a-z0-9\p{Greek}].*)\s(Ε[0-9\p{Greek}]+|Θ)\s[\(]([a-z1-9\p{Greek}]+.*)[\)]\s-\s([a-z0-9\p{Greek}]+$)
And I'm trying to match the following sentences:
ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ Ε2 (Ε.Β.Δ.) - ΔΗΜΗΤΡΙΟΥ
ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ 1 Θ (ΑΜΦ) - ΜΑΣΤΟΡΟΚΩΣΤΑΣ
ΕΙΣΑΓΩΓΗ ΣΤΗΝ ΠΛΗΡΟΦΟΡΙΚΗ Θ (ΑΜΦ) - ΒΟΛΟΓΙΑΝΝΙΔΗΣ
And so on.
This pattern splits the string into 4 parts.
For example, for the string:
ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ Ε2 (Ε.Β.Δ.) - ΔΗΜΗΤΡΙΟΥ
The first match is: ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ (Subject's Name)
Second match is: Ε2 (Class)
Third match is: Ε.Β.Δ. (Room)
And the forth match is: ΔΗΜΗΤΡΙΟΥ (Teacher)
Now in some entries E*/Θ is not defined, and I want to get the 3 matches without the E*/Θ. How should I modify my pattern so that (Ε[0-9\p{Greek}]+|Θ) is an optional match?
I tried ? so far, but because in my previous matches i'm defining \s and \s it requires 2 whitespaces to get 3 matches and i only have one in my string.
I think you need to do two things:
Make .* lazy (i.e. .*?)
Enclose (?:\s(Ε[0-9\p{Greek}]+|Θ))? with a non-capturing optional group.
The regex will look like
^([a-z0-9\p{Greek}].*?)(?:\s(Ε[0-9\p{Greek}]+|Θ))?\s[\(]([a-z1-9\p{Greek}]+.*)[\)]\s-\s([a-z0-9\p{Greek}]+)$
^^ ^^ ^
See demo
If you do not make the first .* lazy, it will eat up the second group that is optional. Making it lazy will ensure that if there is some text that can be matched by the second capturing group, it will be "set".
Note you call capture groups matches, which is wrong. Matches are whole texts matched by the entire regular expression and captures are just substrings matched by parts of regexp enclosed in unescaped round brackets. See more on capture groups at regular-expressions.info.
You can use something like:
(E[0-9\p{Greek}]+|0)?
The whole group will be optional (?).
Disclaimer: I'm new to writing regular expressions, so the only problem may be my lack of experience.
I'm trying to write a regular expression that will find numbers inside of parentheses, and I want both the numbers and the parentheses to be included in the selection. However, I only want it to match if it's at the beginning of a string. So in the text below, I would want it to get (10), but not (2) or (Figure 50).
(10) Joystick Switch - Contains control switches (Figure 50)
Two (2) heavy lifting straps
So far, I have (\(\d+\)) which gets (10) but also (2). I know ^ is supposed to match the beginning of a string (or line), but I haven't been able to get it to work. I've looked at a lot of similar questions, both here and on other sites, but have only found parts of solutions (finding things inside of parentheses, finding just numbers at the beginning for a string, etc.) and haven't quite been able to put them together to work.
I'm using this to create a filter in a CAT tool (for those of you in translation) which means that there's no other coding languages involved; essentially, I've been using RegExr to test all of the other expressions I've written, and that's worked fine.
The regex should be
^\(\d+\)
^ Anchors the regex at the start of the string.
\( Matches (. Should be escaped as it has got special meaning in regex
\d+ Matches one or more digits
\) Matches the )
Capturing brackets like (\(\d+\)) are not necessary as there are no other characters matched from the pattern. It is required only when you require to extract parts from a matched pattern
For example if you like to match (50) but to extract digits, 50 from the pattern then you can use
\((\d+)\)
here the \d+ part comes within the captured group 1, That is the captured group 1 will be 50 where as the entire string matched is (50)
Regex Demo
Like so:
^\(\d+\)
^ anchor
Each of ( and ) are regex meta character, so they need to be escaped with \
So \( and \) match literal parenthesis.
( and ) captures.
\d+ match 1 or more digits
Demo
I am trying to match with the following regex.
\d{11}(.*)
Which is any 11 digits followed by a string. I want to extract the tailing string whatever it is.
I used RE2::FullMatch but it gives the first half (the 11 digits). How to get the sub-string matched with (.*) ?
string subStr
RE2::FullMatch("<sip:+19073381121#216.67.108.201:5060;user=phone>;npi=ISDN",(<sip:\+(\d{11}))(.*), &subStr);
I am trying to extract everything starting from # in above string. Basically I want what matches to (.*) but the above function returns <sip:+19073381121.
I am not very familiar with regex but I looked at different APIs to extract substrings and found this one usefull
Remove the extra capturing groups from your regular expression.
<sip:\+\d{11}(.*)
To get the sub string matched with (.*) use $1. That is the first capturing group that you specified with the brackets.
How do I find multiple matches that are (and can only be) separated from each other by whitespaces?
I have this regular expression:
/([0-9]+)\s*([A-Za-z]+)/
And I want each of the matches (not groups) to be surrounded by a whitespace or another match. If the condition is not fullfilled, the match should not be returned.
This is valid: 1min 2hours 3days
This is not: 1min, 2hours 3days (1min and 2hours should not be returned)
Is there a simpler way of finding a continuous sequence of matches (in Java preferably) than repeating the whole regex before and after the main one, checking if there is a whitespace, start/end of the string or another match?
I believe this pattern will meet your requirements (provided that only a single space character separates your alphanumeric tokens):
(?<=^|[\w\d]\s)([\w\d]+)(?=\s|$)
^^^^^^^^^^ ^^^^^^^ ^^^^
(2) (1) (3)
A capture group that contains an alphanumeric string.
A look-behind assertion: To the left of the capture group must be a) the beginning of the line or b) an alphanumeric character followed by a single space character.
A look-ahead assertion: To the right of the capture group must be a) a space character or b) the end of the line.
See regex101.com demo.
Here is some sample data that I included in the demo. Each bolded alphanumeric string indicates a successful capture:
1min 2hours 3days
1min, 2hours 3days
42min 4hours 2days
String text = "1min 2hours 3days";
boolean match = text.matches("(?:\\s*[0-9]+\\s*[A-Za-z]+\\s*)*");
This is basically looking for a pattern on your example. Then using * after the pattern its looking for zero or more occurrence of the pattern in text. And ?: means doesn't capture the group.
This will will also return true for empty string. If you don't want the empty string to be true, then change * into +
I've mananged to solve my problem by splitting the string using string.split("\\s+") and then matching the results to the pattern /([0-9]+)\s*([A-Za-z]+)/.
There is an error here the '' will match all characters and ignore your rest
/([0-9]+)\s([A-Za-z]+)/
Change to
/(\d+)\s+(\w+)/g
This will return an array of matches either digits or word characters. There is no need to always write '[0-9]' or '[A-Za-z]' the same thing can be said as '\d' match any 0 to 9 more can be found at this cheat sheet regular expression cheat sheet