Can't use regular expression to match exact string - regex

Given a string below:
String s = "sschk##123456sschk##123456gme##100&200&300&1,2,3,4,5$6,7,8,9,0sschk##123456";
I apply a pattern, sschk##\\d+? or sschk##.+? want to get all sschk##123456 and replace them with an empty string. Please note that number after sschk## might different each time I got it, for example sschk##321321.
But I only got
[sschk##1, sschk##1, sschk##1]
What pattern should I apply to get exact each sschk##123456, so that I can do find and replace later.
Thanks a lot.

The problem with your regex was that you have used "?" marker which toggles the greediness of the "+" in your regex, so your regex "sschk##\d+?" means "a string sschk## followed by 1 or more numbers, but match as less digits as possible". Removing "?" would mean "a string sschk## followed by 1 or more numbers (match as much digits as possible)"
Your regex statement might look like this perhaps: sschk##\\d{6} and it would match a string "sschk##" followed by exactly 6 digits. If you want to match the string "sschk##" followed with variable length of digits, but not more than 6, you might use sschk##\\d{1,6}. If you need to match any number of digits after the string "sschk##" then use sschk##\\d+

I think I got it done.
Just apply the pattern like this
(sschk##\\d+)

Related

How to regex for alphanumeric pattern with specific string

Needle in haystack: Specific string followed by exact set of numbers
How can I search for ABC???? where ABC should be exactly that, but the ???? must be exactly four numbers, ideally followed by whitespace.
Illustrative examples:
LHRJFKABC1234 233 <-- Has needle
EABC123 LHRJFK <-- Does not have needle as only 3 numbers following ABC
Something tells me I need to search for string + something like (\d{4}) for the 4 numbers. But not sure quite how to puzzle it all together.
What I've found so far:
Regular expression to match standard 10 digit phone number
Regular Expression to match specific string followed by number?
For things like this I find an online checker like Rubular very handy.
Unless I'm misunderstanding, the regex ABC\d{4}\s should work for you. Do you need groupings (i.e. to match the 4-digit part)?
Try it out on Rubular here

Regex not select word with character at the end

I have a simple question.
I need a regular expression to match a hexdecimal number without colon at the end.
For example:
0x85af6b9d: 0x00256f8a ;some more interesting code
// dont match 0x85af6b9d: at all, but match 0x00256f8a
My expression for hexdecimal number is 0[xX][0-9A-Fa-f]{1,8}
Version with (?!:) is not possible, because it will just match 0x85af6b9 (because of the {1,8} token)
Using a $ also isn't possible - there can be more numbers than one
Thanks!
Here is one way to do so:
0[xX][0-9A-Fa-f]{1,8}(?![0-9A-Fa-f:])
See the online demo.
We use a negative lookahead to match all hexadecimal numbers without : at the end. Because of {1,8}, it is also necessary to ensure that the entire hexadecimal number is correctly matched. We therefore reuse the character set ([0-9A-Fa-f]) to ensure that the number does not continue.

regular expression for decimal with fixed total number of digits

Is there a way to write regular expression that will match strings like
(0|[1-9][0-9]*)\.[0-9]+
but with a specified number of numeric characters. for example: for 3 numeric characters it should match "0.12", "12.3" but not match "1.234" or "1.2". I know I can write it something like
(?<![0-9])(([0-9]{1}\.[0-9]{2})|([1-9][0-9]{1})\.[0-9]{1})(?![0-9])
but that becomes quite tedious for large number of digits.
(I know I don't need {1} but it better explains what I'm doing)
^(?=[\d.]{4}$)\d+\.\d+$
You can try this for 3 digits.Can be extended for more.See demo.
https://regex101.com/r/bN8dL3/4
or
\b(?=[\d.]{4}\b)\d+\.\d+\b
If you dont want anchors.
You can match them with adding alternatation:
\b(?:[0-9]\.[0-9]{2}|[1-9][0-9]\.[0-9])\b
Then, you won't need any start/end string/line anchors.
See demo

C# Regex match + next n characters

I'm new to Regex and i need to parse sourcecode from a website. Can anyone tell me what would be the syntax to match a word followed by the next n characters in the string.
Let's say I wanna match word "country" followed by the next 15 chars in the string.
If string would be "...<tr class="hover"><td>country</td><td>RO</td></t......" I need to get "country</td><td>RO" , I can deal with the string like this , ideally would be only "country RO " but I don't wanna ask for too much.
Something like: (country)<\/td><td>(\.\.)
Using $1 $2 as your output should give you what you need.
Explaination:
Putting the () brackets around something lets you back reference it with the $1, etc.
Otherwise you are able to match exact characters.
Note to escape special regex chars like / with a backslash
The second match in brackets is just matching the next two characters no matter what they are. If you have a subset these can be (i.e. [A-Za-z]) it would be better to use that
With that assumption I would use something like: (country)<\/td><td>([A-za-z]{2})
Also helps to find a good reference: http://www.regular-expressions.info/reference.html
Depending on your flavor of Regex engine:
"country.{15}"
Should match "country" exactly, followed by 15 characters of any kind.
It's worth noting that this is an exact match. If there aren't exactly 15 characters following the words "country" this match will fail. That could be problematic for you.
"country.{1,15}"
This will match "country" exactly followed by any character (up to 15). Again, this could also be problematic depending on your use case.

Regular expression to match last number in a string

I need to extract the last number that is inside a string. I'm trying to do this with regex and negative lookaheads, but it's not working. This is the regex that I have:
\d+(?!\d+)
And these are some strings, just to give you an idea, and what the regex should match:
ARRAY[123] matches 123
ARRAY[123].ITEM[4] matches 4
B:1000 matches 1000
B:1000.10 matches 10
And so on. The regex matches the numbers, but all of them. I don't get why the negative lookahead is not working. Any one care to explain?
Your regex \d+(?!\d+) says
match any number if it is not immediately followed by a number.
which is incorrect. A number is last if it is not followed (following it anywhere, not just immediately) by any other number.
When translated to regex we have:
(\d+)(?!.*\d)
Rubular Link
I took it this way: you need to make sure the match is close enough to the end of the string; close enough in the sense that only non-digits may intervene. What I suggest is the following:
/(\d+)\D*\z/
\z at the end means that that is the end of the string.
\D* before that means that an arbitrary number of non-digits can intervene between the match and the end of the string.
(\d+) is the matching part. It is in parenthesis so that you can pick it up, as was pointed out by Cameron.
You can use
.*(?:\D|^)(\d+)
to get the last number; this is because the matcher will gobble up all the characters with .*, then backtrack to the first non-digit character or the start of the string, then match the final group of digits.
Your negative lookahead isn't working because on the string "1 3", for example, the 1 is matched by the \d+, then the space matches the negative lookahead (since it's not a sequence of one or more digits). The 3 is never even looked at.
Note that your example regex doesn't have any groups in it, so I'm not sure how you were extracting the number.
I still had issues with managing the capture groups
(for example, if using Inline Modifiers (?imsxXU)).
This worked for my purposes -
.(?:\D|^)\d(\D)