"This is a piece of 432432\5321 text".
Numbers could be whatever long and also could be letters. How to get only 432432\5321 part of this?
Here is a sample:
(\d+\\\d+)
Group of digits followed by slash and followed by group of digits. Surrounding parenthesis is a capturing group.
Here is the fiddle: https://regex101.com/r/gI5rG4/2
EDIT:
I have missed that you also want letters. Then use \w instead of \d.
You can use the following example:
input = 'This is a piece of 432432\\5321 text'
print re.findall(r'(\d+(?:\\\d+)+)', input)
It can handle both input like 111\222, 111\222\333, etc.
Use \w for matching alphanumeric characters and \\for matching the backslash:
(\w+\\\w+)
This would match inputs like 32432\5321 as well those with letters in it, e.g. 32A1\BB1
Fiddle: https://regex101.com/r/yF2aX1/2
Related
Any ideas on how to remove all periods from a large text document, by using a regex on a text editor for the following example:
J. don't match
F.C. don't match
word. match
Word. match
WORD. match
Below regex matches multiple word characters or single non-capital string followed by .:
((\w{2,})|([^A-Z]))\.$
You can try this too,
(?<!(?<=^|[^A-Z])[A-Z])\.
Demo
You can try something like this: \w{2,}?\.
You can go to Regex101 and try it for yourself with more test strings to get the one you want. If you want to actually exclude the periods you can use a capturing group like so: (\w{2,}?)\.
I am quite stuck with a regex I can't get to work. It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
I have tried something like (?!\d|fiktiv).* on my sample string 123456788daswqrt fiktiv
https://regex101.com/r/kU8mF3/1
However this does match the fiktiv at the end as well.
One possibility would be to use a neglected character class, which can be used by putting a ^ in [] braces. So you basically say don't match digits, and as many non digits as you can get until a space occurs and the word fiktiv appears.
This capturing will be "saved" in the capturing group 1 for later use.
([^\d]+)\s+fiktiv
Testing could be done here:
https://regex101.com/
It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
So, you want to remove any character that is not a digit (that is, \D or [^0-9] pattern) and not a fiktiv char sequence.
You may use a regex with a capturing group and alternation:
(fiktiv)|[^0-9]
and replace with the contents of Group 1 using a $1 backreference, fiktiv, to restore it in the replaced string.
See the regex demo
C# implementation:
Regex.Replace(input, "(fiktiv)|[^0-9]", "$1")
Also, see Use RegEx in SQL with CLR Procs.
Which regex allows me to match characters and digits from String GIVEN_CHAR_VAL":"AKRONIS387226279863_NXUS0000000016092126"
I tried
GIVEN_CHAR_VAL":"(.*)"
but doesn't work correct.
Any ideas?
If you only want to match alphanumeric characters, use \w rather than .:
GIVEN_CHAR_VAL":"(\w*)"
Your suggested regex works for me, but have you tried:
GIVEN_CHAR_VAL":"(.*?)"
What do you actually want to match?
.* will give you the entire set AKRONIS387226279863_NXUS0000000016092126
\w+ as suggested above will do the same because it accepts '_'
If you are trying to match everything except the underscore try something more specific like [A-Z0-9]+ though you will end up with two matches because of the intervening underscore.
I have this code in VB.NET :
MessageBox.Show(Regex.Replace("Example 4.12.0.12", "\d", ""))
It removes/extracts numbers
I want also to remove dots
so I tried
MessageBox.Show(Regex.Replace("Example 4.12.0.12", "\d\.", ""))
but it keeps the numbers.
how to remove both (numbers & dots) from the string ?
thanks.
Try using a character group:
MessageBox.Show(Regex.Replace("Example 4.12.0.12", "[\d\.]", ""))
I'll elaborate since I inadvertently posted essentially the same answer as Steven.
Given the input "Example 4.12.0.12"
"\d" matches digits, so the replacement gives "Example ..."
"\d\." matches a digit followed by a dot, so the replacement gives "Example 112"
"[\d.]" matches anything that is a digit or a dot. As Steven said, it's not necessary to escape the dot inside the character group.
You need to create a character group using square brackets, like this:
MessageBox.Show(Regex.Replace("Example 4.12.0.12", "[\d.]", ""))
A character group means that any one of the characters listed in the group is considered a valid match. Notice that, within the character group, you don't need to escape the . character.
Can someone show me a regex to select #OnlinePopup_AFE53E2CACBF4D8196E6360D4DDB6B70 its okay to assume #OnlinePopup
~DCTM~dctm://aicpcudev/37004e1f8000219e?DMS_OBJECT_SPEC=RELATION_ID#OnlinePopup_AFE53E2CACBF4D8196E6360D4DDB6B70_11472026_1214836152225_6455280574472127786
NB: The following is .NET Regex syntax, modify for your flavour.
The following:
#[^_]+_[^_]+
will match:
Hash
One or more characters until an underscore
Underscore
One or more characters until an underscore
If the first bit is constant, and you want to be more specific you could use:
#OnlinePopup_[A-F0-9]+
This will match
OnlinePopup_ (exactly)
One or more hex characters until a non Hex character
Simply matching anything between the first '#' and the first or last '_' will not work for your example since the string that you want returned has an underscore in it. If all the text that you want to match has only one underscore in it, you could use this regex:
/(#[^_]+_[^_]+)/
This matches an octothorpe (#), followed by two strings that do not contain an underscore, seperated by a single underscore.
Something a little simpler:
(\#OnlinePopup_.*?)_
Assuming your text starts with # and ends with _