RegEx for capturing everything except numbers and one word - regex

I am quite stuck with a regex I can't get to work. It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
I have tried something like (?!\d|fiktiv).* on my sample string 123456788daswqrt fiktiv
https://regex101.com/r/kU8mF3/1
However this does match the fiktiv at the end as well.

One possibility would be to use a neglected character class, which can be used by putting a ^ in [] braces. So you basically say don't match digits, and as many non digits as you can get until a space occurs and the word fiktiv appears.
This capturing will be "saved" in the capturing group 1 for later use.
([^\d]+)\s+fiktiv
Testing could be done here:
https://regex101.com/

It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
So, you want to remove any character that is not a digit (that is, \D or [^0-9] pattern) and not a fiktiv char sequence.
You may use a regex with a capturing group and alternation:
(fiktiv)|[^0-9]
and replace with the contents of Group 1 using a $1 backreference, fiktiv, to restore it in the replaced string.
See the regex demo
C# implementation:
Regex.Replace(input‌​, "(fiktiv)|[^0-9]", "$1")
Also, see Use RegEx in SQL with CLR Procs.

Related

How to format allowence of multiple whitespaces between characters in Regex more compact?

I came up with this regEx to check if a IBAN is entered correctly into a field which also let's the user enter up to 4 whitespaces between character without causing an error.
^\s?\s?\s?\s?N\s?\s?\s?\s?\s?O\s?\s?\s?\s?([0-9a-zA-Z]\s?\s?\s?\s?){13}$
It works perfectly, but I want to get rid of the "\s?\s?\s?\s?" and format it more compact, I've tried [\s?]{4} but that doesn't work.
What's the correct way to shorten this up?
The system I work with doesn't allow me to use any Javascript, I can only put pure regEx definitions to control entry into the field.
thank you
You can shorten the repeating \s parts using a quantifier {0,4} to match 0-4 times a whitespace char and add an anchor $ to assert the end of the string to prevent a partial match.
If you don't need that value of the capturing group afterwards, you could make it non capturing (?: instead.
^\s{0,4}N\s{0,4}O\s{0,4}(?:[0-9a-zA-Z]\s{0,4}){13}$
Regex demo
If you don't want to match a newline, you could use [^\S\r\n]{0,4} instead of \s{0,4} but that would defeat the purpose of making the pattern smaller.

How can I match all instances of the first letter?

For example, for this string I want to match all A and a:
"All the apples make good cake."
Here's what I did: /(.)[^.]*\1*/ig
I started by getting the first character in the group, which can be any character: (.) Then I added [^.]* because I don't want to match any other character that isn't the first one. Finally I added \1* because I wanted to match the first character again. All other similar variations that I've tried don't seem to work.
The regex you are trying to build would capture very first character then any thing up to the same character as much as possible, using a negative lookahead (tempered dot):
(?i)(\w)(?:(?!\1).)*
Capturing group 1 holds the character you need. Try it on a live demo.
If regex engine supports \K match re-setter token then you can append it to the regex above to only match desired part:
(?i)(\w)(?:(?!\1).)*\K

Adjustment to this code to stop after finding two words

In my haste to get this working I failed to ask how to stop after the second word in my original post. Grab first 4 characters of two words RegEx
If I have Awesome Sauce Today I would like to have AwesSauc
The code in my first post will capture the first 4 characters of any word and combine them. so Awesome Sauce Today will become AwesSaucToda. I want it to stop capturing after the second word. So in my example Today would be ignored but it will still capture 4 characters of the first two words it encounters to create the new wor AwesSauc
You may still use the Replace Text action and use
Pattern: (?s)^\P{L}*(\p{L}{1,4})\p{L}*\P{L}+(\p{L}{1,4}).*
Replacement text: $1$2
See the regex demo.
The difference between this solution and the previous one is that the pattern is anchored at the start with ^, instead of a \W (that matches any non-word char) I am using a \P{L} that matches any non-letter char (adjust as you see fit), and to match the first and second word beginning, I am using 2 capturing groups now ((\p{L}{1,4})...(\p{L}{1,4})), hence two backreferences in the replacement pattern. The (?s) modifier makes the . pattern to match any char, including a newline. The .* at the end is necessary to remove the rest of the string after the necessary text is captured into the 2 capturing groups.

Regular expression to match non-integer values in a string

I want to match the following rules:
One dash is allowed at the start of a number.
Only values between 0 and 9 should be allowed.
I currently have the following regex pattern, I'm matching the inverse so that I can thrown an exception upon finding a match that doesn't follow the rules:
[^-0-9]
The downside to this pattern is that it works for all cases except a hyphen in the middle of the String will still pass. For example:
"-2304923" is allowed correctly but "9234-342" is also allowed and shouldn't be.
Please let me know what I can do to specify the first character as [^-0-9] and the rest as [^0-9]. Thanks!
This regex will work for you:
^-?\d+$
Explanation: start the string ^, then - but optional (?), the digit \d repeated few times (+), and string must finish here $.
You can do this:
(?:^|\s)(-?\d+)(?:["'\s]|$)
^^^^^ non capturing group for start of line or space
^^^^^ capture number
^^^^^^^^^ non capturing group for end of line, space or quote
See it work
This will capture all strings of numbers in a line with an optional hyphen in front.
-2304923" "9234-342" 1234 -1234
++++++++ captured
^^^^^^^^ NOT captured
++++ captured
+++++ captured
I don't understand how your pattern - [^-0-9] is matching those strings you are talking about. That pattern is just the opposite of what you want. You have simply negated the character class by using caret(^) at the beginning. So, this pattern would match anything except the hyphen and the digits.
Anyways, for your requirement, first you need to match one hyphen at the beginning. So, just keep it outside the character class. And then to match any number of digits later on, you can use [0-9]+ or \d+.
So, your pattern to match the required format should be:
-[0-9]+ // or -\d+
The above regex is used to find the pattern in some large string. If you want the entire string to match this pattern, then you can add anchors at the ends of the regex: -
^-[0-9]+$
For a regular expression like this, it's sometimes helpful to think of it in terms of two cases.
Is the first character messed up somehow?
If not, are any of the other characters messed up somehow?
Combine these with |
(^[^-0-9]|^.+?[^0-9])

Why do I get successful but empty regex matches?

I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?
The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')
\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.
\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/