MSBUILD RegexReplace get all text till 2nd last dot from end

MSBUILD RegexReplace get all text till 2nd last dot from end - regex

I am working with ToolsVersion="3.5".
I wanted to match from end of the string till 2-nd last dot (.).
For Example for given value 123.456.78.910.abcdefgh I wanted to get 910.abcdefgh only.
I tried with
<RegexReplace Input="$(big_number)" Expression="/(\w+\.\w+)$/gm" Replacement="$1" count="1">
<Output ItemName ="big_number_tail" TaskParameter="Output"/>
</RegexReplace>
But it is returning entire string only.
Any idea what went wrong ?

First of all, do not use a regex literal in a text attribute. When you define regex via strings, not code, regex literal notation (like /.../gm) is not usually used and in these cases / regex delimiters and g, m, etc. flags are treated as part of a pattern, and as a result, it never matches.
Besides, when you extract via replacing as here, you need to make sure you match the whole string with your pattern, and only capture the part you want to extract. Note you may have more than 1 capturing group, and then you could use $2, $3, etc. in the replacement.
You can use
<RegexReplace Input="$(big_number)" Expression=".*\.([^.]*\.[^.]*)$" Replacement="$1" count="1">
See the regex demo. Details:
.* - any zero or more chars other than line break chars, as many as possible
\. - a . char
([^.]*\.[^.]*) - Group 1 ($1 refers to this part): zero or more non-dot chars, a . char, and again zero or more chars other than dots
$ - end of string.

Related

How to match everything except strings between brackets? [duplicate]

I have a text in which I want to get only the hexadecimal codes.
Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x..
But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?

You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s) - same as . matches newline option ON
((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
\\x - a \\x
[a-fA-F0-9]{2} - 2 letters from a to f or digits
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.

try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1
and it should just leave the hex code
then after replace

If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.

^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character.
\G forces all matches to be contigous. In other words it matches the position at the end of the previous match.

Regex about url encoded string

Would like to write one regex to get the url encoded string in below line:
<topicref href="%E4%BA%B0.txt"/>
When I used a regex like (%[A-Z][0-9])+\.txt it only got %B0.txt. What can I do if I want to get the whole url encoded string such like %E4%BA%B0.txt.
Thanks a lot.

Proper URL encoding uses hex digits only, A-F not A-Z. The encoded URL could contain non-encoded characters anywhere. Also, you should escape the full stop.
((%[0-9A-F]{2}|[^<>'" %])+)\.txt
is a quick ad-hoc fix for your regex, though obviously for any production code, probably don't use a regex for this at all, or at the very least try a well-defined and properly tested URL regex like the one you can find in the HTTP RFC.
Putting the + quantifier outside the capturing parentheses will only return the last repetition. I added a second set of parentheses to put the quantifier inside the first capture group, which assumes you are doing something to extract the first capture group in particular. (If your regex dialect has non-capturing groups, you could change the second opening parenthesis to non-capturing, i.e. (?:.)

You need to change your regex to
([%\dA-Z]+)\.txt
([%\dA-Z]+) - Match %, digits and alphabets one or more time
\.txt - Match .txt
where as your regex means
(%[A-Z][0-9])+.txt
(%[A-Z][0-9])+
% - Match %
[A-Z] - Match A to Z one time
[0-9] - Match any digit one or more time
+ - Match the captured group one or more time
.txt - Match single character (anything except new line) followed by txt

Can't use ^ to say "all but"

I have a text in which I want to get only the hexadecimal codes.
Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x..
But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?

You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s) - same as . matches newline option ON
((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
\\x - a \\x
[a-fA-F0-9]{2} - 2 letters from a to f or digits
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.

try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1
and it should just leave the hex code
then after replace

If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.

^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character.
\G forces all matches to be contigous. In other words it matches the position at the end of the previous match.

RegEx for capturing everything except numbers and one word

I am quite stuck with a regex I can't get to work. It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
I have tried something like (?!\d|fiktiv).* on my sample string 123456788daswqrt fiktiv
https://regex101.com/r/kU8mF3/1
However this does match the fiktiv at the end as well.

One possibility would be to use a neglected character class, which can be used by putting a ^ in [] braces. So you basically say don't match digits, and as many non digits as you can get until a space occurs and the word fiktiv appears.
This capturing will be "saved" in the capturing group 1 for later use.
([^\d]+)\s+fiktiv
Testing could be done here:
https://regex101.com/

It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
So, you want to remove any character that is not a digit (that is, \D or [^0-9] pattern) and not a fiktiv char sequence.
You may use a regex with a capturing group and alternation:
(fiktiv)|[^0-9]
and replace with the contents of Group 1 using a $1 backreference, fiktiv, to restore it in the replaced string.
See the regex demo
C# implementation:
Regex.Replace(input‌, "(fiktiv)|[^0-9]", "$1")
Also, see Use RegEx in SQL with CLR Procs.

Regular expression to match non-integer values in a string

I want to match the following rules:
One dash is allowed at the start of a number.
Only values between 0 and 9 should be allowed.
I currently have the following regex pattern, I'm matching the inverse so that I can thrown an exception upon finding a match that doesn't follow the rules:
[^-0-9]
The downside to this pattern is that it works for all cases except a hyphen in the middle of the String will still pass. For example:
"-2304923" is allowed correctly but "9234-342" is also allowed and shouldn't be.
Please let me know what I can do to specify the first character as [^-0-9] and the rest as [^0-9]. Thanks!

This regex will work for you:
^-?\d+$
Explanation: start the string ^, then - but optional (?), the digit \d repeated few times (+), and string must finish here $.

You can do this:
(?:^|\s)(-?\d+)(?:["'\s]|$)
^^^^^ non capturing group for start of line or space
^^^^^ capture number
^^^^^^^^^ non capturing group for end of line, space or quote
See it work
This will capture all strings of numbers in a line with an optional hyphen in front.
-2304923" "9234-342" 1234 -1234
++++++++ captured
^^^^^^^^ NOT captured
++++ captured
+++++ captured

I don't understand how your pattern - [^-0-9] is matching those strings you are talking about. That pattern is just the opposite of what you want. You have simply negated the character class by using caret(^) at the beginning. So, this pattern would match anything except the hyphen and the digits.
Anyways, for your requirement, first you need to match one hyphen at the beginning. So, just keep it outside the character class. And then to match any number of digits later on, you can use [0-9]+ or \d+.
So, your pattern to match the required format should be:
-[0-9]+ // or -\d+
The above regex is used to find the pattern in some large string. If you want the entire string to match this pattern, then you can add anchors at the ends of the regex: -
^-[0-9]+$

For a regular expression like this, it's sometimes helpful to think of it in terms of two cases.
Is the first character messed up somehow?
If not, are any of the other characters messed up somehow?
Combine these with |
(^[^-0-9]|^.+?[^0-9])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MSBUILD RegexReplace get all text till 2nd last dot from end - regex

Related

How to match everything except strings between brackets? [duplicate]

Regex about url encoded string

Can't use ^ to say "all but"

RegEx for capturing everything except numbers and one word

Regular expression to match non-integer values in a string

Categories

Resources