Regex ot extract multiple substrings - regex

Example string containing one or more variables comma separated: TR.ASDASD, TU.IOHOUFHAF, XP.FWEFRWE .....
I need to use Regex to extract the characters before the . and end up with a string like this: TR, TU, XP
thanks in advance!

This regex works for what you need:
\..+?(?=,|$)
You need to substitute with nothing (an empty string).
This matches a ., then anything up to a comma or string end.
Example of it working: https://regex101.com/r/cV5hS2/1

Related

REGEX string extraction between underscore and file extension

I have a series of string like the following one:
abc_8g_1980_312.tif
from which I would like to extract the string '312' i.e. everything between the 3rd underscore and the file extension'.tif' string.
I'm trying using this website https://regex101.com/
inserting this regular expression: (\d{3})(\.tif$)
but I'm not getting what I would like to have.
Any suggestions would be appreciated.
To get the last 3 digits after an underscore with extension .tif you can also use lookarounds asserting _ to the left, and .tif to the right at the end of the string.
(?<=_)\d{3}(?=\.tif$)
Regex demo
Assuming what you want to capture would always be the last underscore-separated term in your file name, you could use:
(?<=_)[^_]+(?=\.)
Demo

Lua pattern similar to regex positive lookahead?

I have a string which can contain any number of the delimiter §\n. I would like to remove all delimiters from a string, except the last occurrence which should be left as-is. The last delimiter can be in three states: \n, §\n or §§\n. There will never be any characters after the last variable delimiter.
Here are 3 examples with the different state delimiters:
abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n
I would like to remove all delimiters except the last occurrence.
So the result of gsub for the three examples above should be:
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n
Using regular expressions, one could use §\\n(?=.), which matches properly for all three cases using positive lookahead, as there will never be any characters after the last variable delimiter.
I know I could check if the string has the delimiter at the end, and then after a substitution using the Lua pattern §\n I could add the delimiter back onto the string. That is however a very inelegant solution to a problem which should be possible to solve using a Lua pattern alone.
So how could this be done using a Lua pattern?
str:gsub( '§\\n(.)', '%1' ) should do what you want. This deletes the delimiter given that it is followed by another character, putting this character back into to string.
Test code
local str = {
'abc§\\ndef§\\nghi\\n',
'abc§\\ndef§\\nghi§\\n',
'abc§\\ndef§\\nghi§§\\n',
}
for i = 1, #str do
print( ( str[ i ]:gsub( '§\\n(.)', '%1' ) ) )
end
yields
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n
EDIT: This answer doesn't work specifically for lua, but if you have a similar problem and are not constrained to lua you might be able to use it.
So if I understand correctly, you want a regex replace to make the first example look like the second. This:
/(.*?)§\\n(?=.*\\n)/g
will eliminate the non-last delimiters when replaced with
$1
in PCRE, at least. I'm not sure what flavor Lua follows, but you can see the example in action here.
REGEX:
/(.*?)§\\n(?=.*\\n)/g
TEST STRING:
abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n
SUBSTITUTION:
$1
RESULT:
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n

RegEx to match string between delimiters or at the beginning or end

I am processing a CSV file and want to search and replace strings as long as it is an exact match in the column. For example:
xxx,Apple,Green Apple,xxx,xxx
Apple,xxx,xxx,Apple,xxx
xxx,xxx,Fruit/Apple,xxx,Apple
I want to replace 'Apple' if it is the EXACT value in the column (if it is contained in text within another column, I do not want to replace). I cannot see how to do this with a single expression (maybe not possible?).
The desired output is:
xxx,GRAPE,Green Apple,xxx,xxx
GRAPE,xxx,xxx,GRAPE,xxx
xxx,xxx,Fruit/Apple,xxx,GRAPE
So the expression I want is: match the beginning of input OR a comma, followed by desired string, followed by a comma OR the end of input.
You cannot put ^ or $ in character classes, so I tried \A and \Z but that didn't work.
([\A,])Apple([\Z,])
This didn't work, sadly. Can I do this with one regular expression? Seems like this would be a common enough problem.
It will depend on your language, but if the one you use supports lookarounds, then you would use something like this:
(?<=,|^)Apple(?=,|$)
Replace with GRAPE.
Otherwise, you will have to put back the commas:
(^|,)Apple(,|$)
Or
(\A|,)Apple(,|\Z)
And replace with:
\1GRAPE\2
Or
$1GRAPE$2
Depending on what's supported.
The above are raw regex (and replacement) strings. Escape as necessary.
Note: The disadvatage with the latter solution is that it will not work on strings like:
xxx,Apple,Apple,xxx,xxx
Since the comma after the first Apple got consumed. You'd have to call the regex replacement at most twice if you have such cases.
Oh, and I forgot to mention, you can have some 'hybrids' since some language have different levels of support for lookbehinds (in all the below ^ and \A, $ and \Z, \1 and $1 are interchangeable, just so I don't make it longer than it already is):
(?:(?<=,)|(?<=^))Apple(?=,|$)
For those where lookbehinds cannot be of variable width, replace with GRAPE.
(^|,)Apple(?=,|$)
And the above one for where lookaheads are supported but not lookbehinds. Replace with \1Apple.
This does as you wish:
Find what: (^|,)(?:Apple)(,|$)
Replace with: $1GRAPE$2
This works on regex101, in all flavors.
http://regex101.com/r/iP6dZ8
I wanted to share my original work-around (before the other answers), though it feels like more of a hack.
I simply prepend and append a comma on the string before doing the simpler:
/,Apple,/,GRAPE,/g
then cut off the first and last character.
PHP looks like:
$line = substr(preg_replace($search, $replace, ','.$line.','), 1, -1);
This still suffers from the problem of consecutive columns (e.g. ",Apple,Apple,").

Regular Expression to split a sentence into hyphenated words

I'm looking for a regular expression that will split a sentence into words, by using both spaces and hyphens as the character to split at. i.e. "This is over-done" should return 4 words (this, is, over, done)
I have the RegEx to do these separately but can't get it to work together:
To split on spaces:
\b(\S)(\S*)\b
and to split on hyphens:
\b([^-])([^-]*)\b
I have tried various ways to put these together but can't get it working. Any help appreciated.
This should work:
\b([^-\s]+)\b
What about:
(?:^|[\s-])?(\w+)(?:$|[\s-])?
Demo: http://rubular.com/r/WiRSwFPTXa

match chars, numeric and special chars within a pattern

within a string i could have the following:
this is a string ::foo:bar:: ::baz:123abc:: ::bäz:üéü:: ::#$%%:4/4::
how can i get all parts with starts with :: and ends with :: and match what is in between.
within those colons there are key, value pairs i need to filter out of the string.
if there wouldn't be special chars i the regex would look like this:
r'::([a-z0-9]+):([a-z0-9]+)::'
i could list those special chars manually but i don't think thats the right way to do this.
thx
With not-colon:
r'::([^:]+):([^:]+)::'
First you should mention the regex flavor/tool you'd like to use, but generally:
r'::([^:]+)::
Should capture the special chars as well.
HTH