Regex ot extract multiple substrings

Regex ot extract multiple substrings - regex

Example string containing one or more variables comma separated: TR.ASDASD, TU.IOHOUFHAF, XP.FWEFRWE .....
I need to use Regex to extract the characters before the . and end up with a string like this: TR, TU, XP
thanks in advance!

This regex works for what you need:
\..+?(?=,|$)
You need to substitute with nothing (an empty string).
This matches a ., then anything up to a comma or string end.
Example of it working: https://regex101.com/r/cV5hS2/1

Related

REGEX string extraction between underscore and file extension

I have a series of string like the following one:
abc_8g_1980_312.tif
from which I would like to extract the string '312' i.e. everything between the 3rd underscore and the file extension'.tif' string.
I'm trying using this website https://regex101.com/
inserting this regular expression: (\d{3})(\.tif$)
but I'm not getting what I would like to have.
Any suggestions would be appreciated.

To get the last 3 digits after an underscore with extension .tif you can also use lookarounds asserting _ to the left, and .tif to the right at the end of the string.
(?<=_)\d{3}(?=\.tif$)
Regex demo

Assuming what you want to capture would always be the last underscore-separated term in your file name, you could use:
(?<=_)[^_]+(?=\.)
Demo

Lua pattern similar to regex positive lookahead?

I have a string which can contain any number of the delimiter §\n. I would like to remove all delimiters from a string, except the last occurrence which should be left as-is. The last delimiter can be in three states: \n, §\n or §§\n. There will never be any characters after the last variable delimiter.
Here are 3 examples with the different state delimiters:
abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n
I would like to remove all delimiters except the last occurrence.
So the result of gsub for the three examples above should be:
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n
Using regular expressions, one could use §\\n(?=.), which matches properly for all three cases using positive lookahead, as there will never be any characters after the last variable delimiter.
I know I could check if the string has the delimiter at the end, and then after a substitution using the Lua pattern §\n I could add the delimiter back onto the string. That is however a very inelegant solution to a problem which should be possible to solve using a Lua pattern alone.
So how could this be done using a Lua pattern?

str:gsub( '§\\n(.)', '%1' ) should do what you want. This deletes the delimiter given that it is followed by another character, putting this character back into to string.
Test code
local str = {
'abc§\\ndef§\\nghi\\n',
'abc§\\ndef§\\nghi§\\n',
'abc§\\ndef§\\nghi§§\\n',
}
for i = 1, #str do
print( ( str[ i ]:gsub( '§\\n(.)', '%1' ) ) )
end
yields
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n

EDIT: This answer doesn't work specifically for lua, but if you have a similar problem and are not constrained to lua you might be able to use it.
So if I understand correctly, you want a regex replace to make the first example look like the second. This:
/(.*?)§\\n(?=.*\\n)/g
will eliminate the non-last delimiters when replaced with
$1
in PCRE, at least. I'm not sure what flavor Lua follows, but you can see the example in action here.
REGEX:
/(.*?)§\\n(?=.*\\n)/g
TEST STRING:
abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n
SUBSTITUTION:
$1
RESULT:
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n

RegEx to match string between delimiters or at the beginning or end

I am processing a CSV file and want to search and replace strings as long as it is an exact match in the column. For example:
xxx,Apple,Green Apple,xxx,xxx
Apple,xxx,xxx,Apple,xxx
xxx,xxx,Fruit/Apple,xxx,Apple
I want to replace 'Apple' if it is the EXACT value in the column (if it is contained in text within another column, I do not want to replace). I cannot see how to do this with a single expression (maybe not possible?).
The desired output is:
xxx,GRAPE,Green Apple,xxx,xxx
GRAPE,xxx,xxx,GRAPE,xxx
xxx,xxx,Fruit/Apple,xxx,GRAPE
So the expression I want is: match the beginning of input OR a comma, followed by desired string, followed by a comma OR the end of input.
You cannot put ^ or $ in character classes, so I tried \A and \Z but that didn't work.
([\A,])Apple([\Z,])
This didn't work, sadly. Can I do this with one regular expression? Seems like this would be a common enough problem.

It will depend on your language, but if the one you use supports lookarounds, then you would use something like this:
(?<=,|^)Apple(?=,|$)
Replace with GRAPE.
Otherwise, you will have to put back the commas:
(^|,)Apple(,|$)
Or
(\A|,)Apple(,|\Z)
And replace with:
\1GRAPE\2
Or
$1GRAPE$2
Depending on what's supported.
The above are raw regex (and replacement) strings. Escape as necessary.
Note: The disadvatage with the latter solution is that it will not work on strings like:
xxx,Apple,Apple,xxx,xxx
Since the comma after the first Apple got consumed. You'd have to call the regex replacement at most twice if you have such cases.
Oh, and I forgot to mention, you can have some 'hybrids' since some language have different levels of support for lookbehinds (in all the below ^ and \A, $ and \Z, \1 and $1 are interchangeable, just so I don't make it longer than it already is):
(?:(?<=,)|(?<=^))Apple(?=,|$)
For those where lookbehinds cannot be of variable width, replace with GRAPE.
(^|,)Apple(?=,|$)
And the above one for where lookaheads are supported but not lookbehinds. Replace with \1Apple.

This does as you wish:
Find what: (^|,)(?:Apple)(,|$)
Replace with: $1GRAPE$2
This works on regex101, in all flavors.
http://regex101.com/r/iP6dZ8

I wanted to share my original work-around (before the other answers), though it feels like more of a hack.
I simply prepend and append a comma on the string before doing the simpler:
/,Apple,/,GRAPE,/g
then cut off the first and last character.
PHP looks like:
$line = substr(preg_replace($search, $replace, ','.$line.','), 1, -1);
This still suffers from the problem of consecutive columns (e.g. ",Apple,Apple,").

Regular Expression to split a sentence into hyphenated words

I'm looking for a regular expression that will split a sentence into words, by using both spaces and hyphens as the character to split at. i.e. "This is over-done" should return 4 words (this, is, over, done)
I have the RegEx to do these separately but can't get it to work together:
To split on spaces:
\b(\S)(\S*)\b
and to split on hyphens:
\b([^-])([^-]*)\b
I have tried various ways to put these together but can't get it working. Any help appreciated.

This should work:
\b([^-\s]+)\b

What about:
(?:^|[\s-])?(\w+)(?:$|[\s-])?
Demo: http://rubular.com/r/WiRSwFPTXa

match chars, numeric and special chars within a pattern

within a string i could have the following:
this is a string ::foo:bar:: ::baz:123abc:: ::bäz:üéü:: ::#$%%:4/4::
how can i get all parts with starts with :: and ends with :: and match what is in between.
within those colons there are key, value pairs i need to filter out of the string.
if there wouldn't be special chars i the regex would look like this:
r'::([a-z0-9]+):([a-z0-9]+)::'
i could list those special chars manually but i don't think thats the right way to do this.
thx

With not-colon:
r'::([^:]+):([^:]+)::'

First you should mention the regex flavor/tool you'd like to use, but generally:
r'::([^:]+)::
Should capture the special chars as well.
HTH

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex ot extract multiple substrings - regex

Example string containing one or more variables comma separated: TR.ASDASD, TU.IOHOUFHAF, XP.FWEFRWE ..... I need to use Regex to extract the characters before the . and end up with a string like this: TR, TU, XP thanks in advance!

This regex works for what you need: \..+?(?=,|$) You need to substitute with nothing (an empty string). This matches a ., then anything up to a comma or string end. Example of it working: https://regex101.com/r/cV5hS2/1

Related

REGEX string extraction between underscore and file extension

Lua pattern similar to regex positive lookahead?

RegEx to match string between delimiters or at the beginning or end

Regular Expression to split a sentence into hyphenated words

match chars, numeric and special chars within a pattern

Categories

Resources