Regex split by delimiter if no char before delimiter - regex

test;136;1234567890;Som/;e;Test;test /;123;qwertyuio;dfghjk
I need to split it to:
test, 136, 1234567890, Som/;e, Test, test /;123, qwertyuio, dfghjk
The delimiter is ";" but there is the case that this char can also be in the text, so in this case I add "/" before ";" in my code. However, I don't know how to exlude it from the regex search.
Thanks for help !

Use a negative look behind:
String parts = str.split("(?<!/);");
If you want to cater for a term that ends with a /, you could escape that too:
String parts = str.split("(?<!/);|(?<=//);");
If you want to allow terms to end with "/;", use an AST language parser.

I wrote regex to find everything excluding those delimiters:
[^((?<!/);|(?<=//);)]*

Related

How do I only find newlines in a Regex?

I have a regex which finds newlines encoded in strings as \n (not the actual newline character), but it also finds encoded escaped newlines (\\n), like before the word "anything" in the string below.
var rg = '/(\\n)/g'
var str = 'so you can do pretty much \\nanything you want with it. \n\nAt runtime Carota has'
How can I find all of the newlines and none of the escaped newlines?
Here is a link with an example. https://regexr.com/4fna7
You probably want a negative lookbehind. This is used to look for characters behind text, but not include it in the capture.
Your rewritten regex would look like:
(?<!\\)(\\n)
One way to do this is to begin with a negated character class to ensure you do not pick up the double backslash:
var rg = '/[^\\](\\n)/g'
var str = 'so you can do pretty much \\nanything you want with it. \n\nAt runtime Carota has'
I'm assuming you're using JavaScript. In which case you can simply use the Regex literals like so:
/\n/
That would match all newline characters. If you can't use a Regex literal, JS also offers a constructor which takes a string
new RegExp('\\n')
In order to match the \\n you will need to escape the backslash:
/\\n|\n/
with constructor:
new RegExp('\\\\n|\\n')
Hope that helps.

Lua pattern similar to regex positive lookahead?

I have a string which can contain any number of the delimiter §\n. I would like to remove all delimiters from a string, except the last occurrence which should be left as-is. The last delimiter can be in three states: \n, §\n or §§\n. There will never be any characters after the last variable delimiter.
Here are 3 examples with the different state delimiters:
abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n
I would like to remove all delimiters except the last occurrence.
So the result of gsub for the three examples above should be:
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n
Using regular expressions, one could use §\\n(?=.), which matches properly for all three cases using positive lookahead, as there will never be any characters after the last variable delimiter.
I know I could check if the string has the delimiter at the end, and then after a substitution using the Lua pattern §\n I could add the delimiter back onto the string. That is however a very inelegant solution to a problem which should be possible to solve using a Lua pattern alone.
So how could this be done using a Lua pattern?
str:gsub( '§\\n(.)', '%1' ) should do what you want. This deletes the delimiter given that it is followed by another character, putting this character back into to string.
Test code
local str = {
'abc§\\ndef§\\nghi\\n',
'abc§\\ndef§\\nghi§\\n',
'abc§\\ndef§\\nghi§§\\n',
}
for i = 1, #str do
print( ( str[ i ]:gsub( '§\\n(.)', '%1' ) ) )
end
yields
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n
EDIT: This answer doesn't work specifically for lua, but if you have a similar problem and are not constrained to lua you might be able to use it.
So if I understand correctly, you want a regex replace to make the first example look like the second. This:
/(.*?)§\\n(?=.*\\n)/g
will eliminate the non-last delimiters when replaced with
$1
in PCRE, at least. I'm not sure what flavor Lua follows, but you can see the example in action here.
REGEX:
/(.*?)§\\n(?=.*\\n)/g
TEST STRING:
abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n
SUBSTITUTION:
$1
RESULT:
abcdefghi\n
abcdefghi§\n
abcdefghi§§\n

How to match a specific character only if it is followed by string containing specific characters

I'm trying to replace slashes in a string, but not all of them - only the ones before first comma. To do that, I probably have to find a way to match only slashes being followed by string containing a comma.
Is it possible to do this using one regexp, i.e. without first splitting the string by commas?
Example input string:
Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4
What I want to get:
Abc1.Def2.Ghi3,/Dore1/Mifa2/Solla3,Sido4
I've tried some lookahead and lookbehind techniques with no effect, so currently to do this in e.g. Python I first split the data:
test = 'Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4'
strlist = re.split(r',', test)
result = ','.join([re.sub(r'\/', r'.', strlist[0])] + strlist[1:])
What I would prefer is to use a specific regexp pattern instead of Python-oriented solution though, so essentially I could have a pattern and replacement such that the following code would give me the same result:
result = re.sub(pattern, replacement, test)
Thanks for all regex-avoiding answers - I was wondering if I could do this using only regex (so e.g. I could use sed instead of Python).
item = 'Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4'
print item.replace("/", ".", item.count("/", 0, item.index(",")))
This will print what you need. Try to avoid regex wherever you can because they are slow.
You could do this with lookbehind expressions that look for both the beginning of the string and no comma. Or don't use re entirely.
s = 'Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4'
left,sep,right = s.partition(',')
sep.join((left.replace('/','.'),right))
Out[24]: 'Abc1.Def2.Ghi3,/Dore1/Mifa2/Solla3,Sido4'

REGEX how to split on: * , and the phrase "D/ST"?

I've used regex before and am familiar with string.split but I can't figure out how to split on the delimiters: * and , and the phrase, "D/ST".
when i do string.split("[,*|D/ST]+" with a pipe it just splits on the letter D.
Anyone do something like this before?
The reason your previous regex didn't work is because you're using a character class, which will match a single character of those. Instead, you should probably use grouping, which is separated by vertical bars:
(\*|\,|D\/ST)
Have you an example of the input string?
Try something like :
"(\,|\*|D\/ST)+"
No "OR" in an interval...

How to ignore whitespace in a regular expression subject string?

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/
While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.
Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.
You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE
If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.
This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.
The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.