I'm working on a Dart program that will parse data from an XML file. Unfortunately, the file is a bit of a mess, so I'm doing a ton of regex on it to get it into decent shape. Since Dart, like Javascript, doesn't have lookbehind functionality, I've been trying to use capturing parentheses along with the method to mimic lookbehind, to no avail. Dart doesn't like the syntax of it at all, just telling me that function is not defined for myClass
My XML file consists of strings in the following format:
<items>Chicken Sauces/ Creamy Curried Chicken Salad w/ Wild Rice</items>
When finished, I want that string to look like:
<items>Chicken Sauces| Creamy Curried Chicken Salad w/ Wild Rice</items>
I've tried calling this on the string: .replaceAll(new RegExp(r'(\w\s*)/(?=\s*\w)'),"$0|$1")
but that gives me errors saying "Expected an identifier" for $0 and $1. If anyone could offer me some pointers on how to properly use capturing parentheses, or a method to mimic lookbehind in Dart, along with the current lookahead, I'd be very grateful!
Assuming your regexp is correct you should use String.replaceAllMapped: .replaceAllMapped(new RegExp(r'(\w\s*)/(?=\s*\w)'), (match) => "${match[0]}|${match[1]}")
You might also be interested in String.splitMapJoin
Compeek already pointed out that the regexp probably won't do what you want, though.
Related
Regex noob here struggling with this, which I know it will be easy for some of you regex gods out there!
Given the following:
title: Some title
date: 2022-08-15
tags: <value to extract>
identifier: 1234567
---------------------------
Some text
some more text
I would like a regex to match everything except the value of tags (ie the "<value to extract>" text).
For context, this is supposed to run on emacs (in case it matters).
EDIT: Just to clarify as per #phils question, all I care about extracting the tags value. However, this is via a package setting that asks for a regex string and I don't have much control over how it gets use. It seems to expect a regex to strip what I don't need from the string rather than matching what I do want, which is slightly annoying.. Also, the since it seems to match everything with \\(.\\), I'm guessing it's using the global flag?
Please let me know if any of this isn't clear.
Emacs regular expressions can't trivially express "not foo" for arbitrary values of foo. (The likes of PCRE have non-regular extensions for zero-width negative look-ahead/behind assertions, but in Emacs that sort of functionality is generally done with the support of lisp code1.)
You can still do it purely with regexp matching, but it's simply very cumbersome. An Emacs regexp which matches any line which does not begin with tags: is:
^\(?:$\|[^t]\|t[^a]\|ta[^g]\|tag[^s]\|tags[^:]\).*
or if you need to enter it in the elisp double-quoted read syntax for strings:
"^\\(?:$\\|[^t]\\|t[^a]\\|ta[^g]\\|tag[^s]\\|tags[^:]\\).*"
1 In lisp code you would instead simply check each line to see whether it does start with tags: and, if so, skip it (which is why Emacs generally gets away without the feature you're looking for, but of course that doesn't help you here).
After playing around with it for a bit and taken inspiration from #phils' answer, I've come up with the following:
"^\\(?:\\(#\\+\\)?\\(?:filetags:\s+\\|tags:\s+\\|title:.*\\|identifier:.*\\|date:.*\\)\\|.*\\)"
I've also added an extra \\(#\\+\\)? to account for org meta keys which would usually have the format #+key: value.
I would like to write this regular expression pattern:
(?<=bring a ).*?(?=,)
in lua but have no idea on how one would do so. If someone could point me in the right direction that'd be appreciated!
Regular expression pattern explanation: it's supposed to grab anything in between bring a and ,
Lua patterns aren't as advanced as regex, so you can't write a pattern that does exactly what the regex you posted does. However, you can write a pattern that does what this regex does:
bring a (.*?),
The difference being that you get the result you want in a subgroup, instead of as the whole match. Here's how you do that:
bring a (.-),
And you can use it like this:
str = 'I will bring a ball, so if you bring a bat, then we can have batting practice.'
for s in str:gmatch('bring a (.-),') do
print(s)
end
That prints ball and bat.
I have a list of the following numbers and want a Regular expression that matches when a number is not in the list.
0,1,2,3,4,9,11,12,13,14,15,16,18,19,250
I have written the following REGEX statement.
^(?!.*(0|1|2|3|4|9|11|12|13|14|15|16|18|19|250)).*$
The problem is that it correctly gives a match for 5,6,7,8 etc but not for 17 or 251 for example.
I have been testing this on the online REGEX simulators.
This should resolve your issue..
^(?!\D*(0|1|2|3|4|9|11|12|13|14|15|16|18|19|250)\b).*$
In your earlier regex you were basically saying eliminate all numbers that start with 0/1/2/3/4/9!
So your original regex would actually match 54/623/71/88 but not the others. Also the 11-19 and 250 in the list were rendered useless.
Although as others have I would also recommend you to not use regex for this, as I believe it is an overkill and a maintenance nightmare!
Also an extra note "Variable length look arounds are very inefficient too" vs regular checks.
I would do \b\d+\b to get each number in the string and check if they are in your list. It would be way faster.
You can use the discard technique by matching what you do not want and capturing what you really want.
You can use a regex like this:
\b(?:[0-49]|1[1-689]|250)\b|(\d+)
Here you can check a working demo where in blue you have the matches (what you don't want) and in green the content you want. Then you have to grab the content from the capturing group
Working demo
Not sure what regex engine you are using, but here I created a sample using java:
https://ideone.com/B7kLe0
Is there a regular expression to match the some.prefix part of both of the following filenames?
xyz can be any character of [a-z0-9-_\ ]
some.prefix part can be any character in [a-zA-Z0-9-_\.\ ].
I intentionally included a . in some.prefix.
some.prefix.xyz.xyz
some.prefix.xyz
I have tried many combinations. For example:
(?P<prefix>[a-zA-Z0-9-_\.]+)(?:\.[a-z0-9]+\.gz|\.[a-z0-9]+)
It works with abc.def.csv by catching abc.def, but fail to catch it in abc.def.csv.gz.
I primarily use Python, but I thought the regex itself should apply to many languages.
Update: It's not possible, see discussion with #nowox below.
I think your regex works pretty well. I recommend you to trying regex101 with your example:
https://regex101.com/r/dV6cE8/3
The expression
^(?i)[ \w-]+\.[ \w-]+
Should work in your case:
som e.prefix.xyz.xyz
^^^^^^^^^^^
some.prefix.xyz
^^^^^^^^^^^
abc.def.csv.gz
^^^^^^^
And in Python you can use:
import re
text = """some.prefix.xyz.xyz
some.prefix.xyz
abc.def.csv.gz"""
print re.findall('^(?i)[ \w-]+\.[ \w-]+', text, re.MULTILINE)
Which will display:
['som e.prefix', 'some.prefix', 'abc.def']
I might think you are a bit confused about your requirement. If I summarize, you have a pathname made of chars and dot such as:
foo.bar.baz.0
foobar.tar.gz
f.o.o.b.a.r
How would you separate these string into a base-name and an extension? Here we recognize some known patterns .tar.gz is definitely an extension, but is .bar.baz.0 the extension or it is only .0?
The answer is not easy and no regexes in this World would be able to guess the correct answer at 100% without some hints.
For example you can list the acceptable extensions and make some criteria:
An extension match the regex \.\w{1,4}$
Several extensions may be concatenated together (\.\w{1,4}){1,4}$
The remaining is called the basename
From this you can build this regular expression:
(?P<basename>.*?)(?P<extension>(?:\.\w{1,4}){1,4})$
Try this[a-z0-9-_\\]+\.[a-z0-9-_\\]+[a-zA-Z0-9-_\.\\]+
I'm really hoping I'm doing something silly and just can't see the problem... this would be trivial in Perl or other languages. Apparently backreferences are supported in grok https://grokconstructor.appspot.com/RegularExpressionSyntax.txt, but I can't make them work. I need to match on something basic:
identifier - Static Text identifier Rest Of Line
So my grok expression would be something like:
%{DATA:id_name} - Static Text \1 %{GREEDYDATA:rest_of_line}
But using http://grokdebug.herokuapp.com/ always produces a compile error. If I use any of the \k notation, same thing. I've tried wrapping the first variable in parentheses, double backslashes, random permutations, can't make it work.
Any help would be much appreciated. Thanks!
I don't think that the %{DATA:id_name} produces a named capture that you can use with custom regex back references. Instead, you could wrap %{DATA} in a named capture and then back reference to it, like so:
(?<id_name>%{DATA}) - Static Text \k<id_name> %{GREEDYDATA:rest_of_line}