I currently have a regular expression: REGEXP_EXTRACT_ALL(data, r'\"createdAt\"\:(.*?)\}')
Which finds "createAt":" and outputs anything past that text and up until the next "}".
Example output: {"_seconds":1620327345,"_nanoseconds":155071000
This works BUT I need the last } to be included in the output.
Preferred Output: {"_seconds":1620327345,"_nanoseconds":155071000}
How will I need to change my regular expression so that the } is included in the output?
You need to include the } into the capturing group:
REGEXP_EXTRACT_ALL(data, r'"createdAt":(.*?})')
Besides, you can make it a bit more efficient with a negated character class:
REGEXP_EXTRACT_ALL(data, r'"createdAt":([^}]*})')
With [^}]*, you match any zero or more chars other than } as many times as possible.
Also, if you chose single quotation marks as a string literal delimiter char, you need not escape double quotation marks (they are not special regex metacharacters.) Note } is not a special character if there is no paired { with a number (or {<number>,<number>) in front of it.
Related
I'm trying to capture a group from a string with ~, ~~ and ~~~ symbols. I was successful with extracting single symbols but it doesn't ignore the other occurrences in the string.
This is my code I tried experimenting with:
String f = '~the calculator is on and working~I entered 50 into the calculator'+
'~~I press add button~~holding equal button ~~~The result should be 50';
List<String>givens = f.split(RegExp(r'~+'));
List<String>whens = f.split(RegExp(r'~~+'));
List<String>thens = f.split(RegExp(r'~~~+'));
for(String ss in givens){
print(ss);
}
print('xxxxxxxxxxxx');
for(String ss in whens){
print(ss);
}
print('xxxxxxxxxxxx');
for(String ss in thens){
print(ss);
}
Which will result with:
The givens capture group also captured the ones with ~~ and ~~~ which is not intended.
The whens capture group also captured the ones single ~ which made it very confusing.
Lastly, the thens capture group also captured the others which is also not intended.
I only need to capture the strings starting with the specific pattern but will stop when they see a different one.
Example: givens should only capture 'the calculator is on and working' and 'I entered 50 into the calculator' only.
Any hints or help is greatly appreciated!
I think the problem is that you started off by splitting the string into pieces. But it might be easier to search for the elements with a pattern that will look for some text preceeded with either one, two or three ~ chars.
This can be done with regex positive lookbehind patterns.
Typically, if you want to find a string preceeded by one tild then you have to avoid that it matches if we have other tilds before it.
Find givens
(?<=(?:[^~]|^)~)[^~]+ would be the pattern to find only givens.
Test it here: https://regex101.com/r/9WLbM3/2
Explanation
[^~] means search for any character which is not a ~. This is because [abc] means any char which is in the list, so a, b or c. If you add the ^ char at the beginning of the list then it means "not these chars".
[^~]+ means search for one or multiple times a character which is not ~. This will capture phrases between the tilds.
A positive lookbehind is done with (?<=something present). We want to search for a tild so we would put (?<=~) as positive lookbehind. But the problem is that it will also match the ones with several tilds in front. To avoid that we can say that the tild should either be prefixed by ^ (meaning the beginning of a string) or by [^~] (meaning not a tild). To say "either this or that", we use the syntax (this|that|or even that). But using parenthesis will capture the content and we don't need that. To disable group capturing we can add ?: at the beginning of the group, leading finally to (?:[^~]|^) meaning either a non-tild char or the beginning of the string, without capturing it.
Find whens and thens
The regular expression is almost the same. It's just that we replace ~ by ~{2} or ~{3}.
Pattern for whens: (?<=(?:[^~]|^)~{2})[^~]+
Pattern for thens: (?<=(?:[^~]|^)~{3})[^~]+
I have data like this:
~10~682423~15~Test Data~10~68276127~15~More Data~10~6813~15~Also Data~
I'm trying to use Notepad++ to find and replace the values within tag 10 (682423, 68276127, 6813) with zeroes. I thought the syntax below would work, but it selects the first occurrence of the text I want and the rest of the line, instead of just the text I want (~10~682423~, for example). I also tried dozens of variations from searching online, but they also either did the same thing or wouldn't return any results.
~10~.*~
You can use: (?<=~10~)\d+(?=~) and replace with 0. This uses lookarounds to check that ~10~ precedes the digit sequence and the (?=~) ensures a ~ follows the digit sequence. If any character could be after the ~10~ field, use (?<=~10~)[^~]+(?=~).
The problem with ~10~.*~ is that the * is greedy, so it just slurps away matching any character and ~.
Use
\b10~\d+
Replace with 10~0. See proof. \b10~ will capture 10 as entire number (no match in 210 is allowed) and \d+ will match one or more digits.
I am working with government measures and am required to parse a string that contains variable information based on delimiters that come from issuing bodies associated with the fda.
I am trying to retrieve the delimiter and the value after the delimiter. I have searched for hours to find a regex solution to retrieve both the delimiter and the value that follows it and, though there seems to be posts that handle this, the code found in the post haven't worked.
One of the major issues in this task is that the delimiters often have repeated characters. For instance: delimiters are used such as "=", "=,", "/=". In this case I would need to tell the difference between "=" and "=,".
Is there a regex that would handle all of this?
Here is an example of the string :
=/A9999XYZ=>100T0479&,1Blah
Notice the delimiters are:
"=/"
"=>'
"&,1"
Any help would be appreciated.
You can use a regex like this
(=/|=>|&,1)|(\w+)
Working demo
The idea is that the first group contains the delimiters and the 2nd group the content. I assume the content can be word characters (a to z and digits with underscore). You have then to grab the content of every capturing group.
You need to capture both the delimiter and the value as group 1 and 2 respectively.
If your values are all alphanumeric, use this:
(&,1|\W+)(\w+)
See live demo.
If your values can contain non-alphanumeric characters, it get complicated:
(=/|=>|=,|=|&,1)((?:.(?!=/|=>|=,|=|&,1))+.)
See live demo.
Code the delimiters longest first, eg "=," before "=", otherwise the alternation, which matches left to right, will match "=" and the comma will become part of the value.
This uses a negative look ahead to stop matching past the next delimiter.
Using regexp_replace within PostgreSQL, I've developed (with a lot of help from SO) a pattern to match the first n characters, if the last character is not in a list of characters I don't want the string to end in.
regexp_replace(pf.long_description, '(^.{1,150}[^ -:])', '\1...')::varchar(2000)
However, I would expect that to simply end the string in an ellipses. However what I get is the first 150 characters plus the ellipses at the end, but then the string continues all the way to the end.
Why is all that content not being eliminated?
Why is all that content not being eliminated?
because you haven't requested that. you've asked to have the first 2-151 characters replaced with those same characters and elipsis. if you modify the pattern to be (^.{1,150}[^ -:]).* (notice the trailing .* has regex_replace work on the complete string, not just the prefix) you should get the desired effect.
Do your really want the range of characters between the space character and the colon: [^ -:]?
To include a literal - in a character class, put it first or last. Looks like you might actually want [^ :-] - that's just excluding the three characters listed.
Details about bracket expressions in the manual.
That whould be (building on what #just already provided):
SELECT regexp_replace(pf.long_decript-ion, '(^.{1,150}[^ :-]).*$', '\1...');
But it should be cheaper to use substring() instead:
SELECT substring(pf.long_decript-ion, '^.{1,150}[^ :-]') || '...';
Can anyone please help me to find the suitable regular expression to validate a string that has comma separated numbers, for e.g. '1,2,3' or '111,234234,-09', etc. Anything else should be considered invalid. for e.g. '121as23' or '123-123' is invalid.
I suppose this must be possible in Flex using regular expression but I can not find the correct regular expression.
#Justin, I tried your suggestion /(?=^)(?:[,^]([-+]?(?:\d*\.)?\d+))*$/ but I am facing two issues:
It will invalidate '123,12' which should be true.
It won't invalidate '123,123,aasd' which is invalid.
I tried another regex - [0-9]+(,[0-9]+)* - which works quite well except for one issue: it validates '12,12asd'. I need something that will only allow numbers separated by commas.
Your example data consists of three decimal integers, each having an optional leading plus or minus sign, separated by commas with no whitespace. Assuming this describes your requirements, the Javascript/ActionScript/Flex regex is simple:
var re_valid = /^[-+]?\d+(?:,[-+]?\d+){2}$/;
if (re_valid.test(data_string)) {
// data_string is valid
} else {
// data_string is NOT valid
}
However, if your data can contain any number of integers and may have whitespace the regex becomes a bit longer:
var re_valid = /^[ \t]*[-+]?\d+[ \t]*(,[ \t]*[-+]?\d+[ \t]*)*$/;
If your data can be even more complex (i.e. the numbers may be floating point, the values may be enclosed in quotes, etc.), then you may be better off parsing the string as a CSV record and then check each value individually.
Looks like what you want is this:
/(?!,)(?:(?:,|^)([-+]?(?:\d*\.)?\d+))*$/
I don't know Flex, so replace the / at the beginning and end with whatever's appropriate in Flex regex syntax. Your numbers will be in match set 1. Get rid of the (?:\d*\.)? if you only want to allow integers.
Explanation:
(?!,) #Don't allow a comma at the beginning of the string.
(?:,|^) #Your groups are going to be preceded by ',' unless they're the very first group in the string. The '(?:blah)' means we don't want to include the ',' in our match groups.
[-+]? #Allow an optional plus or minus sign.
(?:\d*\.)?\d+ #The meat of the pattern, this matches '123', '123.456', or '.456'.
* #Means we're matching zero or more groups. Change this to '+' if you don't want to match empty strings.
$ #Don't stop matching until you reach the end of the string.