Postgres invalid regular expression: invalid character range - regex

I'm using the following line in a postgres function:
regexp_replace(input, '[^a-z0-9\-_]+', sep, 'gi');
But I'm getting ERROR: invalid regular expression: invalid character range when I try to use it. The regex works fine in Ruby, is there a reason it'd be different in postgres?

Some regexp parsers will work with a dash (-) in the middle, if after a range like you have it, but others won't. I suspect the postgres regexp parser is in the later class. The canonical way to have the dash in a regexp is to start with it, i.e. change the regexp to '[^-a-z0-9_]+' which might get it past the parser. Some regexp parsers, however, can be really fussy and not accept that, either.
I don't have a postgres to test with, but I expect they'll accept the regexp above and deal correctly. Otherwise you have to find the regexp portion of their manual and understand what it says about this.

I had the same problem
using
\-
instead of only
-
worked to me

For me it worked to move the dash (-) to the end of the list
replaced [A-Za-z0-9-_.+=] with [A-Za-z0-9_.+=-] seems to work

[^[:digit:]\-.]
The above code will work.

Related

Lucene regex v4

I am trying to query on Kibana version 7.9.1 for a uuidv4. I disabled the KQL an now it looks like it is using lucene.
Example of a uuid v4:
2334e133-37a6-4039-8acd-b0a561b961b2
Now if I input :
/[0-9a-fA-F]{8}/
in the search bar I get hits, but as soon as I try to escape the hyphen like
/[0-9a-fA-F]{8}\-/
nothing shows up. I would like to use the full regular expression:
[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
But I can't because of the hyphens.
Is there any other way to escape that pesky hyphen?
I am using elastic search 7.9.1 by the way
I'm not sure why that regex above won't work for you, but this was the best I could come up with given the context: ^[0-9a-fA-F]{8}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{4}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{4}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{4}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{12}$
It basically is just replacing your "-" with a character not in range "[^...]" that I filled with almost everything except - and added a start character "^" and end character "$"
Again, not sure if lucene is just not using certain parts of regex, but try not escaping the -'s I know some programs will automatically escape symbols for you when using regex.
I ended up using the following regex on lucene in the kibana discover option:
/[0-9a-fA-F]{8}/ AND /[0-9a-fA-F]{4}/ AND /[0-9a-fA-F]{12}/
Not pretty, but it works.

Why /^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i does not work as expected

I have this regex for email validation (assume only x#y.com, abc#defghi.org, something#anotherhting.edu are valid)
/^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i
But #abc.edu and abc#xyz.eduorg are both valid as to the regex above. Can anyone explain why that is?
My approach:
there should be at least one character or number before #
then there comes #
there should be at least one character or number after # and before .
the string should end with either edu, com, or org.
Try this
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
and it should become clear - you need to group those alternatives, otherwise you can match any string that has 'edu' in it, or any string that ends with org. To put it another way, your version matches any of these patterns
^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)
(edu)
(org)$
It's worth pointing out that the original poster is using this as a regex learning exercise. This would be a terrible regex for actual production use! It's a thorny problem - see Using a regular expression to validate an email address for a lot more depth.
Your grouping parentheses are incorrect:
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
Can also just use one case as you're using the i modifier:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
N.B. you were also missing a + from the second set, I assume this was just a typo...
What you have written is the equivalent of matching something that:
Begins with [a-zA-Z0-9]+#[a-zA-Z0-9].com
contains edu
or ends with org
What you were looking for was:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
Your regex looks ok.
I guess you are looking using a find function in stead of a match function
Without specifying what you use it is a bit difficult, but in Python you would write
import re
pattern = re.compile ('^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$')
re.match('#abc.edu') # fails, use this to validate an input
re.search('#abc.edu') # matches, finds the edu
Try to use it:
[a-zA-Z0-9]+#[a-zA-Z0-9]+.(com|edu|org)+$
U forget about + modificator if u want to catch any combinations of (com|edu|org)
Upd: as i see second [a-zA-Z0-9] u missed + too

Parsing words separated with hyphen

I require to parse the below string using regular expressions. I came up with two variants, both of which seem a bit ugly to me. Please assist me as to which would be better suited for the job.
The main task is to parse the url in scrapy.
Sample expression -
/article/2014/01/16/hcl-tech-earnings-shares-idINDEEA0F02920140116
Regex -
/article/(\d+)/(\d+)/(\d+)/([0-9A-Za-z-]+)
/article/(\d+)/(\d+)/(\d+)/\w+(-\w+)*
And yes, I need to capture the whole ending expression, so 1st regex has handled that perfectly. I verified both the regex using https://pythex.org/.
Edit -
Expected Format -
/article/(yyyy)/(mm)/(dd)/(words-separated-by-hyphen)
I want to capture all the stuff separated by / after /article
Simply use:
/article/(\d+)/(\d+)/(\d+)/(.*)
The hyphens don't seem to have to do anything with what's in the url so...

need a regular expression that copes with this URL:

I have a URL from google circles that doesn't get validated by normal regular expressions. for instance, asp.net provides a standard regular expression to cope with URLS, which is:
"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?"
But when you get a google circles URL:
https://plus.google.com/photos/114197249914471021468/albums/5845982797151575009/5845982803176407170?authkey=CKfNzLrhmenraA#photos/114197249914471021468/albums/5845982797151575009/5845982803176407170?authkey=CKfNzLrhmenraA
it can't cope.
I thought of appending to the end the following expression: (\?.+)?
which basically means the URL can have a question mark after it and then any number of characters of any type, but that doesn't work.
The whole expression would be:
"[Hh][Tt][Tt][Pp]([Ss])?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*(\?.+)?)?"
For some reason, that doesn't work with complicated URLs either.
Help is appreciated.
I added the anchors ^ and $ for the purposes of this test, escaped the / because the following is a javascript regex literal, changed the &, which had no business being there, to &;, removed the space and added # to the third character set, and it seems to work okay:
/^http(s)?:\/\/([\w-]+\.)+[\w-]+(\/[\w.\/?%&;#=-]*)?$/.test(
'https://plus.google.com/photos/114197249914471021468/albums/5845982797151575009/5845982803176407170?authkey=CKfNzLrhmenraA#photos/114197249914471021468/albums/5845982797151575009/5845982803176407170?authkey=CKfNzLrhmenraA' )
// true
I also moved the - to the end in the third character set, as it should be at the start or end of the set if not specifying a range.
Disclaimer: I do not propose this as good way of validating urls in general, it is just an edited version of the original regex which now works in this specific case.

Regular Expression to find multiple instances of %%{ANYTHING}%%

SomeRandomText=%EXAMPLE1%,MoreRandomText=%%ONE%%!!%%TWO%%,YetMoreRandomText=%%THREE%%%FOUR%!!%FIVE%\%%SIX%%
I'm in need of a regular expression which can pull out anything which is wrapped in '%%'- so this regular expression would match only the following:
%%ONE%%
%%TWO%%
%%THREE%%
%%SIX%%
I've tried lots of different methods, and am sure there is a way to achieve this- but i'm struggeling as of yet. I mainly end up getting it where it will match everything from the first %% to the last %% in the string- which is not what i want. i think i need something like forward lookups, but struggling to implement
You need a non-greedy match, using the ? modifier:
%%.*?%%
See it working online: rubular
This can also be done be restricting what is allowed between the %s.
%%[^%]*%%
This is more widely supported than non-greedy matching, however
note that this won't match %%A%B%%. Although, if necessary, this can be done with some modifications:
%%([^%]|%[^%])*%%
Or equivalently
%%(%?[^%])*%%