Trying to write an SQL query with regexp_matches() look behind positive in postgresql - regex

From a PostgreSQL database, I'm trying to match 6 or more digits that come after a string that looks like "(OCoLC)" and I thought I had a working regular expression that would fit that description:
(?<=\(ocolc\))[0-9]{6,}
Here are some strings that it should return the digits for:
|a(OCoLC)08507541 will return 08507541
|a(OCoLC)174097142 will return 174097142
etc...
This seems to work to match strings when I test it on regex101.com, but when I incorporate it into my query:
SELECT
regexp_matches(v.field_content, '(?<=\(ocolc\))[0-9]{6,}', 'gi')
FROM
varfield as v
LIMIT
1;
I get this message:
ERROR: invalid regular expression: quantifier operand invalid
I'm not sure why it doesn't seem to like that expression.
UPDATE
I ended up just resorting to using a case statement, as that seemed to be the best way to work around this...
SELECT
CASE
WHEN v.field_content ~* '\(ocolc\)[0-9]{6,}'
THEN (regexp_matches(v.field_content, '[0-9]{6,}', 'gi'))[1]
ELSE v.field_content
END
FROM
varfield as v
as electricjelly noted, I'm kind of after just the numeric characters, but they have to be preceded by the "(OCoLC)" string, or they're not exactly what I'm after. This is part of a larger query, so I'm running a second case statement a boolean flag in cases where the start of the string wasn't "(OCoLC)". These seems to be more helpful anyway, as I'm going to probably want to preserve those other values somehow.

After looking over your question it seems your error is caused from a syntax problem, not so much from the function not being available on your version of PostgreSQl, as I tested it on 9.6 and I received the same error.
However, what you seem to want is to pull the numbers from a given field as in
|a(OCoLC)08507541 becomes 08507541
an easy way you could accomplish this would be to use regex_replace
the function would be:
regexp_replace('table.field', '\D', '', 'g')
the \D in the function finds all non-numbers and replaces it with a nothing (hence the '') and returns everything else

It looks like after doing some more searching, this is only a feature of versions of PostgreSQL server >= 9.6
https://www.postgresql.org/docs/9.6/static/functions-matching.html#POSIX-CONSTRAINTS-TABLE
The version I am running is version 9.4.6
https://www.postgresql.org/message-id/E1ZsIsY-0006z6-6T#gemulon.postgresql.org
So, the answer is it's not available for this version of PostgreSQL, but presumably this would work just fine in the latest version of the server.

Related

How to I make gerrit query that spans across few specific projects?

I tried for few hours to find the right syntax for making a regex query that returns reviews from 2-3 different projects but I failed and decided to crowdsource the task ;)
The search is documented at https://review.openstack.org/Documentation/user-search.html and mentions possible use of REGEX,... but it just didn't work.
Task: return all CRs from openstack-infra/gerritlib and openstack-infra/git-review projects from https://review.openstack.org
Doing it for one project works well project:openstack-infra/gerritlib
Ideally I would like to look for somethign like ^openstack-infra\/(gerritlib|git-review), or at least this is the standard regex syntax.
Still, I found impossible to use parentheses so far, every time I used them it stopped it from returning any results.
1) You don't need to escape the "/" character.
2) You need to use double quotes to make the parentheses work.
So the following search should work for you:
project:"^openstack-infra/(gerritlib|git-review)"

Matching multiple hex characters in a PGSQL Regex

I'm trying to find some very specific multibyte characters in PostgreSQL using Regex. I know I have the option to make a long CASE WHEN but i decided to check if there is a different way to finding these.
My current Regex looks like this E'\xf0\x9f\x98\x83'
This works pretty well, except that I would need to find all from \xf0\x9f\x98\x80 to \xf0\x9f\x98\x99.
In JS I would just be able to write something like \xf0\x9f\x98[\x80-\x89] but for whatever reason this returns an error in PGSQL. Is there a shortcut like this, or am I doomed to writing 20 CASE WHEN-s?
I have realized my mistake. PGSQL Error was caused because I'm looking for 4 byte characters and I just wanted to mess with the last byte. I realized I'd have to write it like this: E'[\xf0\x9f\x98\x80-\xf0\x9f\x98\x90]'

Solr escapes double quotes/exact match (Django via Scorched/Sunburnt)

I'm querying a Solr 5.3 instance with Django through Scorched. It all works great as far as I don't ask an exact-match query. In other words,
q=something something else
returns exactly the same result as:
q="something something else"
The culprit, as far as I can see, is the actual query which Django throws at Solr. In fact, for the second case this is:
q=\"something\+something\+else\"
So, in other words, the " character is escaped. Am I right? How do I tell Solr that when I query something between double quotes I want an exact match?
In the Solr admin webpage it all works well, i.e. if I search for "something something else" I get the correct result.
I'm not sure this is a Scorched/Sunburnt problem or not. Does it have something to do with filters/tokenizers (e.g. solr.MappingCharFilterFactory)?
Thanks
I have received this from Scorched's people on Github:
from scorched.strings import DismaxString
...
solr.query(q=DismaxString('"something something else"')
Scorched will not escape any characters inside a DismaxString....
Hopefully it can help other people.

Oracle INSTR backward in Oracle SQL Developer does not work.

I am trying to use regexp_substr and instr in Oracle SQL Developer to take a value from my database and go from right to left to the first "/" and then use the value to the right.
table name: access_log
col name: download
value: Download file:/webdocs/data/groupXXX/case/03_28_54_9_0000011856.pdf
I am trying to end up with just the 03_28_54_9_0000011856.pdf part of the value. I have the following SQL:
select regexp_substr(download, '(.*)/', instr(download,'/',1,4)+1,1,'i',1)
from access_log;
But I am getting the following error in SQL Developer:
ORA-00939: too many arguments for function
00939. 00000 - "too many arguments for function"
*Cause:
*Action:
Can someone please tell me why I am getting this error and how I can make this work
That instr and regexp_substr combination looks pretty complicated. You want everything after the last slash? The key to finding the simpler answer is a technique that often helps with regular expressions: If the problem seems hard, invert it. Instead of thinking about what you want to keep, think about what you want to get rid of.
In this case you want to get rid of everything up to and including the last slash, and that's a really easy regular expression: .*/
So just match that and replace it with the empty string.
regexp_replace(download, '.*/', '')
With the goal posts moved:
SELECT REGEXP_REPLACE('file:/webdocs/data/groupXXX/case/03_28_54_9_0000011856.pdf', '^.*/(.*)/.*$', '\1') FROM DUAL;

Convert hex to utf8 in greenplum in regexp_replace

I have strings in a table that contain hex values such as \ffffffc4. An example is the following:
Urz\ffffffc4\ffffff85dzenie zgodne ze standardem High Definition Audio
The following code can convert the hex into UTF8:
select chr(x'c4'::int)
which returns Ä but when I try to use a regexp_replace I get into problems. I have tried the following:
select regexp_replace(sal_input, E'\\f{6}(..)',convert(E'\\1','xyz','UTF8'),'g')
where XYZ are the various source encodings offered in 8.2 but all I get back is the hex value.
Any idea on how I could use the chr function inside regexp_replace?
Version used: PostgreSQL 8.2.15 (Greenplum Database 4.1.1.1 build 1) on x86_64-unknown-linux-gnu
Thanks in advance for the help
You are misunderstanding the order of evaluation. The 2nd argument to regexp_replace isn't a callback invoked for every substitution of '\1'.
What happens is that your convert call is evaluated first, on the literal value \1, and that result is passed to regexp_replace.
In any case, the SQL doesn't even evaluate on a modern PostgreSQL because of stricter casting rules, as '\1' isn't a valid bytea literal.
In a less ancient Pg version it might be possible to do something with regexp_split_to_table, chr and string_agg. In 8.2, I think you're going to be using a PL. I'd load PL/Perl and write a simple Perl function to do it. It's likely possible to implement in PL/PgSQL, but I suspect any implementation with the functionality available in 8.2 will be verbose and slow. I'd love to be proved wrong.