substring match in redshift database - regex

I have a redshift table "person" in which a particular column has data something like this
[{"attributeName":"name","attributeMetadata":null,"attributeValue":"KitchenAid - 7-Speed Hand Mixer - White","attributeImageType":"PRODUCT","attributeStatusCodes":[]},
{"attributeName":"title","attributeMetadata":null,"attributeValue":"KitchenAid","attributeImageType":"PRODUCT","attributeStatusCodes":[]},
{"attributeName":"address","attributeMetadata":null,"attributeValue":"address","attributeImageType":"PRODUCT","attributeStatusCodes":[]},
{"attributeName":"PIN CODE","attributeMetadata":null,"attributeValue":"32110","attributeImageType":"IMG","attributeStatusCodes":[]}]
I would like to extract only the dictionary/json/substring containing PIN CODE (see below)
{"attributeName":"PIN CODE","attributeMetadata":null,"attributeValue":"32110","attributeImageType":"IMG","attributeStatusCodes":[]}
I tried the following query and it is giving the following error
select distinct regexp_substr(attributes,'.*({.*?"attributeName":"PIN CODE".*?}).*') from person ;
ERROR: Invalid content of repeat range
DETAIL:
-----------------------------------------------
error: Invalid content of repeat range
code: 8002
context: T_regexp_init
query: 528401
location: funcs_expr.cpp:130
process: query2_40 [pid=12603]
-----------------------------------------------
I guess the problem is occurring because of multiple attributeName in a single column. Is their a way to achieve the desired result.

I am not sure if I understood you correctly, but you can try to use LIKE:
select * from person where attributes LIKE '%"attributeName":"PIN CODE"%';

Related

Column does not exist AWS Timestream Query error

I am trying to apply WHERE clause on DIMENSION of the AWS Timestream records. However, I got the error: Column does not exist
Here is my table schema:
The table schema
The table measure
First, I will show all the sample data I put in the table
SELECT username, time, manual_usage
FROM "meter-reading"."meter-metrics"
ORDER BY time DESC
LIMIT 4
The result:
Result
What I wanted to do is to query and filter the records by the Dimension ("username" specifically).
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE measure_name = "OnceADay"
ORDER BY time DESC LIMIT 10
Then I got the Error: Column 'OnceADay' does not exist
I tried to search for any quotas for Dimensions name and check for error in my schema:
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.naming
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.system_identifier
But I didn't find that my "username" for the dimension violate any of the above rules.
I checked for some other queries by AWS Blog, the author used the WHERE clause for the Dimension filter normally:
https://aws.amazon.com/blogs/database/effective-queries-for-common-query-patterns-in-amazon-timestream/
I figured it out after I tried with the sample code. Turn out it was a silly mistake I believe.
Using apostrophe (') instead of single quotation marks ("") solved my problem.
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE username = 'OnceADay'
ORDER BY time DESC LIMIT 10

Is it possible to update a piece of data in a pre-existing table using the output of data from a CTE?

I need to correct a datapoint in a pre-existing table. I am using multiple CTEs to find the bad value and the corresponding good value. I am having trouble working out how to overwrite the value in the table using the output of the CTE. Here is what I am trying:
with [extra CTEs here]....
,CTE3 AS (
SELECT c1.FIELD_1, c1.FIELD_2 AS GOOD, c2.FIELD_3 AS BAD
FROM CTE1 c1
JOIN CTE2 c2 ON c1.FIELD_1 = c2.FIELD_1
)
update TABLE1
set TABLE1.FIELD_3 = CTE3.GOOD
from CTE3
INNER JOIN TABLE1 ON CTE3.BAD = TABLE1.FIELD_3
Is it even possible to achieve this?
If so, how should I change my logic to get it to work?
Trying the above logic is throwing the following error:
SQL Error [42601]: An unexpected token "WITH CTE1 AS ( SELECT
FIELD_1" was found following "BEGIN-OF-STATEMENT". Expected tokens
may include: "<update>".. SQLCODE=-104, SQLSTATE=42601,
DRIVER=4.27.25
Table designs and expected output:

redshift User Defined Lambda Function returns error

I followed all the steps mentioned in this official tutorial to create a redshift lambda function.
https://aws.amazon.com/blogs/big-data/accessing-external-components-using-amazon-redshift-lambda-udfs/
I am able to use my own code instead of the code provided in that example.
It works as expected.
# select 123456 as input_number, mycircle('123456');
input_number | mycircle
--------------+--------------------
123456 | Mumbai
(1 row)
But the same function does not work when used in a table like this...
# select input_number, mycircle(input_number) from mytable limit 1;
ERROR: Invalid Lambda Response
DETAIL:
-----------------------------------------------
error: Invalid Lambda Response
code: 8001
context: Missing rows in lambda response
query: 2983079
location: exfunc_data.cpp:288
process: query0_121_2983079 [pid=29202]
-----------------------------------------------
It seems the UDF does not work like any other python UDF's those are already supported by redshift.
How do I use my lambda function as user defined function in this query?
Update:
My function is written in such a way that it will work only if I create a new table with only 1 row.
# create table todel as select * from mytable limit 1;
and then run the UDF on the newly created table, it works:
# select input_number, mycircle(input_number) from todel;
This is not expected and my other python UDF work correctly as expected.
You need to be sure to return the right number of output rows which matches your input rows. Remember the function does not expect just one input row, but instead processes in "batches".

Is it possible to create download link for Left Joined classic report?

I have classic report in the page, which has the following SQL query:
SELECT CELLS.ID, CELLS.NAME, CELLS.NUM, CELLS.AC_ID, AC.SERIAL,AC.FILE_NAME, AC.FILE_DATA
FROM CELLS
LEFT JOIN AC
ON CELLS.AC_ID = AC.ID
WHERE CELLS.AC_ID IS NOT NULL
ORDER BY CELLS.NUM
I would like to have download link for AC.FILE_DATA, which is BLOB. So in Attributes FILE_DATA column I set the following:
Type: Download BLOB
Table Name: AC
BLOB Column: FILE_DATA
Primary Key Column 1: ID
The page then is generating error in the place of classic report region:
Error: ORA-06502: PL/SQL: numeric or value error: character to number
conversion error
Looking in Debug log shows some more:
Exception in "AC Region": Sqlerrm: ORA-06502: PL/SQL: numeric or value
error: character to number conversion error Backtrace: ORA-06512: at
"APEX_050100.WWV_RENDER_REPORT3", line 7965
Without AC.FILE_DATA in left join there is no exception. So can I actually have blob download column when using joins in report query?
As far as I can tell, it has nothing to do with (left) join but the way you create the download link. It should NOT be the BLOB column name, but this:
dbms_lob.getlength(ac.file_data) download
Or, applied to your query,
SELECT CELLS.ID,
CELLS.NAME,
CELLS.NUM,
CELLS.AC_ID,
AC.SERIAL,
AC.FILE_NAME,
--
dbms_lob.getlength(AC.FILE_DATA) download --> this
FROM CELLS
LEFT JOIN AC
ON CELLS.AC_ID = AC.ID
WHERE CELLS.AC_ID IS NOT NULL
ORDER BY CELLS.NUM
"Download" column settings:
Type: Download BLOB
Table name: AC
BLOB Column: FILE_DATA
Primary Key Column 1: ID
Save, Run - should be just fine.

Regex QueryString Parsing for a specific in BigQuery

So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table.
The data in protoPayload.resource is the page requested with the querystring paramters included.
The contents of protoPayload.resource looks like the following examples:
/service.html?device_ID=123456
/service.html?v=2&device_ID=78ec9b4a56
I am getting close, but when there is another entry before device_ID, I am not getting it. As you can see I am not great with Regex, but it is the only way I think I can parse the data in the query. To get just the device ID from the first example, I was able to use the following example. Works great. My next challenge is to the data when the second parameter exists. The device IDs can vary in length from about 10 to 26 characters.
SELECT
RIGHT(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'),
length(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'))-10) as Device_ID
FROM logs
What I would like is just the values from the querystring device_ID such as:
123456
78ec9b4a56
Assuming you have just 1 query string per record then you can do this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable
The part within the parentheses will be captured and returned in the result.
If device_ID isn't guaranteed to be the last parameter in the string, then use something like this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=([^\&]*)') as device_id FROM mytable
One approach is to split protoPayload.resource into multiple service entries, and then apply regexp - this way it will support arbitrary number of device_id, i.e.
select regexp_extract(service_entry, r'device_ID=(.*$)') from
(select split(protoPayload.resource, ' ') service_entry from
(select
'/service.html?device_ID=123456 /service.html?v=2&device_ID=78ec9b4a56'
as protoPayload.resource))