BigQuery Error - Scalar Subquery produced more than one element - google-cloud-platform

I am trying to update a column in bigquery with the below query. But it is generating BigQuery Error - Scalar Subquery produced more than one element,
that also specifically for this sub-query(Select * from unnest(array(Select to_json_String(JSON_EXTRACT_SCALAR(c.raw,'$.C360ServiceError.serviceName'),true)
from tableabc as c where c.trace_id=a.trace_id). Can you please help
The complete query is:-
UPDATE tableabc a
SET
a.Link=CONCAT("a",(Select split(TopicName, '/')[OFFSET(1)] from tableabc As b where b.trace_id=a.trace_id ),
'String abc',
(Select split(TopicName, '/')[OFFSET(1)] from tableabc As b where b.trace_id=a.trace_id ),'""',
'String def',
((Select * from unnest(array(Select to_json_String(JSON_EXTRACT_SCALAR(c.raw,'$.C360ServiceError.serviceName'),true)
from tableabc as c where c.trace_id=a.trace_id)))), '"%0A"')
WHERE
DATE(a.logDate) Between CURRENT_DATE("Asia/Kolkata")-3 And CURRENT_DATE("Asia/Kolkata")

The UNNEST function is returning more than one value and that is not allowed, I believe because you are unnesting the array, in my experience use unnest for the table and then array in order to avoid the error.
Try the below, and please let me now the outcome.
UPDATE tableabc a
SET a.link=concat("a",
(
SELECT Split(topicname, '/')[OFFSET(1)]
FROM tableabc AS b
WHERE b.trace_id=a.trace_id ), 'String abc',
(
SELECT Split(topicname, '/')[OFFSET(1)]
FROM tableabc AS b
WHERE b.trace_id=a.trace_id ),'""', 'String def', (
(
SELECT *
FROM array
(
select to_json_string(json_extract_scalar(c.raw,'$.C360ServiceError.serviceName'),true)
FROM (unnest tableabc AS c WHERE c.trace_id=a.trace_id)))), '"%0A"')
WHERE date(a.logdate) BETWEEN CURRENT_DATE("Asia/Kolkata")-3 AND CURRENT_DATE("Asia/Kolkata")
Regards :)

Related

FLATTEN results using MAX value in BigQuery

I need to flatten probabilities column in my results with the max number:
original predicted probabilities
<=50K >50K >50K 0.5377828170971353
<=50K 0.46221718290286473
<=50K <=50K >50K 0.05434716579642335
<=50K 0.9456528342035766
I would like to flatten my result, but now using this query I just get the table above and using bigQuery Python client get an: [object Object],[object Object]
SELECT
original,
predicted,
probabilities
FROM
ML.PREDICT(MODEL `my_dataset.my_model`,
(
SELECT
*
FROM  
`bigquery-public-data.ml_datasets.census_adult_income`
))
Your probabilities field is a REPEATED RECORD, i.e., an array of structs. You can use a subquery to iterate over the array and select the max probability, like this:
SELECT
original,
predicted,
(SELECT p
-- Iterate over the array
FROM UNNEST(probabilities) as p
-- Order by probability and get the first result
ORDER BY p.prob DESC
LIMIT 1) AS probabilities
FROM
ML.PREDICT(MODEL `my_dataset.my_model`,
(
SELECT
*
FROM
`bigquery-public-data.ml_datasets.census_adult_income`
))
The result will look like this:
The python result you got looks more like a javascript representation of an object. Here's how I did it in python:
from google.cloud import bigquery
client = bigquery.Client()
# Perform a query.
sql = ''' SELECT ... ''' # Your query
query_job = client.query(sql)
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.values())
Output:
(' >50K', ' >50K', {'label': ' >50K', 'prob': 0.5218586871072727})
(' >50K', ' >50K', {'label': ' >50K', 'prob': 0.5907989087876587})
(' >50K', ' >50K', {'label': ' >50K', 'prob': 0.734145221825564})
Note that probabilities is a struct data type in BigQuery SQL, so its mapped as a python dict.
Check the BigQuery quickstart for more information on client libraries.

Create index if it does not exist

I know this question was asked before, but unfortunately I can't run that solution.
I am trying to execute following query on mySQL:
IF( SELECT 1 FROM information_schema.statistics WHERE index_name='abcattbl_tnam_ownr' AND table_name='abcattbl') <= 0 CREATE INDEX abcattbl_tnam_ownr ON abcattbl(abt_tnam ASC, abt_ownr ASC);
Both mySQL WB and unixODBC isql gives an error. WB indicates that the error is in the IF command/statement.
And of course trying to execute this statement in my C/C++ also fails. That's why I tried WB and isql.
I don't really need the WB/isql execution - I want it to be ran inside my C/C++ program.
What am I missing?
TIA!
Here is the solution I will be using:
SELECT( IF( ( SELECT 1 FROM information_schema.statistics WHERE index_name=\'abcattbl_tnam_ownr\' AND table_name=\'abcattbl\' ) > 0, \"SELECT 0\", \"CREATE INDEX abcattbl_tnam_ownr ON abcattbl(abt_tnam ASC, abt_ownr ASC)\"));
This query works fine inside WB and from my software.
Thank you.

Can Redshift SQL perform a case insensitive regular expression evaluation?

The documentation says regexp_instr() and ~ are case sensitive Posix evaluating function and operator.
Is there a Posix syntax for case insensitive, or a plug-in for PCRE based function or operator
Example of PCRE tried in a Redshift query that don't work as desired because of POSIX'ness.
select
A.target
, B.pattern
, regexp_instr(A.target, B.pattern) as rx_instr_position
, A.target ~ B.pattern as tilde_operator
, regexp_instr(A.target
, 'm/'||B.pattern||'/i') as rx_instr_position_icase
from
( select 'AbCdEfffghi' as target
union select 'Chocolate' as target
union select 'Cocoa Latte' as target
union select 'coca puffs, delivered late' as target
) A
,
( select 'choc.*late' as pattern
union select 'coca.*late' as pattern
union select 'choc\w+late' as pattern
union select 'choc\\w+late' as pattern
) B
To answer your question: No Redshift-compatible syntax or plugins that I know of. In case you could live with a workaround: We ended up using lower() around the strings to match:
select
A.target
, B.pattern
, regexp_instr(A.target, B.pattern) as rx_instr_position
, A.target ~ B.pattern as tilde_operator
, regexp_instr(A.target, 'm/'||B.pattern||'/i') as rx_instr_position_icase
, regexp_instr(lower(A.target), B.pattern) as rx_instr_position_icase_by_lower
from
( select 'AbCdEfffghi' as target
union select 'Chocolate' as target
union select 'Cocoa Latte' as target
union select 'coca puffs, delivered late' as target
) A
,
( select 'choc.*late' as pattern
union select 'coca.*late' as pattern
union select 'choc\w+late' as pattern
union select 'choc\\w+late' as pattern
) B
select 'HELLO' ~* 'el' = true
this is currently undocumented (2020-11-05)
Redshift now provides a direct solution for case-insensitive regular expression flags via added function parameters: Amazon Redshift - REGEXP_INSTR
The syntax using the provided query example would be:
select
A.target
, B.pattern
, regexp_instr(A.target, B.pattern) as rx_instr_position
, A.target ~ B.pattern as tilde_operator
, regexp_instr(A.target, B.pattern, 1, 1, 0, 'i') AS rx_instr_position_icase
from
( select 'AbCdEfffghi' as target
union select 'Chocolate' as target
union select 'Cocoa Latte' as target
union select 'coca puffs, delivered late' as target
) A
,
( select 'choc.*late' as pattern
union select 'coca.*late' as pattern
union select 'choc\w+late' as pattern
union select 'choc\\w+late' as pattern
) B

Oracle How do I transform this string field into structured data using regular expressions?

I did start at this answer:
Oracle 11g get all matched occurrences by a regular expression
But it didn't get me far enough. I have a string field that looks like this:
A=&token1&token2&token3,B=&token2&token3&token5
It could have any number of tokens and any number of keys. The desired output is a set of rows looking like this:
Key | Token
A | &token1
A | &token2
A | &token3
B | &token2
B | &token3
B | &token5
This is proving rather difficult to do.
I started here:
SELECT token from
(SELECT REGEXP_SUBSTR(str, '[A-Z=&]+', 1, LEVEL) AS token
FROM (SELECT 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[A-Z=&]+', ',')))
Where token is not null
But that yields:
A=&
&
&
B=&
&
&
which is getting me nowhere. I'm thinking I need to do a nested clever select where the first one gets me
A=&token1&token2&token3
B=&token2&token3&token5
And a subsequent select might be able to do a clever extract to get the final result.
Stumped. I'm trying to do this without using procedural or function code -- I would like the set to be something I can union with other queries so if it's possible to do this with nested selects that would be great.
UPDATE:
SET DEFINE OFF
SELECT SUBSTR(token,1,1) as Key, REGEXP_SUBSTR(token, '&\w+', 1, LEVEL) AS token2
FROM
(
-- 1 row per key/value pair
SELECT token from
(SELECT REGEXP_SUBSTR(str, '[^,]+', 1, LEVEL) AS token
FROM (SELECT 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^,]+', ',')))
Where token is not null
)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(token, '&\w+'))
This gets me
A | &token1
A | &token2
B | &token3
B | &token2
A | &token2
B | &token3
Which is fantastic formatting except for the small problem that it's wrong (A should have a token3, and token4 and token5 are nowhere to be seen).
Great question! Thanks for it!
select distinct k, regexp_substr(v, '[^&]+', 1, level) t
from (
select substr(regexp_substr(val,'^[^=]+=&'),1,length(regexp_substr(val,'^[^=]+=&'))-2) k, substr(regexp_substr(val,'=&.*'),3) v
from (
select regexp_substr(str, '[^,]+', 1, level) val
from (select 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
connect by level <= length(str) - length(replace(str,','))+1
)
) connect by level <= length(v) - length(replace(v,'&'))+1
It is an answer, and one that seems to work... But I don't like the middle splitting the val into kand v- there must be a better way (if the Key is always one character, that makes it easy though) . And having to put a DISTINCT to get rid of duplicates is horrible... Maybe with further playing you can clean it up though (or someone else might)
EDIT based on keeping the leading & and the key being a single character:
select distinct k, regexp_substr(v, '&[^&]+', 1, level) t
from (
select substr(val,1,1) k
, substr(regexp_substr(val,'=&.*'),1) v
from (
select regexp_substr(str, '[^,]+', 1, level) val
from (select 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
connect by level <= length(str) - length(replace(str,','))+1
)
) connect by level < length(v) - length(replace(v,'&'))+1

SQL Comparison to a value in the next row

I have been a long time reader of this forum, it has helped me a lot, however I have a question which I cant find a solution specific to my requirements, so this is the first time I have had to ask anything.
I have a select statement which returns meter readings sorted by date (the newest readings at the top), in 99.9% of cases the meter readings always go up as the date moves on, however due to system errors occasionally some go down, I need to identify instances where the reading in the row below (previous reading) is GREATER than the latest reading (Current cell)
I have come across the LEAD function, however its only in Oracle or SS-MS-2012, I'm using SS-MS-2008.
Here is a simplified version of my select statment:
SELECT Devices.SerialNumber,
MeterReadings.ScanDateTime,
MeterReadings.TotalMono,
MeterReadings.TotalColour
FROM dbo.MeterReadings AS MeterReadings
JOIN DBO.Devices AS Devices
ON MeterReadings.DeviceID = Devices.DeviceID
WHERE Devices.serialnumber = 'ANY GIVEN DEVICE SERIAL NUMBER'
AND Meterreadings.Scandatetime > 'ANY GIVEN SCAN DATE TIME'
ORDER BY MeterReadings.ScanDateTime DESC, Devices.SerialNumber ASC
This is the code I used in the end
WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
WHERE m.ScanDateTime > '2012-01-01'
)
SELECT top 1 *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
and p.ScanDateTime < r.ScanDateTime
and p.TotalMono > r.TotalMono
order by r.serialnumber, p.TotalMono desc, r.TotalMono asc
Try something like this.
;WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
)
SELECT *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
AND p.ScanDateTime < r.ScanDateTime
WHERE p.reading > r.reading