"Operation must use an updateable query" in MS Access when the Updated Table is the same as the source - sql-update

The challenge is to update a table by scanning that same table for information. In this case, I want to find how many entries I received in an Upload dataset that have the same key (effectively duplicate instructions).
I tried the obvious code:
UPDATE Base AS TAR
INNER JOIN (select cdKey, count(*) as ct FROM Base GROUP BY cdKey) AS CHK
ON TAR.cdKey = CHK.cdKey
SET ctReferences = CHK.ct
This resulted in a non-updateable complaint. Some workarounds talked about adding DISTINCTROW, but that made no difference.
I tried creating a view (query in Ms/Access parlance); same failure.
Then I projected the set (SELECT cdKey, count(*) INTO TEMP FROM Base GROUP BY cdKey), and substituted TEMP for the INNER JOIN which worked.
Conclusion: reflexive updates are also non-updateable.

An initial thought was to embed a sub-select in the update, for example:
UPDATE Base TAR SET TAR.ctReferences = (select count(*) from Base CHK where CHK.cd = TAR.cd)
This also failed.
As this is part of a job I am calling, this SQL (like the other statements) are all strings executed by CurrentDb.Execute statements. I thought maybe I could make this a DLookup, I found that as cd is a string, I had a gaggle of double- and triple-quoted elements that was too messy to read (and maintain).
Best solution was to write a function so I could avoid having to do any sort of string manipulation. Hence, in a module there's a function:
Public Function PassHelperCtOccurs(ByRef cdX As String) As Long
PassHelperCtOccurs = DLookup("count(*)", "Base", "cd='" & cdX & "'")
End Function
And the call is:
CurrentDb().Execute ("UPDATE Base SET ctOccursCd =PassHelperCtOccurs(cd)")

Related

BigQuery MERGE statement billing more bytes than editor shows

I have a very large (3.5B records) table that I want to update/insert (upsert) using the MERGE statement in BigQuery. The source table is a staging table that contains only the new data, and I need to check if the record with a corresponding ID is in the target table, updating the row if so or inserting if not.
The target table is partitioned by an integer field called IdParent, and the matching is done on IdParent and another integer field called IdChild. My merge statement/script looks like this:
declare parentList array<int64>;
set parentList = array(select distinct IdParent from dataset.Staging);
merge into dataset.Target t
using dataset.Staging s
on
-- target is partitioned by IdParent, do this for partition pruning
t.IdParent in unnest(parentList)
and t.IdParent = s.IdParent
and t.IdChild = s.IdChild
when matched and t.IdParent in unnest(parentList) then
update
set t.Column1 = s.Column1,
t.Column2 = s.Column2,
...<more columns>
when not matched and IdParent in unnest(parentList) then
insert (<all the fields>)
values (<all the fields)
;
So I:
Pull the IdParent list from the staging table to know which partitions to prune
limit the partitions of the target table in the join predicate
also limit the partitions of the target table in the match/not matched conditions
The total size of dataset.Target is ~250GB. If I put this script in my BQ editor and remove all the IdParent in unnest(parentList) then it shows ~250GB to bill in the editor (as expected since there's no partition pruning). If I add the IdParent in unnest(parentList) back in so the script is exactly like you see it above i.e. attempting to partition prune, the editor shows ~97MB to bill. However, when I look at the query results, I see that it actually billed ~180GB:
The target table is also clustered on the two fields being matched, and I'm aware that the benefits of clustering are typically not shown in the editor's estimate. However, my understanding is that that should only make the bytes billed smaller... I can't think of any reason why this would happen.
Is this a BQ bug, or am I just missing something? BigQuery doesn't even say "the script is estimated to process XX MB", it says "This will process XX MB" and then it processes way more.
That's very interesting. What you did seems totally correct.
It seems BQ query planner could interpret your SQL correctly and know the partition pruning is provided, but when it executes. it failed to do so.
try removing t.IdParent in unnest(parentList) from both when matched clauses to see if the issue still happens, that is,
declare parentList array<int64>;
set parentList = array(select distinct IdParent from dataset.Staging);
merge into dataset.Target t
using dataset.Staging s
on
-- target is partitioned by IdParent, do this for partition pruning
t.IdParent in unnest(parentList)
and t.IdParent = s.IdParent
and t.IdChild = s.IdChild
when matched then
update
set t.Column1 = s.Column1,
t.Column2 = s.Column2,
...<more columns>
when not matched then
insert (<all the fields>)
values (<all the fields)
;
It would be a good idea to submit a bug to BigQuery if it couldn't be resolved.

How to avoid CTE or subquery in SQL?

Question
Say we have 1 as foo, and we want foo+1 as bar in SQL.
With CTE or subquery, like:-
select foo+1 as bar from (select 1 as foo) as abc;
We would get (in postgre which is what I am using):-
bar
-----
2
However, when I tried the following:-
select 1 as foo, foo+1 as bar;
The following error occurs:-
ERROR: column "foo" does not exist
LINE 1: select 1 as foo, foo+1 as bar;
^
Is there any way around this without the use of CTE or subquery?
Why do I ask?
I am using Django for a web service, to order and paginate objects in the database, I have to grab the count of the upvotes and downvotes and do some extra mathematical manipulation on those two values (ie. calculating the wilson score interval), where those two values are used multiple times.
All I can work with that I know of right now is the extra() function without breaking the ORM(?) [for example lazy queryset and prefetch_related() function].
Therefore I need a way to call those two values from somewhere instead of doing a SELECT multiple times when I calculate the score. (Or that's not the case in reality anyway?)
PS. Currently I am storing the vote count as database field and update them, but I already have a model of a vote, so it seems redundant and slow to update vote count and insert vote to database
No, you need the sub-query or CTE to do that. There is one alternative though: create a stored procedure.
CREATE FUNCTION wilson(upvote integer, downvote integer) RETURNS float8 AS $$
DECLARE
score float8;
BEGIN
-- Calculate the score
RETURN score;
END; $$ LANGUAGE plpgsql STRICT;
In your ORM you now call the function as part of your SELECT statement:
SELECT id, upvotes, downvotes, wilson(upvotes, downvotes) FROM mytable;
Also makes for cleaner code.

Update a table by using multiple joins

I am using Java DB (Java DB is Oracle's supported version of Apache Derby and contains the same binaries as Apache Derby. source: http://www.oracle.com/technetwork/java/javadb/overview/faqs-jsp-156714.html#1q2).
I am trying to update a column in one table, however I need to join that table with 2 other tables within the same database to get accurate results (not my design, nor my choice).
Below are my three tables, ADSID is a key linking Vehicles and Customers and ADDRESS and ZIP in Salesresp are used to link it to Customers. (Other fields left out for the sake of brevity.)
Salesresp(address, zip, prevsale)
Customers(adsid, address, zipcode)
Vehicles(adsid, selldate)
The goal is to find customers in the SalesResp table that have previously purchased a vehicle before the given date. They are identified by address and adsid in Customers and Vechiles respectively.
I have seen updates to a column with a single join and in fact asked a question about one of my own update/joins here (UPDATE with INNER JOIN). But now I need to take it that one step further and use both tables to get all the information.
I can get a multi-JOIN SELECT statement to work:
SELECT * FROM salesresp
INNER JOIN customers ON (SALESRESP.ZIP = customers.ZIPCODE) AND
(SALESRESP.ADDRESS = customers.ADDRESS)
INNER JOIN vehicles ON (Vehicles.ADSId =Customers.ADSId )
WHERE (VEHICLES.SELLDATE<'2013-09-24');
However I cannot get a multi-JOIN UPDATE statement to work.
I have attempted to try the update like this:
UPDATE salesresp SET PREVSALE = (SELECT SALESRESP.address FROM SALESRESP
WHERE SALESRESP.address IN (SELECT customers.address FROM customers
WHERE customers.adsid IN (SELECT vehicles.adsid FROM vehicles
WHERE vehicles.SELLDATE < '2013-09-24')));
And I am given this error: "Error code 30000, SQL state 21000: Scalar subquery is only allowed to return a single row".
But if I change that first "=" to a "IN" it gives me a syntax error for having encountered "IN" (Error code 30000, SQL state 42X01).
I also attempted to do more blatant inner joins, but upon attempting to execute this code I got the the same error as above: "Error code 30000, SQL state 42X01" with it complaining about my use of the "FROM" keyword.
update salesresp set prevsale = vehicles.selldate
from salesresp sr
inner join vehicles v
on sr.prevsale = v.selldate
inner join customers c
on v.adsid = c.adsid
where v.selldate < '2013-09-24';
And in a different configuration:
update salesresp
inner join customer on salesresp.address = customer.address
inner join vehicles on customer.adsid = vehicles.ADSID
set salesresp.SELLDATE = vehicles.selldate where vehicles.selldate < '2013-09-24';
Where it finds the "INNER" distasteful: Error code 30000, SQL state 42X01: Syntax error: Encountered "inner" at line 3, column 1.
What do I need to do to get this multi-join update query to work? Or is it simply not possible with this database?
Any advice is appreciated.
If I were you I would:
1) Turn off autocommit (if you haven't already)
2) Craft a select/join which returns a set of columns that identifies the record you want to update E.g. select c1, c2, ... from A join B join C... WHERE ...
3) Issue the update. E.g. update salesrep SET CX = cx where C1 = c1 AND C2 = c2 AND...
(Having an index on C1, C2, ... will boost performance)
4) Commit.
That way you don't have worry about mixing the update and the join, and doing it within a txn ensures that nothing can change the result of the join before your update goes through.

How can I SELECT records using a select list made of foreign keys?

I have a table, DEBTOR, with a structure like this:
and a second table, DEBTOR.INFO structured like this:
I have a select list made of record IDs from the DEBTOR.INFO table. How can I
select * from DEBTOR WHERE 53 IN (name of select list)?
Is this even possible?
I realize this query looks more like SQL than RetrieVe but I wrote it that way for an easier understanding of what I'm trying to accomplish.
Currently, I accomplish this query by writing
SELECT DEBTOR WITH 53 EQ [paste list of DEBTOR.INFO record IDs]
but obviously this is unwieldy for large lists.
It looks to me that you cant do that. Even if you use and i-descriptor, It only works in one direction. TRANS("DEBTOR.INFO",53,0,"X") works from the DEBTOR file but not the other way. So TRANS("DEBTOR",#ID,53,"X") from DEBTOR.INFO will return nothing.
See this article on U2's site for a possible solution.
Would something like this work (two steps):
SELECT DEBTOR.INFO SAVING PACKET
LIST DEBTOR ....
This creates a select list of the data in the PACKET field in the DEBTOR.INFO file and makes it active. (If you have duplicate values that way you can add the keyword UNIQUE after SAVING).
Then the subsequent LIST command uses that active select list which contains values found in the #ID field of the file DEBTOR.
Not sure if you are still looking at this, but there is a simple option that will not require a lot of programming.
I did it with a program, a subroutine and a dictionary item.
First I set a named common variable to contain the list of DEBTOR.INFO ids:
SETLIST
*
* Use named common to hold list of keys
COMMON /MYKEYS/ KEYLIST
*
* Note for this example I am reading the list from SAVEDLISTS
OPEN "SAVEDLISTS" TO FILE ELSE STOP "CAN NOT OPEN SAVEDLISTS"
READ KEYLIST FROM FILE, "MIKE000" ELSE STOP "NO MIKE000 ITEM"
Now, I can create a subroutine that checks for a value in that list
CHECKLIST
SUBROUTINE CHECKLIST( RVAL, IVAL)
COMMON /MYKEYS/ KEYLIST
LOCATE IVAL IN KEYLIST <1> SETTING POS THEN
RVAL = 1
END ELSE RVAL = 0
RETURN
Lastly, I use a dictionary item to call the subroutine with the field I am looking for:
INLIST:
I
SUBR("CHECKLIST", FK)
IN LIST
10R
S
Now all I have to do is put the correct criteria on my list statement:
LIST DEBTOR WITH INLIST = 1 ACCOUNT STATUS FK
Id use the very powerfull EVAL with an XLATE ;
SELECT DEBTOR WITH EVAL \XLATE('DEBTOR.INFO',#RECORD<53>,'-1','X')\ NE ""

What's the right pattern for unique data in columns?

I've a table [File] that has the following schema
CREATE TABLE [dbo].[File]
(
[FileID] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](256) NOT NULL,
CONSTRAINT [PK_File] PRIMARY KEY CLUSTERED
(
[FileID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The idea is that the FileID is used as the key for the table and the Name is the fully qualified path that represents a file.
What I've been trying to do is create a Stored Procedure that will check to see if the Name is already in use if so then use that record else create a new record.
But when I stress test the code with many threads executing the stored procedure at once I get different errors.
This version of the code will create a deadlock and throw a deadlock exception on the client.
CREATE PROCEDURE [dbo].[File_Create]
#Name varchar(256)
AS
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION xact_File_Create
SET XACT_ABORT ON
SET NOCOUNT ON
DECLARE #FileID int
SELECT #FileID = [FileID] FROM [dbo].[File] WHERE [Name] = #Name
IF ##ROWCOUNT=0
BEGIN
INSERT INTO [dbo].[File]([Name])
VALUES (#Name)
SELECT #FileID = [FileID] FROM [dbo].[File] WHERE [Name] = #Name
END
SELECT * FROM [dbo].[File]
WHERE [FileID] = #FileID
COMMIT TRANSACTION xact_File_Create
GO
This version of the code I end up getting rows with the same data in the Name column.
CREATE PROCEDURE [dbo].[File_Create]
#Name varchar(256)
AS
BEGIN TRANSACTION xact_File_Create
SET NOCOUNT ON
DECLARE #FileID int
SELECT #FileID = [FileID] FROM [dbo].[File] WHERE [Name] = #Name
IF ##ROWCOUNT=0
BEGIN
INSERT INTO [dbo].[File]([Name])
VALUES (#Name)
SELECT #FileID = [FileID] FROM [dbo].[File] WHERE [Name] = #Name
END
SELECT * FROM [dbo].[File]
WHERE [FileID] = #FileID
COMMIT TRANSACTION xact_File_Create
GO
I'm wondering what the right way to do this type of action is? In general this is a pattern I'd like to use where the column data is unique in either a single column or multiple columns and another column is used as the key.
Thanks
If you are searching heavily on the Name field, you will probably want it indexed (as unique, and maybe even clustered if this is the primary search field). As you don't use the #FileID from the first select, I would just select count(*) from file where Name = #Name and see if it is greater than zero (this will prevent SQL from retaining any locks on the table from the search phase, as no columns are selected).
You are on the right course with the SERIALIZABLE level, as your action will impact subsequent queries success or failure with the Name being present. The reason the version without that set causes duplicates is that two selects ran concurrently and found there was no record, so both went ahead with the inserts (which creates the duplicate).
The deadlock with the prior version is most likely due to the lack of an index making the search process take a long time. When you load the server down in a SERIALIZABLE transaction, everything else will have to wait for the operation to complete. The index should make the operation fast, but only testing will indicate if it is fast enough. Note that you can respond to the failed transaction by resubmitting: in real world situations hopefully the load will be transient.
EDIT: By making your table indexed, but not using SERIALIZABLE, you end up with three cases:
Name is found, ID is captured and used. Common
Name is not found, inserts as expected. Common
Name is not found, insert fails because another exact match was posted within milliseconds of the first. Very Rare
I would expect this last case to be truly exceptional, so using an exception to capture this very rare case would be preferable to engaging SERIALIZABLE, which has serious performance consequences.
If you do really have an expectation that it will be common to have posts within milliseconds of one another of the same new name, then use a SERIALIZABLE transaction in conjunction with the index. It will be slower in the general case, but faster when these posts are found.
First, create a unique index on the Name column. Then from your client code first check if the Name exists by selecting the FileID and putting the Name in the where clause - if it does, use the FileID. If not, insert a new one.
Using the Exists function might clean things up a little.
if (Exists(select * from table_name where column_name = #param)
begin
//use existing file name
end
else
//use new file name