Select nth to nth row while table still have values unselected with python and pyodbc - python-2.7

I have a table with 10,000 rows and I want to select the first 1000 rows and then select again and this time, the next set of rows, which is 1001-2001.
I am using the BETWEEN clause in order to select the range of values. I can also increment the values. Here is my code:
count = cursor.execute("select count(*) from casa4").fetchone()[0]
ctr = 1
ctr1 = 1000
str1 = ''
while ctr1 <= count:
sql = "SELECT AccountNo FROM ( \
SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum \
FROM casa4 ) seq \
WHERE seq.rownum BETWEEN " + str(ctr) + " AND " + str(ctr1) + ""
ctr = ctr1 + 1
ctr1 = ctr1 + 1000
cursor.execute(sql)
sleep(2) #interval in printing of the rows.
for row in cursor:
str1 = str1 + '|'.join(map(str,row)) + '\n'
print "Records:" + str1 #var in storing the fetched rows from database.
print sql #prints the sql statement(str) and I can see that the var, ctr and ctr1 have incremented correctly. The way I want it.
What I want to achieve is using a messaging queue, RabbitMQ, I will send this rows to another database and I want to speed up the process. Selecting all and sending it to the queue returns an error.
The output of the code is that it returns 1-1000 rows correctly on the 1st but, on the 2nd loop, instead of 1001-2001 rows, it returns 1-2001 rows, 1-3001 and so on.. It always starts on 1.

I was able to recreate your issue with both pyodbc and pypyodbc. I also tried using
WITH seq (AccountNo, rownum) AS
(
SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum
FROM casa4
)
SELECT AccountNo FROM seq
WHERE rownum BETWEEN 11 AND 20
When I run that in SSMS I just get rows 11 through 20, but when I run it from Python I get all the rows (starting from 1).
The following code does work using pyodbc. It uses a temporary table named #numbered, and might be helpful in your situation since your process looks like it would do all of its work using the same database connection:
import pyodbc
cnxn = pyodbc.connect("DSN=myDb_SQLEXPRESS")
crsr = cnxn.cursor()
sql = """\
CREATE TABLE #numbered (rownum INT PRIMARY KEY, AccountNo VARCHAR(10))
"""
crsr.execute(sql)
cnxn.commit()
sql = """\
INSERT INTO #numbered (rownum, AccountNo)
SELECT
ROW_NUMBER() OVER (ORDER BY Accountno) AS rownum,
AccountNo
FROM casa4
"""
crsr.execute(sql)
cnxn.commit()
sql = "SELECT AccountNo FROM #numbered WHERE rownum BETWEEN ? AND ? ORDER BY rownum"
batchsize = 1000
ctr = 1
while True:
crsr.execute(sql, [ctr, ctr + batchsize - 1])
rows = crsr.fetchall()
if len(rows) == 0:
break
print("-----")
for row in rows:
print(row)
ctr += batchsize
cnxn.close()

Related

How to update columns in existing table by using temporary table columns in Amazon Redshift?

Below code is developed in SQL to update target table columns. Can some one help me to rewrite below query in redshift as I am trying to execute same query on amazon redshift it is giving error as:
Amazon Invalid operation: relation "c" does not exist;
With TempTable As
(
SELECT Left('abcdefghijk',len(TerritoryName)/3) + Substring(TerritoryName,len(TerritoryName)-len(TerritoryName)/3-len(TerritoryName)/3+1,len(TerritoryName)-len(TerritoryName)/3-len(TerritoryName)/3) + Right('ijklmnopqrstuv',len(TerritoryName)/3) As Masked_TerritoryName
,Left('abcdefghijk',len(DistrictName)/3) + Substring(DistrictName,len(DistrictName)-len(DistrictName)/3-len(DistrictName)/3+1,len(DistrictName)-len(DistrictName)/3-len(DistrictName)/3) + Right('ijklmnopqrstuv',len(DistrictName)/3) As Masked_DistrictName
,Left('abcdefghijk',len(RegionName)/3) + Substring(RegionName,len(RegionName)-len(RegionName)/3-len(RegionName)/3+1,len(RegionName)-len(RegionName)/3-len(RegionName)/3) + Right('ijklmnopqrstuv',len(RegionName)/3) As Masked_RegionName
,Left('abcdefghijk',len(RSMTerritoryName)/3) + Substring(RSMTerritoryName,len(RSMTerritoryName)-len(RSMTerritoryName)/3-len(RSMTerritoryName)/3+1,len(RSMTerritoryName)-len(RSMTerritoryName)/3-len(RSMTerritoryName)/3) + Right('ijklmnopqrstuv',len(RSMTerritoryName)/3) As Masked_RSMTerritoryName
,Left('abcdefghijk',len(CCAName)/3) + Substring(CCAName,len(CCAName)-len(CCAName)/3-len(CCAName)/3+1,len(CCAName)-len(CCAName)/3-len(CCAName)/3) + Right('ijklmnopqrstuv',len(CCAName)/3) As Masked_CCAName
,Left('abcdefghijk',len(LCAName)/3) + Substring(LCAName,len(LCAName)-len(LCAName)/3-len(LCAName)/3+1,len(LCAName)-len(LCAName)/3-len(LCAName)/3) + Right('ijklmnopqrstuv',len(LCAName)/3) As Masked_LCAName
,Left('abcdefghijk',len(TMComp)/3) + Substring(TMComp,len(TMComp)-len(TMComp)/3-len(TMComp)/3+1,len(TMComp)-len(TMComp)/3-len(TMComp)/3) + Right('ijklmnopqrstuv',len(TMComp)/3) As Masked_TMComp
,Left('abcdefghijk',len(ASMTerritoryName)/3) + Substring(ASMTerritoryName,len(ASMTerritoryName)-len(ASMTerritoryName)/3-len(ASMTerritoryName)/3+1,len(ASMTerritoryName)-len(ASMTerritoryName)/3-len(ASMTerritoryName)/3) + Right('ijklmnopqrstuv',len(ASMTerritoryName)/3) As Masked_ASMTerritoryName
,TerritoryCode
FROM TargetTable
)
Update C
Set C.TerritoryName = N.Masked_TerritoryName
,C.DistrictName = N.Masked_DistrictName
,C.RegionName = N.Masked_RegionName
,C.RSMTerritoryName = N.Masked_RSMTerritoryName
,C.CCAName = N.Masked_CCAName
,C.LCAName = N.Masked_LCAName
,C.TMComp = N.Masked_TMComp
,C.ASMTerritoryName = N.Masked_ASMTerritoryName
From TargetTable C
Inner Join TempTable N ON C.TerritoryCode = N.TerritoryCode
I don't believe you can use just an alias for the target table. You have "... Update C ...", I expect you need "... Update TargetTable ..." or "... Update TargetTable C ...".
Also you don't need to list TargetTable in the FROM clause as this is assumed. Your join on conditions become where conditions. So you query will look like this:
With TempTable As
(
SELECT ...
FROM TargetTable
)
Update TargetTable C
Set ...
From TempTable N
Where C.TerritoryCode = N.TerritoryCode

Big query analytical function not giving expected results

I am trying to write a sql in bigquery and I have a requirement to filter records based on a group by column and another column in the table
what I mean is I want to check if the group by column(column name:mnt) has more than one row then I have to check if col2 (col name: zel) value, then I have to apply a filter saying col2 ='X' and only pass that record else pass i.e dont filter the records if the col1 has only distinct one value per group
So I have written a sql to do this I have used row_number as well as rank , dense rank function but I noticed the value of rank and dense rank and row number functions return same value for a group
Please see the below code
#standardsql
with t1 as (SELECT mnt,
case when rank() over (partition by ltrim(rtrim(mnt)) order by
ltrim(rtrim(mnt)) asc) >1 then 'Y' else 'N' end
as flag,
rank() over (partition by mnt order by mnt) as rn,
dense_rank() over (partition by mnt order by mnt) as drn, FROM
projectname.datasetname.tablename1),
t2 as ( SELECT
mnt,
rel,
lif,
lts,
lokez FROM projectname.datasetname.tablename2
WHERE lts <> "" AND _PARTITIONTIME = TIMESTAMP(CURRENT_DATE()) ) ,
t3 as (SELECT
lif,
lifn,
lts,
par FROM `projectname.datasetname.tablename3`)
,t4 as (SELECT rcv FROM `projectname.datasetname.tablename4` WHERE mes
= 'PRO')
select * from (
SELECT t1.mnt as mnt,
t1.flag,
t1.rn,
t1.drn
t2.rel as zel,
t2.lokez as ZLOEKZ,
t4.rcv as Zrcv
FROM t1 left join t2 on replace(t1.mnt, '00000000', '') =
REPLACE(t2.mnt, '00000000', '') AND t1.lif = t2.lif and t2.lts <> ""
and
case when t1.flag = 'Y' and t2.rel ='X' then 1
when (t1.flag ='N' and t2.rel=t2.rel) or (t1.flag ='N' and t2.rel
is null) then 1
when t1.flag = 'Y' and t2.rel <>'X' then 2
else 3
end = 1
left join t3 ON t1.lif = t3.lif AND t2.lts = t3.lts AND
t3.par = 'BA' left join t4 on t4.rcv = t3.lifn and t2.lokez is null )
where ZLOEKZ is null order by mnt
As you can see I am using a case statement and even it seems to be not working fine. I am pasting the case condition below again
case when t1.flag = 'Y' and t2.rel ='X' then 1
when (t1.flag ='N' and t2.rel=t2.rel) or (t1.flag ='N' and
t2.rel
is null) then 1
when t1.flag = 'Y' and t2.rel <>'X' then 2
else 3
end = 1
But the expected record count did not match so I added the above sql lines to see if my analytical functions were giving me result I wanted
rank() over (partition by mnt order by mnt) as rn,
dense_rank() over (partition by mnt order by mnt) as drn
strangely for same mnt number the rank , dense rank and row_number function are assigning the same value what am i doing wrong here.
mnt flag rn drn rel lokez rcv
100 N 1 1 X abc 123
100 N 1 1 null xyz 123
100 N 1 1 null def 234
This is my output
I mean as per my code for same mnt number I am seeing flag set to N instead of Y and for the rank and dense rank are giving me same number for all 3 mnt it is generating 1 instead of 123 (note for rank function I understand) but dense rank should not do that
I tried to convey the issue as efficiently as I could please let me know if there is any clarifications I can provide.
any help appreciated
thanks
SELECT * EXCEPT(ct) FROM (
SELECT *, COUNT() OVER(PARTITION BY mnt) AS ct
) WHERE ct=1 or zel='X'
This is the code snippet for the problem you mentioned. Use this in your code according to the logic.

Parametized SQL query on a loop not updating correctly

I have an sql query running on a loop. There are two values FINGER and index_str that both need to be updated in parallel.
FINGER: (numpy array)
[['1012_8']
['10214_5']
['10409_9']
index_str: (pandas dataframe)
0 14,38,51,65,84,85
1 3,34,58,65,66,75
2 3,15,68,70,80,82
Above are the first 3 examples. There are over 1000 of each in reality.
for i in range(len(FINGER)):
print i
print FINGER[i]
for x in index_str[i]:
yy = FINGER[i][0]
#print range(len(FINGER))
index_str = str(x)
query = "SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '" + yy + "' and ind IN (" + index_str + ") order by ind "
print query
c.execute(query)
rows = c.fetchall()
print rows
Above is the loop and query in question.
So far the loop runs through all values of index_str for only the first FINGER value. To elaborate, the query updates for the first 3 examples as follows.
SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '1012_8' and ind IN (14,38,51,65,84,85) order by ind
SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '1012_8' and ind IN (3,34,58,65,66,75) order by ind
SELECT finger, ind, x,y, CAST( (direction*180/3.142)as INT),CAST(quality*100 as INT) from UNIL_fingerprints where finger = '1012_8' and ind IN (3,15,68,70,80,82) order by ind
Whereas '1012_8' should be '10214_5' and '10409_9' respectively in the 2nd and 3rd query above.
Any ideas on how to get this to update properly would be helpful.
You want zip():
for finger, indexes in zip(FINGERS, index_str):
print("fingers : {}- indexes: {}".format(finger, indexes))
Also you REALLY want to learn and use the db-api properly (well, unless you dont mind being hacked, that is).

Raw query must include the primary key

I got a raw SQL statement in my views.py
Message.objects.raw('''
SELECT s1.ID, s1.CHARACTER_ID, MAX(s1.MESSAGE) MESSAGE, MAX(s1.c) occurrences
FROM
(SELECT ID, CHARACTER_ID, MESSAGE, COUNT(*) c
FROM tbl_message WHERE ts > DATE_SUB(NOW(), INTERVAL %s DAY) GROUP BY CHARACTER_ID,MESSAGE) s1
LEFT JOIN
(SELECT ID, CHARACTER_ID, MESSAGE, COUNT(*) c
FROM tbl_message WHERE ts > DATE_SUB(NOW(), INTERVAL %s DAY) GROUP BY CHARACTER_ID,MESSAGE) s2
ON s1.CHARACTER_ID=s2.CHARACTER_ID
AND s1.c < s2.c
WHERE s2.c IS NULL
GROUP BY CHARACTER_ID
ORDER BY occurrences DESC''', [days, days])
The result of this SQL statement (tested on database directly) is:
ID | CHARACTER_ID | MESSAGE | OCCURENCES
----+--------------+---------+--------------
148 | 10 | test | 133
But all I got is a InvalidQuery Exception with the information Raw query must include the primary key
Then I double checked the docs and read:
There is only one field that you can’t leave out - the primary key
field....An InvalidQuery exception will be raised if you forget to include the primary key.
As you can see I got the requested primary key added in my statement. What's wrong?
class Message(models.Model):
character = models.ForeignKey('Character')
message = models.TextField()
location = models.ForeignKey('Location')
ts = models.DateTimeField()
class Meta:
pass
def __unicode__(self):
return u'%s: %s...' % (self.character, self.message[0:20])
Include 1 as id to your query
Message.objects.raw('''
SELECT 1 as id , s1.ID, s1.CHARACTER_ID, MAX(s1.MESSAGE) MESSAGE, MAX(s1.c) occurrences
FROM
(SELECT ID, CHARACTER_ID, MESSAGE, COUNT(*) c
FROM tbl_message WHERE ts > DATE_SUB(NOW(), INTERVAL %s DAY) GROUP BY CHARACTER_ID,MESSAGE) s1
LEFT JOIN
(SELECT ID, CHARACTER_ID, MESSAGE, COUNT(*) c
FROM tbl_message WHERE ts > DATE_SUB(NOW(), INTERVAL %s DAY) GROUP BY CHARACTER_ID,MESSAGE) s2
ON s1.CHARACTER_ID=s2.CHARACTER_ID
AND s1.c < s2.c
WHERE s2.c IS NULL
GROUP BY CHARACTER_ID
ORDER BY occurrences DESC''', [days, days])
I reproduced the same problem using Python 2.7.5, Django 1.5.1 and Mysql 5.5.
I've saved the result of the raw call to the results variable, so I can check what columns it contains:
>>> results.columns
['ID', 'CHARACTER_ID', 'MESSAGE', 'occurrences']
ID is in uppercase, so in your query I changed s1.ID to s1.id and it works:
>>> results = Message.objects.raw('''
... SELECT s1.id, s1.CHARACTER_ID, MAX(s1.MESSAGE) MESSAGE, MAX(s1.c) occurrences
... FROM
... (SELECT ID, CHARACTER_ID, MESSAGE, COUNT(*) c
... FROM tbl_message WHERE ts > DATE_SUB(NOW(), INTERVAL %s DAY) GROUP BY CHARACTER_ID,MESSAGE) s1
... LEFT JOIN
... (SELECT ID, CHARACTER_ID, MESSAGE, COUNT(*) c
... FROM tbl_message WHERE ts > DATE_SUB(NOW(), INTERVAL %s DAY) GROUP BY CHARACTER_ID,MESSAGE) s2
... ON s1.CHARACTER_ID=s2.CHARACTER_ID
... AND s1.c < s2.c
... WHERE s2.c IS NULL
... GROUP BY CHARACTER_ID
... ORDER BY occurrences DESC''', [days, days])
>>> results.columns
['id', 'CHARACTER_ID', 'MESSAGE', 'occurrences']
>>> results[0]
<Message_Deferred_character_id_location_id_message_ts: Character object: hello...>
Make Sure the primary key is part of the select statement.
Example:
This will not work:
`Model.objects.raw("Select Min(id), rider_id from Table_Name group by rider_id")`
But this will work:
`Model.objects.raw("Select id, Min(id), rider_id from Table_Name group by rider_id")`
For those also stuck with this problem, perhaps like me, wondering why Django needs a pk, when you don’t have a pk for the query (eg you want multiple rows) – Django just needs an id field returned, the pk does not need to be part of a where clause. ie:
select * from table where foo = 'bar';
or
select id, description from table where foo = 'bar';
Both of these work, if there is a field id in the table. But this throws the error described by Thomas Schwärzl, because no id field is returned:
select description from table where foo = 'bar';

Select the record of a given id with the highest timestamp using DAO

How do I do the following using DAO on a recordset
SELECT TOP 1 * FROM foo WHERE id = 10 ORDER BY timestamp DESC
Using SetCurrentIndex you can only use one index it seems otherwise using id and timestamp and selecting the first one would work.
I am by no means sure of what you want.
Dim rs As DAO.Recordset
Dim db As Database
Set db = CurrentDB
sSQL = "SELECT TOP 1 * FROM foo WHERE id = 10 ORDER BY timestamp DESC"
Set rs = db.OpenRecordset(sSQL)
Find does not work with all recordsets. This will work:
Set rs = CurrentDb.OpenRecordset("select * from table1")
rs.FindFirst "akey=1 and atext='b'"
If Not rs.EOF Then Debug.Print rs!AKey
This will not:
Set rs = CurrentDb.OpenRecordset("table1")
rs.FindFirst "akey=1 and atext='b'"