Datastore ListProperty query inconsistency - google-cloud-platform

I'm trying to run a query on a ListProperty field named "numericRange". There is a row that has value ["3","5"] for this field. I want to verify that value "4" belongs to this range.
If I run the next query on GQL console, datastore returns results (because the first value "3", matches):
select * from example where numericRange<=4
If I run the next query, also datastore returns results (because the second value "5", matches):
select * from example where numericRange>=4
However If I run the next query, datastore doesn't return results:
select * from example where numericRange<=4 and numericRange>=4
Why does it work on the first and second queries, but not on the third query?
Thank you in advance.

Cloud Datastore flattens your list for the indexes. So your query numericRange<=4 and numericRange>=4 is checking the index to see if (3<=4 and 3>=4), and if (5<=4 and 5>=4). As you can see, with a flattened values in the index your 3rd query will only return results when numericRange has a value in the list of exactly 4.

Related

Get latest 3 entries from DynamoDb

I have a dynamo-db table with following schema
{
"id": String [hash key]
"type": String [range key]
}
I have a usecase where I need to fetch last 3 rows for a given id when type is unknown.
Your items need a timestamp attribute. Without that they can’t be sorted out filtered by time. Once you have that, you can define a local secondary index with the id as partition key and the timestamp as the sort key. You can then get the top three items from the index.
Find more information about DynamoDb’s Local Secondary Index here.
Add a field to store the timestamp to the schema
Use query to fetch all the records for the given key
Query always returns records sorted by range key, you cannot set a sort order (without changing table's schema), so, sort the records by timestamp in your code
Get top 3 records
If you have a lot of records, use filter expressions to drop extra results. E.g. if you know that latest records will always have a timestamp not older than a hour (day, week or so) you could filter older records.

Python Cx_Oracle; How Can I Execute a SQL Insert using a list as a parameter

I generate a list of ID numbers. I want to execute an insert statement that grabs all records from one table where the ID value is in my list and insert those records into another table.
Instead of running through multiple execute statements (as I know is possible), I found this cx_Oracle function, that supposedly can execute everything with a single statement and list parameter. (It also avoids the clunky formatting of the SQL statement before passing in the parameters) But I think I need to alter my list before passing it in as a parameter. Just not sure how.
I referenced this web page:
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-executemany.html
ids = getIDs()
print(ids)
[('12345',),('24567',),('78945',),('65423',)]
sql = """insert into scheme.newtable
select id, data1, data2, data3
from scheme.oldtable
where id in (%s)"""
cursor.prepare(sql)
cursor.executemany(None, ids)
I expected the SQL statement to execute as follows:
Insert into scheme.newtable
select id, data1, data2, data3 from scheme.oldtable where id in ('12345','24567','78945','65423')
Instead I get the following error:
ORA-01036: illegal variable name/number
Edit:
I found this StackOverflow: How can I do a batch insert into an Oracle database using Python?
I updated my code to prepare the statement before hand and updated the list items to tuples and I'm still getting the same error.
You use executemany() for batch DML, e.g. when you want to insert a large number of values into a table as an efficient equivalent of running multiple insert statements. There are cx_Oracle examples discussed in https://blogs.oracle.com/opal/efficient-and-scalable-batch-statement-execution-in-python-cx_oracle
However what you are doing with
insert into scheme.newtable
select id, data1, data2, data3
from scheme.oldtable
where id in (%s)
is a different thing - you are trying to execute one INSERT statement using multiple values in an IN clause. You would use a normal execute() for this.
Since Oracle keeps bind data distinct from SQL, you can't pass in multiple values to a single bind parameter because the data is treated as a single SQL entity, not a list of values. You could use %s string substitution syntax you have, but this is open to SQL Injection attacks.
There are various generic techniques that are common to Oracle language interfaces, see https://oracle.github.io/node-oracledb/doc/api.html#sqlwherein for solutions that you can rewrite to Python syntax.
using temporary table to save ids (batch insert)
cursor.prepare('insert into temp_table values (:1)')
dictList = [{'1': x} for x in ids]
cursor.executemany(None, dictList)
then insert selected value into newtable
sql="insert into scheme.newtable (selectid, data1, data2, data3 from scheme.oldtable inner join temp_table on scheme.oldtable.id = temp_table.id)"
cursor.execut(sql,connection)
the script of create temporary table in oracle
CREATE GLOBAL TEMPORARY TABLE temp_table
(
ID number
);
commit
I hope this useful.

DynamoDB QuerySpec {MaxResultSize + filter expression}

From the DynamoDB documentation
The Query operation allows you to limit the number of items that it
returns in the result. To do this, set the Limit parameter to the
maximum number of items that you want.
For example, suppose you Query a table, with a Limit value of 6, and
without a filter expression. The Query result will contain the first
six items from the table that match the key condition expression from
the request.
Now suppose you add a filter expression to the Query. In this case,
DynamoDB will apply the filter expression to the six items that were
returned, discarding those that do not match. The final Query result
will contain 6 items or fewer, depending on the number of items that
were filtered.
Looks like the following query should return (at least sometimes) 0 records.
In summary, I have a UserLogins table. A simplified version is:
1. UserId - HashKey
2. DeviceId - RangeKey
3. ActiveLogin - Boolean
4. TimeToLive - ...
Now, let's say UserId = X has 10,000 inactive logins in different DeviceIds and 1 active login.
However, when I run this query against my DynamoDB table:
QuerySpec{
hashKey: null,
rangeKeyCondition: null,
queryFilters: null,
nameMap: {"#0" -> "UserId"}, {"#1" -> "ActiveLogin"}
valueMap: {":0" -> "X"}, {":1" -> "true"}
exclusiveStartKey: null,
maxPageSize: null,
maxResultSize: 10,
req: {TableName: UserLogins,ConsistentRead: true,ReturnConsumedCapacity: TOTAL,FilterExpression: #1 = :1,KeyConditionExpression: #0 = :0,ExpressionAttributeNames: {#0=UserId, #1=ActiveLogin},ExpressionAttributeValues: {:0={S: X,}, :1={BOOL: true}}}
I always get 1 row. The 1 active login for UserId=X. And it's not happening just for 1 user, it's happening for multiple users in a similar situation.
Are my results contradicting the DynamoDB documentation?
It looks like a contradiction because if maxResultSize=10, means that DynamoDB will only read the first 10 items (out of 10,001) and then it will apply the filter active=true only (which might return 0 results). It seems very unlikely that the record with active=true happened to be in the first 10 records that DynamoDB read.
This is happening to hundreds of customers that are running similar queries. It works great, when according to the documentation it shouldn't be working.
I can't see any obvious problem with the Query. Are you sure about your premise that users have 10,000 items each?
Your keys are UserId and DeviceId. That seems to mean that if your user logs in with the same device it would overwrite the existing item. Or put another way, I think you are saying your users having 10,000 different devices each (unless the DeviceId rotates in some way).
In your shoes I would just remove the filterexpression and print the results to the log to see what you're getting in your 10 results. Then remove the limit too and see what results you get with that.

How to query sharepoint search api where column value contains space

I am trying to run a SharePoint api query to match against a column with a specific value.
The column value contains a space which is resulting in the query not working as expected. Returning anything with 'value' in the column rather than just items where the column = 'value 2'.
My current url looks like where $listId is a list guid
https://mysite.sharepoint.com/_api/search/query?querytext='(customColumn:value 2)+AND+(ListID:$listId)'&selectproperties='Name,Title,Description,Author,LastModifiedTime,Path'
What is the syntax for
(customColumn:value 2)
That allows me to only return results where customColumn = "value 2"?
Try to encode the endpoint as
_api/search/query?querytext='customColumn:value%202'
My test Sample:
/_api/search/query?querytext='Title:Developer%20-%20Wiki1 Author:Lee'

How do I change the individual value to be summed during annotation in Django?

I am front end developer new to django. There is a certain column(server_reach) in our postgres DB which has values of (1,2). But I need to write a query which tells me if at least one of the filtered rows has a row with reachable values( 1= not reachable, 2 = reachable).
I was initially told that the values of the column would be (0,1) based on which I wrote this:
ServerAgent.objects.values('server').filter(
app_uuid_url=app.uuid_url,
trash=False
).annotate(serverreach=Sum('server_reach'))
The logic is simple that I fetch all the filtered rows and annotate them with the sum of the server_reaches. If this is more than zero then at least one entry is non-zero.
But the issue is that the actual DB has values (1,2). And this logic will not work anymore. I want to subtract the server_reach of each row by '1' before summing. I have tried F expressions as below
ServerAgent.objects.values('server').filter(
app_uuid_url=app.uuid_url,
trash=False
).annotate(serverreach=Sum(F('server_reach')-1))
But it throws the following error. Please help me getting this to work.
AttributeError: 'ExpressionNode' object has no attribute 'split'
Use Avg instead of Sum. If average value is greater than 1 then at least one row contains value of 2.