So I have wrote my own lookups for PostgreSQL INET field in my DRF App as Django does not natively support those PostgreSQL specific fields.
I use my own IP field in my objects to support my own lookups.
All my lookup work find, except for those two:
class HostGreaterOrEqual(Lookup):
""" Lookup to check if an IP address is greater than or equal. Used for net ranges"""
lookup_name = "host_greater_or_equal"
def as_sql(self, qn, connection):
lhs, lhs_params = self.process_lhs(qn, connection)
rhs, rhs_params = self.process_rhs(qn, connection)
params = lhs_params + rhs_params
return "%s >= %s" % (lhs, rhs), params
class HostLessOrEqual(Lookup):
""" Lookup to check if an IP address is less than or equal. Used for net ranges"""
lookup_name = "host_less_or_equal"
def as_sql(self, qn, connection):
lhs, lhs_params = self.process_lhs(qn, connection)
rhs, rhs_params = self.process_rhs(qn, connection)
params = lhs_params + rhs_params
return "%s <= %s" % (lhs, rhs), params
In my models I have a address and address_end field to store IP Ranges. I wanted to use those lookups above to make the proper queries like this:
items = queryset.filter(
Q(address__host_greater_or_equal=value)
& Q(address_end__host_less_or_equal=value)
)
In the documentation for PostgreSQL Version 9 I can see the operators for "is lass or equal" and "is greater or equal".
But if I check the documentation for Version 15 it seems like those have been removed.
Was that functionality simply removed or is there something similar that I have not found yet?
From the v13.0 release notes:
Reformat tables containing function and operator information for better clarity (Tom Lane)
So they aren't in their usual place in the documentation since then, but they're still available.
select inet '192.168.1.5' < inet '192.168.1.6' as "less than",
inet '192.168.1.5' > inet '192.168.1.6' as "greater than",
inet '192.168.1.5' <= inet '192.168.1.6' as "less than or equal",
inet '192.168.1.5' >= inet '192.168.1.6' as "greater than or equal";
-- less than | greater than | less than or equal | greater than or equal
-------------+--------------+--------------------+-----------------------
-- t | f | t | f
show server_version;
-- server_version
-------------------------------------
-- 15rc2 (Debian 15~rc2-1.pgdg110+1)
Which you can test on different versions here: online demo.
You can always check the system tables to make sure what's really available
select *
from pg_catalog.pg_operator
where oprname in ('<=','>=')
and oprcode::text ilike '%network%';
-- oid | oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprcode | oprrest | oprjoin
--------+---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+------------+-------------+-----------------
-- 1204 | <= | 11 | 10 | b | f | f | 869 | 869 | 16 | 1206 | 1205 | network_le | scalarlesel | scalarlejoinsel
-- 1206 | >= | 11 | 10 | b | f | f | 869 | 869 | 16 | 1204 | 1203 | network_ge | scalargesel | scalargejoinsel
Or look them up in psql with \do+
\do+ <= inet inet
-- List of operators
-- Schema | Name | Left arg type | Right arg type | Result type | Function | Description
--------------+------+---------------+----------------+-------------+------------+--------------------
-- pg_catalog | <= | inet | inet | boolean | network_le | less than or equal
\do+ >= inet inet
-- List of operators
-- Schema | Name | Left arg type | Right arg type | Result type | Function | Description
--------------+------+---------------+----------------+-------------+------------+-----------------------
-- pg_catalog | >= | inet | inet | boolean | network_ge | greater than or equal
psql -E shows \do+ finds it using the following query, that you can issue in a different client.
SELECT n.nspname as "Schema",
o.oprname AS "Name",
CASE WHEN o.oprkind='l' THEN NULL ELSE pg_catalog.format_type(o.oprleft, NULL) END AS "Left arg type",
CASE WHEN o.oprkind='r' THEN NULL ELSE pg_catalog.format_type(o.oprright, NULL) END AS "Right arg type",
pg_catalog.format_type(o.oprresult, NULL) AS "Result type",
o.oprcode AS "Function",
coalesce(pg_catalog.obj_description(o.oid, 'pg_operator'),
pg_catalog.obj_description(o.oprcode, 'pg_proc')) AS "Description"
FROM pg_catalog.pg_operator o
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = o.oprnamespace
LEFT JOIN pg_catalog.pg_type t0 ON t0.oid = o.oprleft
LEFT JOIN pg_catalog.pg_namespace nt0 ON nt0.oid = t0.typnamespace
LEFT JOIN pg_catalog.pg_type t1 ON t1.oid = o.oprright
LEFT JOIN pg_catalog.pg_namespace nt1 ON nt1.oid = t1.typnamespace
WHERE o.oprname OPERATOR(pg_catalog.~) '^(>=)$' COLLATE pg_catalog.default
AND pg_catalog.pg_operator_is_visible(o.oid)
AND (t0.typname OPERATOR(pg_catalog.~) '^(inet)$' COLLATE pg_catalog.default
OR pg_catalog.format_type(t0.oid, NULL) OPERATOR(pg_catalog.~) '^(inet)$' COLLATE pg_catalog.default)
AND pg_catalog.pg_type_is_visible(t0.oid)
AND (t1.typname OPERATOR(pg_catalog.~) '^(inet)$' COLLATE pg_catalog.default
OR pg_catalog.format_type(t1.oid, NULL) OPERATOR(pg_catalog.~) '^(inet)$' COLLATE pg_catalog.default)
AND pg_catalog.pg_type_is_visible(t1.oid)
ORDER BY 1, 2, 3, 4;
Related
I have an import query (table a) and an imported Excel file (table b) with records I am trying to match it up with.
I am looking for a method to replicate this type of SQL in M:
SELECT a.loc_id, a.other_data, b.stk
FROM a INNER JOIN b on a.loc_id BETWEEN b.from_loc AND b.to_loc
Table A
| loc_id | other data |
-------------------------
| 34A032B1 | ... |
| 34A3Z011 | ... |
| 3DD23A41 | ... |
Table B
| stk | from_loc | to_loc |
--------------------------------
| STKA01 | 34A01 | 34A30ZZZ |
| STKA02 | 34A31 | 34A50ZZZ |
| ... | ... | ... |
Goal
| loc_id | other data | stk |
----------------------------------
| 34A032B1 | ... | STKA01 |
| 34A3Z011 | ... | STKA02 |
| 3DD23A41 | ... | STKD01 |
All of the other queries I can find along these lines use numbers, dates, or times in the BETWEEN clause, and seem to work by exploding the (from, to) range into all possible values and then filtering out the extra rows. However I need to use string comparisons, and exploding those into all possible values would be unfeasable.
Between all the various solutions I could find, the closest I've come is to add a custom column on table a:
Table.SelectRows(
table_b,
(a) => Value.Compare([loc_id], table_b[from_loc]) = 1
and Value.Compare([loc_id], table_b[to_loc]) = -1
)
This does return all the columns from table_b, however, when expanding the column, the values are all null.
This is not very specific "After 34A01 could be any string..." in trying to figure out how your series progresses.
But maybe you can just test for how a value "sorts" using the native sorting function in PQ.
add custom column with table.Select Rows:
= try Table.SelectRows(TableB, (t)=> t[from_loc]<=[loc_id] and t[to_loc] >= [loc_id])[stk]{0} otherwise null
To reproduce with your examples:
let
TableB=Table.FromColumns(
{{"STKA01","STKA02"},
{"34A01","34A31"},
{"34A30ZZZ","34A50ZZZ"}},
type table[stk=text,from_loc=text,to_loc=text]),
TableA=Table.FromColumns(
{{"34A032B1","34A3Z011","3DD23A41"},
{"...","...","..."}},
type table[loc_id=text, other data=text]),
//determine where it sorts and return the stk
#"Added Custom" = Table.AddColumn(#"TableA", "stk", each
try Table.SelectRows(TableB, (t)=> t[from_loc]<=[loc_id] and t[to_loc] >= [loc_id])[stk]{0} otherwise null)
in
#"Added Custom"
Note: if the above algorithm is too slow, there may be faster methods of obtaining these results
I am trying to use Django StrIndex to find all rows with the value a substring of a given string.
Eg:
my table contains:
+----------+------------------+
| user | domain |
+----------+------------------+
| spam1 | spam.com |
| badguy+ | |
| | protonmail.com |
| spammer | |
| | spamdomain.co.uk |
+----------+------------------+
but the query
SpamWord.objects.annotate(idx=StrIndex(models.Value('xxxx'), 'user')).filter(models.Q(idx__gt=0) | models.Q(domain='spamdomain.co.uk')).first()
matches <SpamWord: *#protonmail.com>
The query it is SELECT `spamwords`.`id`, `spamwords`.`user`, `spamwords`.`domain`, INSTR('xxxx', `spamwords`.`user`) AS `idx` FROM `spamwords` WHERE (INSTR('xxxx', `spamwords`.`user`) > 0 OR `spamwords`.`domain` = 'spamdomain.co.uk')
It should be <SpamWord: *#spamdomain.co.uk>
this is happening because
INSTR('xxxx', '') => 1
(and also INSTR('xxxxasd', 'xxxx') => 1, which it is correct)
How can I write this query in order to get entry #5 (spamdomain.co.uk)?
The order of the parameters of StrIndex [Django-doc] is swapped. The first parameter is the haystack, the string in which you search, and the second one is the needle, the substring you are looking for.
You thus can annotate with:
from django.db.models import Q, Value
SpamWord.objects.annotate(
idx=StrIndex('user', Value('xxxx'))
).filter(
Q(idx__gt=0) | Q(domain='spamdomain.co.uk')
).first()
Just filter rows where user is empty:
(~models.Q(user='') & models.Q(idx__gt=0)) | models.Q(domain='spamdomain.co.uk')
Not sure why I'm having a difficult time with this, it seems so simple considering it's fairly easy to do in R or pandas. I wanted to avoid using pandas though since I'm dealing with a lot of data, and I believe toPandas() loads all the data into the driver’s memory in pyspark.
I have 2 dataframes: df1 and df2. I want to filter df1 (remove all rows) where df1.userid = df2.userid AND df1.group = df2.group. I wasn't sure if I should use filter(), join(), or sql For example:
df1:
+------+----------+--------------------+
|userid| group | all_picks |
+------+----------+--------------------+
| 348| 2|[225, 2235, 2225] |
| 567| 1|[1110, 1150] |
| 595| 1|[1150, 1150, 1150] |
| 580| 2|[2240, 2225] |
| 448| 1|[1130] |
+------+----------+--------------------+
df2:
+------+----------+---------+
|userid| group | pick |
+------+----------+---------+
| 348| 2| 2270|
| 595| 1| 2125|
+------+----------+---------+
Result I want:
+------+----------+--------------------+
|userid| group | all_picks |
+------+----------+--------------------+
| 567| 1|[1110, 1150] |
| 580| 2|[2240, 2225] |
| 448| 1|[1130] |
+------+----------+--------------------+
EDIT:
I've tried many join() and filter() functions, I believe the closest I got was:
cond = [df1.userid == df2.userid, df2.group == df2.group]
df1.join(df2, cond, 'left_outer').select(df1.userid, df1.group, df1.all_picks) # Result has 7 rows
I tried a bunch of different join types, and I also tried different
cond values:
cond = ((df1.userid == df2.userid) & (df2.group == df2.group)) # result has 7 rows
cond = ((df1.userid != df2.userid) & (df2.group != df2.group)) # result has 2 rows
However, it seems like the joins are adding additional rows, rather than deleting.
I'm using python 2.7 and spark 2.1.0
Left anti join is what you're looking for:
df1.join(df2, ["userid", "group"], "leftanti")
but the same thing can be done with left outer join:
(df1
.join(df2, ["userid", "group"], "leftouter")
.where(df2["pick"].isNull())
.drop(df2["pick"]))
I have this table
With SQL query i can get aggregated information about total "amount" for avery car, and amount of "checked"|"uncheked" rows and sign of finished checking for all rows for one car:
SELECT
car_id
, SUM(amount) as total_amount
, Sum(IF(checked=1,1,0)) as already_checked
, Sum(IF(checked=0,1,0)) as not_cjecked
, IF(Sum(IF(checked=0,1,0))=0,1,0) as check_finished
FROM
refuels_flow
GROUP BY car_id
Result:
+--------+--------------+-----------------+-------------+----------------+
| car_id | total_amount | already_checked | not_cjecked | check_finished |
+--------+--------------+-----------------+-------------+----------------+
| 1 | 1300 | 1 | 12 | 0 |
| 2 | 300 | 3 | 0 | 1 |
+--------+--------------+-----------------+-------------+----------------+
The question is - how i can do this with Django ORM (without use of raw query)?
To obtains the same SQL output, you may use the following queryset:
already_checked = Sum(Func('checked', function='IF', template='%(function)s(%(expressions)s=0, 0, 1)'))
not_checked = Sum(Func('checked', function='IF', template='%(function)s(%(expressions)s=0, 1, 0)'))
check_finished = Func(
not_checked,
function='IF', template='%(function)s(%(expressions)s=0, 1, 0)'
)
Refuels.objects.values('car_id').annotate(
total_amount=Sum('amount'),
already_checked=already_checked,
not_checked=not_checked,
check_finished=check_finished
)
Check the doc on expressions for more informations.
Now, already_checked could be simplified with:
already_checked = Sum('checked')
And instead of having the not_checked and check_finished annotations, you could annotate the count and easily compute them in Python, for example:
qs = Refuels.objects.values('car_id').annotate(
count_for_car=Count('car_id'),
total_amount=Sum('amount'),
already_checked=Sum('checked'),
)
for entry in qs:
not_checked = entry['count_for_car'] - entry['already_checked']
check_finished = not_checked == 0
I have a DataFrame inside of a function:
using DataFrames
myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
myservs
5x2 DataFrame
| Row | serverName | ipAddress |
|-----|------------|---------------|
| 1 | "elmo" | "12.345.6.7" |
| 2 | "bigBird" | "12.345.6.8" |
| 3 | "Oscar" | "12.345.6.9" |
| 4 | "gRover" | "12.345.6.10" |
| 5 | "BERT" | "12.345.6.11" |
How can I write the function to take a single parameter called server, case-insensitive match the server parameter in the myservs[:serverName] DataArray, and return the match's corresponding ipAddress?
In R this can be done by using
myservs$ipAddress[grep("server", myservs$serverName, ignore.case = T)]
I don't want it to matter if someone uses ElMo or Elmo as the server, or if the serverName is saved as elmo or ELMO.
I referenced how to accomplish the task in R and tried to do it using the DataFrames pkg, but I only did this because I'm coming from R and am just learning Julia. I asked a lot of questions from coworkers and the following is what we came up with:
This task is much cleaner if I was to stop thinking in terms of
vectors in R. Julia runs plenty fast iterating through a loop.
Even still, looping wouldn't be the best solution here. I was told to look into
Dicts (check here for an example). Dict(), zip(), haskey(), and
get() blew my mind. These have many applications.
My solution doesn't even need to use the DataFrames pkg, but instead
uses Julia's Matrix and Array data representations. By using let
we keep the global environment clutter free and the server name/ip
list stays hidden from view to those who are only running the
function.
In the sample code, I'm recreating the server matrix every time, but in reality/practice I'll have a permission restricted delimited file that gets read every time. This is OK for now since the delimited files are small, but this may not be efficient or the best way to do it.
# ONLY ALLOW THE FUNCTION TO BE SEEN IN THE GLOBAL ENVIRONMENT
let global myIP
# SERVER MATRIX
myservers = ["elmo" "12.345.6.7"; "bigBird" "12.345.6.8";
"Oscar" "12.345.6.9"; "gRover" "12.345.6.10";
"BERT" "12.345.6.11"]
# SERVER DICT
servDict = Dict(zip(pmap(lowercase, myservers[:, 1]), myservers[:, 2]))
# GET SERVER IP FUNCTION: INPUT = SERVER NAME; OUTPUT = IP ADDRESS
function myIP(servername)
sn = lowercase(servername)
get(servDict, sn, "That name isn't in the server list.")
end
end
# Test it out
myIP("SLIMEY")
#>"That name isn't in the server list."
myIP("elMo")
#>"12.345.6.7"
Here's one way:
julia> using DataFrames
julia> myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
5x2 DataFrames.DataFrame
| Row | serverName | ipAddress |
|-----|------------|---------------|
| 1 | "elmo" | "12.345.6.7" |
| 2 | "bigBird" | "12.345.6.8" |
| 3 | "Oscar" | "12.345.6.9" |
| 4 | "gRover" | "12.345.6.10" |
| 5 | "BERT" | "12.345.6.11" |
julia> grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "") = Bool[isna(d) ? false : ismatch(Regex(pat, opts), d) for d in dat]
grep (generic function with 2 methods)
julia> myservs[:ipAddress][grep("bigbird", myservs[:serverName], "i")]
1-element DataArrays.DataArray{ASCIIString,1}:
"12.345.6.8"
EDIT
This grep works faster on my platform.
julia> function grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "")
myreg = Regex(pat, opts)
return convert(Array{Bool}, map(d -> isna(d) ? false : ismatch(myreg, d), dat))
end