Airflow bigquery_to_gcs operator changing field_delimiter - google-cloud-platform

I am trying to use Airflow operator BigQueryToGCSOperator & forcing field_delimiter to be pipe (|) , however output of the file is always coming comma (,) delimited.
I have also tried operator BigQueryToCloudStorageOperator which has same behavior.
Any idea what wrong I am doing here ?
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import (
BigQueryToGCSOperator,
data_to_gcs = BigQueryToGCSOperator(
task_id="BigQuery_to_GoogleCloudBucket",
gcp_conn_id="google_cloud_default",
project_id=project_id,
source_project_dataset_table=f"{project_id}.{temp_dataset_id}.{temp_table}",
location="EU",
print_header=True,
destination_cloud_storage_uris=destination_uri,
export_format="csv",
field_delimiter="|",
)
Thanks in advance for your reply.

Normally if you set the export_format field with CSV (uppercase instead of lower case) and field_delimiter it should work :
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import (
BigQueryToGCSOperator,
data_to_gcs = BigQueryToGCSOperator(
task_id="BigQuery_to_GoogleCloudBucket",
gcp_conn_id="google_cloud_default",
project_id=project_id,
source_project_dataset_table=f"{project_id}.{temp_dataset_id}.{temp_table}",
location="EU",
print_header=True,
destination_cloud_storage_uris=destination_uri,
export_format="CSV",
field_delimiter="|",
)
I saw this code snippet in the Airflow code, and I think it can cause an issue if you set the export_format with csv as lowercase value :
if self.export_format == 'CSV':
# Only set fieldDelimiter and printHeader fields if using CSV.
# Google does not like it if you set these fields for other export
# formats.
configuration['extract']['fieldDelimiter'] = self.field_delimiter
configuration['extract']['printHeader'] = self.print_header
In your case, this code snippet is not invoked and the operator take the default value for field_delimiter which is ,
Here you can see the default values used in the constructor of this operator in the Airflow code :
def __init__(
self,
*,
source_project_dataset_table: str,
destination_cloud_storage_uris: List[str],
compression: str = 'NONE',
export_format: str = 'CSV',
field_delimiter: str = ',',
print_header: bool = True,
gcp_conn_id: str = 'google_cloud_default',
bigquery_conn_id: Optional[str] = None,
delegate_to: Optional[str] = None,
labels: Optional[Dict] = None,
location: Optional[str] = None,
impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
**kwargs,
)

Related

Problem when updating a table using celery task: OperationalError

EDIT 2022-10-04 18:40
I've tried using bulk_update and bulk_create as these method only query database once but still have the same issue
would appreciate any help/explanation on this issue
'''
Task to edit data correction forms (DCF) online
'''
#shared_task(bind=True)
def DCF_edition(self):
print(timezone.now())
DCF_BEFORE_UPDATE = pd.DataFrame.from_records(DataCorrectionForm.objects.all().values())
if not DCF_BEFORE_UPDATE.empty :
DCF_BEFORE_UPDATE = DCF_BEFORE_UPDATE.rename(columns={"patient": "pat"})
DCF_BEFORE_UPDATE = DCF_BEFORE_UPDATE.astype({'record_date': str,'created_date': str})
DCF_BEFORE_UPDATE['dcf_status'] = DCF_BEFORE_UPDATE.apply(lambda status: 0, axis=1)
# list of dataframe to concat
data = []
# load queries definition
queries = queries_definitions()
# print(queries)
if not queries.empty:
for index, row in queries.iterrows():
print('Query ide',row['ide'])
# print(row['ide'],row['query_type'],row['crf_name'].lower(),row['crf identification date'],row['variable_name'],row['variable_label'],row['query_condition'],row['fields_to_display'])
try:
missing_or_inconsistent = missing_or_inconsistent_data(row['ide'],row['query_type'],row['crf_name'].lower(),row['crf identification date'],row['variable_name'],row['variable_label'],row['query_condition'],row['fields_to_display']) #.iloc[:10] #to limit rows
missing_or_inconsistent.columns.values[2] = 'record_date' # rename the date column (that have database name)
missing_or_inconsistent['dcf_ide'] = str(row['ide']) + '_' + row['variable_name'] + '_' + missing_or_inconsistent[row['crf primary key']].astype(str)
missing_or_inconsistent['category'] = row['query_type']
missing_or_inconsistent['crf'] = row['crf_name']
missing_or_inconsistent['crf_ide'] = missing_or_inconsistent[row['crf primary key']]
missing_or_inconsistent['field_name'] = row['variable_name']
missing_or_inconsistent['field_label'] = row['variable_label']
missing_or_inconsistent['field_value'] = missing_or_inconsistent[row['variable_name']]
missing_or_inconsistent['message'] = row['query_message']
missing_or_inconsistent['query_id'] = 'Query ide ' + str(row['ide'])
missing_or_inconsistent['dcf_status'] = 1
missing_or_inconsistent['created_date'] = timezone.now()
missing_or_inconsistent['deactivated'] = False
missing_or_inconsistent['comments'] = None
data.append(missing_or_inconsistent[['ide','dcf_ide','category','crf','crf_ide','pat','record_date','field_name','field_label','message','field_value','dcf_status','created_date','query_id','deactivated','comments']])
dcf = pd.concat(data)
except Exception as e:
Log.objects.create(dcf_edition_status=0,dcf_edition_exception=str(e)[:200])
continue
DCF_AFTER_UPDATE = pd.concat([DCF_BEFORE_UPDATE,dcf])
DCF_AFTER_UPDATE['duplicate'] = DCF_AFTER_UPDATE.duplicated(subset=['dcf_ide'],keep='last')
DCF_AFTER_UPDATE['dcf_status'] = DCF_AFTER_UPDATE.apply(lambda row: 2 if row['duplicate'] else row['dcf_status'],axis=1)
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.drop_duplicates(subset=['dcf_ide'],keep='first').drop(columns=['duplicate'])
DCF_AFTER_UPDATE.rename(columns = {'pat':'patient',}, inplace = True)
# Cast date into string format to be able to dumps data
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.astype({'record_date': str}) if not DCF_AFTER_UPDATE.empty else DCF_AFTER_UPDATE
records_to_update = [
DataCorrectionForm(
ide=record['ide'],
dcf_ide=record['dcf_ide'],
category=record['category'],
crf=record['crf'],
crf_ide=record['crf_ide'],
patient=record['patient'],
record_date=record['record_date'],
field_name=record['field_name'],
field_label=record['field_label'],
message=record['message'],
field_value=record['field_value'],
dcf_status=record['dcf_status'],
created_date=record['created_date'],
query_id=record['query_id'],
deactivated=record['deactivated'],
comments=record['comments']
) for i, record in DCF_AFTER_UPDATE[(DCF_AFTER_UPDATE['dcf_status'] != 1)].iterrows()
]
if records_to_update:
DataCorrectionForm.objects.bulk_update(records_to_update,['dcf_status'])
records_to_create = [
DataCorrectionForm(
dcf_ide=record['dcf_ide'],
category=record['category'],
crf=record['crf'],
crf_ide=record['crf_ide'],
patient=record['patient'],
record_date=record['record_date'],
field_name=record['field_name'],
field_label=record['field_label'],
message=record['message'],
field_value=record['field_value'],
dcf_status=record['dcf_status'],
created_date=record['created_date'],
query_id=record['query_id'],
deactivated=record['deactivated'],
comments=record['comments']
) for i, record in DCF_AFTER_UPDATE[(DCF_AFTER_UPDATE['dcf_status'] == 1)].iterrows()
]
if records_to_create:
DataCorrectionForm.objects.bulk_create(records_to_create)
EDIT 2022-10-04 13:40
I've tried to "optimized" code using update_or_create() method but doesn't change anything
I still have an OperationalError with the line DataCorrectionForm.objects.update_or_create(...)
How can I update my database?
'''
Task to edit data correction forms (DCF) online
'''
#shared_task(bind=True)
def DCF_edition(self):
DCF_BEFORE_UPDATE = pd.DataFrame.from_records(DataCorrectionForm.objects.all().values())
if not DCF_BEFORE_UPDATE.empty :
DCF_BEFORE_UPDATE.drop(columns=['ide'])
DCF_BEFORE_UPDATE = DCF_BEFORE_UPDATE.rename(columns={"patient": "pat"})
DCF_BEFORE_UPDATE = DCF_BEFORE_UPDATE.astype({'record_date': str,'created_date': str})
DCF_BEFORE_UPDATE['dcf_status'] = DCF_BEFORE_UPDATE.apply(lambda status: 0, axis=1)
# list of dataframe to concat
data = []
# load queries definition
queries = queries_definitions()
if not queries.empty:
for index, row in queries.iterrows()
try:
missing_or_inconsistent = missing_or_inconsistent_data(row['ide'],row['query_type'],row['crf_name'].lower(),row['crf identification date'],row['variable_name'],row['variable_label'],row['query_condition'],row['fields_to_display']) #.iloc[:10] #to limit rows
missing_or_inconsistent.columns.values[2] = 'record_date' # rename the date column (that have database name)
missing_or_inconsistent['dcf_ide'] = str(row['ide']) + '_' + row['variable_name'] + '_' + missing_or_inconsistent[row['crf primary key']].astype(str)
missing_or_inconsistent['category'] = row['query_type']
missing_or_inconsistent['crf'] = row['crf_name']
missing_or_inconsistent['crf_ide'] = missing_or_inconsistent[row['crf primary key']]
missing_or_inconsistent['field_name'] = row['variable_name']
missing_or_inconsistent['field_label'] = row['variable_label']
missing_or_inconsistent['field_value'] = missing_or_inconsistent[row['variable_name']]
missing_or_inconsistent['message'] = row['query_message']
missing_or_inconsistent['DEF'] = 'Query ide ' + str(row['ide'])
missing_or_inconsistent['dcf_status'] = 1
missing_or_inconsistent['created_date'] = timezone.now()
missing_or_inconsistent['deactivated'] = False
missing_or_inconsistent['comments'] = None
data.append(missing_or_inconsistent[['dcf_ide','category','crf','crf_ide','pat','record_date','field_name','field_label','message','field_value','dcf_status','created_date','DEF','deactivated','comments']])
dcf = pd.concat(data)
except Exception as e:
Log.objects.create(dcf_edition_status=0,dcf_edition_exception=str(e)[:200])
continue
DCF_AFTER_UPDATE = pd.concat([DCF_BEFORE_UPDATE,dcf])
DCF_AFTER_UPDATE['duplicate'] = DCF_AFTER_UPDATE.duplicated(subset=['dcf_ide'],keep='last')
DCF_AFTER_UPDATE['dcf_status'] = DCF_AFTER_UPDATE.apply(lambda row: 2 if row['duplicate'] else row['dcf_status'],axis=1)
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.drop_duplicates(subset=['dcf_ide'],keep='first').drop(columns=['duplicate'])
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.drop(['DEF'], axis=1)
DCF_AFTER_UPDATE.rename(columns = {'pat':'patient',}, inplace = True)
# Cast date into string format to be able to dumps data
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.astype({'record_date': str}) if not DCF_AFTER_UPDATE.empty else DCF_AFTER_UPDATE
records = DCF_AFTER_UPDATE.to_dict(orient='records')
for record in records:
DataCorrectionForm.objects.update_or_create(
dcf_ide=record['dcf_ide'], # filter to search for existing objects => should not be pass to default (if not IntegrityError)
defaults = {
'category':record['category'],
'crf':record['crf'],
'crf_ide':record['crf_ide'],
'patient':record['patient'],
'record_date':record['record_date'],
'field_name':record['field_name'],
'field_label':record['field_label'],
'message':record['message'],
'field_value':record['field_value'],
'dcf_status':record['dcf_status'],
'created_date':record['created_date'],
# 'DEF':record['DEF'],
'deactivated':record['deactivated'],
'comments':record['comments']
}
)
Log.objects.create(dcf_edition_status=1)
return True
EDIT 2022-10-03 17:00
in fact reading CAVEATS:
The development server creates a new thread for each request it
handles, negating the effect of persistent connections. Don’t enable
them during development.
EDIT 2022-10-03 16:00
Django 2.2.5
I have tried to set DATABASES parameter CONN_MAX_AGE as per Django documentation but it doesn't change anythings
Default: 0
The lifetime of a database connection, as an integer of seconds. Use 0
to close database connections at the end of each request — Django’s
historical behavior — and None for unlimited persistent connections.
I use Celery task and got an error I do not understand.
I loop over a table (that contain query definitions) to edit missing/inconsistent data in a database (using API) and registered discrepencies in another table.
If I run query one at a time, it works but when I try to loop over queries, I got an error
OperationalError('server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n')
def DCF_edition(self):
DCF_BEFORE_UPDATE = pd.DataFrame.from_records(DataCorrectionForm.objects.all().values())
DCF_BEFORE_UPDATE = DCF_BEFORE_UPDATE.astype({'record_date': str,'created_date': str}) if not DCF_BEFORE_UPDATE.empty else DCF_BEFORE_UPDATE
data = []
# load queries definition
queries = queries_definitions()
if not queries.empty:
for index, row in queries.iterrows():
try:
missing_or_inconsistent = missing_or_inconsistent_data(row['ide'],row['query_type'],row['crf_name'].lower(),row['crf identification
data.append(missing_or_inconsistent[['dcf_ide','category','crf','crf_ide','pat','record_date','field_name','field_label','message','field_value','dcf_status','DEF','deactivated']])
DCF_AFTER_UPDATE = pd.concat(data)
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.drop_duplicates(keep='last')
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.drop(['DEF'], axis=1)
DCF_AFTER_UPDATE.rename(columns = {'pat':'patient',}, inplace = True)
except Exception as e:
Log.objects.create(dcf_edition_status=0,dcf_edition_exception=str(e)[:200])
continue
# Cast date into string format to be able to dumps data
DCF_AFTER_UPDATE = DCF_AFTER_UPDATE.astype({'record_date': str}) if not DCF_AFTER_UPDATE.empty else DCF_AFTER_UPDATE
records = json.loads(json.dumps(list(DCF_AFTER_UPDATE.T.to_dict().values())))
for record in records:
if not DCF_BEFORE_UPDATE.empty and record['dcf_ide'] in DCF_BEFORE_UPDATE.values:
DataCorrectionForm.objects.filter(dcf_ide=record['dcf_ide']).update(dcf_status=2)
else:
DataCorrectionForm.objects.get_or_create(**record)
# resolved dcf => status=0
if not DCF_BEFORE_UPDATE.empty:
records = json.loads(json.dumps(list(DCF_BEFORE_UPDATE.T.to_dict().values())))
for record in records:
if record['dcf_ide'] not in DCF_AFTER_UPDATE.values:
DataCorrectionForm.objects.filter(dcf_ide=record['dcf_ide']).update(dcf_status=0)
Log.objects.create(dcf_edition_status=1)
return True
The lifetime of a database connection, as an integer of seconds. Use 0 to close database connections at the end of each request — Django’s historical behavior — and None for unlimited persistent connections.
It seems that your task is long running task and need to hold the db connection for a long period. Did you try to set it to None
DATABASES = {
'default': env.db(),
}
# https://docs.djangoproject.com/en/3.1/ref/settings/#conn-max-age
DATABASES['default']['CONN_MAX_AGE'] = None
How long does your task need to finish? It could be another problem with server database setting, ex tcp_keepalives_ilde..

Generate AWS `Logs Insights` URL with query and search creteria

I'd like to generate a URL for AWS Cloud Watch Logs Insights page where I can customize the following parameters:
The query string
The log groups I'd like to search with-in
The time range
I couldn't find any official documentation for the structure of the URL.
Also no API (at least not in boto3) would help me in this.
Here is an example:
https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logsV2:logs-insights$3FqueryDetail$3D$257E$2528end$257E$25272021-07-18T20*3a59*3a59.000Z$257Estart$257E$25272021-07-15T21*3a00*3a00.000Z$257EtimeType$257E$2527ABSOLUTE$257Etz$257E$2527Local$257EeditorString$257E$2527fields*20*40timestamp*2c*20*40message*0a*7c*20sort*20*40timestamp*20desc*0a*7c*20filter*20*40message*20*3d*7e*20*22Exception*22*0a*7c*20limit*20200$257EisLiveTail$257Efalse$257EqueryId$257E$2527####$257Esource$257E$2528$257E$2527*2faws*2flambda*2f######$2529$2529$26tab$3Dlogs
What is the encoding used to generate the URL above?
I'm thinking about simply replace the strings above with the desired params, any better way to achieve this ?
Replacing the strings in the URL isn't that direct, I did the following in Python that will generate link for the logs insights page:
def generate_cloudwatch_url(log_groups, query):
def escape(s):
for c in s:
if c.isalpha() or c.isdigit() or c in ["-", "."]:
continue
c_hex = "*{0:02x}".format(ord(c))
s = s.replace(c, c_hex)
return s
def gen_log_insights_url(params):
S1 = "$257E"
S2 = "$2528"
S3 = "$2527"
S4 = "$2529"
res = f"{S1}{S2}"
for k in params:
value = params[k]
if isinstance(value, str):
value = escape(value)
elif isinstance(value, list):
for i in range(len(value)):
value[i] = escape(value[i])
prefix = S1 if list(params.items())[0][0] != k else ""
suffix = f"{S1}{S3}"
if isinstance(value, list):
value = "".join([f"{S1}{S3}{n}" for n in value])
suffix = f"{S1}{S2}"
elif isinstance(value, int) or isinstance(value, bool):
value = str(value).lower()
suffix = S1
res += f"{prefix}{k}{suffix}{value}"
res += f"{S4}{S4}"
QUERY = f"logsV2:logs-insights$3Ftab$3Dlogs$26queryDetail$3D{res}"
return f"https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#{QUERY}"
query_vars = {"sample":"value"}
query = "\n".join([f'| filter {k}="{v}"' for (k, v) in query_vars.items()])
params = {
"end": 0, # "2021-07-18T20:00:00.000Z",
"start": -60 * 60 * 24, # "2021-07-15T21:00:00.000Z",
"unit": "seconds",
"timeType": "RELATIVE", # "ABSOLUTE", # OR RELATIVE and end = 0 and start is negative seconds
"tz": "Local", # OR "UTC"
"editorString": f"fields #timestamp, #message\n| sort #timestamp desc\n{query}\n| limit 200",
"isLiveTail": False,
"source": [f"/aws/lambda/{lg}" for lg in log_groups],
}
return gen_log_insights_url(params)
I know it's aweful implementation, but it works!

How to override Django admin `change_list.html` to provide formatting on the go

In Django admin, if I want to display a list of Iron and their respective formatted weights, I would have to do this.
class IronAdmin(admin.ModelAdmin):
model = Iron
fields = ('weight_formatted',)
def weight_formatted(self, object):
return '{0:.2f} Kg'.format(object.weight)
weight_formatted.short_description = 'Weight'
I.e: 500.00 Kg
The problem with this however is that I would have to write a method for every field that I want to format, making it redundant when I have 10 or more objects to format.
Is there a method that I could override to "catch" these values and specify formatting before they get rendered onto the html? I.e. instead of having to write a method for each Admin class, I could just write the following and have it be formatted.
class IronAdmin(admin.ModelAdmin):
model = Iron
fields = ('weight__kg',)
def overriden_method(field):
if field.name.contains('__kg'):
field.value = '{0:.2f} Kg'.format(field.value)
I.e: 500.00 Kg
After hours scouring the source , I finally figured it out! I realize this isn't the most efficient code and it's probably more trouble than it's worth in most use cases but it's enough for me. In case anyone else needs a quick and dirty way to do it:
In order to automate it, I had to override django.contrib.admin.templatetags.admin_list.result_list with the following:
def result_list_larz(cl):
"""
Displays the headers and data list together
"""
resultz = list(results(cl)) # Where we override
""" Overriding starts here """
""" Have to scrub the __kg's as result_header(cl) will error out """
for k in cl.list_display:
cl.list_display[cl.list_display.index(k)] = k.replace('__kg','').replace('__c','')
headers = list(result_headers(cl))
num_sorted_fields = 0
for h in headers:
if h['sortable'] and h['sorted']:
num_sorted_fields += 1
return {'cl': cl,
'result_hidden_fields': list(result_hidden_fields(cl)),
'result_headers': headers,
'num_sorted_fields': num_sorted_fields,
'results': resultz}
Then overriding results(cl)'s call to items_for_result() wherein we then override its call to lookup_field() as follows:
def lookup_field(name, obj, model_admin=None):
opts = obj._meta
try:
f = _get_non_gfk_field(opts, name)
except (FieldDoesNotExist, FieldIsAForeignKeyColumnName):
# For non-field values, the value is either a method, property or
# returned via a callable.
if callable(name):
attr = name
value = attr(obj)
elif (model_admin is not None and
hasattr(model_admin, name) and
not name == '__str__' and
not name == '__unicode__'):
attr = getattr(model_admin, name)
value = attr(obj)
""" Formatting code here """
elif '__kg' in name or '__c' in name: # THE INSERT FOR FORMATTING!
actual_name = name.replace('__kg','').replace('__c', '')
value = getattr(obj, actual_name)
value = '{0:,.2f}'.format(value)
prefix = ''
postfix = ''
if '__kg' in name:
postfix = ' Kg'
elif '__c' in name:
prefix = 'P'
value = '{}{}{}'.format(prefix, value, postfix)
attr = value
else:
attr = getattr(obj, name)
if callable(attr):
value = attr()
else:
value = attr
f = None
""" Overriding code END """
else:
attr = None
value = getattr(obj, name)
return f, attr, value

Q object parameters

from django.db.models import Q
MODULES_USERS_PERMS = {
MODULE_METHOD: [],
MODULE_NEWS: [],
MODULE_PROJECT: ['created_by', 'leader'],
MODULE_TASK: [],
MODULE_TICKET: [],
MODULE_TODO: []
}
filter_fields = MODULES_USERS_PERMS[MODULE_PROJECT]
perm_q = map(lambda x: Q(x=user), filter_fields)
if perm_q: #sum(perm_q)
if len(perm_q) == 1:
return perm_q[0]
elif len(perm_q) == 2:
return perm_q[0] | perm_q[1]
elif len(perm_q) == 3:
return perm_q[0] | perm_q[1] | perm_q[2]
I do not know how to describe in words what is required by code, I hope he speaks for itself.
I need to make a filter from the list of objects.
Needless code is not working.
UPDATE:
code, that looks better, but not working too:
filters = ['created_by', 'leader']
filter_params = Q()
for filter_obj in filters:
filter_params = filter_params | Q(filter_obj=user)
FieldError at /projects/
Cannot resolve keyword 'filter_obj' into field. Choices are:
begin_time, comment, created_at, created_by, created_by_id, end_time,
id, leader, leader_id, name, project_task, status, ticket_project
If you're looking to combine an unknown number of Q objects:
import operator
perm_q = reduce(operator.or_, perm_q)
Or:
summed_q = perm_q[0]
for new_term in perm_q[1:]:
summed_q = summed_q | new_term
Which does the same thing, just more explicitly.
Based on your edit - you need to turn the string contained in your filter_obj variable into a keyword argument. You can do this by creating a dictionary to use as the keyword arguments for the Q constructor:
filters = ['created_by', 'leader']
filter_params = Q()
for filter_obj in filters:
kwargs = {filter_obj: user}
filter_params = filter_params | Q(**kwargs)

How to include "None" in lte/gte comparisons?

I've got this complex filtering mechanism...
d = copy(request.GET)
d.setdefault('sort_by', 'created')
d.setdefault('sort_dir', 'desc')
form = FilterShipmentForm(d)
filter = {
'status': ShipmentStatuses.ACTIVE
}
exclude = {}
if not request.user.is_staff:
filter['user__is_staff'] = False
if request.user.is_authenticated():
exclude['user__blocked_by__blocked'] = request.user
if form.is_valid():
d = form.cleaned_data
if d.get('pickup_city'): filter['pickup_address__city__icontains'] = d['pickup_city']
if d.get('dropoff_city'): filter['dropoff_address__city__icontains'] = d['dropoff_city']
if d.get('pickup_province'): filter['pickup_address__province__exact'] = d['pickup_province']
if d.get('dropoff_province'): filter['dropoff_address__province__exact'] = d['dropoff_province']
if d.get('pickup_country'): filter['pickup_address__country__exact'] = d['pickup_country']
if d.get('dropoff_country'): filter['dropoff_address__country__exact'] = d['dropoff_country']
if d.get('min_price'): filter['target_price__gte'] = d['min_price']
if d.get('max_price'): filter['target_price__lte'] = d['max_price']
if d.get('min_distance'): filter['distance__gte'] = d['min_distance'] * 1000
if d.get('max_distance'): filter['distance__lte'] = d['max_distance'] * 1000
if d.get('available_on'): # <--- RELEVANT BIT HERE ---
filter['pickup_earliest__lte'] = d['available_on'] # basically I want "lte OR none"
filter['pickup_latest__gte'] = d['available_on']
if d.get('shipper'): filter['user__username__iexact'] = d['shipper']
order = ife(d['sort_dir'] == 'desc', '-') + d['sort_by']
shipments = Shipment.objects.filter(**filter).exclude(**exclude).order_by(order) \
.annotate(num_bids=Count('bids'), min_bid=Min('bids__amount'), max_bid=Max('bids__amount'))
And now my client tells me he wants pickup/drop-off dates to be 'flexible' as an option. So I've updated the DB to allow dates to be NULL for this purpose, but now the "available for pickup on" filter won't work as expected. It should include NULL/None dates. Is there an easy fix for this?
Flip the logic and use exclude(). What you really want to do is exclude any data that specifies a date that doesn't fit. If pickup_latest and pickup_earliest are NULL it shouldn't match the exclude query and wont be removed. Eg
exclude['pickup_latest__lt'] = d['available_on']
exclude['pickup_earliest__gt'] = d['available_on']
Most database engines don't like relational comparisons with NULL values. Use <field>__isnull to explicitly check if a value is NULL in the database, but you'll need to use Q objects to OR the conditions together.
Don't think that's actually a django-specific question. Variable 'd' is a python dictionary, no? If so, you can use this:
filter['pickup_latest__gte'] = d.get('available_on', None)