What is the most efficient way to count all distinct values for column of ArrayField.
Let's suppose I have a model with the name MyModel and cities field which is postgres.ArrayField.
#models.py
class MyModel(models.Model):
....
cities = ArrayField(models.TextField(blank=True),blank=True,null=True,default=list) ### ['mumbai','london']
and let's suppose our MyModel has the following 3 objects with cities field value as follow.
1. ['london','newyork']
2. ['mumbai']
3. ['london','chennai','mumbai']
Doing a count on distinct values for cities field does on the entire list instead of doing on each element.
## Query
MyModel.objects.values('cities').annotate(Count('id')).order_by().filter(id__count__gt=0)
Here I would like to count distinct values for cities field on each element of the list of cities field.which should give the following final output.
[{'london':2},{'newyork':1},{'chennai':1},{'mumbai':2}]
perform the group by operation in the database level itself.
from django.db import connection
cursor = connection.cursor()
raw_query = """
select unnest(subquery_alias.cities) as distinct_cities, count(*) as cities_group_by_count
from (select cities from sample_mymodel) as subquery_alias group by distinct_cities;
"""
cursor.execute(raw_query)
result = [{"city": row[0], "count": row[1]} for row in cursor]
print(result)
References
unnest()-postgress array function
Django: Executing custom SQL directly
Doing it with an in-efficient way out of Django syllabus:
unique_cities = list(data.values_list('cities',flat=True))
unique_cities_compiled = list(itertools.chain.from_iterable(unique_cities ))
unique_cities_final = {unique_cities_compiled .count(i) for i in unique_cities_compiled }
print(unique_cities_final )
{'london':2},{'newyork':1},{'chennai':1},{'mumbai':2}
if anyone does in much efficient way, do drop the answer for the improvised version of the solution.
I have 3 tables:
Truck with the fields: id, name....
Menu with the fields: id, itemname, id_foodtype, id_truck...
Foodtype with the fields: id, type...
I want to get a summary like:
id name total
10 Alcoholic drink 0
5 Appetizer 11
My problem is to return the results with 0 elements.
I tried an SQL query like this:
SELECT
ft.id, ft.name, COUNT(me.id) total
FROM
foodtype ft LEFT JOIN menu me
ON ft.id = me.id_foodtype
LEFT JOIN truck tr
ON tr.id = me.id_truck AND tr.id = 3
GROUP BY ft.id, ft.name
ORDER BY ft.name
or a query in Django
Menu.objects.filter(id_truck=3).values("id_foodtype").annotate(cnt=Count("id_foodtype"))
But, neither is displaying the results with Zero elements.
At the moment to convert this query to Python code, any of my queries return the exact result that I expected.
How can I return results with the Left Join including the foodtypes with zero elements in the menu?
The direction of LEFT JOIN depends on the object, where you start the query. If it start on Menu you will never see a FoodType unused by selected Menu items. Then is important to filter (by Truck in your case) such way that also null value Menu.id is allowed in order to can get Count == 0.
from django.db.models import Q
qs = (
FoodType.objects
.filter(Q(menu_set__id_truck=3) | Q(menu_set__id__isnull=True))
.values() # not necessary, but useful if you want a dict, not a Model object
.annotate(cnt=models.Count("menu_set__id"))
)
Verify:
>>> print(str(qs.query))
SELECT foodtype.id, foodtype..., COUNT(menu.id) AS cnt
FROM foodtype
LEFT OUTER JOIN menu ON (foodtype.id = menu.id_foodtype)
WHERE _menu.id_truck = 3 OR menu.id IS NULL)
GROUP BY foodtype.id
It works with the current newest and oldest Django 2.0b1 and 1.8.
The query is the same with or without the line .values(). The results are dictionaries or FoodType objects with a cnt attribute.
Footnotes:
The name menu_set should be replaced by the real related_name of foreign key id_foodtype if you have defined the related_name.
class Menu(models.Model):
id_foodtype = models.ForeignKey('FoodType', on_delete=models.DO_NOTHING,
db_column='id_foodtype', related_name='menu_set'))
...
If you start a new project I recommend to rename the foreign key to a name without "id" and the db_column field is with "id". Then menu_item.foodtype is a Food object and menu_item.id_foodtype its id.
I need to select entries which match specific pattern, count them and the pick latest. How to do this via Django ORM?
Tried:
Entry.objects.filter(A=B).annotate(Count("something")).latest("date")
This counts only 1 item for each B. If I remove latest("date"), it counts correctly but gives only count number and nothing else. How to perform this task correctly?
UPD: Actual code
def render_entries(request):
ids = Entry.objects.values("entry_token").distinct()
entries = [Entry.objects.filter(entry_token=x["entry_token"]).annotate(count=Count("id")).latest("date_time") for x in ids]
return render(request, "entries_list.html", {'entries':entries})
Updated Probably not most optimal, but this should work.
def render_entries(request):
# get list of all entry_tokens with count
tokens = Entry.objects.values('entry_token').annotate(count=Count('entry_token'))
# create list of entries
entries = []
for token in tokens:
entry = Entry.objects.filter(entry_token=token['entry_token']).latest('date_time')
entry.count = token['count']
entries.append(entry)
return render(request, "entries_list.html", {'entries':entries})
I am new to sfdc . I have report already created by user . I would like to use python to dump the data of the report into csv/excel file.
I see there are couple of python packages for that. But my code gives an error
from simple_salesforce import Salesforce
sf = Salesforce(instance_url='https://cs1.salesforce.com', session_id='')
sf = Salesforce(password='xxxxxx', username='xxxxx', organizationId='xxxxx')
Can i have the basic steps for setting up the API and some example code
This worked for me:
import requests
import csv
from simple_salesforce import Salesforce
import pandas as pd
sf = Salesforce(username=your_username, password=your_password, security_token = your_token)
login_data = {'username': your_username, 'password': your_password_plus_your_token}
with requests.session() as s:
d = s.get("https://your_instance.salesforce.com/{}?export=1&enc=UTF-8&xf=csv".format(reportid), headers=sf.headers, cookies={'sid': sf.session_id})
d.content will contain a string of comma separated values which you can read with the csv module.
I take the data into pandas from there, hence the function name and import pandas. I removed the rest of the function where it puts the data into a DataFrame, but if you're interested in how that's done let me know.
In case it is helpful, I wanted to write out the steps I used to answer this question now (Aug-2018), based on Obol's comment. For reference, I followed the README instructions at https://github.com/cghall/force-retrieve/blob/master/README.md for the salesforce_reporting package.
To connect to Salesforce:
from salesforce_reporting import Connection, ReportParser
sf = Connection(username='your_username',password='your_password',security_token='your_token')
Then, to get the report I wanted into a Pandas DataFrame:
report = sf.get_report(your_reports_id)
parser = salesforce_reporting.ReportParser(report)
report = parser.records_dict()
report = pd.DataFrame(report)
If you were so inclined, you could also simplify the four lines above into one, like so:
report = pd.DataFrame(salesforce_reporting.ReportParser(sf.get_report(your_reports_id)).records_dict())
One difference I ran into from the README is that sf.get_report('report_id', includeDetails=True) threw an error stating get_report() got an unexpected keyword argument 'includeDetails'. Simply removing it out seemed result in the code working fine.
report can now be exported via report.to_csv('report.csv',index=False), or manipulated directly.
EDIT: parser.records() changed to parser.records_dict(), as this allows the DataFrame to have the columns already listed, rather than indexing them numerically.
The code below is rather long and might be just for our use case but the basic idea is the following:
Find out date interval length and additional needed filtering to never run into the "more the 2'000" limit. In my case I could have weekly date range filter but would need to apply some additional filters
Then run it like this:
report_id = '00O4…'
sf = SalesforceReport(user, pass, token, report_id)
it = sf.iterate_over_dates_and_filters(datetime.date(2020,2,1),
'Invoice__c.InvoiceDate__c', 'Opportunity.CustomField__c',
[('a', 'startswith'), ('b', 'startswith'), …])
for row in it:
# do something with the dict
The iterator goes through every week (if you need daily iterators or monthly then you'd need to change the code, but the change should be minimal) since 2020-02-01 and applies the filter CustomField__c.startswith('a'), then CustomField__c.startswith('b'), … and acts as a generator so you don't need to mess with the filter cycling yourself.
The iterator throws an Exception if there's a query which returns more than 2000 rows, just to be sure that the data is not incomplete.
One warning here: SF has a limit of max 500 queries per hour. Say if you have one year with 52 weeks and 10 additional filters you'd already run into that limit.
Here's the class (relies on simple_salesforce)
import simple_salesforce
import json
import datetime
"""
helper class to iterate over salesforce report data
and manouvering around the 2000 max limit
"""
class SalesforceReport(simple_salesforce.Salesforce):
def __init__(self, username, password, security_token, report_id):
super(SalesforceReport, self).__init__(username=username, password=password, security_token=security_token)
self.report_id = report_id
self._fetch_describe()
def _fetch_describe(self):
url = f'{self.base_url}analytics/reports/{self.report_id}/describe'
result = self._call_salesforce('GET', url)
self.filters = dict(result.json()['reportMetadata'])
def apply_report_filter(self, column, operator, value, replace=True):
"""
adds/replaces filter, example:
apply_report_filter('Opportunity.InsertionId__c', 'startsWith', 'hbob').
For date filters use apply_standard_date_filter.
column: needs to correspond to a column in your report, AND the report
needs to have this filter configured (so in the UI the filter
can be applied)
operator: equals, notEqual, lessThan, greaterThan, lessOrEqual,
greaterOrEqual, contains, notContain, startsWith, includes
see https://sforce.co/2Tb5SrS for up to date list
value: value as a string
replace: if set to True, then if there's already a restriction on column
this restriction will be replaced, otherwise it's added additionally
"""
filters = self.filters['reportFilters']
if replace:
filters = [f for f in filters if not f['column'] == column]
filters.append(dict(
column=column,
isRunPageEditable=True,
operator=operator,
value=value))
self.filters['reportFilters'] = filters
def apply_standard_date_filter(self, column, startDate, endDate):
"""
replace date filter. The date filter needs to be available as a filter in the
UI already
Example: apply_standard_date_filter('Invoice__c.InvoiceDate__c', d_from, d_to)
column: needs to correspond to a column in your report
startDate, endDate: instance of datetime.date
"""
self.filters['standardDateFilter'] = dict(
column=column,
durationValue='CUSTOM',
startDate=startDate.strftime('%Y-%m-%d'),
endDate=endDate.strftime('%Y-%m-%d')
)
def query_report(self):
"""
return generator which yields one report row as dict at a time
"""
url = self.base_url + f"analytics/reports/query"
result = self._call_salesforce('POST', url, data=json.dumps(dict(reportMetadata=self.filters)))
r = result.json()
columns = r['reportMetadata']['detailColumns']
if not r['allData']:
raise Exception('got more than 2000 rows! Quitting as data would be incomplete')
for row in r['factMap']['T!T']['rows']:
values = []
for c in row['dataCells']:
t = type(c['value'])
if t == str or t == type(None) or t == int:
values.append(c['value'])
elif t == dict and 'amount' in c['value']:
values.append(c['value']['amount'])
else:
print(f"don't know how to handle {c}")
values.append(c['value'])
yield dict(zip(columns, values))
def iterate_over_dates_and_filters(self, startDate, date_column, filter_column, filter_tuples):
"""
return generator which iterates over every week and applies the filters
each for column
"""
date_runner = startDate
while True:
print(date_runner)
self.apply_standard_date_filter(date_column, date_runner, date_runner + datetime.timedelta(days=6))
for val, op in filter_tuples:
print(val)
self.apply_report_filter(filter_column, op, val)
for row in self.query_report():
yield row
date_runner += datetime.timedelta(days=7)
if date_runner > datetime.date.today():
break
For anyone just trying to download a report into a DataFrame this is how you do it (I added some notes and links for clarifications):
import pandas as pd
import csv
import requests
from io import StringIO
from simple_salesforce import Salesforce
# Input Salesforce credentials:
sf = Salesforce(
username='johndoe#mail.com',
password='<password>',
security_token='<security_token>') # See below for help with finding token
# Basic report URL structure:
orgParams = 'https://<INSERT_YOUR_COMPANY_NAME_HERE>.my.salesforce.com/' # you can see this in your Salesforce URL
exportParams = '?isdtp=p1&export=1&enc=UTF-8&xf=csv'
# Downloading the report:
reportId = 'reportId' # You find this in the URL of the report in question between "Report/" and "/view"
reportUrl = orgParams + reportId + exportParams
reportReq = requests.get(reportUrl, headers=sf.headers, cookies={'sid': sf.session_id})
reportData = reportReq.content.decode('utf-8')
reportDf = pd.read_csv(StringIO(reportData))
You can get your token by following the instructions at the bottom of this page
views.py:
q3=KEBReading.objects.filter(datetime_reading__month=a).filter(datetime_reading__year=selected_year).values("signed")
for item in q3:
item["signed"]="signed"
print item["signed"]
q3.save()
How do I save a field into the database? I'm trying to save the field called "signed" with a value. If I do q3.save() it gives a error as it is a queryset. I'm doing a query from the database and then, based on the result, want to set a value to a field and save it.
prevdate=KEBReading.objects.filter(datetime_reading__lt=date)
i am getting all the rows from the database less than the current date. but i want only the latest record. if im entering 2012-06-03. wen i query i want the date less than this date i.e the date just previous to this. can sumbody help?
q3 = KEBReading.objects.filter(datetime_reading__month=a,
datetime_reading__year=selected_year)
for item in q3:
item.signed = True
item.save()
q3=KEBReading.objects.filter(...)
will return you a list of objects. Any instance of a Django Model is an object and all fields of the instance are attributes of that object. That means, you must use them using dot (.) notation.
like:
item.signed = "signed"
If your object is a dictionary or a class derived from dictionary, then you can use named-index like:
item["signed"] = "signed"
and in your situation, that usage is invalid (because your object's type is not dictionary based)
You can either call update query:
KEBReading.objects.filter(...).update(selected="selected")
or set new value in a loop and then save it
for item in q3:
item.signed="signed"
q3.save()
but in your situation, update query is a better approach since it executes less database calls.
Try using update query:
If signed is a booleanfield:
q3 = KEBReading.objects.filter(datetime_reading__month = a).filter(datetime_reading__year = selected_year).update(signed = True)
If it is a charfield:
q3 = KEBReading.objects.filter(datetime_reading__month = a).filter(datetime_reading__year = selected_year).update(signed = "True")
Update for comments:
If you want to fetch records based datetime_reading month, you can do it by providing month as number. For example, 2 for February:
q3 = KEBReading.objects.filter(datetime_reading__month = 2).order_by('datetime_reading')
And if you to fetch records with signed = True, you can do it by:
q3 = KEBReading.objects.filter(signed = True)
If you want to fetch only records of previous date by giving a date, you can use:
prevdate = KEBReading.objects.filter(datetime_reading = (date - datetime.timedelta(days = 1)))