python select key in dict - python-2.7

have a simple request to Solr below
response = requests.get("http://localhost:8080/solr/select?fl=id,title,description,keywords,author&q=domain:mydomain.com&rows=10&indent=on&wt=json&omitHeader=true")
result = response.json()
print type(result)
print result
print response.text
the above code fives me the following output
the type
<type 'dict'>
the dict contents
{u'response': {u'start': 0, u'numFound': 1, u'docs': [{u'keywords': u'my keywords listed here', u'id': u'project.websiteindex.abe919664893e46eef21aacb1360948ae836a756faa28e2063a4a92ef4273630', u'description': u'my site description isted here.', u'title': u'my site title'}]}}
and below the better readable vesion
{
"response":{"numFound":1,"start":0,"docs":[
{
"description":"my site description isted here.",
"title":"my site title",
"keywords":"my keywords listed here",
"id":"project.websiteindex.abe919664893e46eef21aacb1360948ae836a756faa28e2063a4a92ef4273630"}]
}}
executing:
print(result['response']['numFound'])
will give me the very useful output of
1
Very nice and all, but obviously i would like to get something like
print(result['response']['title'])
my site title
So my question, how do i get my values of title,description and keywords as keys
How do i select them using this format, or how do i get my desired values in a better dict format?
p.s.
stuck wioth an old setup here with python 2.7 and solr 3.6

Since the values you are addressing are fields of an array element, you have to address the element first.
Try something like:
result['response']['docs'][0]['title']

Related

How to run a combination of query and filter in elasticsearch?

I am experimenting using elasticsearch in a dummy project in django. I am attempting to make a search page using django-elasticsearch-dsl. The user may provide a title, summary and a score to search for. The search should match all the information given by the user, but if the user does not provide any info about something, this should be skipped.
I am running the following code to search for all the values.
client = Elasticsearch()
s = Search().using(client).query("match", title=title_value)\
.query("match", summary=summary_value)\
.filter('range', score={'gt': scorefrom_value, 'lte': scoreto_value})
When I have a value for all the fields then the search works correctly, but if for example I do not provide a value for the summary_value, although I am expecting the search to continue searching for the rest of the values, the result is that it comes up with nothing as a result.
Is there some value that the fields should have by default in case the user does not provide a value? Or how should I approach this?
UPDATE 1
I tried using the following, but it returns every time no matter the input i am giving the same results.
s = Search(using=client)
if title:
s.query("match", title=title)
if summary:
s.query("match", summary=summary)
response = s.execute()
UPDATE 2
I can print using the to_dict().
if it is like the following then s is empty
s = Search(using=client)
s.query("match", title=title)
if it is like this
s = Search(using=client).query("match", title=title)
then it works properly but still if i add s.query("match", summary=summary) it does nothing.
You need to assign back into s:
if title:
s = s.query("match", title=title)
if summary:
s = s.query("match", summary=summary)
I can see in the Search example that django-elasticsearch-dsl lets you apply aggregations after a search so...
How about "staging" your search? I can think if the following:
#first, declare the Search object
s = Search(using=client, index="my-index")
#if parameter1 exists
if parameter1:
s.filter("term", field1= parameter1)
#if parameter2 exists
if parameter2:
s.query("match", field=parameter2)
Do the same for all your parameters (with the needed method for each) so only the ones that exist will appear in your query. At the end just run
response = s.execute()
and everything should work as you want :D
I would recommend you to use the Python ES Client. It lets you manage multiple things related to your cluster: set mappings, health checks, do queries, etc.
In its method .search(), the body parameter is where you send your query as you normally would run it ({"query"...}). Check the Usage example.
Now, for your particular case, you can have a template of your query stored in a variable. You first start with, let's say, an "empty query" only with filter, just like:
query = {
"query":{
"bool":{
"filter":[
]
}
}
}
From here, you now can build your query from the parameters you have.
This is:
#This would look a little messy, but it's useful ;)
#if parameter1 is not None or emtpy
#(change the if statement for your particular case)
if parameter1:
query["query"]["bool"]["filter"].append({"term": {"field1": parameter1}})
Do the same for all your parameters (for strings, use "term", for ranges use "range" as usual) and send the query in the .search()'s body parameter and it should work as you want.
Hope this is helpful! :D

Using Python to retrieve website table data after filtering a specific date

I am trying to build a python script to retrieve historic wind power data from this site
I have done sort of a similar thing before. In that case the date and relevant parameters were entered explicitly in the url address.
As you can see in the previous link, e.g. the date is selected from a calendar and it is not displayed as part of the web address.
How can I use python to select an specific date and type in an Id for the fields Settlement Dateand NGC BM Unit Id respectively?
For example:
Settlement Date = 2017-08-01
NGC BM Unit Id = ANSUW-1
I don't have a MWE because I've no clue how to proceed. I was trying to reuse code from other script I'd used to get weather data:
from lxml import html
from lxml import etree
import urllib
def gettabledata():
web= urllib.urlopen("https://www.bmreports.com/bmrs/?q=actgenration/actualgeneration")
s = web.read()
html = etree.HTML(s)
but in this case it's no that simple since the filter parameters are not passed through the url.
Thanks.
I think the below script will fetch you the desired response:
import requests
payload = {"flowid":"b1610","start_date":"2017-08-01","period":"*","bmu_id":"ANSUW-1"}
headers={
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36",
"X-Requested-With":"XMLHttpRequest"
}
page = requests.get("https://www.bmreports.com/bmrs/?", params=payload, headers=headers).text
print(page)
Yes, the advice I offered in the comment was genuinely awful. Shahin is right. What I would add is that you can get the result in json which is relatively easy to process. It has taken me this long to get to this point.
>>> import requests
>>> parameter={"flowid":"b1610","start_date":"2017-08-02","period":"*","bmu_id":"ANSUW-1"}
>>> arg = 'https://www.bmreports.com/bmrs/?q=tablegen&parameter=%s' % str(parameter).replace("'",'"').replace(' ','')
>>> r = requests.get(arg)
>>> r
<Response [200]>
The result in r is json which admittedly looks horrible. However, on inspection it proves to be a series of nested dictionaries. Eventually, if you burrow in you find that 'item' is a list of 48 dictionaries from which you can easily extract whatever you might want.
>>> r.json()['responseBody']['responseList']['item'][0]
{'quantity': '1.414', 'marketGenerationBMUId': 'T_ANSUW-1', 'timeSeriesID': 'ELX-EMFIP-AGOG-TS-14842', 'powerSystemResourceType': 'Generation', 'resolution': 'PT30M', 'documentRevNum': '1', 'bMUnitID': 'T_ANSUW-1', 'registeredResourceEICCode': '48W00000ANSUW-1E', 'businessType': 'Production', 'settlementPeriod': '48', 'curveType': 'Sequential fixed size block', 'marketGenerationUnitEICCode': '48W00000ANSUW-1E', 'activeFlag': 'Y', 'nGCBMUnitID': 'ANSUW-1', 'processType': 'Realised', 'documentID': 'ELX-EMFIP-AGOG-17134615', 'marketGenerationNGCBMUId': 'ANSUW-1', 'settlementDate': '2017-08-02', 'documentType': 'Actual generation'}
>>> r.json()['responseBody']['responseList']['item'][47]
{'quantity': '1.088', 'marketGenerationBMUId': 'T_ANSUW-1', 'timeSeriesID': 'ELX-EMFIP-AGOG-TS-172', 'powerSystemResourceType': 'Generation', 'resolution': 'PT30M', 'documentRevNum': '1', 'bMUnitID': 'T_ANSUW-1', 'registeredResourceEICCode': '48W00000ANSUW-1E', 'businessType': 'Production', 'settlementPeriod': '1', 'curveType': 'Sequential fixed size block', 'marketGenerationUnitEICCode': '48W00000ANSUW-1E', 'activeFlag': 'Y', 'nGCBMUnitID': 'ANSUW-1', 'processType': 'Realised', 'documentID': 'ELX-EMFIP-AGOG-17134615', 'marketGenerationNGCBMUId': 'ANSUW-1', 'settlementDate': '2017-08-02', 'documentType': 'Actual generation'}
You can set items to the 'item' dictionary and then go from there.
>>> items = r.json()['responseBody']['responseList']['item']
>>> items[0]['settlementPeriod']
'48'
>>> items[47]['quantity']
'1.088'
Addendum: In case you don't know how I was able to get that url this is it. I used the Chrome browser. I right-clicked on any element and then on 'Inspect'. Then I clicked on the 'Network' tab in the right-hand pane, then on 'XHR'. Now I clicked on the 'View' button. As you see in the small screen view below I could just about see '?q=tablegen' in the table. I right-clicked and copied that into an editor for study.

django retrieve specific data from a dictionary database field

I have a table that contains values saved as a dictionary.
FIELD_NAME: extra_data
VALUE:
{"code": null, "user_id": "103713616419757182414", "access_token": "ya29.IwBloLKFALsddhsAAADlliOoDeE-PD_--yz1i_BZvujw8ixGPh4zH-teMNgkIA", "expires": 3599}
I need to retrieve the user_id value from the field "extra_data" only not the dictionnary like below.
event_list = Event.objects.filter(season_id=season_id, event_status_id=2).value('extra_data')
If you are storing a dictionary as text in the code you can easily convert it to a python dictionary using eval - although I don't know why you'd want to as it opens you to all sorts of potential malicious code injections.
event_list = eval(Event.objects.filter(season_id=season_id, event_status_id=2).value('extra_data'))
user_id = event_list['user_id']
print user_id
Would give:
"103713616419757182414"
Edit:
On deeper inspection , thats not a Python dictionary, you could import a JSON library to import this, or declare what null is like so:
null = None
event_list = eval(Event.objects.filter(season_id=season_id, event_status_id=2).value('extra_data'))
user_id = event_list['user_id']
Either way, the idea of storing any structured data in a django textfield is fraught with danger that will come back to bite you. The best solution is to rethink your data structures.
This method worked for me. However, this works with a json compliant string
import json
json_obj = json.loads(event_list)
dict1 = dict(json_obj)
print dict1['user_id']

Django-haystack (xapian) autocomplete giving incomplete results

I have a django site running django-haystack with xapian as a back end. I got my autocomplete working, but it's giving back weird results. The results coming back from the searchqueryset are incomplete.
For example, I have the following data...
['test', 'test 1', 'test 2']
And if I type in 't', 'te', or 'tes' I get nothing back. However, if I type in 'test' I get back all of the results, as would be expected.
I have something looking like this...
results = SearchQuerySet().autocomplete(auto=q).values('auto')
And my search index looks like this...
class FacilityIndex(SearchIndex):
text = CharField(document=True, use_template=True)
created = DateTimeField(model_attr='created')
auto = EdgeNgramField(model_attr='name')
def get_model(self):
return Facility
def index_queryset(self):
return self.get_model().objects.filter(created__lte=datetime.datetime.now())
Any tips are appreciated. Thanks.
A bit late, but you need to check the min ngram size that is being indexed. It is most likely 4 chars, so it won't match on anything with fewer chars than that. I am not a Xapian user though, so I don't know how to change this configuration option for that backend.

how to write a query to get find value in a json field in django

I have a json field in my database which is like
jsonfield = {'username':'chingo','reputation':'5'}
how can i write a query so that i can find if a user name exists. something like
username = 'chingo'
query = User.objects.get(jsonfield['username']=username)
I know the above query is a wrong but I wanted to know if there is a way to access it?
If you are using the django-jsonfield package, then this is simple. Say you have a model like this:
from jsonfield import JSONField
class User(models.Model):
jsonfield = JSONField()
Then to search for records with a specific username, you can just do this:
User.objects.get(jsonfield__contains={'username':username})
Since Django 1.9, you have been able to use PostgreSQL's native JSONField. This makes search JSON very simple. In your example, this query would work:
User.objects.get(jsonfield__username='chingo')
If you have an older version of Django, or you are using the Django JSONField library for compatibility with MySQL or something similar, you can still perform your query.
In the latter situation, jsonfield will be stored as a text field and mapped to a dict when brought into Django. In the database, your data will be stored like this
{"username":"chingo","reputation":"5"}
Therefore, you can simply search the text. Your query in this siutation would be:
User.objects.get(jsonfield__contains='"username":"chingo"')
2019: As #freethebees points out it's now as simple as:
User.objects.get(jsonfield__username='chingo')
But as the doc examples mention you can query deeply, and if the json is an array you can use an integer to index it:
https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
>>> Dog.objects.create(name='Rufus', data={
... 'breed': 'labrador',
... 'owner': {
... 'name': 'Bob',
... 'other_pets': [{
... 'name': 'Fishy',
... }],
... },
... })
>>> Dog.objects.create(name='Meg', data={'breed': 'collie', 'owner': None})
>>> Dog.objects.filter(data__breed='collie')
<QuerySet [<Dog: Meg>]>
>>> Dog.objects.filter(data__owner__name='Bob')
<QuerySet [<Dog: Rufus>]>
>>> Dog.objects.filter(data__owner__other_pets__0__name='Fishy')
<QuerySet [<Dog: Rufus>]>
Although this is for postgres, I believe it works the same in other DBs like MySQL
Postgres: https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
MySQL: https://django-mysql.readthedocs.io/en/latest/model_fields/json_field.html#querying-jsonfield
This usage is somewhat anti-pattern. Also, its implementation is not going to have regular performance, and perhaps is error-prone.
Normally don't use jsonfield when you need to look up through fields. Use the way the RDBMS provides or MongoDB(which internally operates on faster BSON), as Daniel pointed out.
Due to the deterministic of JSON format,
you could achieve it by using contains (regex has issue when dealing w/ multiple '\' and even slower), I don't think it's good to use username in this way, so use name instead:
def make_cond(name, value):
from django.utils import simplejson
cond = simplejson.dumps({name:value})[1:-1] # remove '{' and '}'
return ' ' + cond # avoid '\"'
User.objects.get(jsonfield__contains=make_cond(name, value))
It works as long as
the jsonfield using the same dump utility (the simplejson here)
name and value are not too special (I don't know any egde-case so far, maybe someone could point it out)
your jsonfield data is not corrupt (unlikely though)
Actually I'm working on a editable jsonfield and thinking about whether to support such operations. The negative proof is as said above, it feels like some black-magic, well.
If you use PostgreSQL you can use raw sql to solve problem.
username = 'chingo'
SQL_QUERY = "SELECT true FROM you_table WHERE jsonfield::json->>'username' = '%s'"
User.objects.extra(where=[SQL_EXCLUDE % username]).get()
where you_table is name of table in your database.
Any methods when you work with JSON like with plain text - looking like very bad way.
So, also I think that you need a better schema of database.
Here is the way I have found out that will solve your problem:
search_filter = '"username":{0}'.format(username)
query = User.objects.get(jsonfield__contains=search_filter)
Hope this helps.
You can't do that. Use normal database fields for structured data, not JSON blobs.
If you need to search on JSON data, consider using a noSQL database like MongoDB.