Hashed or random values getting inserted in google cloud datastore - python-2.7

task = datastore.Entity(client.key('ModelDataTest', prod_id))
task.update({
'ProductId': '1234',
'ListOfRankedRelevantItems.ProductId': ['345', '456','567'],
'ListOfRankedRelevantItems.SimilarityScore': ['0.98', '0.89','0.77']
})
client.put(task)
Using the above code I'm creating an entity in GC-datastore
however I'm getting prod_id as blob like 'MTIzNDU=' instead of string '1234' and ProductId as ["MzIx","MzQ1","NDU2"] instead of ['345', '456','567'] and SimilarityScore as ["MC45OA==","MC44Nw==","MC43Nw=="] instead of ['0.98', '0.89','0.77'] any one having any idea why so and how to get the proper values.

The issue is that in python 2 string is represented as bytestream and you have to convert it to unicode to see the results properly in gcloud datastore.
Simply use unicode() method to solve this issue.

Related

Query Logs for filter non-empty strings in Google Cloud Logs Explorer

I am trying to query for all logs that meets a simple condition:
The jsonPayload of some DEFAULT log entries have the following structure:
response: {
Values: [
[ ]
]
}
where each item in Values is an array. In most cases, Values have a single item "" in the array (empty). I want to write a query that can filter all logs entry that have values different from an empty string (an array in fact).
Here's the query I tried to run:
severity="DEFAULT" AND
jsonPayload.response.Values != ''
This did not return any result. There are thousands of entries, most of which are empty. Can this be done? If so, what am I missing in this case?
Edit
I am checking to see if the first value inside Values is something other than an empty string. In entries I am looking for, the value of the first item will be an array.
Edit 2
Following the reference suggested, I tried looking for the opposite:
severity="DEFAULT" AND
jsonPayload.response.Values = ''
This shows me the all the results with empty Values array as expected. What's confusing me is why it's not working. The logs are generated by a cloud function that serves as a webhook for event processing. The jsonPayload represents the body of the request from the event source.
To filter non-empty strings in Google Cloud Logs Explorer as seen in the official documentation:
severity="DEFAULT" AND
jsonPayload.response.Values !~ ''
Another way would be:
severity="DEFAULT" AND
jsonPayload.response.Values:*
NOT jsonPayload.response.Values = ''

How to run a combination of query and filter in elasticsearch?

I am experimenting using elasticsearch in a dummy project in django. I am attempting to make a search page using django-elasticsearch-dsl. The user may provide a title, summary and a score to search for. The search should match all the information given by the user, but if the user does not provide any info about something, this should be skipped.
I am running the following code to search for all the values.
client = Elasticsearch()
s = Search().using(client).query("match", title=title_value)\
.query("match", summary=summary_value)\
.filter('range', score={'gt': scorefrom_value, 'lte': scoreto_value})
When I have a value for all the fields then the search works correctly, but if for example I do not provide a value for the summary_value, although I am expecting the search to continue searching for the rest of the values, the result is that it comes up with nothing as a result.
Is there some value that the fields should have by default in case the user does not provide a value? Or how should I approach this?
UPDATE 1
I tried using the following, but it returns every time no matter the input i am giving the same results.
s = Search(using=client)
if title:
s.query("match", title=title)
if summary:
s.query("match", summary=summary)
response = s.execute()
UPDATE 2
I can print using the to_dict().
if it is like the following then s is empty
s = Search(using=client)
s.query("match", title=title)
if it is like this
s = Search(using=client).query("match", title=title)
then it works properly but still if i add s.query("match", summary=summary) it does nothing.
You need to assign back into s:
if title:
s = s.query("match", title=title)
if summary:
s = s.query("match", summary=summary)
I can see in the Search example that django-elasticsearch-dsl lets you apply aggregations after a search so...
How about "staging" your search? I can think if the following:
#first, declare the Search object
s = Search(using=client, index="my-index")
#if parameter1 exists
if parameter1:
s.filter("term", field1= parameter1)
#if parameter2 exists
if parameter2:
s.query("match", field=parameter2)
Do the same for all your parameters (with the needed method for each) so only the ones that exist will appear in your query. At the end just run
response = s.execute()
and everything should work as you want :D
I would recommend you to use the Python ES Client. It lets you manage multiple things related to your cluster: set mappings, health checks, do queries, etc.
In its method .search(), the body parameter is where you send your query as you normally would run it ({"query"...}). Check the Usage example.
Now, for your particular case, you can have a template of your query stored in a variable. You first start with, let's say, an "empty query" only with filter, just like:
query = {
"query":{
"bool":{
"filter":[
]
}
}
}
From here, you now can build your query from the parameters you have.
This is:
#This would look a little messy, but it's useful ;)
#if parameter1 is not None or emtpy
#(change the if statement for your particular case)
if parameter1:
query["query"]["bool"]["filter"].append({"term": {"field1": parameter1}})
Do the same for all your parameters (for strings, use "term", for ranges use "range" as usual) and send the query in the .search()'s body parameter and it should work as you want.
Hope this is helpful! :D

Django unable to save a text value in DB

I'm reading an e-mail content through IMAP in my Django app.
When I try to assign some of the parsed content to the object and do .save() it returns:
ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
When I print the variable type: . Field in the DB is defined as CharField. I tried TextField as well, but the result is the same.
How I can solve that?
if your mail text is in mail_text, do this:
mail_text = unicode(mail_text)

How to match strings in MongoDb and ignore any whitespace

Is it possible to ignore all whitespace using regex in MongoDB queries?
My Node.js program uses Cheerio to pull data from a number of websites, parses and then stores the data in MongoDB. My database has a People collection that keys on the string field Name.
Problem occurs where one website (site-A) shows the name HTML text as John&npsp;Smith, whereas another website (site-B) shows name as John Smith. My program has two scripts, one that scrapes site-A and another to scrape site-B; both of which use the following to scrape the Name data -
var $ = cheerio.load(htmlrow);
var personobj = { name: $('td.person a').text().trim() }
Each script then uses the following MongoDb command (using the native driver) to upsert the scraped data, keying on the Name field. However, this results in two records in the People collection -
db.collection('people').update(
{ Name: personobj.name },
{ $set: { LastScan: new Date() }},
{ upsert: true },
function(){} );
Now, I tried using the regex "extended" 'x' option to query in MongoDb, but it's not working. In fact, I tried testing the 'x' option via the find operator in Robomongo, and it returns zero records. I also note that when find testing in Robomongo, and I simply type Name: "John Smith", it only returns the site-B record, the one without the $nbsp; whitespace; even though when I view the detail of both records, the name strings appear identical. (I suppose difference is caused somewhere by all the encoding/decoding going on here to scrape, parse, store, retrieve... but I'm not sure where or why).
Is it possible to ignore all whitespace when querying MongoDb using regex?
Or, is it easier to handle this in my javascript parse line, to somehow replace and 'standardize' all possible whitespace characters? (Any recommended library to do so?)

Get value in a post request, Django

am getting the following post data in my django app
POST
Variable Value
csrfmiddlewaretoken u'LHM3nkrrrrrrrrrrrrrrrrrrrrrrrrrdd'
id u'{"docs":[],"dr":1, "id":4, "name":"Group", "proj":"/al/p1/proj/2/", "resource":"/al/p1/dgroup/4/","route":"group", "parent":null'
am trying to get the id value in variable id i.e "id":4 (the value 4). When I do request.POST.get('id')I get the whole json string. u'{"docs":[],"dr":1, "id":4, "name":"Group", "proj":"/al/p1/proj/2/", "resource":"/al/p1/dgroup/4/","route":"group", "parent":null' How can I get the "id" in the string?
The data you are sending is simply a json string.
You have to parse that string before you can access data within it. For this you can use Python's json module (it should be available if you're using Python 2.7).
import json
data = json.loads( request.POST.get('id') )
id = data["id"]
If you somehow don't have the json module, you can get the simplejson module.
For more details, refer this question : best way to deal with JSON in django
That's happening because id is string, not dict as it should be. Please provide your template and view code to find source of problem.