I'm trying to understand a rebuild_index error that results when using a very basic SearchIndex very similar to what is used in the haystack documentation. In the docs, the id of the model is excluded, which makes sense given that there seems little point in it influencing search results.
I've tried something like this
class CollectionIndex(indexes.SearchIndex, indexes.Indexable):
#id = indexes.IntegerField(model_attr='pk')
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='name')
def get_model(self):
return Collection
def index_queryset(self, using=None):
# force just one item to be indexed to save a wait.
return self.get_model().objects.filter(name="Test")
...but I get this error (I've applied some formatting to help legibility)
elasticsearch.helpers.BulkIndexError: ('1 document(s) failed to index.', [
{
"index": {
"_index": "haystack",
"_type": "modelresult",
"_id": "my_project.collection.1",
"status": 400,
"error": {
"type": "mapper_parsing_exception",
"reason": "failed to parse [id]",
"caused_by": {
"type": "number_format_exception",
"reason": 'For input string: "my_project.collection.1"',
},
},
}
}
])
The only way to get the indexing to work is to uncomment the id line. Is this just an elasticsearch thing, which is why the examples in the official docs don't have an equivalent, or am I misinterpreting things?
Does it make sense to do a search for a number, and have a search result with an id matching that number show up? It seems a bit odd.
Related
Let's say that I have a SaaS based on Django backend that processes the data of the users and write everything to the Elasticsearch. Now I would like to give users access to search and request their data stored in ES using all possible search requests available in ES. Obviously the user should have only access to his data, not to other user's data. I am aware that it can be done in a lot of different ways but I wonder what is safe and the best solution? At this point I store everything in one index and type in the way shown below but I can do this in any way.
"_index": "example_index",
"_type": "example_type",
"_id": "H2s-lGsdshEzmewdKtL",
"_score": 1,
"_source": {
"user_id": 1,
"field1": "example1",
"field2": "example2",
"field3": "example3"
}
I think that the best way would be to associate every document with the user_id. The user would send for example GET request with body and authorization header with Token. I would use Token to extract id of the user for example in this way
key = request.META.get('HTTP_AUTHORIZATION').split()[1]
user_id = Token.objects.get(key=key).user_id
After this I would redirect his request to ES and only data that meet requirements and belongs to this user would be returned. Of course I could do this like shown above where I also add field user_id. For example I could use post_filter in this way:
To every request I would add something like this:
,
"post_filter": {
"match": {
"user_id": 1
}
}
For example the user sends GET with body
{
"query": {
"regexp": {
"tag": ".*example.*"
}
}
}
and I change this in my backend and redirect request to ES with body:
{
"query": {
"regexp": {
"tag": ".*example.*"
}
},
"post_filter": {
"match": {
"user_id": 1
}
}
}
but it doesn't seem to me that including this field in _source is a good idea. I am almost sure that it can be solved in a more optimal way than post_filtering. I see a lot of information about authorization in ES however I can’t find how can I associate document with user_id and then search only his documents without post_filtering. Any ideas?
UPDATE
My current solution looks in they way shown below however as I mentioned I believe that it is not optimal way. If anyone has an idea how can I solve this in the way described above I will be grateful for help.
I send for example
{
"query": {
"regexp": {
"tag": ".*test.*"
}
}
}
In Django backend I just do
key = request.META.get('HTTP_AUTHORIZATION').split()[1]
user_id = Token.objects.get(key=key).user_id
body = json.loads(request.body)
body['post_filter'] = {"match": {"user_id": user_id}}
res = es.search(index="pictures", doc_type="picture", body=body)
output = []
for hit in res['hits']['hits']:
output.append(hit["_source"])
return Response(
{'output': output},
status=status.HTTP_200_OK)
In elasticsearch 7.1, you have now basic security in the free version of elasticsearch. Thanks to that, you can control per indice thé Access of your user.
I've got model with translated fields.
class Device(TranslatableModel):
translations = TranslatedFields(name=models.CharField(max_length=100))
I made a serializer like:
class DeviceSerializer(TranslatableModelSerializer):
translations = TranslatedFieldsField(shared_model=Device)
class Meta:
model = Device
fields = ('translations',)
It gives me nice JSON like it should.
{
"count": 1,
"next": null,
"previous": null,
"results": [
{
"device": {
"translations": {
"en": {
"name": "Sample Device"
}
}
}
}
]
}
Now i want to use it with django-rest-framework. In my template I've written script like:
$('#devices').DataTable({
'serverSide': true,
'ajax': 'api/devices/?format=datatables',
'columns': [
{'data':'device.translations.en'}
It refuses to work with me. I am getting django.core.exceptions.FieldError: Unsupported lookup 'en' for AutoField or join on the field not permitted.
If I am not appending .en to {'data'} it gives Object.object of course.
Issue is in template file.
Pass name & data field separately to columns in data-table configuration
please replace field_name with your model field name
$('#devices').DataTable({
'ajax': 'api/devices/?format=datatables',
'columns': [
{"data": "translations.en.field_name" , "name": "translations.field_name"},
]
});
for more details refer django-rest-framework-datatables
& Django-parler-rest
The actual problem is that while making get request to server
data-table will add name value in column parameter so
instead of writing
"name": "translations.en.field_name"
write down:
"name": "translations.field_name"
remove language code
I have a strange problem.
I have an API built with the Django REST framework.
I'm making a call and getting the following JSON back:
{
"success": true,
"result": {
"user_type": "ta",
"email": "myemail#gmail.com",
"first_name": "John",
"last_name": "Smith",
"mobile_phone": "555-555-5555",
"id": "0f165a85-2da6-4dcb-97cb-bf04900a942b"
}
}
I've tried to add a logging middleware when I'm trying to get the same output from response.data and writing int into a text field in my database.
For the very same request response.data is this: (and this gets written into my db, instead of the desired JSON string from above):
{'success': True, 'result': OrderedDict([('user_type', 'ta'), ('email', 'myemail#gmail.com'), ('first_name', 'John'), ('last_name', 'Smith'), ('mobile_phone', '555-555-5555'), ('id', UUID('0f165a85-2da6-4dcb-97cb-bf04900a942b'))])}
Why is that? How can I get get rid of that OrderedDict and get a perfect JSON string from response.data?
Please note: json.dumps doesn't work. I'm getting TypeError: Object of type 'UUID' is not JSON serializable. My entire ID system in the models is based on UUIDs. However, my Django REST framework is capable of serializing it just fine in the above example... how is that done?
You are hitting this problem because you're trying to dump the internal representation using json.dumps, which doesn't know how to handle UUID objects.
I can see two options - one, teach dumps how to serialize a UUID. This can be done by subclassing JSONDecoder, e.g. this SO answer.
However, DRF already knows how to serialize these fields. Poking around in the debugger, it looks like the response text is stashed in response.rendered_content. I'd check if that's populated by the time your middleware is run.
json.dumps(log_data, indent=2)
this can format your dict to the str you want.
eg:
# save log_data in some way
log_data = {
"success": True,
"result": {
"user_type": "ta",
"email": "myemail#gmail.com",
"first_name": "John",
"last_name": "Smith",
"mobile_phone": "555-555-5555",
"id": "0f165a85-2da6-4dcb-97cb-bf04900a942b"
}
}
logger.info(json.dumps(log_data, indent=2))
print(json.dumps(log_data, indent=2), type(json.dumps(log_data, indent=2)))
logger save to django.log like:
I managed to find this posted solution.
https://arthurpemberton.com/2015/04/fixing-uuid-is-not-json-serializable
I added the code to my models and now I do not get TypeError: Object of type 'UUID' is not JSON serializable when I serialize my UUID fields. This allowed me to call json.dumps on my response.data and serialize it to text perfectly.
I'm getting my page wall with the open graph.
And when someone posted a photo, I get it on the JSON
{
"id": "27888702146_10150369820322147",
"from": {
"name": "brocoli",
"category": "Record label",
"id": "27888702146"
},
"message": "Vincent Epplay / David Fenech / Jac Berrocal \u00e0 Beaubourg ce soir, 19h, gratos.",
"picture": "http://photos-f.ak.fbcdn.net/hphotos-ak-snc7/305819_10150369820292147_27888702146_8255527_583491475_s.jpg",
"link": "https://www.facebook.com/photo.php?fbid=10150369820292147&set=a.386279807146.165840.27888702146&type=1",
"icon": "http://static.ak.fbcdn.net/rsrc.php/v1/yz/r/StEh3RhPvjk.gif",
"type": "photo",
"object_id": "10150369820292147",
"created_time": "2011-10-16T08:22:21+0000",
"updated_time": "2011-10-16T08:22:21+0000",
"likes": {
"data": [
{
"name": "brocoli",
"category": "Record label",
"id": "27888702146"
},
{
"name": "Agathe Morier",
"id": "601668526"
}
],
"count": 2
},
"comments": {
"count": 0
},
"is_published": true
}
The problem is that the picture link is a low resolution copy of the picture.
How can I get the URL of the full picture ?
Thanks!!
Best
Geoffroy
You can get different version of the photo by querying Graph API with its object_id (not photo post_id which is id in results you provided).
Once you'll request the photo by object id you'll get array of images with URLs and dimensions:
http://graph.facebook.com/10150369820292147?fields=images
If you're attempting to access posts on a Facebook Page (such as for a company) instead of typical user profile, you firstly need to fetch the feed like this:
https://graph.facebook.com/v15.0/YOUR_PAGE_ID_HERE/feed?fields=attachments&access_token=...
And then access data[0].attachments.data[0].subattachments.data[0].target.id to get the object ID (or "target ID" in this case) which you can then use to perform an additional query to obtain the higher resolution image. Increment the numbers to get additional posts and images inside each post.
All you need to do is :
http://graph.facebook.com/me?fields=picture.height(961)
// replace 961 with your required height which u want
You can do this from the main posts list now using
/v2.3/105753476132681/posts?limit=5&fields=likes.summary(true),comments.summary(true), attachments
If attachments doesn't work, try full_picture - but that just gave the 100x100 image for me as well.
Attachments returns a data hash with a 640x480 version of the image at least (not sure what my orig. photo size was)
Use this Code. Its Work for me and get Clear Image
String PICTURE_URL;
String getPicture = hashMap.get("picture");
if (getPicture.contains("_t.")) {
PICTURE_URL = getPicture.replaceAll("_t.", "_n.");
} else if (getPicture.contains("_a.")) {
PICTURE_URL = getPicture.replaceAll("_a.", "_n.");
} else if (getPicture.contains("_s.")) {
PICTURE_URL = getPicture.replaceAll("_s.", "_n.");
} else if (getPicture.contains("_q.")) {
PICTURE_URL = getPicture.replaceAll("_q.", "_n.");
}
url=new URL(PICTURE_URL);
Bitmap bitmap=BitmapFactory.decodeStream(url.openConnection().getInputStream());
((ImageView)view.findViewById(R.id.imageView_FullImage)).setImageBitmap(bitmap);
Though requesting a photo by its object_id will return an array of images with different dimensions, in some cases this approach would require an additional call to the Facebook API.
A simpler approach is to add full_picture to your list of parameters, which will extract the highest resolution image associated with the post.
/v2.2/6275848869/posts?fields=full_picture
For example, if you want to extract all the posts from a Facebook page in the past X days, with the object_id approach you'd need to call the API 3 times:
To get the page info.
To extract the list of posts and obtain the object_id for each post.
For each object_id, to retrieve the list of higher resolution images.
I'm trying to create a REST web service that exposes the following Django model:
class Person(models.Model):
uid = models.AutoField(primary_key=True)
name = models.CharField(max_length=40)
latitude = models.CharField(max_length=20)
longitude = models.CharField(max_length=20)
speed = models.CharField(max_length=10)
date = models.DateTimeField(default=datetime.datetime.now)
def __unicode__(self):
return self.name
Here's how I thought about it so far:
Get all Persons
URL: http://localhost/api/persons/
Method: GET
Querystring:
startlat=
endlat=
startlng=
endlng=
Used for getting the Persons that are within the specified coordinate range.
page=
Used for getting the specified page of the response (if the response contains multiple pages).
Returns:
200 OK & JSON
404 Not Found
Example:
Request:
GET http://localhost/api/persons/?startlat=10&endlat=15&startlng=30&endlng=60
Response:
{
"persons":
[
{ "href": "1" },
{ "href": "2" },
{ "href": "3" },
...
{ "href": "100" }
],
"next": "http://localhost/api/persons/?startlat=10&endlat=15&startlng=30&endlng=60&page=2"
}
Get info on a specified Person
URL: http://localhost/api/persons/[id]
Method: GET
Returns:
200 OK & JSON
404 Not Found
Example:
Request:
http://localhost/api/persons/5/
Response:
{
"uid": "5",
"name": "John Smith",
"coordinates": {
"latitude":"14.43432",
"longitude":"56.4322"
},
"speed": "12.6",
"updated": "July 17, 2009, 8:46 a.m."
}
How correct is my attempt so far? Any suggestions are highly appreciated.
{ "href": "1" },
1 is hardly a valid URL. You should use full URLs. Google for HATEOAS.
Also, remember to send a relevant Content-Type header. You may want to make up your own mime-type to describe the format. This gives you the option to later change the content-type (Eg. change the format after publishing). See Versioning REST Web Services
I think query parameters could be simpler and clearer. This would make the URI more readable and would allow more flexibility for future extensions:
GET http://localhost/api/persons/?latitude=10:15&longitude=30:60
You may want to enable these in the future:
GET http://localhost/api/persons/?latitude=10&longitude=60&within=5km
Seems REST-cool. Even i worked on same kind of thing, few days earlier.
The only change, i would love to do in it, is the direct link to the person details. And also some details (like name here) to identify the person, and aid me in decision to navigate further. Like...
{
"persons":
[
{ "name": "John Smith", "href": "http://localhost/api/persons/1/" },
{ "name": "Mark Henry", "href": "http://localhost/api/persons/2/" },
{ "name": "Bruce Wayne", "href": "http://localhost/api/persons/3/" },
...
{ "name": "Karl Lewis", "href": "http://localhost/api/persons/100/" }
],
"next": "http://localhost/api/persons/?startlat=10&endlat=15&startlng=30&endlng=60&page=2"
}
This way, i am giving everything, to present data as,
John
Smith
Mark
Henry
Bruce
Wayne
...
Karl Lewis
Next Page
It's ok to provide shorthand URIs in your JSON responses if you provide some templating system. Like giving a base URI as something like http://whatever.com/persons/{id}/ and then providing IDs. Then with python you can just do a format call on the string. You don't ever want to make the programmer actually look at and understand the meaning of the URIs, which isn't necessary when you use templates.
You might want to take a look at pre-existing REST middleware. I know they saved me a lot of time. I'm using http://code.google.com/p/django-rest-interface/. And a snippet of the urls.py
json_achievement_resource = Collection(
queryset = Achievement.objects.all(),
permitted_methods = ('GET',),
responder = JSONResponder(paginate_by = 10)
)
urlpatterns += patterns('',
url(r'^api/ach(?:ievement)?/(.*?)/json$', json_achievement_resource),
)