I've made a simple JSON API using django-rest-framework, and I'd like to be able to download the 5 collections as one zipfile containing the collections as JSON files. (my app requires this offline dump of data).
I'm thinking of writing a view, served at, say, /download/ that bundles up the output from all my MyModelList.as_view()s in a zip and serves it.
What is the best way of doing this? I could use urllib/Requests to query my API directly, but it seems a long way round, to invoke the whole http stack...
Many thanks!
You could use the MyModelList.as_view()(request, format="json").rendered_content you mention. Depending on your setup, you could also call a serializer directly instead of going all the way through your model. This, of course, depends on whether you do anything special in your view or whether you just pass along serialized model data.
For zipping up those files, use Python's zipfile module. See https://stackoverflow.com/a/9644831/27401 and https://stackoverflow.com/a/908293/27401 for some additional hints. A summary:
Use StringIO to create a file-like object (instead of a temporary file).
Use zipfile's .writestr('filename-in-zip.json', your_json_string) on that StringIO to add content.
When returning the zipfile as a django response, don't forget to set the Content-Disposition header as shown in https://stackoverflow.com/a/909088/27401
Related
Can anyone guide me towards the right direction as to where I should place a script solely for loading data into ndb. As I wish to upload all data into the gae ndb so that the application could perform query on it.
Right now, the loading of data is in my application. I wish to placed it separately from the main application.
Should it be edited in the yaml file?
EDITED
This is a snippet of the entity and the handler to upload the data into GAE ndb.
I wish to placed this chunk of code separately from my main application .py. Reason being the uploading of this data won't be done frequently and to keep the codes in the main application "cleaner".
class TagTrend_refine(ndb.Model):
tag = ndb.StringProperty()
trendData = ndb.BlobProperty(compressed=True)
class MigrateData(webapp2.RequestHandler):
def get(self):
listOfEntities = []
f = open("tagTrend_refine.txt")
lines = f.readlines()
f.close()
for line in lines:
temp = line.strip().split("\t")
data = TagTrend_refine(
tag = temp[0],
trendData = temp[1]
)
listOfEntities.append(data)
ndb.put_multi(listOfEntities)
For example if I placed the above code in a file called dataLoader.py, where should I call this script to invoke?
In app.yaml alongside my main application(knowledgeGraph.application)?
- url: /.*
script: knowledgeGraph.application
You don't show us the application object (no doubt a WSGI app) in your knowledge.py module, so I can't know what URL you want to serve with the MigrateData handler -- I'll just guess it's something like /migratedata.
So the class TagTrend_refine should be in a separate file (usually called models.py) so that both your dataloader.py, and your knowledge.py, can import models to access it (and models.py will need its own import of ndb of course). (Then of course access to the entity class will be as models.TagTrend_refine -- very basic Python).
Next, you'll complete dataloader.py by defining a WSGI app, e.g, at end of file,
app = webapp2.WSGIApplication(routes=[('/migratedata', MigrateData)])
(of course this means this module will need to import webapp2 as well -- can I take for granted a knowledge of super-elementary Python?).
In app.yaml, as the first URL, before that /.*, you'll have:
url: /migratedata
script: dataloader.app
Given all this, when you visit '/migratedata', your handler will read the "tagTrend_refine.txt" file that you uploaded together with your .py, .yaml, and so on, files in your overall GAE application, and unconditionally create one entity per line of that file (assuming you fix the multiple indentation problems in your code as displayed above, but, again, this is just super-elementary Python -- presumably you've used both tabs and spaces and they show up OK in your editor, but not here on SO... I recommend you use strictly, only spaces, never tabs, in Python code).
However this does seem to be a peculiar task. If /migratedata gets visited twice, it will create duplicates of all entities. If you change the tagTrend_refine.txt and deploy a changed variation, then visit /migratedata... all old entities will stick around and all the new entities will join them. And so forth.
Moreover -- /migratedata is NOT idempotent (if visited more than once it does not produce the same state as running it just once) so it shouldn't be a GET (and now we're on to super-elementary HTTP for a change!-) -- it should be a POST.
In fact I suspect (but I'm really flying blind here, since you see fit to give such tiny amounts of information) that you in fact want to upload a .txt file to a POST handler and do the updates that way (perhaps avoiding duplicates...?). However, I'm no mind reader, so this is about as far as I can go.
I believe I have fully answered the question you posted (though perhaps not the one you meant but didn't express:-) and by SO's etiquette it would be nice to upvote and accept this answer, then, if needed, post another question, expressing MUCH more clearly and completely what you're trying to achieve, your current .py and .yaml (ideally with correct indentation), what they actually do and why you'd like to do something different. For POST vs GET in particular, just study When should I use GET or POST method? What's the difference between them? ...
Alex's solution will work, as long as all you data can be loaded in under 1 minute, as that's the timeout for an app engine request.
For larger data, consider calling the datastore API directly from your own computer where you have the source. It's a bit of a hassle because it's a different API; it's not ndb. But it's still a pretty simple API. Here's some code that calls the API:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/2-structured-data/bookshelf/model_datastore.py
Again, this code can run anywhere. It doesn't need to be uploaded to app engine to run.
I've run into a snag in my views.
Here "filtered_posts" is array of Django objects coming back from the model.
I am having a little trouble figuring out how to get as text data that I can
later pack into json instead of using serializers.serialize...
What results is that the data comes double-escaped (escaped once by serializers.serialize and a second time by json.dumps).
I can't figure out how to return the data from the db in the same way that it would come back if I were using the MySQLdb lib directly, in other words, as strings, instead of references to objects. As it stands if I take out the serializers.serialize, I get a list of these django objects, and it doesn't even list them all (abbreviates them with '...(remaining elements truncated)...'.
I don't think I should, but should I be using the __unicode__() method for this? (and if so, how should I be evoking it?)
JSONtoReturn = json.dumps({
'lowest_id': user_posts[limit - 1].id,
'user_posts': serializers.serialize("json", list(filtered_posts)),
})
The Django Rest Framework looks pretty neat. I've used Tastypie before, too.
I've also done RESTful APIs that don't include a framework. When I do that, I define toJSON methods on my objects, that return dictionaries, and cascade the call to related elements. Then I call json.dumps() on that. It's a lot of work, which is why the frameworks are worth looking at.
What you're looking for is Django Rest Framework. It handles related objects in exactly thew way you're expecting it to (you can include a nested object, like in your example, or simply have it output the PK of the related object for the key).
I have a Django server responsible for serving JSON formatted files to a controller machine. My server gets data from POST requests and serve JSON files on GET requests. I'm wondering what is the most RESTful and efficient way of creating this bridge on the server?
I'm considering creating a JSON model and storing instances on my database for each POST request and pack the instances into JSON files dynamically and serve them on GET requests. The alternative would be creating JSON files on POST requests, saving them in a folder on the server, and serving these files on GET requests.
Which way is better and why? Or am I failing to see a completely better method?
Why are you creating files? You can let a Django view return a JSON response instead of a HTML response:
import json
# in your view:
data = {}
return HttpResponse(json.dumps(data), mimetype="application/json")
Construct the JSON data dynamically from the available data and if the JSON responses are big, add caching (e.g., memcached or Varnish).
This will prevent certain problems (what to do if a GET request is done but there is no JSON file yet?). This way you can generate the JSON based on the data that you have or return JSON with an error message instead.
I looked into json serialization with natural keys and dependencies etc. to control the fields that are serialized. I also tried using the middleware wodofstuff to allow for deeper foreign key serialization. However, I decided to use templates to render the JSON response.
Some pitfalls are
I am responsible for formatting the JSON (more prone to mistakes like a missing semi-colon)
I am responsible for escaping characters
Renders slower than buit-in serialization?
Some advantages are
I have control over what is serialized (even if the underlying models is changed)
I can format many to many or foreign key relationships on the JSON file however I like
TLDL; In my case, the format of the JSON file I needed was very custom. It had dictionary in list in dictionary. Some fields are iterative so I have for loops in the templates to render them. However, the format requires some of the fields in the iterated object to be encapsulated in a list as oppose to a dictionary.
This was an obstacle I ran into while considering simplejson. By using something like
import simplejson as json
def encode_complex(obj):
if isinstance(obj, complex):
return [obj.real, obj.imag]
raise TypeError(repr(o) + " is not JSON serializable")
json.dumps(2 + 1j, default=encode_complex)
'[2.0, 1.0]'
I could manage to return iterated data; however, I needed iteration in iteration and custom object types (list or dict) to encapsulate certain iterations. Finally, (may it be lack of knowledge or lack of patience) I decided to just do it in the templates.
I feel like rendering it through templates is not the most scalable or 'smartest' way of doing it, could it possibly be done in a better way? Please feel free to prove me right or wrong.
I'm working with SWFUpload and Django, and I've noticed that authentication tends to break.
There is one part that is holding me up and I'm looking for direction more then a solution as I think know the solution is not yet available. (So I'm making it. )
I need to know how Django creates the WSGI request-object and how it's handled.
After looking at the source of django, it seems that csrf is done via the WSGIobject which have the appropriate cookeis appended to it. Naturally flash posts do not support this unless specified. SWFUpload offers the ability to send cookie data in the post params via a plugin, however I'd like to send them via headers on the URLRequest object. ( So that the Auth-Middleware and CSRF-Middleware can see it. )
My goal is to upgrade SWFUpload to send headers containing the values for what ever objects I pass it. The hard part for me is to figure out how those headers will be interpreted.
How does Django create the request.META object? | Where is the request.session object created?
I'm reading up on the WSGInterface now, but I'd like to accelerate this research. Thanks!
I believe what you're looking for is django.core.handlers.wsgi.
I've found myself unsatisfied with Django's ability to render JSON data. If I use built in serializes then database foreign key relationships are not included in the data (only the keys). Also, it seems to be impossible to include custom data in the json feed that isn't part of the model being serialized.
As a test I implemented a template that rendered some JSON for the resultset of a particular model. I was able to include/exclude whatever parts of the model I wanted and was able to include custom data as well.
The test seemed to work well and wasn't slower than the recommended serialization methods.
Are there any pitfalls to this using this method of serialization?
While it's hard to say definitively whether this method has any pitfalls, it's the method we use in production as you control everything that is serialized, even if the underlying model is changed. We've been running a high traffic application in for almost two years using this method.
Hope this helps.
One problem might be escaping metacharacters like ". Django's template system automatically escapes dangerous characters, but it's set up to do that for HTML. You should look up exactly what the template escaping does, and compare that to what's dangerous in JSON. Otherwise, you could cause XSS problems.
You could think about constructing a data structure of dicts and lists, and then running a JSON serializer on that, rather than directly on your database model.
I don't understand why you see the choice as being either 'use Django serializers' or 'write JSON in templates'. The middle way, which to my mind is much more robust and fits your use case well, is to build up your data as Python lists/dictionaries and then simply use simplejson.dumps() to convert it to a JSON string.
We use this method to get custom JSON format consumed by datatables.net
It was the easiest method we find to accomplish this task and it looks very fine with no problems so far.
You can find details here: http://datatables.net/development/server-side/django
So far, generating JSON from templates, we've run into the need to escape newlines. Looking at doing simplejson.dumps() next.