I getting data from a webservice. (one-time timelimited password used for login)
Data only needs to be read, no updates.
I'm still looking for the best framework to put this in without making the small-medium site too heavy.
If I only gets my data from the webservice once, puts this in several objects..
Would it make sense to store this in cache and reuse it on other information pages?
Using mvc2, would it be sensible to put the entire entity model in HttpRuntime.Cache ?
(I guess session is out of the question..)
thanks,
nakori
You can use the EF caching provider, which will make your entities come from Velocity. That said, it's a lot easier to put stuff into a cache than to know when to expire it. Look into the CQRS approach for overall architecture.
Related
I have been working with Ember Data and i'm trying to understand some concepts. I have a quite heavy data intensive app, my back-end has endpoints that return a lot of records.
So, basically i have Route's that have something like this.store.findAll('places') which can return thousands of places having each one several text intensive fields like services or description.
This is only one of the resources, there are a few more that handle that amount of data as well.
My main concern is that the app hits some kind of limit or becomes unresponsive. So my question is that: How does Ember Data manage large amount of records ? Is there any best practice to handle those kind of scenarios ?
How does Ember Data manage large amount of records?
The same way as it handles a small amount of records. It's not going to do anything special for performance if you try to load/fetch a large number of records. You need to handle that yourself.
Is there any best practice to handle those kind of scenarios?
Unfortunately, no. Pagination of some sort is really the only way to accomplish this. But as you can see in this thread, there's quite a bit of discussion about the "best" way to do it. There are adapters and plugins made to handle this scenario, as well as server-side boilerplate designed to make it easy. But there really is no canonical way of doing pagination with Ember Data.
In my opinion, the best way to handle large amounts of data is to design a query endpoint and implement it on your server, handling everything yourself. This will be the most tailored to your application and the easiest to understand. If it sounds complicated, that's because it is. Data set segmentation/pagination is not a simple problem to solve, you will definitely run into issues along the way. That's why there's no agreed-upon best practice yet.
Update: Javier Cadiz mentioned the JSON API in the comments so I thought I would mention it. The JSON API does seem to be the new defacto standard for Ember Data, and it does specifiy a pagination method. However, the JSON API is fairly new and isn't widely adopted yet. I believe it wasn't until very recently that Ember Data switched to the JSON API adapter as its default. Using this pagination would mostly likely require you to conform to the entire API, not just the pagination aspect. (Although you can always steal certain ideas from it.) Because of that, I'm not sure if I'd call it a best practice just yet.
Bottom line: the JSON API way of pagination may be the way of the future, but it's not currently very popular. (Although that's just my opinion based on what I see/read. There's no saying how many people are using it privately.)
We are developing an online school diary application using django. The prototype is ready and the project will go live next year with about 500 students.
Initially we used sqlite and hoped that for the initial implementation this would perform well enough.
The data tables are such that to obtain details of a school day (periods, classes, teachers, classrooms, many tables are used and the database access takes 67ms on a reasonably fast PC.
Most of the data is static once the year starts with perhaps minor changes to classrooms. I thought of extracting the timetable for each student for each term day so no table joins would be needed. I put this data into a text file for one student, the file is 100K in size. The time taken to read this data and process it for a days timetable is about 8ms. If I pre-load the data on login and store it in sessions it takes 7ms at login and 2ms for each query.
With 500 students what would be the impact on the web server using this approach and what other options are there (putting the student text files into a sort of memory cache rather than session for example?)
There will not be a great deal of data entry, students adding notes, teachers likewise, so it will mostly be checking the timetable status and looking to see what events exist for that day or week.
What is your expected response time, and what is your expected number of requests per minute? One twentieth of a second for the database access (which is likely to be slow part) for a request doesn't sound like a problem to me. SQLite should perform fine in a read-mostly situation like this. So I'm not convinced you even have a performance problem.
If you want faster response you could consider:
First, ensuring that you have the best response time by checking your indexes and profiling individual retrievals to look for performance bottlenecks.
Pre-computing the static parts of the system and storing the HTML. You can put the HTML right back into the database or store it as disk files.
Using the database as a backing store only (to preserve state of the system when the server is down) and reading the entire thing into in-memory structures at system start-up. This eliminates disk access for the data, although it limits you to one physical server.
This sounds like premature optimization. 67ms is scarcely longer than the ~50ms where we humans can observe that there was a delay.
SQLite's representation of your data is going to be more efficient than a text format, and unlike a text file that you have to parse, the operating system can efficiently cache just the portions of your database that you're actually using in RAM.
You can lock down ~50MB of RAM to cache a parsed representation of the data for all the students, but you'll probably get better performance using that RAM for something else, like the OS disk cache.
I agree with some of other answers which suggest to use MySQL or PostgreSQL instead of SQLite. It is not designed to be used as production db. It is great for storing data for one-user applications such as mobile apps or even a desktop application, but it falls short very quickly in server applications. With Django it is trivial to switch to any other full-pledges database backend.
If you switch to one of those, you should not really have any performance issues, especially if you will do all the necessary joins using select_related and prefetch_related.
If you will still need more performance, considering that "most of the data is static", you actually might want to convert Django site a static site (a collection of html files) and then serve those using nginx or something similar to that. The simplest way I can think of doing that is to just write a cron-job which will loop over all needed url-configs, request the page from Django and then save that as an html file. If you want to go into that direction, you also might want to take a look at Python's static site generators: Hyde and Pelican.
This approach will certainly work much faster then any caching system however you will loose any dynamic components of the site. If you need them, then caching seems like the best and fastest solution.
You should use MySQL or PostgreSQL for your production database. sqlite3 isn't a good idea.
You should also avoid pre-loading data on login. Since your records can be inserted in advance, write django management commands and run the import to your chosen database before hand and design your models such that when a user logs in, the user would already be able to access and view/edit his or her related data (which are pre-inserted before the application even goes live). Hardcoding data operations when log in does not smell right at all from an application design point-of-view.
https://docs.djangoproject.com/en/dev/howto/custom-management-commands/
The benefit of designing your django models and using custom management commands to insert the records right way before your application goes live implies that you can use django orm to make the appropriate relationships between users and their records.
I suspect - based on your description of what you need above - that you need to re-look at the approach you are creating this application.
With 500 students, we shouldn't even be talking about caching. If you want response speed, you should deal with the following issues in priority:-
Use a production quality database
Design your application use case correctly and design your application model right
Pre-load any data you need to the production database
front end optimization comes first (css/js compression etc)
use django debug toolbar to figure out if any of your sql is slow and optimize specifically those
implement caching (memcached etc) as needed
As a general guideline.
I hope the title is chosen well enough to ask this question.
Feel free to edit if not and please accept my apologies.
I am currently laying out an application that is interacting with the web.
Explanation of the basic flow of the program:
The user is entering a UserID into my program, which is then used to access multiple xml-files over the web:
http://example.org/user/userid/?xml=1
This file contains several ID's of products the user owns in a DRM-System. This list is then used to access stats and informations about the users interaction with the product:
http://example.org/user/appid/stats/?xml=1
This also contains links to various images which are specific to that application. And those may change at any time and need to be downloaded for display in the app.
This is where the horror starts, at least for me :D.
1.) How do I store that information on the PC of the user?
I thought about using a directory for the userid, then subfolders with the appid to cache images and the xml-files to load them on demand. I also thought about using a zipfile while using the same structure.
Or would one rather use a local db like sqlite for that?
Average Number of Applications might be around ~100-300 and stats and images per app from basically 5-700.
2.) When should I refresh the content?
The bad thing is, the website from where this data is downloaded, or rather the xmls, do not contain any timestamps when it was refreshed/changed the last time. So I would need to hash all the files and compare them in the moment the user is accessing that data, which can take an inifite amount of time, because it is webbased. Okay, there are timeouts, but I would need to block the access to the content until the data is either downloaded and processed or the timeout occurs. In both cases, the application would not be accessible for a short or maybe even long time and I want to avoid that. I could let the user do the refresh manually when he needs it, but then I hoped there are some better methods for that.
Especially with the above mentioned numbers of apps and stuff.
Thanks for reading and all of that and please feel free to ask if I forgot to explain something.
It's probably worth using a DB since it saves you messing around with file formats for structured data. Remember to delete and rebuild it from time to time (or make sure old stuff is thoroughly removed and compact it from time to time, but it's probably easier to start again, since it's just a cache).
If the web service gives you no clues when to reload, then you'll just have to decide for yourself, but do be sure to check the HTTP headers for any caching instructions as well as the XML data[*]. Decide a reasonable staleness for data (the amount of time a user spends staring at the results is a absolute minimum, since they'll see results that stale no matter what you do). Whenever you download anything, record what date/time you downloaded it. Flush old data from the cache.
To prevent long delays refreshing data, you could:
visually indicate that the data is stale, but display it anyway and replace it once you've refreshed.
allow staler data when the user has a lot of stuff visible, than you do when they're just looking at a small amount of stuff. So, you'll "do nothing" while waiting for a small amount of stuff, but not while waiting for a large amount of stuff.
run a background task that does nothing other than expiring old stuff out of the cache and reloading it. The main app always displays the best available, however old that is.
Or some combination of tactics.
[*] Come to think of it, if the web server is providing reasonable caching instructions, then it might be simplest to forget about any sort of storage or caching in your app. Just grab the XML files and display them, but grab them via a caching web proxy that you've integrated into your app. I don't know what proxies make this easy - you can compile Squid yourself (of course), but I don't know whether you can link it into another app without modifying it yourself.
Do you fire ajax requests through the MVC framework of choice, or directly to the CFC?
I'm leaning towards bypassing the MVC, since I need no 'View' from the ajax request.
What are the pro's of routing ajax calls through MVC framework, like Coldbox?
update: found this page http://ortus.svnrepository.com/coldbox/trac.cgi/wiki/cbAjaxHints but I am still trying the wrap my mind around what benefits it brings over the complexity it introduces...
Henry, I make my Ajax requests to proxy objects of my model. Typically, I am outside of a 'framework' when doing so. That being said, it may be (very) necessary to utilize your framework, such as working within a set security model.
I can't really see any benefit of bypassing the MVC framework - in combination, those three elements are your application.
Your ajax elements are really part of the view. As Luca says, the view outputs the results of the model and controller.
Look at it this way - if you made an iPhone-friendly web interface (that is, a new View), would you bypass the model and controller?
Luis Majano, creator of ColdBox said:
These are the two schools of ajax
interaction henry.
I prefer the proxy approach because it
adds the following:
Debugging
Tracing in the debugger
AOP interception points
Security
Setting availability
The proxy will relay to the event model, so I can use local interception
points, local AOP, plugins, etc.
In other words, it can be a highly
monitored call instead of a simple
service cfc call, which you can still
do.
I, for one, love to have my execution
profiler running (part of the coldbox
debugger), so I can see when ajax
requests come in and when they come
out. I can see the data requested and
the data sent back. I don't have to
look in log files, or try to imagine
results or problems. It really helps
out in debugging.
However, it would be a developer
choice in which way you decide to go.
My personal preference is to always
use my proxy to event delegation
because it gives me much more
flexibility, debugging and peace of
mind.
The purpose of the "view" in MVC frameworks is to show the data after the "model" and "controller" have generated it. If you don't need the "view", then what's the point of using such a design pattern?
I agree with Luca. It also bypasses any kind of sanitization and filtering logic you have in your MC stack. It basically negates any kind of query processing that you may or may not have in place.
Yeah, I wouldn't bypass your framework, figure out what's causing you grief and hunt down the offending pieces, adding logic to exclude common components such as headers or footers, and looking for methods injecting whitespace that while fine for html is annoying or down right problematic when parsing json.
Adding output="false" especially in your application.cfc and it's methods would be the first thing I cleaned up.
I am a strong believer in NEVER directly accessing the CFC's directly, I find it creates long term problems when a major refactor might want to consolidate or eliminate components, the direct accesses potentially make this harder than it should be, especially if a third party is hitting your ajax from another domain(e.g. flash remoting).
+1 to Steve's answer.
What are the best practices for dealing with binaries in domain model? I frequently must associate images and other files with business objects and the simple byte[] is not adequate even for the simplest of cases.
The files:
Does not have a fixed size and can be quite large thus:
Have to be streamed or buffered, preferably in asynchronous manner;
Must be cached both on server and client to avoid redundant transfer;
On unreliable connections the data transfer can be easily interrupted and has to be
resumed - therefore the transfer could start not from the beginning of file but from arbitrary position.
Are handled differently than the rest of the data:
In web applications are not part of the page content but are downloaded by browser separately;
Might be a black box that is handled by third-party software;
For performance reasons might not even be stored in the database.
How do we go about expressing such files in domain model (or more specifically, in model classes)? If the rest of the model is transferred via DTOs and WCF web services and persisted with NHibernate in the database, but the files not necessarily so, how to make the file handling transparent, part of the overall transaction where applicable yet support all that is necessary for them to be consumed not only in web applications, but also in ordinary desktop applications.
For WPF and ASP.NET the file object must expose some form of Url property that can be data-bound to WPF controls or used in IMG or HTML tags. Uploading a file is a lot more complicated. Preferably, proper presentation and content practices such as MVVM must be maintained there.
I am really lost here as I am not satisfied with any of my previous solutions. What would you advice?
You have to be careful not to try and shoehorn too much functionality into a single class here, your wording sounds a bit like you want a single "File" object that will do everything, this is not a good idea.
You will need to have a concept of a File representation that can be passed around everywhere as you have identified - but this needs to be little more than an identifier and possibly a name - it is then up to individual components to decide how they treat this, for example the HTML page may use a File json object and infer that jsFile.Id needs to be retrieved using ftp://xxx/uploads/{id} or something, while in order to display additional associated information a WCF service might receive the file id and look up info in a database.
It probably makes sense to have a FileAttributesDTO class or some such just to distinguish it from when you are dealing with the physical file. You need to consider seperation of concerns and nail down as many use cases as you can before you proceed really. For example will you really need additional information or would a simple wrapper around an FTP service get you all you need.