Search result pagination, best practice - web-services

I have some results obtained through WS requests from a couple of different providers, then i gather and order the results and i show them at the user.
The number of the results is somewhere between 0 and 60-70, with an average of 10-20.
My problem is: how to handle pagination?
I'm trying to figure out which is the best solution for my situation, because i have find out several ways to do that... and I am sure I am missing other good (probably better) solutions... The solutions i thought until now:
1)Making for each page (15 results) a new aggregated search through the WebServices. This is stupid, but since the average number of results is 10-20, the pagination wont be used often.
2)Saving in the database all the results as a temporary cache and then showing 15 results at time
3)Loading all the results in a single page but showing only 15 a page using a Jquery pagination plugin (client side?)

It depends how big is 1 result, but I'd prefer no pagination if you have max 60-70 results, especially if it's not often. Better user experience.

Are you really sure that someday the web services aren't going to start returning a lot more results? What if someday there is a bug in one of them where it accidentally returns 50,000 copies of the same result to you? In each of your solutions:
A larger than expected number of results would cause you to spam the web services with repeated requests for the same results, as users page through them.
A larger than expected number of results will end up temporarily taking up space in your database. Also, in a web app, how will you know when to clear the cache?
A larger than expected number of results will end up as a huge page in the user's browser, possibly not rendering correctly until the whole thing is downloaded.
I really like option 3. The caching is done at the place where the data is wanted, there are no redundant hits to the web services, and paging will be super fast for the users.
If you're really certain no more than 60-70 results will ever be returned, and/or that your users will never want a really large number of results, you could combine option 3 with a cap on the number of results you will return.
Even in the worst case where the web services return erroneous/unexpected results, you could trim it to the first so many, send them down to the browser, and paginate them there with JavaScript.

Related

Django Debug Toolbar Target?

I've got a web page loading pretty slowly, so I installed the Django Debug Toolbar. I'm pretty new at this, so I'm trying to figure out what I can do with it.
I can see the database did 264 queries in 205 ms. Looks kind of high. I'm pretty sure I can cut down on that by adding some indexes and just writing better queries. But my question is: What is a "good" number that should be trying to hit here? What is generally accepted as "fast enough" and further optimization isn't really worth it. 50ms? 20ms?
Also on this same page it's showing 2500ms in user CPU. That sounds terrible to me, and I'm surprised it's so much higher than the database, which I assumed was the bottleneck. Is this maybe an indication that I am trying to do too much in python code instead of at the database layer? Would reducing the number of SQL queries help with CPU? (Waiting between queries?). Again is there some well known target response time I should be aiming for.
I'm looking for a snappy response from my clients. Right now when I click around I can feel a "pregnant pause" before the pages load.
By default accessing related model fields results in one extra query per model per row. Look into select_related() and prefetch_related(), this usually cuts down number of queries and speeds things up by a lot. I think debug toolbar shows you the actual queries, if not, need to enable sql logs before doing any query optimizations. Once you cut down number of queries to a minimum (no extra queries per pow), look for the slowest query and use EXPLAIN sql syntax to see if indexes are being used, this is another area where it can get slow especially on big data.
Usually database is the bottleneck, unless you are doing some major looping in your code. If you believe python code is slow, then need to profile it, otherwise it's just guessing.

Determine unique visitors to site

I'm creating a django website with Apache2 as the server. I need a way to determine the number of unique visitors to my website (specifically to every page in particular) in a full proof way. Unfortunately users will have high incentives to try to "game" the tracking systems so I'm trying to make it full proof.
Is there any way of doing this?
Currently I'm trying to use IP & Cookies to determine unique visitors, but this system can be easily fooled with a headless browser.
Unless it's necessary that the data be integrated into your Django database, I'd strongly recommend "outsourcing" your traffic to another provider. I'm very happy with Google Analytics.
Failing that, there's really little you can do to keep someone from gaming the system. You could limit based on IP address but then of course you run into the problem that often many unique visitors share IPs (say, via a university, organization, or work site). Cookies are very easy to clear out, so if you go that route then it's very easy to game.
One thing that's harder to get rid of is files stored in the appcache, so one possible solution that would work on modern browsers is to store a file in the appcache. You'd count the first time it was loaded in as the unique visit, and after that since it's cached they don't get counted again.
Of course, since you presumably need this to be backwards compatible then of course it leaves it open to exactly the sorts of tools which are most likely to be used for gaming the system, such as curl.
You can certainly block non-browserlike user agents, which makes it slightly more difficult if some gamers don't know about spoofing browser agent strings (which most will quickly learn).
Really, the best solution might be -- what is the outcome from a visit to a page? If it is, for example, selling a product, then don't award people who have the most page views; award the people whose hits generate the most sales. Or whatever time-consuming action someone might take at the page.
Possible solution:
If you're willing to ignore people with JavaScript disabled, you could choose to count only people who access the page and then stay on that page for a given window of time (say, 1 minute). After a given period of time, do an Ajax request back to the server. So if they tried to game by changing their cookie and loading multiple tabs at once, it wouldn't work because they'd need to have the same cookie in order to register that they'd been on that page long enough. I actually think this might work; I can't honestly see a way to game that. Basically on the server side you store a dictionary called stay_until in request.session with keys for each unique page and after 1 minute or so you run an Ajax call back to the server. If the value for stay_until[page_id] is less than or equal to the current time, then they're an active user, otherwise they're not. This means that it will take someone at least 20 minutes to generate 20 unique visitors, and so long as you make the payoff worth less than the time consumed that will be a strong disincentive.
I'd even make it more explicit: on the bottom of the page in a noscript tag, put "Your access was not counted. Turn on JavaScript to be counted" with a page that lays out the tracking process.
As HTML Requests are stateless and you have no control over the users behavior on his clientside, there is no bulletproof way.
The only way you're going to be able to track "unique" visitors in a fool-proof way is to make it contingent on some controlled factor such as a login. Anything else can and will fail to be completely accurate.

Windows Phone 7 - Best Practices for Speeding up Data Fetch

I have a Windows Phone 7 app that (currently) calls an OData service to get data, and throws the data into a listbox. It is horribly slow right now. The first thing I can think of is because OData returns way more data than I actually need.
What are some suggestions/best practices for speeding up the fetching of data in a Windows Phone 7 app? Anything I could be doing in the app to speed up the retrieval of data and putting into in front of the user faster?
Sounds like you've already got some clues about what to chase.
Some basic things I'd try are:
Make your HTTP requests as small as possible - if possible, only fetch the entities and fields you absolutely need.
Consider using multiple HTTP requests to fetch the data incrementally instead of fetching everything in one go (this can, of course, actually make the app slower, but generally makes the app feel faster)
For large text transfers, make sure that the content is being zipped for transfer (this should happen at the HTTP level)
Be careful that the XAML rendering the data isn't too bloated - large XAML structure repeated in a list can cause slowness.
When optimising, never assume you know where the speed problem is - always measure first!
Be careful when inserting images into a list - the MS MarketPlace app often seems to stutter on my phone - and I think this is caused by the image fetch and render process.
In addition to Stuart's great list, also consider the format of the data that's sent.
Check out this blog post by Rob Tiffany. It discusses performance based on data formats. It was written specifically with WCF in mind but the points still apply.
As an extension to the Stuart's list:
In fact there are 3 areas - communication, parsing, UI. Measure them separately:
Do just the communication with the processing switched off.
Measure parsing of fixed ODATA-formatted string.
Whether you believe or not it can be also the UI.
For example a bad usage of ProgressBar can result in dramatical decrease of the processing speed. (In general you should not use any UI animations as explained here.)
Also, make sure that the UI processing does not block the data communication.

How to convert concurrent users into hits per second?

SRS for the system I'm currently working on includes the following non-functional requirement: "the SuD shall be scalable to 200 concurrent users". How can I convert this statement to a more measurable characteristic: "hits per second"?
Assuming you're talking about a web application (based on your desire to estimate "hits" per second), you have to work on a number of assumptions.
- How long will a user spend between interactions? For typical content pages, that might be 10 seconds; for interactive web apps, perhaps only 5 seconds.
- Divide the number of users by the "think time" to get hits per second - 200 concurrent users with a think time of 10 seconds gives you 20 concurrent users on average.
- Then multiply by a "peak multiplier" - most web sites are relatively silent during the night, but really busy around 7PM. So your average number needs to take account of that - typically, I recommend a peak of between 4 and 10 times.
This gives you a peak page requests per second - this is usually the limiting factor for web applications (though by no means always - streaming video is often constrained by bandwidth, for instance).
If you really want to know "hits", you then need to work through the following:
- How many assets on your page? Images, stylesheets, javascript files etc. - "hit" typically refers to any kind of request, not just the HTML page (or ASPX or PHP or whatever). Most modern web apps include dozens of assets.
- How cacheable are your pages and/or assets? Most images, CSS, JS files etc. should be set to cacheable by the browser.
Multiply the page requests by the number of non-cacheable assets. Add to this the number of visitors multiplied by the number of assets if you want to be super precise.
All of this usually means you have to make lots and lots of assumptions - so the final number is an indicator at best. For scalability measurements, I usually spend more time trying to understand the bottlenecks in the system and observing the system under load.
Well that's impossible to answer without knowing anything about your app or what it does. You need to figure out how many hits per second one user is likely to make when using the app, and multiply by 200.
Incidently, hits/second is not the only metric you need to be concerned with. With 200 concurrent users how much memory overhead will that be? How much disk access or open file handles? How many db reads/writes? How much bandwidth (does the app involve streaming media)? Can it all be handled by one machine? etc etc

Instant search considerations

I've started working on a basic instant search tool.
This is a workflow draft.
User presses a key
Current value gets passed to the function which will make an Ajax call to a web service
Web service will run a select on a database through LINQ-To-SQL and will retrieve a list of values that match my value. I will achieve this by using SQL Like clause
Web service will return data to the function.
Function will populate relative controls through jQuery.
I have the following concerns/considerations:
Problem: Fast typists: I have typed in this sentence within few seconds. This means that on each key press I will send a request to a database. I may have 10 people doing the same thing. Server may return a list of 5 records, or it may return a list of 1000 records. Also I can hold down a key and this will send few hundred requests to a database - this can potentially slow the whole system down.
Possible solutions:
Timer where I will be able to send a request to database once every 2-4 seconds
Do not return any data unless the value is at least 3 characters long
Return a limited number of rows?
Problem: I'm not sure whether LINQ-to-SQL will cope with the potential load.
Solution: I can use stored procedures, but is there any other feasible alternatives?
I'm interested to hear if anybody else is working on a similar project and what things you have considered before implementing it.
Thank you
When to call the web service
You should only call the web service when the user is interested in suggestions. The user will only type fast if he knows what to type. So while he's typing fast, you don't have to provide suggestions to the user.
When a fast typist pauses for a short time, then he's probably interested in search suggestions. That's when you call the web service to retrieve suggestions.
Slow typists will always benefit from search suggestions, because it can save them time typing in the query. In this case you will always have short pauses between the keystrokes. Again, these short pauses are your queue to retrieve suggestions from the web service.
You can use the setTimeout function to call your web service 500 milliseconds after the user has pressed a key. If the user presses a key, you can reset the timeout using clearTimeout. This will result in a call to the web service only when the user is idle for half a second.
Performance of LINQ-to-SQL
If your query isn't too complex, LINQ-to-SQL will probably perform just fine.
To improve performance, you can limit the number of suggestions to about twenty. Most users aren't interested in thousands of suggestions anyway.
Consider using a full text catalog instead of the like clause if you are searching through blocks of text to find specific keywords. Besides being much faster, it can be configured to recognize multiple forms of the same word (like mouse and mice or leaf and leaves).
To really make your search shine, you can correct many common misspellings using the levenshtein distance to compare the search term to a list of similar terms when no matches are found.