Faster twitter ID stream - python-2.7

My project is to download extremly big number of ID-s from twitter.
Also known as, that the average user have small number of followers(100-200).
I use for this streaming the Twython package, and here is the main part of my program:
while(next_cursor):
follower_id=twitter.get_followers_ids(user_id=ids,cursor=next_cursor)
time.sleep(60)
next_cursor=follower_id['next_cursor']
This is a really simple cod, and works also, but really slow, for big number of ID-s, becouse the function tw.get_follower_id()-s rate limit is 5000 id/minute, thats why the time sleep function is in the code.
My question, is there any possibilites of speed up this code?
Perhaps so that the program does not pause after each query, only when it really need.
Could somebody help with this?

Twitter provide rate-limit info in the headers sent with every API response. SO you could check that, and hence call at the maximum rate allowed. You can also request your rate-limit status from Twitter via a specific rate-limit API call, and it doesn't reduce the rate-limit to check. I don't use Twython myself, so I can't advise on how to do so within Twython.
It won't gain you much extra -- maybe a few %.
Alternatively, it doesn't hurt to bump into the rate-limit occasionally -- you'll get an error message. As long as it isn't too frequent, Twitter won't mind.
The basic rate-limit speed cap -- no way round that. Perhaps Gnip have a paid service that will let you download this data faster?

Related

GCP Functions Node.js Huge Latency

We are running simple GCP Functions (pure, no Firebase, or any other layer added) that just handle HTTP requests using Node.js engine (previously version 8, now 10) and return some "simple JSON response". What we see is that sometimes (but not rarely) there is a huge latency when the request is "accepted by GCP" and before it gets to our function code. If I say huge I'm not speaking ms but units of seconds! And it is not a cold start (we have separate log messages on the global scope so we know when cold start occurs). Functions have currently 256 or 512 mb and run in close region.
We log at the very first line of the GCP function, for example:
or
Does anyone also experience that? And is that normal that sometimes this delay may take up to 5s (or rarely even more)?
By the way, sometimes the same thing happens on the output side as well. So if unlucky, it may take up to 10s. Thanks in advance for any reply, no matter if you have or have not similar experience.
All such problems I have seen have been related with cold start or it was not possible to prove that they are not related with code start.
This question could be even to broad to stackoverflow. We do not have any chance to reproduce it without example at least functions and number of the executions, however I will try to answer.
It seems that latency analyzes are done mainly on logs. I think you should try to use "Trace" functionality that is available in GCP (direct link and documentation). This should give you data to be able to track the issue.
Example i have used it on helloworld cloud function and was curl'ing it from bush script. It seems that over few hundreds of invocations there was one execution with latency 10 times greater than usually.
I hope it will help somehow :)!

pytest-timeit or pytest-benchmark which one is better in terms of accuracy s?

I am testing response status code and data of flask-restful api in pytest , now I would like to test the time these end points are taking , I am considering pytest-timeit a benchmark plugin , does anyone know what is more accurate ?
pytest-benchmark is the clear choice.
You can run the benchmark function times and provide a more detailed summary of its running time.
You can do that obviously manually in timeit. But not worth the time given the feature rich and well documented benchmark function.
Read More on benchmark here.

What is the maximum time a web application (or website) should respond to a request?

I'm aware that a web application should render it's pages as fast as possible, with few database requests only in milliseconds. What are the guidelines about this response time (like Microsoft guidelines for UI or something like that)?
What is the absolute maximum time a webpage should respond under?
Are there any "limits" or general guidelines for this?
When should I put jobs into task queues (like Python celery for example)?
My concrete problem is that I have to parse a bunch of text files, which users submits. The average time these can be parsed are 2-3 seconds (response times are 3-4sec with database inserts) but if the file is very big, it takes 8sec to parse (10sec to respond).
Is it okay to let the user without feedback for this time? If not, what is the best way for handling these kinds of situations?
Is it even okay to put these in the request-response cycle ?
Is there any difference if I provide a REST API vs a website form ? Are these "allowed" slower to respond?
I think this is really hard to answer. Different guidelines exist.
When I was at university during interface / interaction design courses I learned that no user should be left with response times over 50 ms.
If that is exceeded, something like a loading icon etc should be displayed.
Also users are educated enough to expect right loading times form websites... So the user will accept 2 seconds loading time for a ticket booking page but not accept more than 300 ms from a search engine.
The limits I hear about during this days are 0.1 sec, 1 sec and 10 sec.
0.1 feels instantly to the user on websites
1 sec is slow but no interruption
10 sec is the maximum for the user to endure before loosing attention (for example light a smoke, check facebook feed in the meantime, etc...)
There is a nice article along with a lot of useful comments which I lately read which I would like to point you to:
http://www.nngroup.com/articles/response-times-3-important-limits/
I think it answers your questions well.
Please understand that this is all purely subjective but I think this is a very subjective topic...

Porting REST API connection code to Android NDK in c++

I have an Android application in the market which connects and send POST and GET queries to a REST API, and then stores the results in a DB which are then queries and displayed in an appropriate manner in the application.
I'm interested in speeding up the application and have noticed quite a lot of lag between the time of receiving the data back from the api and the data being ready to use. I'd like to investigate if and how I can write similar code in c++ using the NDK to connect to the REST API, process the results and store in a DB or raise an error. I've no previous c++ experience and need to know firstly if I can access the same DB in the C++ as the Java, secondly if there are any other caveats which I should be aware of?
Also I guess I should ask - is it worth doing this? Will I notice any difference?
Any links to similar code, or an overview of where I should look to get started in c++ would be greatly appreciated.
I'm doing the EXACT same thing, and trust me: if you have no previous C++ experience, this might be a bit too costly for little benefit.
In my case, after some profiling, I reordered things around and had an initial jump in performance only by dropping DOM and using SAX. All the rest is only making things marginally better, like processing the response while packets are still being transmitted (i.e. not wait for the full response to start processing), and multiplexing requests on the same thread instead of starting a new thread for each.
What you should be looking for in Google is POSIX sockets, HTTP and REST codes, if you wish to do it all by hand. A better option might be using CURL or something similar for the Socket/HTTP part. I did it all myself, but only because I have already done this a few times.

Instant search considerations

I've started working on a basic instant search tool.
This is a workflow draft.
User presses a key
Current value gets passed to the function which will make an Ajax call to a web service
Web service will run a select on a database through LINQ-To-SQL and will retrieve a list of values that match my value. I will achieve this by using SQL Like clause
Web service will return data to the function.
Function will populate relative controls through jQuery.
I have the following concerns/considerations:
Problem: Fast typists: I have typed in this sentence within few seconds. This means that on each key press I will send a request to a database. I may have 10 people doing the same thing. Server may return a list of 5 records, or it may return a list of 1000 records. Also I can hold down a key and this will send few hundred requests to a database - this can potentially slow the whole system down.
Possible solutions:
Timer where I will be able to send a request to database once every 2-4 seconds
Do not return any data unless the value is at least 3 characters long
Return a limited number of rows?
Problem: I'm not sure whether LINQ-to-SQL will cope with the potential load.
Solution: I can use stored procedures, but is there any other feasible alternatives?
I'm interested to hear if anybody else is working on a similar project and what things you have considered before implementing it.
Thank you
When to call the web service
You should only call the web service when the user is interested in suggestions. The user will only type fast if he knows what to type. So while he's typing fast, you don't have to provide suggestions to the user.
When a fast typist pauses for a short time, then he's probably interested in search suggestions. That's when you call the web service to retrieve suggestions.
Slow typists will always benefit from search suggestions, because it can save them time typing in the query. In this case you will always have short pauses between the keystrokes. Again, these short pauses are your queue to retrieve suggestions from the web service.
You can use the setTimeout function to call your web service 500 milliseconds after the user has pressed a key. If the user presses a key, you can reset the timeout using clearTimeout. This will result in a call to the web service only when the user is idle for half a second.
Performance of LINQ-to-SQL
If your query isn't too complex, LINQ-to-SQL will probably perform just fine.
To improve performance, you can limit the number of suggestions to about twenty. Most users aren't interested in thousands of suggestions anyway.
Consider using a full text catalog instead of the like clause if you are searching through blocks of text to find specific keywords. Besides being much faster, it can be configured to recognize multiple forms of the same word (like mouse and mice or leaf and leaves).
To really make your search shine, you can correct many common misspellings using the levenshtein distance to compare the search term to a list of similar terms when no matches are found.