I am planning to write my scraper with V and i need to send estimatedly ~2500 request per second but can't figure out what am i doing wrong, it should be sending concurrently but it is deadly slow right now. Feels like i'm doing something really wrong but i can't figure it out.
import net.http
import sync
import time
fn send_request(mut wg sync.WaitGroup) ?string {
start := time.ticks()
data := http.get('https://google.com')?
finish := time.ticks()
println('Finish getting time ${finish - start} ms')
wg.done()
return data.text
}
fn main() {
mut wg := sync.new_waitgroup()
for i := 0; i < 50; i++ {
wg.add(1)
go send_request(mut wg)
}
wg.wait()
}
Output:
...
Finish getting time 2157 ms
Finish getting time 2173 ms
Finish getting time 2174 ms
Finish getting time 2200 ms
Finish getting time 2225 ms
Finish getting time 2380 ms
Finish getting time 2678 ms
Finish getting time 2770 ms
V Version: 0.1.29
System: Ubuntu 20.04
You're not doing anything wrong. I'm getting similar results in multiple languages in multiple ways. Many sites have rate limiting software that prevent repeated reads like this, that's what you're running up against.
You could try using channels now that they're in, but you'll still run up against the rate limiter.
Best way to send that many get requests it too use what is called a Head request, it relies on status code rather than a response since it doesn't return any. Which is what makes the http requests faster.
Related
As we know, the advantage of BigQuery Storage Write API, one month ago, we replace insertAll with managedwriter API on our server. It seems to work well for one month, however, we met the following errors recently
rpc error: code = Unavailable desc = closing transport due to: connection error:
desc = "error reading from server: EOF", received prior goaway: code: NO_ERROR,
debug data: "server_shutting_down"
The version of managedwriter API are:
cloud.google.com/go/bigquery v1.25.0
google.golang.org/protobuf v1.27.1
There is a piece of retrying logic for storage write API that detects error messages on our server-side. We notice the response time of storage write API becomes longer after retrying, as a result, OOM is happening on our server. We also tried to increase the request timeout to 30 seconds, and most of those requests could not be completed within it.
How to handle the error server_shutting_down correctly?
Update 02/08/2022
The default stream of managedwrite API is used in our server. And server_shutting_down error comes up periodically. And this issue happened on 02/04/2022 12:00 PM UTC and the default stream of managedwrite API works well for over one month.
Here is one wrapper function of appendRow and we log the cost time of this function.
func (cl *GBOutput) appendRows(ctx context.Context,datas [][]byte, schema *gbSchema) error {
var result *managedwriter.AppendResult
var err error
if cl.schema != schema {
cl.schema = schema
result, err = cl.managedStream.AppendRows(ctx, datas, managedwriter.UpdateSchemaDescriptor(schema.descriptorProto))
} else {
result, err = cl.managedStream.AppendRows(ctx, datas)
}
if err != nil {
return err
}
_, err = result.GetResult(ctx)
return err
}
When the error server_shutting_down comes up, the cost time of this function could be several hundred seconds. It is so weird, and it seems to there is no way to handle the timeout of appendRow.
Are you using the "raw" v1 storage API, or the managedwriter? I ask because managedwriter should handle stream reconnection automatically. Are you simply observing connection closes periodically, or something about your retry traffic induces the closes?
The interesting question is how to deal with in-flight appends for which you haven't yet received an acknowledgement back (or the ack ended in failure). If you're using offsets, you should be able to re-send the append without risk of duplication.
Per the GCP support guy,
The issue is hit once 10MB has been sent over the connection, regardless of how long it takes or how much is inflight at that time. The BigQuery Engineering team has identified the root cause and the fix would be rolled out by Friday, Feb 11th, 2022.
I am using phpseclib to conect with cisco switch.
$ssh = new SSH2('169.254.170.30',22);
$ssh->login('lemi', 'a');
Sometimes it connects very fast few times in a row (when i reload a page), but sometimes i get error message "Maximum execution time of 60 seconds exceeded" also few times in a row. I am doing that in laravel. I do not want to prolong execution time. I can't figure out where is the problem. I think that it has some problems with sockets... I am getting error on this line (3031) of code in SSH2.php :
$raw = stream_get_contents($this->fsock, $this->decrypt_block_size);
Any sugestions?
I've been searching the web for this for two days and I found nothing. Maybe I'm looking in the wrong way — I don't know...
So here it is: what are the times on my console when running a Karma+Jasmine+phantomJs unit test?
... Executed 1 of 1 SUCCESS (0.878 secs / 0.112 secs)
First, I though that the second time is the total unit test time (for example, when running multiple tasks), however, sometimes the first time gets to be 'bigger', sometimes not...
Anyone?
total time / net time
net time = only test execution (in the browser)
total time = how long it took since Karma noticed the file change till the final result was printed (net time + communication with the browser + loading the files in the browser)
See karma/lib/reporters/base.js
Doing simple GET requests over a high speed internet connection (within an AWS EC2 instance), the throughput seems to be really low. I've tried this with multiple servers.
Here's the code that I'm using:
HTTP2Client http2Client = new HTTP2Client();
http2Client.start();
SslContextFactory ssl = new SslContextFactory(true);
HttpClient client = new HttpClient(new HttpClientTransportOverHTTP2(http2Client), ssl);
client.start();
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
System.out.println("i" + i);
ContentResponse response = client.GET("https://http2.golang.org/");
System.out.println(response.getStatus());
}
System.out.println(System.currentTimeMillis() - start);
The throughput I get is about 8 requests per second.
This seems to be pretty low (as compared to curl on the command line).
Is there anything that I'm missing? Any way to turbo-charge this?
EDIT: How do I get Jetty to use multiple streams?
That is not the right way to measure throughput.
Depending where you are geographically, you will be dominated by the latency between your client and the server. If the latency is 125 ms, you can only make 8 requests/s.
For example, from my location, the ping to http2.golang.org is 136 ms.
Even if your latency is less, there are good chances that the server is throttling you: I don't think that http2.golang.org will be happy to see you making 10k requests in a tight loop.
I'll be curious to know what is the curl or nghttp latency in the same test, but I guess won't be much different (or probably worse if they close the connection after each request).
This test in the Jetty test suite, which is not a proper benchmark either, shows around 500 requests/s on my laptop; without TLS, goes to around 2500 requests/s.
I don't know exactly what you're trying to do, but your test does not tell you anything about the performance of Jetty's HttpClient.
I wrote two equal projects in Golang+Martini and Play Framework 2.2.x to compare it's performance. Both have 1 action that render 10K HTML View. Tested it with ab -n 10000 -c 1000 and monitored results via ab output and htop. Both uses production confs and compiled views. I wonder about results:
Play: ~17000 req/sec + constant 100% usage of all cores of my i7 = ~0.059 msec/req
Martini: ~4000 req/sec + constant 70% usage of all cores of my i7 = ~0.25 msec/req
...as I understand martini is not bloated, so why it 4.5 times slower? Any way to speedup?
Update: Added benchmark results
Golang + Martini:
./wrk -c1000 -t10 -d10 http://localhost:9875/
Running 10s test # http://localhost:9875/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 241.70ms 164.61ms 1.16s 71.06%
Req/Sec 393.42 75.79 716.00 83.26%
38554 requests in 10.00s, 91.33MB read
Socket errors: connect 0, read 0, write 0, timeout 108
Requests/sec: 3854.79
Transfer/sec: 9.13MB
Play!Framework 2:
./wrk -c1000 -t10 -d10 http://localhost:9000/
Running 10s test # http://localhost:9000/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 32.99ms 37.75ms 965.76ms 85.95%
Req/Sec 2.91k 657.65 7.61k 76.64%
276501 requests in 10.00s, 1.39GB read
Socket errors: connect 0, read 0, write 0, timeout 230
Requests/sec: 27645.91
Transfer/sec: 142.14MB
Martini running with runtime.GOMAXPROCS(runtime.NumCPU())
I want to use golang in production, but after this benchmark I don't know how can I make such decision...
Any way to speedup?
I made a simple martini app with rendering 1 html file and it's quite fast:
✗ wrk -t10 -c1000 -d10s http://localhost:3000/
Running 10s test # http://localhost:3000/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 31.94ms 38.80ms 109.56ms 83.25%
Req/Sec 3.97k 3.63k 11.03k 49.28%
235155 requests in 10.01s, 31.17MB read
Socket errors: connect 774, read 0, write 0, timeout 3496
Requests/sec: 23497.82
Transfer/sec: 3.11MB
Macbook pro i7.
This is app https://gist.github.com/zishe/9947025
It helps if you show your code, perhaps you didn't disable logs or missed something.
But timeout 3496 seems bad.
#Kr0e, right! I figured out that heavy use of reflection in DI of martini makes it perform slowly.
I moved to gorilla mux and wrote some martini-style helpers and got a wanted performance.
#Cory LaNou: I can't accept yours comment) Now I agree with you, no framework in prod is good idea. Thanks
#user3353963: See my question: Both uses production confs and compiled views
add this code:
func init(){
martini.Env = martini.Prod
}
sorry, your code have done.