Large(r) files with camel-ws - Memory - web-services

We use camel with a webservice-endpoint starting a route. This works well so far. Now I tested some 'larger' file of just 15MB and the route runs into a OutOfMemoryError. Heapsize is 1280MB (-Xms128m -Xmx1280m).
Is there a limit for the memory, a route may use? I winder, that a 15MB file could not be received with the tomcat using 1280 MB of memory.
Did I miss some configuration?
Sry for answering late, I had same time off.
The route looks like this, I think this doesn't really help ;-):
this.from(this.createWsEndpoint(...))
.routeId("some id")
.process(new AuthentificationProcessor(...))
.process(this.getAddMetadataProcessor())
.process(this.getArchiveToDbProcessor())
.to(this.getProducerJpaEndpoint())
Anyway, ftp seems to be the future for this case.

Related

Is there a maximum concurrency for AWS s3 multipart uploads?

Referring to the docs, you can specify the number of concurrent connection when pushing large files to Amazon Web Services s3 using the multipart uploader. While it does say the concurrency defaults to 5, it does not specify a maximum, or whether or not the size of each chunk is derived from the total filesize / concurrency.
I trolled the source code and the comment is pretty much the same as the docs:
Set the concurrency level to use when uploading parts. This affects
how many parts are uploaded in parallel. You must use a local file as
your data source when using a concurrency greater than 1
So my functional build looks like this (the vars are defined by the way, this is just condensed for example):
use Aws\Common\Exception\MultipartUploadException;
use Aws\S3\Model\MultipartUpload\UploadBuilder;
$uploader = UploadBuilder::newInstance()
->setClient($client)
->setSource($file)
->setBucket($bucket)
->setKey($file)
->setConcurrency(30)
->setOption('CacheControl', 'max-age=3600')
->build();
Works great except a 200mb file takes 9 minutes to upload... with 30 concurrent connections? Seems suspicious to me, so I upped concurrency to 100 and the upload time was 8.5 minutes. Such a small difference could just be connection and not code.
So my question is whether or not there's a concurrency maximum, what it is, and if you can specify the size of the chunks or if chunk size is automatically calculated. My goal is to try to get a 500mb file to transfer to AWS s3 within 5 minutes, however I have to optimize that if possible.
Looking through the source code, it looks like 10,000 is the maximum concurrent connections. There is no automatic calculations of chunk sizes based on concurrent connections but you could set those yourself if needed for whatever reason.
I set the chunk size to 10 megs, 20 concurrent connections and it seems to work fine. On a real server I got a 100 meg file to transfer in 23 seconds. Much better than the 3 1/2 to 4 minute it was getting in the dev environments. Interesting, but thems the stats, should anyone else come across this same issue.
This is what my builder ended up being:
$uploader = UploadBuilder::newInstance()
->setClient($client)
->setSource($file)
->setBucket($bicket)
->setKey($file)
->setConcurrency(20)
->setMinPartSize(10485760)
->setOption('CacheControl', 'max-age=3600')
->build();
I may need to up that max cache but as of yet this works acceptably. The key was moving the processor code to the server and not relying on the weakness of my dev environments, no matter how powerful the machine is or high class the internet connection is.
We can abort the process during upload and can halt all the operations and abort the upload at any instance. We can set Concurrency and minimum part size.
$uploader = UploadBuilder::newInstance()
->setClient($client)
->setSource('/path/to/large/file.mov')
->setBucket('mybucket')
->setKey('my-object-key')
->setConcurrency(3)
->setMinPartSize(10485760)
->setOption('CacheControl', 'max-age=3600')
->build();
try {
$uploader->upload();
echo "Upload complete.\n";
} catch (MultipartUploadException $e) {
$uploader->abort();
echo "Upload failed.\n";
}

Jetty's HTTP/2 client slow?

Doing simple GET requests over a high speed internet connection (within an AWS EC2 instance), the throughput seems to be really low. I've tried this with multiple servers.
Here's the code that I'm using:
HTTP2Client http2Client = new HTTP2Client();
http2Client.start();
SslContextFactory ssl = new SslContextFactory(true);
HttpClient client = new HttpClient(new HttpClientTransportOverHTTP2(http2Client), ssl);
client.start();
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
System.out.println("i" + i);
ContentResponse response = client.GET("https://http2.golang.org/");
System.out.println(response.getStatus());
}
System.out.println(System.currentTimeMillis() - start);
The throughput I get is about 8 requests per second.
This seems to be pretty low (as compared to curl on the command line).
Is there anything that I'm missing? Any way to turbo-charge this?
EDIT: How do I get Jetty to use multiple streams?
That is not the right way to measure throughput.
Depending where you are geographically, you will be dominated by the latency between your client and the server. If the latency is 125 ms, you can only make 8 requests/s.
For example, from my location, the ping to http2.golang.org is 136 ms.
Even if your latency is less, there are good chances that the server is throttling you: I don't think that http2.golang.org will be happy to see you making 10k requests in a tight loop.
I'll be curious to know what is the curl or nghttp latency in the same test, but I guess won't be much different (or probably worse if they close the connection after each request).
This test in the Jetty test suite, which is not a proper benchmark either, shows around 500 requests/s on my laptop; without TLS, goes to around 2500 requests/s.
I don't know exactly what you're trying to do, but your test does not tell you anything about the performance of Jetty's HttpClient.

Aws - End of script output before headers: wsgi.py

I have a django application that does some heavy computation. It works very good with less data on my machine and on 'aws -elasticbeanstalk' as well. But When the data becomes large it on aws, gives, internal server error, and in the logs it shows:
[core:error]End of script output before headers: wsgi.py
However works fine on my machine
The code where it constantly gives this error is :
[my_big_lst[int(i[0][1])-1].appendleft((int(i[0][0]) - i[1])) for i in itertools.product(zipped_list,temp_list)]
where:
my_big_lst is a big list of deques
zipped_list is a large list of tuples
temp_list is a large list of numbers
It is notable, that as data grows large, the processing time also increases, and also that this problem is only coming on aws when data is large, and on my machine, it always works fine.
Update:
I worked out, that this error happens when the processing time exceeds 60 seconds, I also changed the Idle Loadbalancer time to 3600, but no effect, still error is there
Please anyone suggest a solution ?
If you are using a c-extension module, You could try setting
WSGIApplicationGroup %{GLOBAL}
in your virtualhost.
Something about python subinterpreters not working with c-extension modules. However since your code works for a smaller data set, your problem might be solved by setting memory-specific directives.
https://code.google.com/archive/p/modwsgi/wikis/ApplicationIssues.wiki#Python_Simplified_GIL_State_API

How to recover HDFS journal node?

I have configured 3 journalnodes, let's say JN1, JN2, JN3. Each of them saves the edit log under /tmp/hadoop/journalnode/mycluster...
Based on which, I started my namenode, secondary namenode and bunch of datanode. The system runs well until one day JN2 and JN3 are dead. Furthermore, the disks are corrupted.
Then I purchased the new disks and restarted JN2 and JN3. The bad thing is it didn't work anymore.
It keeps complaining
org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /tmp/hadoop/dfs/journalnode/mycluster not formatted
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:457)
at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:640)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:185)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Is there anyway to recover JN2 and JN3 from the only living JN1?
Really appreciate all the possible solutions!
Thanks,
Miles
I was able to fix the issues by creating missing directory on the Journal host where namenode will write the its' edit files.
Make sure the VERSION file is created, otherwise you will get org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException.
or copy version file in directory
The issue has gone after I duplicated the only existing /tmp/hadoop/journalnode/mycluster to JN2 and JN3.

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 5 bytes)...in mysqli.php on line 483

I am trying to upgrade my website joomla from 1.5 to 2.5 using jUpgrade. but I am encountering this problem since a long time. jUpgrade component starts upgrading, downloads and exctracts the new joomla but while migrating the categories OR contents, it stucks and gives this error:
==========
[undefined] [undefined]
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 5 bytes) in /home/daneshna/public_html/up/libraries/joomla/database/database/mysqli.php on line 483
I tried allocating more memory size in php.ini file in the server to even 1028M (128M was the default), but this problem persists and I cant get through with it. Tried everything I could find online, but its still there.
Can anyone please help me out with this issue?
(p.s. my website is www.daneshnamah.com, a persian educational website run from Afghanistan since almost 4 years perfectly with over 6000 articles in it and over 19M visitors so far, bcz of security reasons now I wanna upgrade it to 2.5)
Thank You
Ebtihaj
Looks like something (you added recently?) is eating up massive amounts of memory.
Things like that happen often with PHP which is why it shouldn't be used for more than Hypertext.
Did you restart your webserver after changing the php.ini?
What you can do to identify the hungry parts of your website is use Xdebug to see where the memory is eaten up.
If you recently added code/plugins, you can simply remove half of your new additions, see if it happens again, if it does, comment out half of that, etc, and keep doing this until you found the culprit.