Webpack 4 SplitChunks outputting different chunk files in production and development mode - webpack-4

Is it normal for splitChunks to output different files in Production and Development mode? Is it possible to make sure the number/name of outputted files is the same between modes?
This is my config:
splitChunks: {
cacheGroups: {
vendors: {
test: /[\\/]node_modules[\\/]/i,
chunks: "all",
reuseExistingChunk: true
}
}
}

It's a feature of Webpack SplitChunksPlugin. By default, in production mode, any chunk smaller than 30kb will be ignored.
You can enforce a minimum chunk size by setting optimization.splitChunks.minSize option to a value smaller than your smallest chunk in development mode, to ensure the same chunks and created in both development and production modes.

Related

Leveldb limit testing - limit Memory used by a program

I'm currently benchmarking an application built on Leveldb. I want to configure it in such a way that the key-values are always read from disk and not from memory.
For that, I need to limit the memory consumed by the program.
I'm using key-value pairs of 100 bytes each and 100000 of them, which makes their size equal to 10 MB. If I set the virtual memory limit to less than 10 MB using ulimit, I can't even run the command Makefile.
1) How can I configure the application so that the key value pairs are always fetched from the disk?
2) What does ulimit -v mean? Does limiting the virtual memory translate to limiting the memory used by the program on RAM?
Perhaps there is no need in reducing available memory, but simply disable cache as described here:
leveldb::ReadOptions options;
options.fill_cache = false;
leveldb::Iterator* it = db->NewIterator(options);
for (it->SeekToFirst(); it->Valid(); it->Next()) {
...
}

Is there a maximum concurrency for AWS s3 multipart uploads?

Referring to the docs, you can specify the number of concurrent connection when pushing large files to Amazon Web Services s3 using the multipart uploader. While it does say the concurrency defaults to 5, it does not specify a maximum, or whether or not the size of each chunk is derived from the total filesize / concurrency.
I trolled the source code and the comment is pretty much the same as the docs:
Set the concurrency level to use when uploading parts. This affects
how many parts are uploaded in parallel. You must use a local file as
your data source when using a concurrency greater than 1
So my functional build looks like this (the vars are defined by the way, this is just condensed for example):
use Aws\Common\Exception\MultipartUploadException;
use Aws\S3\Model\MultipartUpload\UploadBuilder;
$uploader = UploadBuilder::newInstance()
->setClient($client)
->setSource($file)
->setBucket($bucket)
->setKey($file)
->setConcurrency(30)
->setOption('CacheControl', 'max-age=3600')
->build();
Works great except a 200mb file takes 9 minutes to upload... with 30 concurrent connections? Seems suspicious to me, so I upped concurrency to 100 and the upload time was 8.5 minutes. Such a small difference could just be connection and not code.
So my question is whether or not there's a concurrency maximum, what it is, and if you can specify the size of the chunks or if chunk size is automatically calculated. My goal is to try to get a 500mb file to transfer to AWS s3 within 5 minutes, however I have to optimize that if possible.
Looking through the source code, it looks like 10,000 is the maximum concurrent connections. There is no automatic calculations of chunk sizes based on concurrent connections but you could set those yourself if needed for whatever reason.
I set the chunk size to 10 megs, 20 concurrent connections and it seems to work fine. On a real server I got a 100 meg file to transfer in 23 seconds. Much better than the 3 1/2 to 4 minute it was getting in the dev environments. Interesting, but thems the stats, should anyone else come across this same issue.
This is what my builder ended up being:
$uploader = UploadBuilder::newInstance()
->setClient($client)
->setSource($file)
->setBucket($bicket)
->setKey($file)
->setConcurrency(20)
->setMinPartSize(10485760)
->setOption('CacheControl', 'max-age=3600')
->build();
I may need to up that max cache but as of yet this works acceptably. The key was moving the processor code to the server and not relying on the weakness of my dev environments, no matter how powerful the machine is or high class the internet connection is.
We can abort the process during upload and can halt all the operations and abort the upload at any instance. We can set Concurrency and minimum part size.
$uploader = UploadBuilder::newInstance()
->setClient($client)
->setSource('/path/to/large/file.mov')
->setBucket('mybucket')
->setKey('my-object-key')
->setConcurrency(3)
->setMinPartSize(10485760)
->setOption('CacheControl', 'max-age=3600')
->build();
try {
$uploader->upload();
echo "Upload complete.\n";
} catch (MultipartUploadException $e) {
$uploader->abort();
echo "Upload failed.\n";
}

Aws - End of script output before headers: wsgi.py

I have a django application that does some heavy computation. It works very good with less data on my machine and on 'aws -elasticbeanstalk' as well. But When the data becomes large it on aws, gives, internal server error, and in the logs it shows:
[core:error]End of script output before headers: wsgi.py
However works fine on my machine
The code where it constantly gives this error is :
[my_big_lst[int(i[0][1])-1].appendleft((int(i[0][0]) - i[1])) for i in itertools.product(zipped_list,temp_list)]
where:
my_big_lst is a big list of deques
zipped_list is a large list of tuples
temp_list is a large list of numbers
It is notable, that as data grows large, the processing time also increases, and also that this problem is only coming on aws when data is large, and on my machine, it always works fine.
Update:
I worked out, that this error happens when the processing time exceeds 60 seconds, I also changed the Idle Loadbalancer time to 3600, but no effect, still error is there
Please anyone suggest a solution ?
If you are using a c-extension module, You could try setting
WSGIApplicationGroup %{GLOBAL}
in your virtualhost.
Something about python subinterpreters not working with c-extension modules. However since your code works for a smaller data set, your problem might be solved by setting memory-specific directives.
https://code.google.com/archive/p/modwsgi/wikis/ApplicationIssues.wiki#Python_Simplified_GIL_State_API

Symfony2 unit testing: PDOException: SQLSTATE[00000] [1040] Too many connections

when I run unit tests for my application, first tests are successful, then around 100, tests start to fail, due to PDOException (Too many connections). I have already searched about this problem, but was not able to solve it.
My config is as follows:
<phpunit
backupGlobals = "false"
backupStaticAttributes = "false"
colors = "true"
convertErrorsToExceptions = "true"
convertNoticesToExceptions = "true"
convertWarningsToExceptions = "true"
processIsolation = "false"
stopOnFailure = "false"
syntaxCheck = "false"
bootstrap = "bootstrap.php.cache" >
If I change processIsolation to "true", all tests generate an error (E):
Caused by ErrorException: unserialize(): Error at offset 0 of 79 bytes
For that I tried setting "detect_unicode = Off" inside php.ini file.
If I run tests in smaller batches, like with "--group something", all tests are successful.
Can someone help me solve the issue when running all the tests at once? I really want to get rid of the PDOException.
Thanks in advance!
You should increase the maximum number of concurrent connections in your DB server.
If you're using MySQL, edit /etc/mysql/my.cnf and set the max_connections parameter to the number of concurrent connections you need. Then restart the MySQL server.
Keep in mind: In theory, the physical limits are very high. But if your queries cause a high CPU load or memory consumption, your DB server could eat up the resources required for other processes. This means, you could run out of memory, or your system can become overloaded.
For people who are having the same issue, here is more specific steps to config the my.cnf file.
If you are sure you are on the right my.cnf file, put max_connections = 500 (default is 151) to [mysqld] section in the my.cnf. Don't put it in the [client] section.
To make sure you are on the right my.cnf, if you have multiple mysqld installed from Homebrew or XAMMPP, find the right mysqld, for XAMMPP using /Applications/XAMPP/xamppfiles/sbin/mysqld --verbose --help | grep -A 1 "Default options" and you will get something like this:
Default options are read from the following files in the given order:
/Applications/XAMPP/xamppfiles/etc/xampp/my.cnf /Applications/XAMPP/xamppfiles/etc/my.cnf ~/.my.cnf
Normally it's at /Applications/XAMPP/xamppfiles/etc/my.cnf.

Should we do nested goroutines?

I'm trying to build a parser for a large number of files, and I can't find information about what might possibly be called "nested goroutines" (maybe this is not the right name ?).
Given a lot of files, each of them having a lot of lines. Should I do:
for file in folder:
go do1
def do1:
for line in file:
go do2
def do2:
do_something
Or should I use only "one level" of goroutines, and do the following:
for file in folder:
for line in file:
go do_something
My question target primarily performance issues.
Thanks for reaching that sentence !
If you go through with the architecture you've specified, you have a good chance of running out of CPU/Mem/etc because you're going to be creating an arbitrary amount of workers. I suggest, instead go with an architecture that allows you to throttle via channels. For example:
In your main process feed the files into a channel:
for _, file := range folder {
fileChan <- file
}
then in another goroutine break the files into lines and feed those into a channel:
for {
select{
case file := <-fileChan
for _, line := range file {
lineChan <- line
}
}
}
then in a 3rd goroutine pop out the lines and do what you will with them:
for {
select{
case line := <-lineChan:
// process the line
}
}
The main advantage to this is that you can create as many or as few go routines as your system can handle and pass them all the same channels and whichever go routine gets to the channel first will just handle it, so you're able to throttle the amount of resources you're using.
Here is a working example: http://play.golang.org/p/-Qjd0sTtyP
The answer depends on how processor-intensive the operation on each line is.
If the line operation is short-lived, definitely don't bother to spawn a goroutine for each line.
If it's expensive (think ~5 secs or more), proceed with caution. You may run out of memory. As of Go 1.4, spawning a goroutine allocates a 2048 byte stack. For 2 million lines, you could allocate over 2GB of RAM for the goroutine stacks alone. Consider whether it's worth allocating this memory.
In short, you will probably get the best results with the following setup:
for file in folder:
go process_file(file)
If the number of files exceeds the number of CPUs, you're likely to have enough concurrency to mask the disk I/O latency involved in reading the files from disk.