What is the reason behind this following issue, how to increase the size of the input in lucee. I don't know what I do. Please help me.
I knew how to increase that in Coldfusion. In Coldfusion we use server settings-Request Size Limits
lucee.runtime.exp.NativeException: The input was too large. The specified input was 15,307 bytes and the maximum is 15,000 bytes.
Related
I am using aws comprehend for PII redaction, Idea is to detect entities and then redact PII from it.
Now the problem is this API has a Input text size limit. How can I increase the limit ?? Maybe to 1 MB ?? Or is there any other way to detect entities for large text.
ERROR: botocore.errorfactory.TextSizeLimitExceededException: An error occurred (TextSizeLimitExceededException) when calling the DetectPiiEntities operation: Input text size exceeds limit. Max length of request text allowed is 5000 bytes while in this request the text size is 7776 bytes
There's no way to increase this limit.
For input text greater than 5000 bytes, you can split the text into multiple chunks of 5000 bytes each and then aggregate the results back.
Please do mind that you keep some overlap between different chunks, to carry over some context from previous chunk.
For reference you can use similar solution exposed by Comprehend team itself . https://github.com/aws-samples/amazon-comprehend-s3-object-lambda-functions/blob/main/src/processors.py#L172
What I'm trying to do is to deposit into HDFS blocks of size of 128MB I've been trying several processors but can't get the good one or I haven't identify the correct property:
This is how prety much the flow looks like:
Right now I'm using PutParquet but this processor doesn't have a property to do that
The previous processor is a MergeContent and this is the configuration
and on the SplitAvro I have next configuration
Hope someone can help I'm really stuck trying to do this.
You shouldn't need the SplitAvro or ConvertAvroToJSON, if you use MergeRecord instead you can supply an AvroReader and JsonRecordSetWriter and it will do the conversion for you. If you know the approximate number of records that will fit in an HDFS block, you can set that as the Maximum Number of Entries and also the Max Group Size. Keep in mind those are soft limits though, so you might want to set it to something safer like 100MB.
When you tried with your flow from the description, what did you observe? Were the files still too big, or did it not seem to obey the min/max limits, etc.?
According to the official documentation: "A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB." (https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html)
But 25 put requests * 400KB per put request = 10MB. How then is the limit 16MB? Under what circumstances could the total ever exceed 10MB? Purely asking out of curiosity.
Actually I have also had the same doubt. Searched for this so much but found a decent explanation which I posted here (Don't know whether it is correct or not but I hope it gives you some intuition).
The 16MB limit applies to the request size - ie, the raw data going over the network. Can be quite different from what is actually stored and metered as throughput. I was able to hit this 16MB request size cap with a BatchWriteItem containing 25 PutItems of around 224kB
Also once head over to this link. This might help.
I have a couchdb with ~16,000 similar documents of about 500 bytes each. The stats for the db report (commas added):
"disk_size":73,134,193,"data_size":7,369,551
Why is the disk size 10x the data_size? I would expect, if anything, for the disk size to be smaller as I am using the default (snappy) compression and this data should be quite compressible.
I have no views on this DB, and each document has a single revision. Compaction has very little effect.
Here's the full output from hitting the DB URI:
{"db_name":"xxxx","doc_count":17193,"doc_del_count":2,"update_seq":17197,"purge_seq":0,"compact_running":false,"disk_size":78119025,"data_size":7871518,"instance_start_time":"1429132835572299","disk_format_version":6,"committed_update_seq":17197}
I think you are getting correct results. couchdb stores documents in chunks of 4kb each (can't find a reference at the moment but you can test it out by storing an empty document). That is min size of a document is 4kb.
Which means that even if you store a data of 500 bytes per document couchdb is going to save it in chunks of 4kb each. So doing a rough calculation
17193*4*1024+(2*4*1024)= 70430720
That seems to be in the range of 78119025 still a little less but that could be due to the way files are stored on the disk.
in my current genetical algorithm I'm iterating over a couple of rather large files. Right now I'm using boost::file_mapping to access this data.
I have 3 different testcases I can launch the program on: (my computer has 8GB RAM, Windows 8.1, my different attempts at page file limits, read below)
1000 files, about 4MB in size, so 4 GB total.
This case is, when executed first a bit sluggish, but from the second iteration onwards, the memoryaccess isn't bottlenecking it anymore, and the speed is entirely limited by my CPU.
1000 files, about 6MB in size, so 6 GB total.
This is an entirely different scenario... The first iteration is proportionally slow, but even following iterations do not speed up. I have actually considered trying to load 4 GB to my memory and keep 2 GB mapped... Not sure this would actually work, but it may be worth a test... But even if this would work, this would not help with case c)...
1000 files, about 13 MB in size, so 13 GB total.
This is entirely hopeless. The first iteration is incredibly slow (which is understandable considering the amount of data), but even further iterations show no sign of speed improvement. And even a partial load to memory won't help much here.
Now I tried various settings for the page file limits:
managed by Win - the size of the pagefil stops at around 5-5.2 GB... never gets bigger. This obviously does not help with cases b) and c) and actually causes the files to cycle through... (it would actually be helpful, if at least the first 4 GB would stay, as it is right now, basically nothing is reused from the pagefile)
manual: min 1 GB, max 32 GB: the page file does not grow above 4.5GB
manual: min 16GB, max 32 GB: In case you haven't tried this yourself... don't do it. It makes booting almost impossible, and nothing runs smooth anymore... Yeah, I didn't test my program with this, as this was unacceptable.
So, what I'm looking for is some way to tell my Windows, that, when using page file settings 1) or 2), that i really really want to use a very large page file in this case for my program. But I don't want my Computer to entirely be run on the page file (as it basically happens with 3)) Is there any way I could force this?
Or is there any other way how to properly load the data in a way, so that at least from the second iteration onwards the access is done quick? The data consist only out of huge numbers of 64bit integers that are bitchecked against by my algorithm(there are a bunch of formating symbols in between every 200-300 ints), so I only need read access.
In case the info is needed, I'm using VS Pro 2013. Portability of the code isn't an issue, it only has to run on my notebook. And of course it is a 64bit application, and my processor supports that ;)