DynamoDB 1MB limit on full item or only requested attributes? - amazon-web-services

Suppose I make a Query in which I use KeyConditionExpression and I only want to return a subset of all attributes with ProjectionExpression. Does the full item size count toward the 1MB limit, or only what I'm fetching?
There is a slight ambiguity which I would like to clarify with respect to this question, because there it mentions read capacity, but not size limit.

That is a good question. I've decided to test it.
It seems it doesn't change the page size, but it does help with performance. My objects have a dummy data field that is quite large, so it's normal to get much better response times, given the decrease in bytes that need to be transferred.
Here is my code
// Without Projection
long start = System.currentTimeMillis();
double avgPageSize = 0;
for (var page : table.scan()) {
avgPageSize = (avgPageSize + page.items().size()) / 2;
}
System.out.println("Without Projection ms " + (System.currentTimeMillis() - start));
System.out.println("Average Page size " + avgPageSize);
// With projection
start = System.currentTimeMillis();
avgPageSize = 0;
scanEnhancedRequest = ScanEnhancedRequest.builder()
.addAttributeToProject("PK")
.build();
for (var page : table.scan(scanEnhancedRequest)) {
avgPageSize = (avgPageSize + page.items().size()) / 2;
}
System.out.println("With Projection ms " + (System.currentTimeMillis() - start));
System.out.println("Average Page size " + avgPageSize);
And the results
Without Projection ms 13862
Average Page size 3062.7655089609325
With Projection ms 2241
Average Page size 3062.7655089609325
So it seems the sequence is query -> calculate capacity units -> filters -> pagination + projection
The last one is just an assumption

Related

How to set the frequency band of my array after fft

How I can set frequency band to my array from KissFFT? Sampling frequency is 44100 and I need to set it to my array realPartFFT. I have no idea, how it works. I need to plot my spectrum chart to see if it counts right. When I plot it now, it still has only 513 numbers on the x axis, without the specified frequency.
int windowCount = 1024;
float floatArray[windowCount], realPartFFT[(windowCount / 2) + 1];
kiss_fftr_cfg cfg = kiss_fftr_alloc(windowCount, 0, NULL, NULL);
kiss_fft_cpx cpx[(windowCount / 2) + 1];
kiss_fftr(cfg, floatArray, cpx);
for (int i = 0; i < (windowCount / 2) + 1; ++)
realPartFFT[i] = sqrtf(powf(cpx[i].r, 2.0) + powf(cpx[i].i, 2.0));
First of all: KissFFT doesn't know anything about the source of the data. You pass it an array of real numbers of a given size N, and you get in return an array of complex values of size N/2+1. The input array may be the whether forecast of the next N hours of the number of sunspots of the past N days. KissFFT doesn't care.
The mapping back to the real world needs to be done by you, so you have to interpret the data. As of you code snippet, you are passing 1024 of floats (I assume that floatArray contains the input data). You then get back an array of 513 (=1024/2+1) pairs of floats.
If you are sampling with 44.1 KHz and pass KissFFT chunks of 1024 (your window size) samples, you will get as highest frequency 22.05 KHz and as lowest frequency about 43 Hz (44,100 / 1024). You can get even lower by passing bigger chunks to KissFFT, but keep in mind that processing time will grow (with the fourth power of N, IIRC)!
Btw: You may consider making your windowSize variable const, to allow the compiler do some optimizations. Optimizations are very valuable when doing number crunching. In this case the effect may be negligible, but it's a good starting point.

Time complexity of algorithm with random component (Gillespie Algorithm)

I'm trying to find the time complexity of the Gillespie Algorithm.
General algorithm can be found: Here
More extended version: Here
The assumption is that the number of reactions and the number of proteins is constant. This might allow me to calculate the time complexity by the time variable alone.
But I get stuck since the time increase each iteration is based on a random value. Let me elaborate (removed non relevant code):
So this is the general loop, each iteration the reactions are updated, then the currentTime is updated.
currentTime = 0.0;
while(currentTime < timeEnd)
{
reactions->calcHazard();
currentTime += this->getNextTime();
}
The function getNextTime calculates a new time.
double Gillespie::getNextTime()
{
double randVal;
randVal = ((double) rand() / (double)(RAND_MAX));
while ( randVal == 0)
{
randVal = ((double) rand() / (double)(RAND_MAX));
}
return ((1.0/reactions->getSum())*(log(1.0/randVal)));
}
The calculation of a the new time size is based on a random value R. The other variable component here is the result of reactions->getsum. The return value of this function is
sum(k*A(t))
Where k and A(t) are both vectors, k is the probability for each reaction and A(t) is the number of proteins at time t.
Better explanation about the time increase might be provided by page 7 of previous link.
Is it possible to say anything about the time complexity of this (iteration from tStart -> tEnd)? Or is this impossible without also including information about about #proteins and #reactions?
It's O(n). You don't really need to calculate the expected return value of getNextTime(). It's enough to know that its return doesn't change in response to the simulation running.
Let's assume your code iterates that loop N times for a 1 hour simulation.
It's pretty obvious that these are equivalent...
timeEnd = currentTime + 2hours;
while (currentTime < timeEnd) { ... } // ??? iterations
is equivalent to
timeMid = currentTime + 1hour;
timeEnd = currentTime + 1hour;
while (currentTime < timeMid) { ... } // N iterations
while (currentTime < timeEnd) { ... } // <=N iterations
So, it iterates the loop approx 2N times for a 2 hour simulation.
The assumption is that the number of reactions and the number of proteins is constant
That is useful assumption. Basically, that means that getNextTime() will not systematically increase or decrease as the simulation runs. If the return value of getNextTime() decreased over the course of the simulation (meaning A(t) was increasing over the course of the simulation), then the second loop would take more iterations than the first one.
You can probably make that assumption anyways if the system hits equilibrium at some point (that's inevitable right? I'm not a chemist). Because then A(t) is constant, since that's... what equilibrium is.

How to code the optimal page replacement algorithm?

I am sharing my logic. I need to know if its fine.
I created an array which stores the total number of occurrences for each page.
For ex - If sequence of page requirements is { 1,2,3,1,2}. Lets call it "seq" array.
Then array = { 2,2,1 } . Lets call it "count" array
Now, I iterate through seq and allocate it a frame till I don't exhaust all the frames or if the frame is not already in memory. Then I push it the frame no. and its remaining no. of occurrences to a min priority queue.
for (int i = 1; i <= M; ++i)
{
if (frameAssigned[arr[i]] != 0) //frame already assigned
{
count[arr[i]]--;
PQ.push(ii(count[arr[i]], arr[i]));
continue;
}
if (freeFrames >= 1)
{
frameAssigned[arr[i]] = presentFrame++; //presentFrame=0 initially
freeFrames--;
noOfReplacements++;
count[seq[i]]--;
PQ.push(ii(count[seq[i]], seq[i]));
continue;
}
//Now, if all free frames are exhausted, I do the following. Replace the page which is
//occurring the minimum number of times.
ii temp = PQ.top(); // ii = pair<int,int>
PQ.pop();
int frameNumber = temp.second;
count[seq[i]]--;
if (seq[arr[i]] >= 0) PQ.push(ii(count[seq[i]], seq[i]));
frameAssigned[arr[i]] = frameAssigned[custNumber];
frameAssigned[custNumber] = 0;
noOfReplacements++;
However, this algorithm seems to be incorrect. I don't understand why. I found the correct algorithm here, but I don't understand why mine doesn't work.
Let us look at the following page occurrence:
1,2,3,2,3,2,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Let us assume that the 2 pages can be held in memory. According to your algorithm, when 3 will arrive for the first time, 2 will be replaced because number of occurrences of 1 is quite high , which is not optimal.
In the optimal page replacement algorithm, the criteria for page replacement is based on the time after which the page will be referenced again.
I recommend you to go through the editorial of this problem http://www.codechef.com/AUG14/problems/CLETAB once it is out.

Efficiently Building Summed Area Table

I am trying to construct a summed area table for later use in an adaptive thresholding routine. Since this code is going to be used in time critical software, I am trying to squeeze as many cycles as possible out of it.
For performance, the table is unsigned integers for every pixel.
When I attach my profiler, I am showing that my largest performance bottleneck occurs when performing the x-pass.
The simple math expression for the computation is:
sat_[y * width + x] = sat_[y * width + x - 1] + buff_[y * width + x]
where the running sum resets at every new y position.
In this case, sat_ is a 1-D pointer of unsigned integers representing the SAT, and buff_ is an 8-bit unsigned monochrome buffer.
My implementation looks like the following:
uint *pSat = sat_;
char *pBuff = buff_;
for (size_t y = 0; y < height; ++y, pSat += width, pBuff += width)
{
uint curr = 0;
for (uint x = 0; x < width; x += 4)
{
pSat[x + 0] = curr += pBuff[x + 0];
pSat[x + 1] = curr += pBuff[x + 1];
pSat[x + 2] = curr += pBuff[x + 2];
pSat[x + 3] = curr += pBuff[x + 3];
}
}
The loop is unrolled manually because my compiler (VC11) didn't do it for me. The problem I have is that the entire segmentation routine is spending an extraordinary amount of time just running through that loop, and I am wondering if anyone has any thoughts on what might speed it up. I have access to all of the SSE's sets, and AVX for any machine this routine will run on, so if there is something there, that would be extremely useful.
Also, once I squeeze out the last cycles, I then plan on extending this to multi-core, but I want to get the single thread computation as tight as possible before I make the model more complex.
You have a dependency chain running along each row; each result depends on the previous one. So you cannot vectorise/parallelise in that direction.
But, it sounds like each row is independent of all the others, so you can vectorise/paralellise by computing multiple rows simultaneously. You'd need to transpose your arrays, in order to allow the vector instructions to access neighbouring elements in memory.*
However, that creates a problem. Walking along rows would now be absolutely terrible from a cache point of view (every iteration would be a cache miss). The way to solve this is to interchange the loop order.
Note, though, that each element is read precisely once. And you're doing very little computation per element. So you'll basically be limited by main-memory bandwidth well before you hit 100% CPU usage.
* This restriction may be lifted in AVX2, I'm not sure...
Algorithmically, I don't think there is anything you can do to optimize this further. Even though you didn't use the term OLAP cube in your description, you are basically just building an OLAP cube. The code you have is the standard approach to building an OLAP cube.
If you give details about the hardware you're working with, there might be some optimizations available. For example, there is a GPU programming approach that may or may not be faster. Note: Another post on this thread mentioned that parallelization is not possible. This isn't necessarily true... Your algorithm can't be implemented in parallel, but there are algorithms that maintain data-level parallelism, which could be exploited with a GPU approach.

Viola Jones AdaBoost running out of memory before even starts

I'm implementing the Viola Jones algorithm for face detection. I'm having issues with the first part of the AdaBoost learning part of the algorithm.
The original paper states
The weak classifier selection algorithm proceeds as follows. For each feature, the examples are sorted based on feature value.
I'm currently working with a relatively small training set of 2000 positive images and 1000 negative images. The paper describes having data sets as large as 10,000.
The main purpose of AdaBoost is to decrease the number of features in a 24x24 window, which totals 160,000+. The algorithm works on these features and selects the best ones.
The paper describes that for each feature, it calculates its value on each image, and then sorts them based on value. What this means is I need to make a container for each feature and store the values of all the samples.
My problem is my program runs out of memory after evaluating only 10,000 of the features (only 6% of them). The overall size of all the containers will end up being 160,000*3000, which is in the billions. How am I supposed to implement this algorithm without running out of memory? I've increased the heap size, and it got me from 3% to 6%, I don't think increasing it much more will work.
The paper implies that these sorted values are needed throughout the algorithm, so I can't discard them after each feature.
Here's my code so far
public static List<WeakClassifier> train(List<Image> positiveSamples, List<Image> negativeSamples, List<Feature> allFeatures, int T) {
List<WeakClassifier> solution = new LinkedList<WeakClassifier>();
// Initialize Weights for each sample, whether positive or negative
float[] positiveWeights = new float[positiveSamples.size()];
float[] negativeWeights = new float[negativeSamples.size()];
float initialPositiveWeight = 0.5f / positiveWeights.length;
float initialNegativeWeight = 0.5f / negativeWeights.length;
for (int i = 0; i < positiveWeights.length; ++i) {
positiveWeights[i] = initialPositiveWeight;
}
for (int i = 0; i < negativeWeights.length; ++i) {
negativeWeights[i] = initialNegativeWeight;
}
// Each feature's value for each image
List<List<FeatureValue>> featureValues = new LinkedList<List<FeatureValue>>();
// For each feature get the values for each image, and sort them based off the value
for (Feature feature : allFeatures) {
List<FeatureValue> thisFeaturesValues = new LinkedList<FeatureValue>();
int index = 0;
for (Image positive : positiveSamples) {
int value = positive.applyFeature(feature);
thisFeaturesValues.add(new FeatureValue(index, value, true));
++index;
}
index = 0;
for (Image negative : negativeSamples) {
int value = negative.applyFeature(feature);
thisFeaturesValues.add(new FeatureValue(index, value, false));
++index;
}
Collections.sort(thisFeaturesValues);
// Add this feature to the list
featureValues.add(thisFeaturesValues);
++currentFeature;
}
... rest of code
This should be the pseudocode for the selection of one of the weak classifiers:
normalize the per-example weights // one float per example
for feature j from 1 to 45,396:
// Training a weak classifier based on feature j.
- Extract the feature's response from each training image (1 float per example)
// This threshold selection and error computation is where sorting the examples
// by feature response comes in.
- Choose a threshold to best separate the positive from negative examples
- Record the threshold and weighted error for this weak classifier
choose the best feature j and threshold (lowest error)
update the per-example weights
Nowhere do you need to store billions of features. Just extract the feature responses on the fly on each iteration. You're using integral images, so extraction is fast. That is the main memory bottleneck, and it's not that much, just one integer for every pixel in every image... basically the same amount of storage as your images required.
Even if you did just compute all the feature responses for all images and save them all so you don't have to do that every iteration, that still only:
45396 * 3000 * 4 bytes =~ 520 MB, or if you're convinced there are 160000 possible features,
160000 * 3000 * 4 bytes =~ 1.78 GB, or if you use 10000 training images,
160000 * 10000 * 4 bytes =~ 5.96 GB
Basically, you shouldn't be running out of memory even if you do store all the feature values.