Poor performance with a large topn value for most_similar_approx - word2vec

I have an API that returns most_similar_approx from a magnitude model . The model is built from native Word2Vec format with 50 dimensions and 50 trees. The magnitude model is close to 350MB, with approximately 350000 tokens.
Load testing this API I observed that the performance deteriorates as I increase the topn value for most_similar_approx, I need a high number of similar tokens for downstream activities,
with topn=150 I get a throughput of 500 transactions per second on the API,
while gradually reducing it I get 800 transactions with topn=50 and and ~1300 with topn=10.
The server instance is not under any memory/cpu load, am using a c5.xlarge AWS EC2 instance.
Is there anyway I can tune the model to improve the performance for a high topn value?
My aim is to obtain most_similar tokens from word embeddings, and pymagnitude was the most recommended option I found, are there any similar high performing alternatives.

Related

How does DynamoDB adaptive scaling rebalance partitions?

In the DynamoDB doc, it is written:
If your application drives disproportionately high traffic to one or
more items, adaptive capacity rebalances your partitions such that
frequently accessed items don't reside on the same partition.
My question is:
what exactly is meant by “rebalance” ?
Are some items copied to a new partition and removed from the original one ?
Does this process impact performance?
How long does it take ?
Items are split across two new partitions. The split initiates when the database decides there's been enough sustained traffic in a spread pattern where a split would be beneficial, and then the split itself takes a few minutes. In testing with on-demand tables (where I created synthetic sustained traffic) I've seen the throughput double and then double again, repeating about every 15 minutes.

Is there a way to query more than 2,000 Intents in Dialogflow CX? I.e. Setting up parallel flows?

I'm creating an agent with more than 2,000 intents. Basically a FAQ bot that can answer thousands of questions.
In Dialogflow ES, there was the concept of MegaAgents and SubAgent. The maximum number of intent for a SubAgent was 2,000. Using a MegaAgent, I could put together up to 20 Sub Agents (total 20,000 intents). When the user queries the bot, each intent is weighted differently.
Now in Dialogflow CX, although the entire agent has a maximum 10,000 intents, each flow has a 2,000 limit. I can't figure out a way to design the bot to have multiple "parallel" flows where there is equal weighting given to each flow.
The only way is string together multiple flows in sequence, and use a Fallback to transition from one flow to another. However, this doesn't put equal query weighting on my many thousand intents.
Any suggestions?

BigQuery with BI Engine is slower than BigQuery with cache

I've read almost all the threads about how to improve BigQuery performance, to retrieve data in milliseconds or at least under a second.
I decided to use BI Engine for the purpose because it has seamless integration without code changes, it supports partitioning, smart offloading, real-time data, built-in compression, low latency, etc.
Unfortunately for the same query, I got a slower response time with the BI engine enabled, than just the query with cache enabled.
BigQuery with cache hit
Average 691ms response time from BigQuery API
https://gist.github.com/bgizdov/b96c6c3d795f5f14e5e9a3e9d7091d85
BigQuery + BiEngine
Average 1605ms response time from BigQuery API.
finalExecutionDurationMs is about 200-300ms, but the total time to retrieve the data (just 8 rows) is 5-6 times more.
BigQuery UI: Elapsed 766ms, the actual time for their call to REST entity service is 1.50s. This explains why I get similar results.
https://gist.github.com/bgizdov/fcabcbce9f96cf7dc618298b2d69575d
I am using Quarkus with BigQuery integration and measuring the time for the query with Stopwatch by Guava.
The table is about 350MB, the BI reservation is 1GB.
The returned rows are 8, aggregated from 300 rows. This is a very small data size with a simple query.
I know BigQuery does not perform well with small data sizes, or it doesn't matter, but I want to get data for under a second, that's why I tried BI, and it will not improve with big datasets.
Could you please share job id?
BI Engine enables a number of optimizations, and for vast majority of queries they allow significantly faster and efficient processing.
However, there are corner cases when BI Engine optimizations are not as effective. One issue is initial loading of the data - we fetch data into RAM using optimal encoding, whereas BigQuery processes data directly. Subsequent queries should be faster. Another is - some operators are very easy to optimize to maximize CPU utilization (e.g. aggregations/filtering/compute), while others may be more tricky.

How to calculate the average user concurrency for below task in a Load runner scenario? Can someone help me?

Total number of simultaneous users - 200,
Test duration - 2 hours
Load Profile:
Script 1: Browse Catalogue -> 10 steps > 2000 expected rate of business processes/hour > 100 users
Script 2: Search Product -> 6 steps > 1400 expected rate of business processes/hour > 60 users
Script 3: Buy Product -> 12 steps > 600 expected rate of business processes/hour > 40 users
With only this data, how to find out the average user concurrency (per sec)?
Concurrency is about collision in a give time frame. Simultaneity is about same request, same time.
Concurrency over the course of an hour is different for a second. For each of your steps, it is also impossible to understand how many requests are made to the servers under test and what resources are in use. As an example, it is not uncommon for public facing web pages to be made up of hundreds of individual requested elements.
Concurrency on which tier? Web? If web, then I can distort that highly with a cache model, a cache appliance (Varnish), a CDN. App Server? If my requests are all under 100ms apiece to be satisfied I might never have a concurrency in a second above but a few. Database server? If queries are the same across users then some of that might be distorted by caching of the results, either living in the query cache or a front end cache which takes the load off of the DB.
Run your test, report on it. That will be the easiest way.
You could refer to this web, it describe Performance Test Workload Modeling, could help you to calculate the Percent Load Distribution.
Then you can measure the manual operation time of each transaction and average time of each script. When you have Percent Load Distribution and average time, you could calculate the minimum amount of virtual user.

Amzon Web Services (AWS) - Aggregating DynamoDB Data

We have a DynamoDB Database that is storing machine sensor information in the "structure" of :
HashKey: MachineNumber (Number)
SortKey: EntryDate (String)
Columns: SensorType (String), SensorValue (Number)
The sensors generate information almost every 3 seconds and we're looking to measure a (near) real-time KPI to count how many machines in a region were down in the past hour for more than 10 minutes. A region can have close to 10000 machines so iterating through DynamoDB is taking almost 10+ minutes for a response. What is the best way to do this?
Describing the answer as discussed in comments on the question.
Performing a table scan on a very large table is expensive and should be avoided. DynamoDB Streams provides the ability to process records using your own custom code after they are inserted. This allows for aggregations or other computations to be performed asynchronously in near real time. The result can then be written or updated in a separate DynamoDB table.
You can run the code that processes the DynamoDB Stream messages on your own server (example: EC2), but it is likely easier to just utilize Lambda. Lambda lets you write Java or NodeJS code that will be run on AWS infrastructure that is fully managed so all you need to worry about is the code.