how can we get total number of items in cloudsearch? - amazon-web-services

I am working on customer document (data like fname, lname, orderAmount etc.) which is configured on AWS CloudSearch. Now the situation is that I am showing this data on jquery datatable. for pagination I required to have total items available for search. Is there any way I can get count of all available document on cloudsearch ?
I am getting response for matching count, But not the total items available on cloudsearch doamin.
I have search for this under https://docs.aws.amazon.com/cloudsearch/latest/developerguide/what-is-cloudsearch.html but did not found any useful thing.
Any trick to get total document count on particular cloudsearch domain ?

amazon does not have an easy way to fetch all the records, so we basically pass a condition which would never be true & return all the record.
Key Points
Get only single item (pageSize = 1)
Get item without fields
For Example
http://search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com/2013-01-01/search?q=(and+(id:0))&q.parser=structured&return=_no_fields&size=1

Related

Dynamodb One table design - scan and filter with limit approach?

So I'm following the one table design and the PK keys that are with the below format
P#product1
P#product2
P#product3
W#warehouse1
W#warehouse2
P#product4
....
With this query pattern "get all products" , I need to run a scan to get all records "begins_with = P#" and I'm not sure if this is the ideal approach.
I understand Scan is resource-consuming (and I would love not to have to rely on it)
Not to mention that if I want to put in limit & pagination, the scenario becomes even more cumbersome (as limit is applied before the filter). E.g: the first scan with a limit of 10 may return only 3 products, next one may only return 2 , etc..)
Is there a more straight forward approach? I was hoping to at least scan through say 87 products out of 1000 records, and will still be able to get 9 pages of 10 products per instead?
I've come across other forum topics and found this solution that we can utilise Dynamodb Global Secondary Index
Basically:
We'll set up an attribute , say entitytype(values can be product,warehouse...)
And create a Global Secondary Index with
GSI PK : to set to that entitytype
GSI SK : set to the original PK
We'll end up having the below in this GSI
product P#product1
product P#product2
warehouse W#warehouse1
We can then query against this GSI using Query entitytype=product

Shopify API to get All Records for Customers,Orders & Products (Django)

I have searched for getting all customer one by one.But after some study understand the whole way to solve.
for getting 250 data from shopify api we can use limit but pagination and synchronization for getting all data we need some step to get all data.
shop_url = "https://%s:%s#%s.myshopify.com/admin/api/%s/" % (API_KEY, PASSWORD, SHOP_NAME, API_VERSION)
endpoint = 'customers.json?limit=250&fields=id,email&since_id=0'
r = requests.get(shop_url + endpoint)
Step 1:Where we need to put the initial id to start extraction and store to your db
customers.json??limit=250&fields=id,email&since_id=0
Step 2:Next changes the since_id value with with last id of your extraction like my image.
last id=5103249850543 (suppose)
Mentioned in Fields Data
customers.json??limit=250&fields=COLUMN_YOUNEED_FOR_CHK&since_id=5103249850543

How to retrieve all the item from DynamoDB using boto3?

I want to retrieve all the items from my table without specifying any particular parameter, I can do it using Key Pair, but want to get all items. How to do it?
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Email')
response = table.get_item(
Key={
"id": "2"
}
)
item = response['Item']
print(item)
This way I can do, but how to retrieve all items? is there any method?
If you want to retrieve all items you will need to use the Scan command.
You can do this by running
response = table.scan()
Be aware that running this will utilise a large number of read credits (RCU). If you're using eventual consistency 1 RCU will be equal to 2 items (under 4KB) and strongly consistent will be 1 item per each RCU (under 4KB).
Here is the consideration page for scans vs queries in AWS documentation.

Fulltext Search DynamoDB

Following situation:
I´m storing elements in a DyanmoDb for my customers. HashKey is a Element ID and Range Key is the customer ID. In addition to these fields I´m storing an array of strings -> tags (e.g. ["Pets", "House"]) and a multiline text.
I want to provide a search function in my application, where the user can type a free text or select tags and get all related elements.
In my opinion a plain DB query is not the correct solution. I was playing around with CloudSearch, but I´m not really sure if this is the correct solution, because everytime the user adds a tag the index must be updated...
I hope you have some hints for me.
DynamoDB is now integrated with Elasticsearch, enabling you to perform
full-text queries on your data.
https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-dynamodb-elasticsearch-integration/
DynamoDB streams are used to keep the search index up-to-date.
You can use an instant-search engine like Typesense to search through data in your DynamoDB table:
https://github.com/typesense/typesense
There's also ElasticSearch, but it has a steep learning curve and can become a beast to manage, given the number of features and configuration options it supports.
At a high level:
Turn on DynamoDB streams
Setup an AWS Lambda trigger to listen to these change events
Write code inside your lambda function to index data into Typesense:
def lambda_handler(event, context):
client = typesense.Client({
'nodes': [{
'host': '<Endpoint URL>',
'port': '<Port Number>',
'protocol': 'https',
}],
'api_key': '<API Key>',
'connection_timeout_seconds': 2
})
processed = 0
for record in event['Records']:
ddb_record = record['dynamodb']
if record['eventName'] == 'REMOVE':
res = client.collections['<collection-name>'].documents[str(ddb_record['OldImage']['id']['N'])].delete()
else:
document = ddb_record['NewImage'] # format your document here and the use upsert function to index it.
res = client.collections['<collection-name>'].upsert(document)
print(res)
processed = processed + 1
print('Successfully processed {} records'.format(processed))
return processed
Here's a detailed article from Typesense's docs on how to do this: https://typesense.org/docs/0.19.0/guide/dynamodb-full-text-search.html
DynamoDB just added PartiQL, a SQL-compatible language for querying data. You can use the contains() function to find a value within a set (or a substring): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-functions.contains.html
In your specific case you need Elastic search. But you can do wildcard text search on sort-key,
/* Return all of the songs by an artist, matching first part of title */
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE 'Call%';
/* Return all of the songs by an artist, with a particular word in the title...
...but only if the price is less than 1.00 */
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE '%Today%'
AND Price < 1.00;
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.ReadData.Query.html
This is the advantage of using dynamodb as a 'managed service' by aws. You get multiple components managed apart from the managed nosql db.
If you are using the 'downloaded' version of dynamodb then you need to ' build your own ' elasticcluster and index the data in dynamodb .

Boto scanning a dynamodb table from item 1000 to 2000

I have a dynamodb table. And I want to build a page where I can see the items in the table. But since this could have tens of thousand of items, I want to see them in 10 items per page. How do I do that? How to scan items 1000 to 2000?
import boto
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
res = table.scan(attributes_to_get=['id'], max_results=10)
for i in res:
print i
What do you mean by 1000~2000 items?
There is no global order of hash keys (primary or index), thus it's hard to define the 10000~20000 items in advance.
However, it makes perfect sense if you'd like to find the next 1000 items, given the last return item. To fetch the next page, you execute the scan method again by providing the primary key value of the last item in the previous page so that the scan method can return the next set of items. The parameter name is exclusive_start_key and its initial value is None.
See more details in official docs.