how to filter items with a certain attribute empty - amazon-web-services

I'm using boto and DynamoDB and I want to count all the items with Feature attribute empty. I tried the following,
not_empty_ct = db.instance.query_count(
Feature__eq=''
)
But didn't work,
boto.dynamodb2.exceptions.ValidationException: ValidationException: 400 Bad Request
{'message': 'One or more parameter values were invalid: An AttributeValue may not contain an empty string', '__type':
'com.amazon.coral.validate#ValidationException'}
I didn't find too much information in boto's API Docs.

You probably need to do a Scan which you should probably not be doing in your code.

Related

How to fetch a certain number of records from paginated dynamodb table?

I am trying to work with the first 50 records or 1st scan page returned from the get_paginator method.
This is how i scan through the table and get paginated results over which i loop and do some post processing.
dynamo_client = boto3.client('dynamodb')
paginator = dynamo_client.get_paginator("scan")
for page in paginator.paginate(TableName=table_name):
yield from page["Items"]
Is it possible to only work on say the 1st scanned page and explicitly mention 2nd page onwards? Summing it up, i am trying to query the first page results in one lambda function and the 2nd page specifically using another lambda function. How can i achieve this?
You need to pass the NextToken to your other Lambda, somehow.
On the paginator response, there is a NextToken property. You can then pass that in the config of the paginator.paginate() call.
Somewhat contrived example:
dynamo_client = boto3.client('dynamodb')
paginator = dynamo_client.get_paginator("scan")
token = ""
# Grab the first page
for page in paginator.paginate(TableName=table_name):
# do some work
dowork(page["Items"])
# grab the token
token = page["NextToken"]
# stop iterating after the first page for some reason
break
# This will continue to iterator where the last iterator left off
for page in paginator.paginate(TableName=table_name, PaginationConfig= 'StartingToken': token }):
# do some work
dowork(page["Items"])
Let's say you were trying to use a Lambda to iterate over all your DynamoDB items in a table. You could have the iterator run until a time limit, break, then queue up next Lambda function, passing along the NextToken for it to resume with.
You can learn more via the API doc which details what this does or see some further examples on GitHub.

Query Logs for filter non-empty strings in Google Cloud Logs Explorer

I am trying to query for all logs that meets a simple condition:
The jsonPayload of some DEFAULT log entries have the following structure:
response: {
Values: [
[ ]
]
}
where each item in Values is an array. In most cases, Values have a single item "" in the array (empty). I want to write a query that can filter all logs entry that have values different from an empty string (an array in fact).
Here's the query I tried to run:
severity="DEFAULT" AND
jsonPayload.response.Values != ''
This did not return any result. There are thousands of entries, most of which are empty. Can this be done? If so, what am I missing in this case?
Edit
I am checking to see if the first value inside Values is something other than an empty string. In entries I am looking for, the value of the first item will be an array.
Edit 2
Following the reference suggested, I tried looking for the opposite:
severity="DEFAULT" AND
jsonPayload.response.Values = ''
This shows me the all the results with empty Values array as expected. What's confusing me is why it's not working. The logs are generated by a cloud function that serves as a webhook for event processing. The jsonPayload represents the body of the request from the event source.
To filter non-empty strings in Google Cloud Logs Explorer as seen in the official documentation:
severity="DEFAULT" AND
jsonPayload.response.Values !~ ''
Another way would be:
severity="DEFAULT" AND
jsonPayload.response.Values:*
NOT jsonPayload.response.Values = ''

How to change name of a table created by AWS Glue crawler using boto3

I'm trying to change the table name created by AWS Crawler using boto3. Here is the code:
import boto3
database_name = "eventbus"
table_name = "enrollment_user_enroll_cancel_1_0_0"
new_table_name = "enrollment_user_enroll_cancel"
client = boto3.client("glue", region_name='us-west-1')
response = client.get_table(DatabaseName=database_name, Name=table_name)
table_input = response["Table"]
table_input["Name"] = new_table_name
print(table_input)
print(table_input["Name"])
table_input.pop("CreatedBy")
table_input.pop("CreateTime")
table_input.pop("UpdateTime")
client.create_table(DatabaseName=database_name, TableInput=table_input)
Getting the below error:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in TableInput: "DatabaseName", must be one of: Name, Description, Owner, LastAccessTime, LastAnalyzedTime, Retention, StorageDescriptor, PartitionKeys, ViewOriginalText, ViewExpandedText, TableType, Parameters
Unknown parameter in TableInput: "IsRegisteredWithLakeFormation", must be one of: Name, Description, Owner, LastAccessTime, LastAnalyzedTime, Retention, StorageDescriptor, PartitionKeys, ViewOriginalText, ViewExpandedText, TableType, Parameters
Could you please let me know the resolution for this issue? Thanks!
To get rid of botocore.exceptions.ParamValidationError thrown by client.create_table, you need to delete the corresponding items from table_input in a similar way as you did with CreatedBy etc
...
table_input.pop("DatabaseName")
table_input.pop("IsRegisteredWithLakeFormation")
client.create_table(DatabaseName=database_name, TableInput=table_input)
In case your original table had partitions, which want to add to a new table, you need to use similar approach. First you need to retrieve meta information about those partitions with either:
batch_get_partition()
get_partition()
get_partitions()
Note: depending which one you chose, you would need to pass different parameters. There are limitation on how many partitions you can retrieve within a single request. If I remember correctly it is around 200 or so. On top of that, you might need to use page paginator to list all of the available partitions. This is the case when your table has more then 400 partitions.
In general, I would suggest to:
paginator = client.get_paginator('get_partitions')
response = paginator.paginate(
DatabaseName=database_name,
TableName=table_name
)
partitions = list()
for page in response:
for p in page['Partitions']:
partitions.append(p.copy())
# Here you need to remove "DatabaseName", "TableName", "CreationTime" from
# every partition
Now you are ready add those retrieved partition to a new table with either:
batch_create_partition()
create_partition()
I'd suggest to use batch_create_partition(), however, it limits on how many partitions can be created at the single request.

Can creationTime and other Directory meta-fields be used in a query?

I'm trying to filter the list of users returned from Directory.Users.List, and want to use the creationTime value in the filter. According to this:
Search for users
...you can use Date fields in comparisons, however when I try something like:
creationTime>=2016-06-30 (or the same value quoted)
...I get:
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid Input: creationTime>=2016-06-30",
"reason" : "invalid"
} ],
"message" : "Invalid Input: creationTime>=2016-06-30"
}
The message isn't particularly helpful - is this a syntax issue with the query, or is it the case that the field isn't available for query (it isn't listed in that article, but it is part of the JSON response)?
The article specifies the format for Date field queries, however this field also includes a time component, so I tried to also replicate the exact format that shows in the JSON, with the same result.
Also, same result in the Try it! section of the Users reference page.
Also also, tried using a numeric value (Date.getTime(), in effect) - same error.
Thanks...
P.S. Interestingly, Google does document how to search on date fields for Chromebooks, and it's entirely different than they imply for users. I still can't use creationTime with this, but it does work for Chromebooks. The other part of this is that it refers to fields that aren't documented, and why I try to use the documented field (e.g. lastSync vs. sync), it fails with the same message. This whole thing seems half-baked.
View Chrome device information
Only the fields listed in the table are searchable via the users.list search.
A workaround (I don't know if this is suitable for you) may be to push the data you want to use in a search into custom user fields. These fields are searchable.

Tweepy API search doesn't have keyword

I am working with Tweepy (python's REST API client) and I'm trying to find tweets by several keywords and without url included in tweet.
But search results are not up to our satisfaction. Looks like query has erros and was stopped. Additionally we had observed that results were returned one-by-one not (as previously) in bulk packs of 100.
Could you please tell me why this search does not work properly?
We wanted to get all tweets mentioning 'Amazon' without any URL links in the text.
We used search shown below. Search results were still containing tweets with URLs or without 'Amazon' keyword.
Could you please let us know what we are doing wrong?
auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
searchQuery = 'Amazon OR AMAZON OR amazon filter:-links' # Keyword
new_tweets = api.search(q=searchQuery, count=100,
result_type = "recent",
max_id = sinceId,
lang = "en")
The minus sign should be put before "filter", not before "links", like this:
searchQuery = 'Amazon OR AMAZON OR amazon -filter:links'
Also, I doubt that the count = 100 option is a valid one, since it is not listed on the API documentation (which may not be very up-to-date, though). Try to replace that with rpp = 100 to get tweets in bulk packs.
I am not sure why some of the tweets you find do not contain the "Amazon" keyword, but a possibility is that "Amazon" is contained within the username of the poster. I do not know if you can filter that directly in the query, or even if you would want to filter it, since it would mean you would reject tweets from the official Amazon accounts. I would suggest that, for each tweet the query returns, you check it to make sure it does contain "Amazon".