DynamoDB pagination using Boto3 - amazon-web-services

We are using boto3 for our DynamoDB and we need to do a full scan of our tables to enable to do that based on other post we need to do a pagination. However, we are unable to find a working sample of pagination. Here is what we did.
import boto3
client_setting = boto3.client('dynamodb', region_name='ap-southeast-2')
paginator = client_setting.get_paginator('scan')
esk = {}
data = []
unconverted_ga = ourQuery(params1, params2)
for page in unconverted_ga:
data.append(page)
esk = page['LastEvaluatedKey']
We dont know exactly how to make the esk as the ExclusiveStartKey of our next query. What should be the expected value of ExclusiveStartkey parameter? We are still new in DynamoDB and there's many things we need to learn including this. thanks!

From the answer by Tay B at https://stackoverflow.com/a/38619425/3176550
import boto3
dynamodb = boto3.resource('dynamodb',
aws_session_token=aws_session_token,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region
)
table = dynamodb.Table('widgetsTableName')
response = table.scan()
data = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.update(response['Items'])

After hour of search, i've finally found a better solution. For those who are new to DynamoDB, we should'nt missed this - http://docs.aws.amazon.com/amazondynamodb/latest/gettingstartedguide/GettingStarted.Python.04.html
from __future__ import print_function # Python 2/3 compatibility
import boto3
import json
import decimal
from boto3.dynamodb.conditions import Key, Attr
# Helper class to convert a DynamoDB item to JSON.
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, decimal.Decimal):
if o % 1 > 0:
return float(o)
else:
return int(o)
return super(DecimalEncoder, self).default(o)
dynamodb = boto3.resource('dynamodb', region_name='us-west-2', endpoint_url="http://localhost:8000")
table = dynamodb.Table('Movies')
fe = Key('year').between(1950, 1959)
pe = "#yr, title, info.rating"
# Expression Attribute Names for Projection Expression only.
ean = { "#yr": "year", }
esk = None
response = table.scan(
FilterExpression=fe,
ProjectionExpression=pe,
ExpressionAttributeNames=ean
)
for i in response['Items']:
print(json.dumps(i, cls=DecimalEncoder))
// As long as LastEvaluatedKey is in response it means there are still items from the query related to the data
while 'LastEvaluatedKey' in response:
response = table.scan(
ProjectionExpression=pe,
FilterExpression=fe,
ExpressionAttributeNames= ean,
ExclusiveStartKey=response['LastEvaluatedKey']
)
for i in response['Items']:
print(json.dumps(i, cls=DecimalEncoder))

You can try with following code:
esk = None
while True:
scan_generator = YourTableName.scan(max_results=10, exclusive_start_key=esk)
for item in scan_generator:
# your code for processing
# condition to check if entire table is scanned
else:
break;
# Load the last keys
esk = scan_generator.kwargs['exclusive_start_key'].values()
Here is the reference documentation link.
Hope that helps

Bit more verbose but I like it.
def fetch_from_table(last_key=None):
if last_key:
response = table.query(
IndexName='advertCatalogIdx',
KeyConditionExpression=Key('sk').eq('CATALOG'),
Limit=5,
ExclusiveStartKey=last_key
)
else:
response = table.query(
IndexName='advertCatalogIdx',
KeyConditionExpression=Key('sk').eq('CATALOG'),
Limit=5
)
# print(response)
for item in response['Items']:
print(item['address'])
print('***************************')
return response.get('LastEvaluatedKey')
last_key = fetch_from_table()
while last_key != None:
print("Running again : ")
last_key = fetch_from_table(last_key)

import sys
import boto3
client = boto3.client('dynamodb')
marker = None
while True:
paginator = client.get_paginator('list_tables')
page_iterator = paginator.paginate(
PaginationConfig={
'MaxItems': 1000,
'PageSize': 100,
'StartingToken': marker})
for page in page_iterator:
tables=page['TableNames']
for table in tables:
print (table)
try:
marker = page['NextToken']
except KeyError:
sys.exit()

Related

How can I save the output of an API call to DynamoDB?

I am calling a REST API from lambda function in AWS . I receive the response in the terminal. How can I store these API responses in DynamoDB. I am sorry if this may seem naive to some. I am a beginner and self learner.
Here is a full working example for a Python based Lambda function:
# Import AWS (Python version) SDK
import boto3
def lambda_handler(event, context):
# Construct the client to the DynamoDB service
client = boto3.client('dynamodb')
db = boto3.resource('dynamodb')
table_name = 'my-awesome-table'
# Table instance
table = db.Table(table_name)
# Example data to query an entry from the table
primary_column_name = 'email'
primary_key = 'john#gmail.com'
# We get an object for the retrieved table entry
table_entry = get_table_item(table, primary_column_name, primary_key)
# Example data to put into the table
new_entry = {
'email': 'jack#outlook.com',
'name': 'Jack Ma',
'age': 56
}
put_table_item(table, new_entry)
# Function to retrieve an entry from the DynamoDB table
def get_table_item(table, primary_column_name, primary_key):
response = table.get_item(
Key={
primary_column_name: primary_key
}
)
return response['Item']
# Function to put an entry into the DynamoDB table
def put_table_item(table, item):
response = table.put_item(
Item=item
)
# If there were errors, throw an exception
if (response['ResponseMetadata']['HTTPStatusCode'] != 200):
raise Exception('Failed to update table!')

I would like to export DynamoDB Table to S3 bucket in CSV format using Python (Boto3)

This question has been asked earlier in the following link:
How to write dynamodb scan data's in CSV and upload to s3 bucket using python?
I have amended the code as advised in the comments. The code looks like as follows:
import csv
import boto3
import json
dynamodb = boto3.resource('dynamodb')
db = dynamodb.Table('employee_details')
def lambda_handler(event, context):
AWS_BUCKET_NAME = 'session5cloudfront'
s3 = boto3.resource('s3')
bucket = s3.Bucket(AWS_BUCKET_NAME)
path = '/tmp/' + 'employees.csv'
try:
response = db.scan()
myFile = open(path, 'w')
for i in response['Items']:
csv.register_dialect('myDialect', delimiter=' ', quoting=csv.QUOTE_NONE)
with myFile:
writer = csv.writer(myFile, dialect='myDialect')
writer.writerows(i)
print(i)
except :
print("error")
bucket.put_object(
ACL='public-read',
ContentType='application/csv',
Key=path,
# Body=json.dumps(i),
)
# print("here")
body = {
"uploaded": "true",
"bucket": AWS_BUCKET_NAME,
"path": path,
}
# print("then here")
return {
"statusCode": 200,
"body": json.dumps(body)
}
I am a novice, please help me in fixing this code as it is having problem in inserting data in file created in S3 Bucket.
Thanks
I have revised the code to be simpler and to also handle paginated responses for tables with more than 1MB of data:
import csv
import boto3
import json
TABLE_NAME = 'employee_details'
OUTPUT_BUCKET = 'my-bucket'
TEMP_FILENAME = '/tmp/employees.csv'
OUTPUT_KEY = 'employees.csv'
s3_resource = boto3.resource('s3')
dynamodb_resource = boto3.resource('dynamodb')
table = dynamodb_resource.Table(TABLE_NAME)
def lambda_handler(event, context):
with open(TEMP_FILENAME, 'w') as output_file:
writer = csv.writer(output_file)
header = True
first_page = True
# Paginate results
while True:
# Scan DynamoDB table
if first_page:
response = table.scan()
first_page = False
else:
response = table.scan(ExclusiveStartKey = response['LastEvaluatedKey'])
for item in response['Items']:
# Write header row?
if header:
writer.writerow(item.keys())
header = False
writer.writerow(item.values())
# Last page?
if 'LastEvaluatedKey' not in response:
break
# Upload temp file to S3
s3_resource.Bucket(OUTPUT_BUCKET).upload_file(TEMP_FILENAME, OUTPUT_KEY)

Unable to lunch Multiple Streaming Pipeline ( N to N Pipeline) Dynamically (Using Runtime Value Provider) in Single Dataflow Job in Python

I am trying to launch a Streaming Dataflow Job which contains n number of pipelines.
Based on configured topic and corresponding BQ table for each Topic i want to launch a Pipeline inside a one Streaming Job.
My actual problem is i have to create and upload a template for each and every project. What i want is, i can reuse the uploaded template and only configuration files ihave to pass for launching new dataflow job by changing topic,subscription, dataset and bq table.
Which is i am unable to reuse the template.
Please help me on this and let me know if this is possible or not. Because Google has also provided one to one template. Not many to many Template (e.g Three topic - Three BQ Table (three data pipeleine) , n-n).
import logging
import os
import json
from google.cloud import storage
from apache_beam import Pipeline, ParDo, DoFn
from apache_beam.io import ReadFromPubSub, WriteToBigQuery, BigQueryDisposition
from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions, WorkerOptions, GoogleCloudOptions, \
SetupOptions
def _get_storage_service():
storage_client = storage.Client \
.from_service_account_json(
json_credentials_path='C:\Users\dneema\PycharmProjects\iot_dataflow\df_stm_iot_pubsub_bq\service_account_credentials.json')
print('storage service fetched')
return storage_client
class RuntimeOptions(PipelineOptions):
def __init__(self, flags=None, **kwargs):
super(RuntimeOptions, self).__init__(flags, **kwargs)
#classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument('--bucket_name', type=str)
parser.add_value_provider_argument('--config_json_path', type=str,)
class PipelineCreator:
def __init__(self):
self.options = PipelineOptions()
storage_client = storage.Client.from_service_account_json(
'service_account_credentials_updated.json')
runtime_options = self.options.view_as(RuntimeOptions)
bucket_name = str(runtime_options.bucket_name)
config_json_path = str(runtime_options.config_json_path)
# get the bucket with name
bucket = storage_client.get_bucket(bucket_name)
# get bucket file as blob
blob = bucket.get_blob(config_json_path)
# convert to string and load config
json_data = blob.download_as_string()
self.configData = json.loads(json_data)
dataflow_config = self.configData['dataflow_config']
self.options.view_as(StandardOptions).streaming = bool(dataflow_config['streaming'])
self.options.view_as(SetupOptions).save_main_session = True
worker_options = self.options.view_as(WorkerOptions)
worker_options.max_num_workers = int(dataflow_config['max_num_worker'])
worker_options.autoscaling_algorithm = str(dataflow_config['autoscaling_algorithm'])
#worker_options.machine_type = str(dataflow_config['machine_type'])
#worker_options.zone = str(dataflow_config['zone'])
#worker_options.network = str(dataflow_config['network'])
#worker_options.subnetwork = str(dataflow_config['subnetwork'])
def run(self):
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'dataflow-service-account.json'
project_id = self.configData['project_id']
dataset_id = self.configData['dataset_id']
topics = self.configData['topics']
table_ids = self.configData['bq_table_ids']
error_table_id = self.configData['error_table_id']
logger = logging.getLogger(project_id)
logger.info(self.options.display_data())
pipeline = Pipeline(options=self.options)
size = len(topics)
for index in range(size):
print(topics[index])
pipeline_name = "pipeline_"+str(index)
logger.info("Launch pipeline :: "+pipeline_name)
messages = pipeline | 'Read PubSub Message in ' + pipeline_name >> ReadFromPubSub(topic=topics[index])
logger.info("Read PubSub Message")
valid_messages, invalid_messages = messages | 'Convert Messages to TableRows in ' + pipeline_name >> ParDo(TransformMessageToTableRow()).with_outputs('invalid', main='valid')
valid_messages | 'Write Messages to BigQuery in ' + pipeline_name >> WriteToBigQuery(table=table_ids[index],
dataset=dataset_id,
project=project_id,
write_disposition=BigQueryDisposition.WRITE_APPEND)
pipeline.run().wait_until_finish()
class TransformMessageToTableRow(DoFn):
def process(self, element, *args, **kwargs):
logging.getLogger('dataflow').log(logging.INFO, element)
print element
print("element type ", type(element))
print("inside bq pardo")
import json
try:
message_rows = json.loads(element)
# if using emulator, uncomment below line
message_rows = json.loads(message_rows)
print 'loaded element'
except:
try:
element = "[" + element + "]"
message_rows = json.loads(element)
except Exception as e:
print(e)
from apache_beam import pvalue
yield [pvalue.TaggedOutput('invalid', [element, str(e)])]
print(message_rows)
print("message rows", type(message_rows))
if not isinstance(message_rows, list):
message_rows = [message_rows]
#rows = list()
if isinstance(message_rows, list):
for row in message_rows:
try:
new_row = dict()
for k, v in row.items():
new_row[str(k)] = v
#rows.append(new_row)
print(new_row)
yield new_row
except Exception as e:
print(e)
from apache_beam import pvalue
yield pvalue.TaggedOutput('invalid', [row, str(e)])
if __name__ == '__main__':
PipelineCreator().run()
Here Runtime argument as bucket_name and config_json_path for all the configuration related stuffs like Dataset, BQ table, Topics/ Subscription and all Workflow options.
This is possible or not ? Because Google has also provided one to one template. Not many to many Template (e.g Three topic - Three BQ Table (three data pipeleine) , n-n).
Regarding this previously answered thread Unable to run multiple Pipelines in desired order by creating template in Apache Beam, you can run only one pipeline inside a template at any time.
You'll have to delegate the template creation to another service and pass the configuration with it, just follow the link inside the thread and you'll have How To examples.

AWS Lambda error No module named 'StringIO' or No module named 'StringIO'

I try to use AWS Lambda for mass email sending, the code we use as the link below:
https://aws.amazon.com/cn/premiumsupport/knowledge-center/mass-email-ses-lambda/
from __future__ import print_function
import StringIO
import csv
import json
import os
import urllib
import zlib
from time import strftime, gmtime
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import boto3
import botocore
import concurrent.futures
__author__ = 'Said Ali Samed'
__date__ = '10/04/2016'
__version__ = '1.0'
# Get Lambda environment variables
region = os.environ['us-east-1']
max_threads = os.environ['10']
text_message_file = os.environ['email_body.txt']
html_message_file = os.environ['email_body.html']
# Initialize clients
s3 = boto3.client('s3', region_name=region)
ses = boto3.client('ses', region_name=region)
send_errors = []
mime_message_text = ''
mime_message_html = ''
def current_time():
return strftime("%Y-%m-%d %H:%M:%S UTC", gmtime())
def mime_email(subject, from_address, to_address, text_message=None, html_message=None):
msg = MIMEMultipart('alternative')
msg['Subject'] = subject
msg['From'] = from_address
msg['To'] = to_address
if text_message:
msg.attach(MIMEText(text_message, 'plain'))
if html_message:
msg.attach(MIMEText(html_message, 'html'))
return msg.as_string()
def send_mail(from_address, to_address, message):
global send_errors
try:
response = ses.send_raw_email(
Source=from_address,
Destinations=[
to_address,
],
RawMessage={
'Data': message
}
)
if not isinstance(response, dict): # log failed requests only
send_errors.append('%s, %s, %s' % (current_time(), to_address, response))
except botocore.exceptions.ClientError as e:
send_errors.append('%s, %s, %s, %s' %
(current_time(),
to_address,
', '.join("%s=%r" % (k, v) for (k, v) in e.response['ResponseMetadata'].iteritems()),
e.message))
def lambda_handler(event, context):
global send_errors
global mime_message_text
global mime_message_html
try:
# Read the uploaded csv file from the bucket into python dictionary list
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
response = s3.get_object(Bucket=bucket, Key=key)
body = zlib.decompress(response['Body'].read(), 16+zlib.MAX_WBITS)
reader = csv.DictReader(StringIO.StringIO(body),
fieldnames=['from_address', 'to_address', 'subject', 'message'])
# Read the message files
try:
response = s3.get_object(Bucket=bucket, Key=text_message_file)
mime_message_text = response['Body'].read()
except:
mime_message_text = None
print('Failed to read text message file. Did you upload %s?' % text_message_file)
try:
response = s3.get_object(Bucket=bucket, Key=html_message_file)
mime_message_html = response['Body'].read()
except:
mime_message_html = None
print('Failed to read html message file. Did you upload %s?' % html_message_file)
if not mime_message_text and not mime_message_html:
raise ValueError('Cannot continue without a text or html message file.')
# Send in parallel using several threads
e = concurrent.futures.ThreadPoolExecutor(max_workers=max_threads)
for row in reader:
from_address = row['from_address'].strip()
to_address = row['to_address'].strip()
subject = row['subject'].strip()
message = mime_email(subject, from_address, to_address, mime_message_text, mime_message_html)
e.submit(send_mail, from_address, to_address, message)
e.shutdown()
except Exception as e:
print(e.message + ' Aborting...')
raise e
print('Send email complete.')
# Remove the uploaded csv file
try:
response = s3.delete_object(Bucket=bucket, Key=key)
if 'ResponseMetadata' in response.keys() and response['ResponseMetadata']['HTTPStatusCode'] == 204:
print('Removed s3://%s/%s' % (bucket, key))
except Exception as e:
print(e)
# Upload errors if any to S3
if len(send_errors) > 0:
try:
result_data = '\n'.join(send_errors)
logfile_key = key.replace('.csv.gz', '') + '_error.log'
response = s3.put_object(Bucket=bucket, Key=logfile_key, Body=result_data)
if 'ResponseMetadata' in response.keys() and response['ResponseMetadata']['HTTPStatusCode'] == 200:
print('Send email errors saved in s3://%s/%s' % (bucket, logfile_key))
except Exception as e:
print(e)
raise e
# Reset publish error log
send_errors = []
if __name__ == "__main__":
json_content = json.loads(open('event.json', 'r').read())
lambda_handler(json_content, None)
but it has problem when i choose python 2.7.the error is
module initialization error 'us-east-1'
when i choose python 3.6 the error is
Unable to import module 'lambda_function': No module named 'StringIO'
anyone can tell me what is the problem it is ?
From Python v3, the StringIO module has gone. Instead, import the io module and use io.StringIO.
The problem with the v27 version is presumably that the following statement is failing:
region = os.environ['us-east-1']
This will result in a KeyError if us-east-1 is not an available environment variable. Instead use AWS_REGION or AWS_DEFAULT_REGION. See the full list of Lambda environment variables.
Please set the environment variables as described in step 4 of the article:
"Configure Lambda environment variables appropriate to your usage scenario. For example, the following variables would be valid for a given use case:
REGION=us-east-1, MAX_THREADS=10, TEXT_MESSAGE_FILE=email_body.txt, HTML_MESSAGE_FILE=email_body.html."
What was done (as per the code provided in the question) is replacing names of environment variables with their values, which means that python is looking for e.g. 'us-east-1' environment variable which isn't there...
This is the original code
# Get Lambda environment variables
region = os.environ['REGION']
max_threads = os.environ['MAX_THREADS']
text_message_file = os.environ['TEXT_MESSAGE_FILE']
html_message_file = os.environ['HTML_MESSAGE_FILE']
You can also hard-code the values, like below:
# Get Lambda environment variables
region = 'us-east-1'
max_threads = '10'
text_message_file = 'email_body.txt'
html_message_file = 'email_body.html'
but I'd suggest to set the environment variables instead (and use the version of script provided by the article author). When it comes to setting environment variables in Lambda, see this article :)

Automating Date Range while extracting

The below script I am using to extract data from Google Analytics. Here I am extracting data for last one week. I want to automate the date range so that i don't have to change date_range every week.
I also want to avoid sampling of data by GA. Please guide my the correct way to automate in details.
author = 'test#gmail.com (test)'
import argparse
import sys
import csv
import string
import datetime
import json
import time
from apiclient.errors import HttpError
from apiclient import sample_tools
from oauth2client.client import AccessTokenRefreshError
cam_name = sys.argv[1:]
class SampledDataError(Exception): pass
def main(argv):
# Authenticate and construct service.
service, flags = sample_tools.init(
argv[0], 'analytics', 'v3', __doc__, __file__,
scope='https://www.googleapis.com/analytics.readonly')
# Try to make a request to the API. Print the results or handle errors.
try:
profile_id = profile_ids[profile]
if not profile_id:
print ('Could not find a valid profile for this user.')
else:
metrics = argv[1]
dimensions = argv[2]
reportName = argv[3]
sort = argv[4]
filters = argv[5]
for start_date, end_date in date_ranges:
limit = ga_query(service, profile_id, 0,
start_date, end_date, metrics, dimensions, sort, filters).get('totalResults')
for pag_index in range(0, limit, 10000):
results = ga_query(service, profile_id, pag_index,
start_date, end_date, metrics, dimensions, sort, filters)
# if results.get('containsSampledData'):
# raise SampledDataError
print_results(results, pag_index, start_date, end_date, reportName)
except TypeError as error:
# Handle errors in constructing a query.
print ('There was an error in constructing your query : %s' % error)
except HttpError as error:
# Handle API errors.
print ('Arg, there was an API error : %s : %s' %
(error.resp.status, error._get_reason()))
except AccessTokenRefreshError:
# Handle Auth errors.
print ('The credentials have been revoked or expired, please re-run '
'the application to re-authorize')
except SampledDataError:
# force an error if ever a query returns data that is sampled!
print ('Error: Query contains sampled data!')
def ga_query(service, profile_id, pag_index, start_date, end_date, metrics, dimensions, sort, filters):
return service.data().ga().get(
ids='ga:' + profile_id,
start_date=start_date,
end_date=end_date,
metrics=metrics,
dimensions=dimensions,
sort=sort,
filters=filters,
samplingLevel='HIGHER_PRECISION',
start_index=str(pag_index+1),
max_results=str(pag_index+10000)).execute()
def print_results(results, pag_index, start_date, end_date, reportName):
"""Prints out the results.
This prints out the profile name, the column headers, and all the rows of
data.
Args:
results: The response returned from the Core Reporting API.
"""
# New write header
if pag_index == 0:
if (start_date, end_date) == date_ranges[0]:
print ('Profile Name: %s' % results.get('profileInfo').get('profileName'))
columnHeaders = results.get('columnHeaders')
cleanHeaders = [str(h['name']) for h in columnHeaders]
writer.writerow(cleanHeaders)
print (reportName,'Now pulling data from %s to %s.' %(start_date, end_date))
# Print data table.
if results.get('rows', []):
for row in results.get('rows'):
for i in range(len(row)):
old, new = row[i], str()
for s in old:
new += s if s in string.printable else ''
row[i] = new
writer.writerow(row)
else:
print ('No Rows Found')
limit = results.get('totalResults')
print (pag_index, 'of about', int(round(limit, -4)), 'rows.')
return None
# Uncomment this line & replace with 'profile name': 'id' to query a single profile
# Delete or comment out this line to loop over multiple profiles.
#Brands
profile_ids = {'abc-Mobile': '12345',
'abc-Desktop': '23456',
'pqr-Mobile': '34567',
'pqr-Desktop': '45678',
'xyz-Mobile': '56789',
'xyz-Desktop': '67890'}
date_ranges = [
('2017-01-24','2017-01-24'),
('2017-01-25','2017-01-25'),
('2017-01-26','2017-01-26'),
('2017-01-27','2017-01-27'),
('2017-01-28','2017-01-28'),
('2017-01-29','2017-01-29'),
('2017-01-30','2017-01-30')
]
for profile in sorted(profile_ids):
print("Sequence 1",profile)
with open('qwerty.json') as json_data:
d = json.load(json_data)
for getThisReport in d["Reports"]:
print("Sequence 2",getThisReport["ReportName"])
reportName = getThisReport["ReportName"]
metrics = getThisReport["Metrics"]
dimensions = getThisReport["Dimensions"]
sort = getThisReport["sort"]
filters = getThisReport["filter"]
path = 'C:\\Projects\\DataExport\\test\\' #replace with path to your folder where csv file with data will be written
today = time.strftime('%Y%m%d')
filename = profile+'_'+reportName+'_'+today+'.csv' #replace with your filename. Note %s is a placeholder variable and the profile name you specified on row 162 will be written here
with open(path + filename, 'wt') as f:
writer = csv.writer(f,delimiter = '|', lineterminator='\n', quoting=csv.QUOTE_MINIMAL)
args = [sys.argv,metrics,dimensions,reportName,sort,filters]
if __name__ == '__main__': main(args)
print ( "Profile done. Next profile...")
print ("All profiles done.")
The Core Reporting API supports some interesting things as far as dates goes.
All Analytics data requests must specify a date range. If you do not include start-date and end-date parameters in the request, the server returns an error. Date values can be for a specific date by using the pattern YYYY-MM-DD or relative by using today, yesterday, or the NdaysAgo pattern. Values must match [0-9]{4}-[0-9]{2}-[0-9]{2}|today|yesterday|[0-9]+(daysAgo).
so doing something like
start_date = '7daysAgo'
end_date = 'today'
Just remember that data hasn't completed processing for 24 - 48 hours so your data for today, yesterday and the day before that may not be 100% accurate.