AWS Mobile Analytics Data not exported to S3 - amazon-web-services

I have enabled export (S3 + Redshift) on my app, however I don't see data getting exported to S3.
The file jsonpaths/awseventexportjsonpaths.json has been created. I manually created the folder awsma/events since it wasn't there.
Checked the IAM role 'mobileanalytics-autoExportToS3', it points to the correct S3 bucket. The Redshift import routine seems to be functioning fine as well, I see following message in the log "data_export.etl: Skipping 2016/06/22...No Data exists for this date".
Is there any data size/time requirements for the S3 export to work, haven't come across any such restriction in the documentation. The export functionality was enabled atleast 12 hours ago.
Appreciate any help debugging the issue.

Related

Export DynamoDB table to S3 automatically

The scenario is the following: I have a lambda function that does an http request to get the data of today and the last 365 days and stores them in DynamoDB. The function is triggered every day at 8am, so the most recent data is always saved in the DynamoDB table.
Now my goal is to export the DynamoDB table to a S3 file automatically on an everyday basis as well, so I'm able to use services like QuickSight, Athena, Forecast on the data.
If possible and easily implementable, I'd like to only have one S3 file that gets added with the most recent data of the day, because an extra file everyday seems kinda pricey. If that's not possible, an extra file everyday would also be fine.
What's the best way to go about doing so without using CLI (because I'm not allowed to install programs to my laptop) and without using Lambda (because I wouldn't know how to write a function for that without any tutorials)?
Take a look at DataPipeline. This is a use case and most of the configuration is simple.
It will also not require any knowledge of Lambda and can be automated.
More info: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html
DynamoDB recently released a new, native feature to export your table's data to an S3 bucket. It supports exporting into DynamoDB JSON and Amazon Ion - see the documentation on how to use it at:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataExport.html
This will enable you to run whatever analytics tools you'd like (Athena, etc.) on the data exported in S3.

Splunk migration to S3 DataLake

We're looking at moving away from Splunk as our datastore and looking at AWS Data Lake backed by S3.
What would be the process of migrating data from Splunk to S3? I've read lots of documents talking about archiving data from Splunk to S3 but not sure if this archives the data as a usable format OR if its in some archive format that needs to be restored to splunk itself?
Check out Splunk's SmartStore feature. It moves your non-hot buckets to S3 so you save storage costs. Running SmartStore on AWS only makes sense, however, if you run Splunk on AWS. Otherwise, the data export charges will bankrupt you. Data export applies when Splunk needs to search a bucket that's stored in S3 and so copies that bucket to an indexer. See https://docs.splunk.com/Documentation/Splunk/8.0.0/Indexer/AboutSmartStore for more information.
From what I've read there are a couple of ways to do it:
Export using the Web UI
Export using REST API Endpoint
Export using CLI
Copy certain files in the filesystem
So far I've tried using the CLI to export and I've managed to export around 500,000 events at a time using
splunk search "index=main earliest=11/11/2019:00:00:01 latest=11/15/2019:23:59:59" -output rawdata -maxout 500000 > output2.dmp
However - I'm not sure how I can accurately repeat this step to make sure I include all 100 million+ events. IE search from DATE A to DATE B for 500,000 records, then search from DATE B to DATE C for the next 500,000 - without missing any events inbetween.

Using S3 as target for AWS DMS: Uploaded File name doesn't change

We are using DMS to get data from SQL Server and load it in S3 bucket, after which the data is finally loaded into Snowflake DB using Snowpipe for Full Load.
Now, in order for Snowpipe to know there is new data in S3 bucket, the filename needs to be different than the last one. Have tried all the task setting options available (DROP_AND_CREATE, DO_NOTHING, TRUNCATE) to have the file name different, but still not working. It loads the file name as LOAD00000001.csv
In documentation it shows that file name will be incremental (eg. LOAD00000001.csv, LOAD00000002.csv .. and so on) but it's not happening. Which is why the Snowpipe is not able to register the changes.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html
Can someone please help?
For DMS the incremental counter is started over from 1 each time the task is run. It does not have a "Don't override existing objects" feature.
Your best bet may be to handle the load yourself by looking for updated object timestamps in your folder or setting up S3 event notifications.

AWS EMR - Hive creating new table in S3 results in AmazonS3Exception: Bad Request

I have a Hive script I'm running in EMR that is creating a partitioned Parquet table in S3 from a ~40GB gzipped CSV file also stored in S3.
The script runs fine for about 4 hours but reaches a point (pretty sure when it is just about done creating the Parquet table) where it errors out. The logs show that the error is:
HiveException: Hive Runtime Error while processing row
caused by:
AmazonS3Exception: Bad Request
There really isn't any more useful information in the logs that I can see. It is reading the CSV file fine from S3 and it creates a couple metadata files in S3 fine as well, so I've confirmed the instance has read/write permissions to the Bucket.
I really can't think of anything else that's going on and I wish there was more info in the logs about what "Bad Request" to S3 that Hive is making. Anyone have any ideas?
BadRequest is a fairly meaningless response from AWS which it sends if there is any reason why it doesn't like the caller. Nobody really knows what's happening.
The troubleshooting docs for the ASF S3A connector list some causes, but they aren't complete, and based on guesswork from what made the message go away.
If you have the request ID which failed, you can submit a support request for amazon to see what they saw on their side.
If it makes you feel any better, I'm seeing it when I try to list exactly one directory in an object store, and I'm co-author of the s3a connector. Like I said "guesswork". Once you find out, add a comment here or, if it's not in the troubleshooting doc, submit a patch to hadoop on the topic.

AWS S3 error with PFFiles after importing the exported Parse data

Looks like Parse.com stores the PFFile objects on AWS S3 and only stores a reference to the actual files on S3 in Parse for the PFFile object types.
So my problem here is I only get a link to AWS S3 link for my PFFile if I export the data using the out of the box Parse.com export functionality. After I import the same data to my Parse application, for some reason the security setting on those PFFiles on S3 is changed in a way that all PFFiles won't be accessible to me after an import due to security error.
My question is, does anyone know how the security is being set on the PFFiles? Here's a link to PFFile https://parse.com/docs/osx/api/Classes/PFFile.html but I guess this is rather an advanced topic and wasn't revealed on this page.
Also looking a solution for this, all I found is this from their forum:
In this case, the PFFiles are stored in a different app. You might
need to download these files and upload them again to the new app and
update the pointers. I know this is not a great answer but we're
working on making this process more straightforward.
https://www.parse.com/questions/import-pffile-object-not-working-in-iphone-application