Identify AWS service for fast retrival of data - amazon-web-services

I have one generic question, actually, I am hunting for a solution to a problem,
Currently, we are generating the reports directly from the oracle database, now from the performance perspective, we want to migrate data from oracle to any specific AWS service which could perform better. We will pass data from that AWS service to our reporting software.
Could you please help which service would be idle for this?
Thanks,
Vishwajeet

To answer well, additional info is needed:
How much data is needed to generate a report?
Are there any transformed/computed values needed?
What is good performance? 1 second? 30 seconds?
What is the current query time on Oracle and what kind of query? Joins, aggregations etc.

Related

General guidance around Bigtable performance

I'm using a single node Bigtable cluster for my sample application running on GKE. Autoscaling feature has been incorporated within the client code.
Sometimes I experience slowness (>80ms) for the GET calls. In order to investigate it further, I need some clarity around the following Bigtable behaviour.
I have cached the Bigtable table object to ensure faster GET calls. Is the table object persistent on GKE? I have learned that objects are not persistent on Cloud Function. Do we expect any similar behaviour on GKE?
I'm using service account authentication but how frequently auth tokens get refreshed? I have seen frequent refresh logs for gRPC Java client. I think Bigtable won't be able to serve the requests over this token refreshing period (4-5 seconds).
What if client machine/instance doesn't scale enough? Will it cause slowness for GET calls?
Bigtable client libraries use connection pooling. How frequently connections/channels close itself? I have learned that connections are closed after minutes of inactivity (>15 minutes or so).
I'm planning to read only needed columns instead of entire row. This can be achieved by specifying the rowkey as well as column qualifier filter. Can I expect some performance improvement by not reading the entire row?
According to GCP official docs you can get here the cause of slower performance of Bigtable. I would like to suggest you to go through the docs that might be helpful. Also you can see Troubleshooting performance issues.

Django and Amazon Lambda: Best solution for big data with Amazon RDS or GraphQL or Amazon AppSync

We have a system with large data (about 10 million rows in on a table). We developed it in Django framework and also we want to use Amazon Lambda for serving it. Now I have some question about it:
1- If we want to use Amazon RDS (MySql, PostgresSQL), which one is better? And relational database is a good solution for doing this?
2- I read somewhere, If we want to use a relational database in Amazon Lambda, Django for each instance, opens a new connection to the DB and it is awful. Is this correct?
3- If we want to use GraphQL and Graph database, Is that a good solution? Or we can combine Django Rest-API and GraphQL together?
4- If we don't use Django and use Amazon AppSync, Is better or not? What are our limitations for use this.
Please help me.
Thanks
GraphQL is very useful for graph data, not timeseries. Your choice will depend on the growth factor, not the actual rows. I currently run an RDS instance with 5 billion rows just fine, but the problem is how it will increase over time. I suggest looking into archival strategies using things like S3 or IoT-analytics (this one is really cool).
I wouldn't worry about concurrent connections until you have a proper reason too (+50's per second). Your DB will be the largest server you have anyway.

What is considered a concurrent query in BigQuery?

We are running into quota limits for a small data set which is less than 1Gb in Bigquery. Google cloud gives us no indication of what queries are running on the backend which isn't allowing us to tune the setup. We have a Bigquery dataset and a dashboard built in data studio which is querying on the data set.
I've used relational databases like Oracle in the past and they have excellent tooling to diagnose issues. But with Bigquery, I feel like I am staring into the dark.
I'd appreciate any help/pointers you can give.
The concurrent queries limit makes reference to the number of statements that are executed simultaneously in BigQuery. The quota limit for on-demand, interactive queries is 100 concurrent queries (updated).
Based on this, it is seems that your Data Studio is hitting this quota when running your reports in which case is suggested to re-design your dashboard build in order to avoid exceeding those limits.
Additionally, you can use the bq ls -j -a PROJECTNAME command to list the jobs that have been run in your project in order to identify the queries you need to work with, as well mentioned by Elliott Brossard

Online update spanner schema is extremely slow

Online updating spanner schema takes minutes even for very very small tables (10s of rows).
i.e. - adding/dropping/altering columns, adding tables, etc.
This can be quite frustrating for development processes and new version deployments.
Any plans for improvement?
Few more questions:
Anyone knows a 3rd party schema comparison tool for spanner? couldn't find any.
What about data backups? in order to save historical snapshots.
Thanks in advance
Schema Updates:
Since Cloud Spanner is a distributed database, it makes sure to update all moving parts of the system which takes the latency as described.
As a suggestion, you could batch the schema updates. This ensures the lower latencies (nearly equivalent to executing a single schema update) and can be executed using API / gcloud command-line tools.
Schema Comparison Tool:
You could use the getDatabaseDdl API to maintain history of your schema changes and use your tool of choice to diff them.

Sitecore Database Cleanup Fails

Sitecore 6.6
I'm speaking with Sitecore Support about this as well, but thought I'd reach out to the community too.
We have a custom agent that syncs media on the file system with the media library. It's a new agent and we made the mistake of not monitoring the database size. It should be importing about 8 gigs of data, but the database ballooned to 713 GB in a pretty short amount of time. Turns out the "Blobs" table in both "master" and "web" databases is holding pretty much all of this space.
I attempted to use the "Clean Up Databases" tool from the Control Panel. I only selected one of the databases. This ran for 6 hours before it bombed due to consuming all the available locks on the SQL Server:
Exception: System.Data.SqlClient.SqlException
Message: The instance of the SQL Server Database Engine cannot obtain a LOCK
resource at this time. Rerun your statement when there are fewer active users.
Ask the database administrator to check the lock and memory configuration for
this instance, or to check for long-running transactions.
It then rolled everything back. Note: I increased the SQL and DataProvider timeouts to infinity.
Anyone else deal with something like this? It would be good if I could 'clean up' the databases in smaller chunks to avoid overwhelming the SQL Server.
Thanks!
Thanks for the responses, guys.
I also spoke with support and they were able to provide a SQL script that will clean the Blobs table:
DECLARE #UsableBlobs table(
ID uniqueidentifier
);
INSERT INTO
#UsableBlobs
select convert(uniqueidentifier,[Value]) as EmpID from [Fields]
where [Value] != ''
and (FieldId='{40E50ED9-BA07-4702-992E-A912738D32DC}' or FieldId='{DBBE7D99-1388-4357-BB34-AD71EDF18ED3}')
delete top (1000) from [Blobs]
where [BlobId] not in (select * from #UsableBlobs)
The only change I made to the script was to add the "top (1000)" so that it deleted in smaller chunks. I eventually upped that number to 200,000 and it would run for about an hour at a time.
Regarding cause, we're not quite sure yet. We believe our custom agent was running too frequently, causing the inserts to stack on top of each other.
Also note that there was a Sitecore update that apparently addressed a problem with the Blobs table getting out of control. The update was 6.6, Update 3.
I faced such a problem previously, and we had contacted Sitecore Support.
They gave us a Sitecore Support DLL, and suggessted a Web.Config change for Dataprovider -- from main type="Sitecore.Data.$(database).$(database)DataProvider, Sitecore.Kernel" to the new one.
The reason I am posting on this question of yours is that because the most of the time taken for us was in Cleaning up Blobs and and they gave us this DLL to increase Cleanup Blobs speed. So I think it might help you too.
Hence, I would like to suggest if you could please request Sitecore Support in this case, I am sure you might get the best solution to solve your case.
Hope this helps you!
Regards,
Varun Shringarpure
If you have a staging environment I would recommend taking a copy of database and try to shrink the database. Part of the database size might also be related to the transaction log.
if you have a DBA please have him (her) involved.