Running a Gremlin query with union is taking more time than running the individual queries.
Anybody knows the reason behind it?
Related
I have one generic question, actually, I am hunting for a solution to a problem,
Currently, we are generating the reports directly from the oracle database, now from the performance perspective, we want to migrate data from oracle to any specific AWS service which could perform better. We will pass data from that AWS service to our reporting software.
Could you please help which service would be idle for this?
Thanks,
Vishwajeet
To answer well, additional info is needed:
How much data is needed to generate a report?
Are there any transformed/computed values needed?
What is good performance? 1 second? 30 seconds?
What is the current query time on Oracle and what kind of query? Joins, aggregations etc.
We've noticed that some of our queries have seen degraded performance in the last couple of weeks. We suspect this is due to some combination of:
Increased data in the tables
Increased data in some results
Inefficient or over-aggressive use of transactions
Any advice on how to diagnose the performance of a particular query?
When running an interactive query against your database in the Google Cloud Platform online management console, you can request generation of a plan explanation with the tab below the 'Run Query' button. This explanation may help you understand why your query is running slowly.
One common reason for performance regressions is that you have recently deleted or updated a lot of data. It can take several days for deleted/overwritten data to be garbage-collected, and in the interim it can slow down operations since this old data must still be scanned for queries over its key-range.
I have asked some of the questions on increasing the performance of Hive queries. Some of the answers were pertaining to number of mappers and reducers. I tried with multiple mappers and reducers but I didn't see any difference in the execution. Don't know why, may be I did not do it in the right way or I missed something else.
I would like to know is it possible to execute Hive queries in parallell?
What exactly I mean is, normally the queries get executed in a queue.
For instance:
query1
query2
query3
.
.
.
n
It takes too much time to execute and I want to reduce the execution time.
I need to know if we use mapreduce program in Hive JDBC program then is it possible to execute it in parallel?
Don't know if that will work or not but that's my aim to achieve?
I am reinstating my questions below:
1) If it is possible to run multiple hive queries in parallel, does it requires multiple Hive Thrift Server?
2) Is it possible to open multiple Hive Thrift Servers?
3) I think it is not possible to open multiple Hive Thrift Server on same port?
4) Can we open multiple Hive Thrift Server on different ports?
Please suggest me some solution for this. If you have any other alternative I will try that as well.
As you might already know, Hive is a SQL-like front-end to Hadoop and Map-reduce. Any non-trivial query on Hive gets compiled to Map-Reduce and run on Hadoop. Map-reduce is a parallel processing framework, therefore each of your Hive queries will run and process data in parallel.
Hive uses a FIFO scheduler by default to schedule jobs on Hadoop, therefore, only one Hive query can be executed at a given time and the next query would be executed when the first one is done. In most circumstances, I would suggest people to optimize individual Hive queries instead of parallelizing multiple Hive queries. If you are inclined towards parallelizing Hive queries, it might be an indicative of your cluster being used inefficiently. To further analyze the performance and usage of your Hive queries, you can install a distributed monitoring system like Ganglia for monitoring the usage of your cluster (Amazon EMR supports it too).
Long story short, you don't have to write a map-reduce program; that's what you are using Hive for in the first place. However, if there is something you might know about the data that Hive might not, it might result in sub-optimal performance of your Hive queries. For example, your data might be sorted by some column and Hive might not know about that information. In such cases, if you can't set that additional meta-information in Hive, it might make sense to write a map-reduce job that takes that additional information into account and potentially gives you better performance. In most cases, I have found Hive performance to be at-par with Map-reduce jobs corresponding to the Hive query.
If you are trying to diagnose slow queries in your mysql backend and are using a Django frontend, how do you tie together the slow queries reported by the backend with specific querysets in the Django frontend code?
I think you has no alternative besides logging every django query for the suspicious querysets.
See this answer on how to access the actual query for a given queryset.
If you install django-devserver, it will show you the queries that are being run and the time they take in your shell when using runserver.
Another alternative is django-debug-toolbar, which will do the same in a side panel-overlay on your site.
Either way, you'll need to test it out in your development environment. However, neither really solves the issue of pinpointing you directly to the offending queries; they work on a per-request basis. As a result, you'll have to do a little thinking about which of your views are using the database most heavily and/or deal with exceptionally large amounts of data, but by cherry-picking likely-candidate views and inspecting the times for the queries to run on those pages, you should be able to get a handle on which particular queries are the worst.
One of our customers is complaining our application is not working. Their reasoning is that our sql function call to their Oracle database is not getting the "expected" result. Sometime, it should failed but our application get success from their database. It's really frustrating because it's their database and we cannot do any test on it.
We are using the C++ Oracle OCCI API. Is there anyway we can log the raw sql from our end? That will be very helpful and we can ship the script to them and let them debug in their system to figure out the problem.
Thanks in advance.
I assume that you are issuing just a SQL statement, since you say that you want to see the 'raw SQL from your end'. The best thing, then, is to get the database trace, as has been suggested.
What I want to point out is that even if your SQL returns the expected result in a test database, the same SQL may return an unexpected result in another database because the data may be different: the data may be corrupted, indexes may exist or may not exist, constraints may be defined or not, etc. Definitely, you need to get the trace from the database to be able to move forward.
Ideally you would turn on a trace at the database level which would generate a trace file containing all activity the database performed.
Other alternatives would be to alter you application to log all SQL that it was about to execute against the database.
This post also goes into some other options (they approach it from trying to detect whether SQL injection is happening) to sniff the database activity:
http://www.symantec.com/connect/articles/detecting-sql-injection-oracle
Though it must be setup on the database, a trace will give you the truest results. Oracle Fine Grained Auditing is something else to look into if you are on Oracle 9i or higher.
Depending on the architecture the statements sent across the network do not necessarily mirror the SQLs executed. The obvious example is calling a stored procedure, where the network simply has the call, but the database actions all the underlying SQL of the procedure. However triggers, fine-grained access control, views etc can all have similar impacts.
For the network transfer, you can look at SQL net traces
For the database side, look at DBMS_MONITOR