I am consuming from Active Directory connector; and, I start to get the data, but at around the row 20,000 I get this error:
I am pretty much following this guide:
https://matt40k.uk/2016/06/getting-a-list-of-ad-groups-and-their-members-using-powerquery/
PowerQuery:
I remove Nulls before consuming too:
Consuming only 2 columns:
Any idea how can I prevent it from failing?
Related
I am using Powerquery to fetch the data from Zabbix using their API. It works fine when I fetch the data for some days, but as I increase the period and the amount of data surpasses the millions of rows I just get the error below after some time waiting, and the query doesn't return anything.
I am using Web.contents to get the data as follows:
I have added that timeout as you can see above but the error just happens much before 5 minutes have passed. How should I solve this? Are there ways to fetch large amounts of data in power query in parts, without being all at once? Or does this error happen because of connection parameters inherent to zabbix configs.?
My team changed all the possible parameters regarding server memory and nothing seemed to have worked. One thing to notice is that, although power query seems to face the same error (500) internal server error if I get data for a period of 3 days or 30 days, for the first case it shows the error much faster while in the last case it takes much more time and eventually gets to the same error.
Thanks!
It's a PHP memory limit hit, you should modify the maximum memory.
For example, in an Apache standard setup you should edit /etc/httpd/conf.d/zabbix.conf and modify the php_value memory_limit to a greater value (restart apache!).
The default is 128M, the "right" setting depends on the memory available on the system and the maximum data size you want to get.
I have a database with about 2 million nodes and transaction connections stored in a Neptune database.
I am trying two different queries with similar issues but I don't know how to solve any of them.
The first query is trying to generate a 2 hop graph starting from one user. The query is g.V(source).outE('friend').otherV().outE('friend').toList(). For a 1 hop graph, the query works fine, but for 2 hops or more I have the following error:
gremlin_python.driver.protocol.GremlinServerError: 598: {"detailedMessage":"A timeout occurred within the script or was otherwise cancelled directly during evaluation of [1e582e78-bab5-462c-9f24-5597d53ef02f]","code":"TimeLimitExceededException","requestId":"1e582e78-bab5-462c-9f24-5597d53ef02f"}
The second query I am making is finding a path (does not need to be the shortest but just a path) from a source node to a target node. The query to do this is the following: g.V().hasId(str(source)).repeat(__.out().simplePath()).until(__.hasId(str(target))).path().limit(1).toList()
The query works for pairs of nodes that are relatively close (at most 4 hops of distance) but for further apart pairs of nodes I am getting the following error:
*** tornado.ioloop.TimeoutError: Operation timed out after 30 seconds
I was wondering if anyone might have suggestions on how to solve these Time limit errors. I would really appreciate any help with this, thanks!
This is a known bug in TinkerPop Python 3.4.9 client. Please see the thread on Gremlin mailing list for details of the issue and the workaround:
https://groups.google.com/g/gremlin-users/c/K0EVG3T-UrM
You can change the 30sec timeout using the following code snippet.
from gremlin_python.driver.tornado.transport import TornadoTransport
graph=Graph()
connection = DriverRemoteConnection(endpoint,'g',
transport_factory=lambda: TornadoTransport(read_timeout=None, write_timeout=None))
g = graph.traversal().withRemote(connection)
First of all, I'm very new to Informatica PowerCenter and PowerExchange.
We are using Informatica PowerCenter and PowerExchange to receive CDC data from our source DB2 to a PostgreSQL DB. Therefore we have one workflow where 7 tables are mapped and we get the result in our PostgreSQL. It works fine so far, but it's lacking performance. Not that the size of data is the problem, it's more the delay I see results in the target DB.
When I insert or delete some data on the DB2 (just like 10 rows in one db), I see the results in our PostgreSQL mostly in about ~10-30 seconds (very rare in less than 5 seconds).
My goal would be to speed up this delay. Is this possible? What would I need for that?
I played a little bit with commit interval, and DTM Buffer size, but nothing helped pretty much.
Also I have the feeling that when I configure the workflow to run continuously, it's even slower, compared to when I execute the workflow, after I made the Inserts/Deletes.
Thanks in advance
I am working on production environment. Last day accidentally I made changes to Master dataset permanently while trying to get the sample out of it in work directory. Unfortunately they don't have any backup for this data.
I wanted to execute this:
Data work.facttable;
Set Master.facttable(obs=10);
run;
instead of this, accidentally I executed the following:
data Master.facttable;
set Master.facttable(obs=10);
run;
You can clearly see what sort of blunder it was!
Facttable has been building up nearly from 2 long years and it is of 250GB and has millions of rows. Now it has 10 rows and is of 128kb :(
I am very much worried how to recover the data back. It is crucial for the business teams. I have no idea how to proceed to get it back.
I know that SAS doesn't support any rollback options or recovery process. We don't use Audit trail method also.
I am just wondering if there is any way that still we can get the data back in spite of all these.
Details: Dataset is assigned on SPDE Engine. I checked the data files(.dpf) but all were disappeared except yesterday's data file which is of 128kb
You appear to have exhausted most of the simple options already:
Restore from external/OS-level backup
Restore from previous generation via the gennum= data set option (only available if the genmax option was set to 1+ when creating the dataset).
Restore from SAS audit trail
I think that leaves you with just 2 options:
Rebuild the dataset from the underlying source(s), if you still have them.
Engage the services of a professional data recovery company, who might be able to recover some or all of the deleted files, depending on the complexity of your storage environment, and how much of the original 250GB has since been overwritten.
Either way, it sounds as though this may prove to have been an expensive mistake.
I'm trialing FluentMigrator as a way of keeping my database schema up to date with minimum effort.
For the release I'm currently building, I need to run a database script to make a simple change to a large number of rows of existing data (around 2% of 21,000,000 rows need to be updated).
There's too much data for to be updated in a single transaction (the transaction log gets full and the script aborts), so I use a WHILE loop to iterate through the table, updating 10,000 rows at a time, each batch in a separate transacticon. This works, and takes around 15 minutes to run to completion.
Now I have the script complete, I'm trying to integrate it into FluentMigrator.
FluentMigrator seems to run all the migrations for a single batch in one transaction.
How do I get FM to run each migration in a separate transaction?
Can I tell FM to not use a transaction for a specific migration?
This is not possible as of now.
There are ongoing discussions and some work already in progress.
Check it out here : https://github.com/schambers/fluentmigrator/pull/178
But your use case will surely help in pushing the things in the right direction.
You are welcome to take part to the discussion!
Maybe someone will find a temporary workaround?