informcatica workflow polling for a file under FTP location - informatica

I wanted to develop a workflow which continuously runs, looking for a file.
The source file data is like this :
eno
10
20
30
40
Once the file is received in the FTP location , the workflow should automatically pick the file and load into the target Table.
The output of target table table <EMP_TGT> will be like below
eno | Received
--- | -------
10 | Y
20 | Y
30 | Y
40 | Y
50 |
60 |
70 |
80 |
The condition to load the target table would be [ Update EMP_TGT set Received='Y' where eno='<flat_file_Eno> ]

You could use the EventWait task, but:
You need an exact file name to wait for. Pattern cannot be used.
No problem if you can access the path directly, but I'm not sure if you can monitor ftp location with EventWait.
The other way to implement it would be to have workflow sheduled to run e.g. every 10 minutes and try to fetch the file from ftp.

Related

List of Powered off VMs with date of the event, beyond 30 days

I am trying to write a script to get the VM powered off date and time beyond 30 days (all time since the Vsphere setup), I came to know they were only available if i parse the latest vmware.log of powered off vms and check the last string date in it.
I have with script that i have included below this script which just gives me the output for last 30 days
```
$VMs = get-vm | Where powerstate -eq "poweredoff"
Get-VIEvent -Entity $VMs -MaxSamples ([int]::MaxValue) |
where {$_ -is [VMware.Vim.VmPoweredOffEvent]} |
Group-Object -Property {$_.Vm.Name} | %{
$lastPO = $_.Group | Sort-Object -Property CreatedTime -Descending | Select -First 1
$vm = Get-VIObjectByVIView -MORef $_.Group[0].VM.VM
$cloumn = '' | select VMName,Powerstate,PowerOFF
```
I expect results beyond 30 days with the help of vmware log files, currently i get results for 30 days
There are a couple potential causes. The first, you don't seem to be actually limiting the time to be 30 days or older. You're grabbing all of the events as a whole. The second, the vCenter logs roll up at certain intervals which turns them into zip files that the Get-VIEvent cmdlet will not be able to read anymore. This is where a log parser comes into play, something like Log Insight or Splunk/SolarWinds/Nagios.

How can I visualize timeseries data aggregated by more than one dimension on AWS insights?

I'd like to use cloudwatch insights to visualize a multiline graph of average latency by host over time. One line for each host.
This stats query extracts the latency and aggregates it in 10 minute buckets by host, but it doesn't generate any visualization.
stats avg(latencyMS) by bin(10m), host
bin(10m) | host | avg(latencyMS)
0m | 1 | 120
0m | 2 | 220
10m | 1 | 130
10m | 2 | 230
The docs call this out as a common mistake but don't offer any alternative.
The following query does not generate a visualization, because it contains more than one grouping field.
stats avg(myfield1) by bin(5m), myfield4
aws docs
Experementally, cloudwatch will generate a multi line graph if each record has multiple keys. A query that would generate a line graph must return results like this:
bin(10m) | host-1 avg(latencyMS) | host-2 avg(latencyMS)
0m | 120 | 220
10m | 130 | 230
I don't know how to write a query that would output that.
Parse individual message for each host then compute their stats.
For example, to get average latency for responses from processes with PID=11 and PID=13.
parse #message /\[PID:11\].* duration=(?<pid_11_latency>\S+)/
| parse #message /\[PID:13\].* duration=(?<pid_13_latency>\S+)/
| display #timestamp, pid_11_latency, pid_13_latency
| stats avg(pid_11_latency), avg(pid_13_latency) by bin(10m)
| sort #timestamp desc
| limit 20
The regular expressions extracts duration for processes having id 11 and 13 to parameters pid_11_latency and pid_13_latency respectively and fills null where there is no match series-wise.
You can build from this example by creating the match regular expression that extracts for metrics from message for hosts you care about.

Making SQLite run SELECT faster

Situation: I have about 40 million rows, 3 columns of unorganised data in a table in my SQLite DB (~300MB). An example of my data is as follows:
| filehash | filename | filesize |
|------------|------------|------------|
| hash111 | fileA | 100 |
| hash222 | fileB | 250 |
| hash333 | fileC | 380 |
| hash111 | fileD | 250 | #Hash collision with fileA
| hash444 | fileE | 520 |
| ... | ... | ... |
Problem: A single SELECT statement could take between 3 to 5 seconds. The application I am running needs to be fast. A single query taking 3 to 5 seconds is too long.
#calculates hash
md5hash = hasher(filename)
#I need all 3 columns so that I do not need to parse through the DB a second time
cursor.execute('SELECT * FROM hashtable WHERE filehash = ?', (md5hash,))
returned = cursor.fetchall()
Question: How can I make the SELECT statement run faster (I know this sounds crazy but I am hoping for speeds of below 0.5s)?
Additional information 1: I am running it on Python 2.7 program on a RPi 3B (1GB RAM, default 100MB SWAP). I am asking mainly because I am afraid that it will crash the RPi because 'not enough RAM'.
For reference, when reading from the DB normally with my app running, we are looking at max 55MB of RAM free, with a few hundred MB of cached data - I am unsure if this is the SQLite caches (SWAP has not been touched).
Additional information 2: I am open to using other databases to store the table (I was looking at either PyTables or ZODB as a replacement - let's just say that I got a little desperate).
Additional information 3: There are NO unique keys as the SELECT statement will look for a match in the column which are just hash values, which apparently have collisions.
Currently, the database has to scan the entire table to find all matches. To speed up searches, use an index:
CREATE INDEX my_little_hash_index ON hashtable(filehash);

Compound Queries in Amazon CloudSearch

I want to use group by result from AWS cloud search.
user | expense | status
1 | 1000 | 1
1 | 300 | 1
1 | 700 | 2
1 | 500 | 2
2 | 1000 | 1
2 | 1200 | 3
3 | 200 | 1
3 | 600 | 1
3 | 1000 | 2
Above are my table structure, I want total count of expense for all user. Expected Answer is-
{ user:1,expense_count:2500},{user:2,expense_count:2200 },{user:3,expense_count:1800 }
I want GROUP BY the user column, and it should count the total expenses of the respective user.
There is no (easy) way to do this in CloudSearch, which is understandable when you consider that your use case is more like a SQL query and is not really what I would consider a search. If what you want to do is look up users by userId and sum their expenses, then a search engine is the wrong tool to use.
CloudSearch isn't meant to be used as a datastore; it should return minimal information (ideally just IDs), which you then use to retrieve data. Here is a blurb about it from the docs:
You should only store document data in the search index by making
fields return enabled when it's difficult or costly to retrieve the
data using other means. Because it can take some time to apply
document updates across the domain, you should retrieve critical data
such as pricing information by using the returned document IDs instead
of returned from the index.

Is there any DAX expression for calculating SUM of few rows based on an index?

Suppose I have the following data :
MachineNumber | Duration
01 | 234
01 | 200
01 | 150
02 | 320
02 | 120
02 | 100
I want to know a DAX query which can add 234 + 200 + 150 since it belongs to machine 01 and give me the sum.
If you want to see the Machine
like this table
you have do avoid the automatic sum on your MachineNumber
You can also do a transformation in the PowerQuery editor by specifying that your MachineNumber is a string in place of a Number
To find the total duration machine wise, I chose SUM option from the field and machine wise total sum got displayed.