Unix command in informatica - informatica

My Requirement is to get source counts,target counts and the rows that did not get changed that gets loaded via informatica.(Example:: Source::100, Target::50,Ignored(No change)::50)
I need to achieve this using presession and postsession command task.Can anyone help me with the scripts.
Thanks for the help in Advance

try to grep the information from the session log which gets created for every run. If you are still looking for the script, let me know.
example :
if you go to the bottom of the session log you see session load summary like below
SESSION LOAD SUMMARY
Output Rows [44768], Affected Rows [44768], Applied Rows [44768], Rejected Rows [0]
do cat log file name | grep 'Output Rows' in post session command

Why don't you use the metadata tables, write a Unix script on the infa server to get the count of SuccessfulRow count and all other counts you want from it and call it using postsession cmd which will also help you build an alert mechanism in for the number of count you get using mail -x. just see the below illustration.
Let me know if you need more details on this.

Short Answer: Refer to REP_SESS_Logs table in Informatica metadata repository Database.

Related

How to store the result of Google Dataproc query inside a variable GCP

I have a requirement wherein I need to count the number of records in a gcloud hive table and need to store this result inside a variable.
Below is the code for the same:
test=$(gcloud dataproc jobs submit hive --cluster=$CLUSTER --region=$REGION --execute="select count(*) from db.table;")
However, the above variable is not storing the count of records but is storing some logs which not useful for me.
Can please someone help me to find out how can we redirect the output of above query inside a variable.
The output of gcloud command actually consists of two parts: stderr and standard output. The output that includes count number is actually insided the stderr. The following command can do this trick,
cnt_output=$((gcloud dataproc jobs submit hive --cluster=$CLUSTER --region=$REGION --execute "select count(*) from db.table;" 1>/dev/null) 2>&1)
This basically strips the standout first and then convert the stderr to standard output so that it can be saved into a variable, i.e. cnt_output
After that you can use the tool mentioned in the answer above to capture the number you want.
My guess is the output you mention includes the logs for the Hive command and the output you want. It sounds like you just want the latter.
I'd recommend using something like grep, sed, or Python to capture the output. If you know regular expressions (regex), this should be pretty easy - this is a good example of what you might want to do. If you have not used regex before, a regex builder like this one will be useful.

Having trouble scraping a specfic site with scrapy properly

I went over the tutorial for Scrapy, and I was able to understand how to scrap the site included in the tutorial. But I'm having a little trouble with some of the more complicated sites (at least to me).
I'm attempting to scrape the rows and columns of the insider transactions from this webpage:
http://finviz.com/insidertrading.ashx
I'm using command prompt commands with scrapy to test out if I'm able to scrape the necessary information, so the following commands are what I've have written in the command prompt.
scrapy shell "http://finviz.com/insidertrading.ashx"
I then used firebug from firefox to look at the html code of the page.
I'm able to get some of the information (Stock Name, Name of the Insider and Date) into a list via this code:
response.css('td a.tab-link::text').extract()
However, the rest of the info is missing.
I'm able to get some (maybe most)of the missing info (Cost, Shares, Value etc) via this code
response.css(td::text).extract()
I can't figure out how to cleanly get all info together in one scrape.
Thanks.
EDIT: The other option would be to collect the data iteratively, one row at a time, so I can separate it as I like. I'm brooding over this as well.
Since the data is tabular, the position of table rows and columns is predictable and stable. You can simply extract all text in the row and unpack it into variables:
for row in response.xpath("//tr[#class='insider-option-row']"):
items = row.xpath('td/a/text() | td/text()').extract()
ticker, owner, relationship, date, transaction, cost, shares, value, shares_total, sec_form_4 = items

How do I save the web service response to the same excel sheet I extracted the data from?

For example:
The given sample HP Flights SampleAppData.xls and using the CreateFlightOrder, we can link the data to the test functions and get a OrderNumber and Price response from the Web Service. And in the SampleAppData.xls Input tab, we can see that there is a empty column of OrderNumber.
So here is my question, is there any ways that I can take the OrderNumber response and fill the empty column in SampleAppData.xls?
My point to do this is because, let's say I have many test cases to do and will take days, and today I do this certain test and I would need the result of today for the next day's test.
Although I know that the responses are saved in the result but it beats the point of automation if I am required to check the response for each and every test cases?
Yes of course you can. There are a number of ways to do this. The simplest is as follows.
'Datatable.Value("columnName","sheetName")="Value"
DataTable.Value(“Result”,”Action1”)=“Pass”
Once you have recorded the results in the Datasheet, your can export them using
DataTable.ExportSheet("C:\SavePath\Results.xls")
You can write back the response programatically , if you already imported mannually .
You can use GetDataSource Class of UFT API , it will work like this lets say you imported excel from FlightSampleData.xls, and named it as FlightSampleData, you have sheet, accessing the sheet will be like below:
GetDataSource("FlightSampleData!input).Set(ROW,ColumnName,yourValue);
GetDataSource("FlightSampleData!input).Get(ROW,ColumnName);
for exporting you can use ExportToExcelFile method of GetDataSourse class after your test run . Please let me know if you have any furthur question about this.

trigger Informatica workflow based on the status column in oracle table

I want to implement the below scenario without using pl/sql procedure or trigger
I have a table called emp_details with coulmns (empno,ename,salary,emp_status,flag,date1) .
If someone updates the columns emp_status='abc' and flag='y', Informatica WF 1 would be in continuous running status and checking emp_status value "ABC"
If it found record / records then query all the records and it will invoke WF 2.
WF 1 will pass value ename,salary,Date1 to WF 2 (Wf2 will populate will insert the records into the table emp_details2).
How can I do this using the informatica approach instead of plsql or trigger?
If you want to achieve this in real time, write the output of WF1 to a message queue and in the second workflow WF2 subscribe to the message queue produced from WF1.
If you have batch process in place. Produce a output file from WF1 and use this output file in WF2. You can easily setup this dependency using job schedulers.
I don't understand why do you need two workflows in the first place. Why not accomplish emp_details2 table updates with the very same workflow that is looking for differences.
Anyway, this can be done using indicator file:
WF1 running continously should create a file if any changes have been found.
WF2 should be running continously with EventWait set to wait for the indicator file specified above. Once found it should use the Assignment Task to rename/delete the file and fetch the desired data from source and populate the emp_details2 table.
If you need it this way, you can pass the data through the indicator file
You can do this in a single workflow, Create a dummy session which which check for the flag in table after this divide the flow into two based on the below link conditions,
Flow one: Link condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS(count)>=1 then run your actual session which will load the data
Flow two: Link Condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS=0, connect this to control task and mark the workflow as complete.
Make sure you schedule the workflow at Informatica level to run continousuly.
Cheers

<cfquery> not retrieving DATA

I am unable to retrieve any data from my cfquery. Same query when i run in sql developer i get the result.
Any reason why ?
Hi all, thanks for the responses. Sorry, it was my fault.
It was a data issue. I was retrieving uncommited data from CF.
You can also build the query in CFEclipse, test it and then paste the query in your CFQuery tag.
Also check how you have put the query name in the CFoutput tag, so many times I've put #queryname# instead of queryname in cfoutput.
Is the query actually being ran?
If you can turn debugging on, does the query show as being executed?
Also when you run the same query do you mean you copy/paste the query from the debugger into sql developer?
Perhaps the same values are not being included (if you are using variables in there)