source file fixed width , need only Header and Footer to the target(oracle) - informatica

I have this scenario with source as fix width flat file, and I need to read to target only the Header and Footer not the details records.
I need to trim the first column (PA22109 ) and get only PA and next 2 columns to rows as two different dates.
For Footer get only the PT(PT000000000700000030620E00000055612I00000010277I) and the rest into a column of the target.
How can I achieve this logic, inputs are appreciated.
source file :
PA22109 00153252015110905408179 2015110820151108PO ---header
DE0E9D TESTGROUPEXCH TESTINSEXCH TESTLOCEXCH ID014 LNAME014 FNAME014 14 MAIN ST ANYWHERE NJ011110000 195001012Z 01000000014 LNAME014 PATFIRST014 14 MAIN ST ANYWHERE NJ011110000 1955010110106000220 TESTGROUPEXCH 8179 TESTBENEXCH TESTCNTE53 0000000000 0000002643005 011234567890 011234567890 1234 TEST PHARMACY TEST PHARMACY LANE PHARMACYTOWN NJ09876 5555555555 11Y5 019876543210 019876543210 NJPRESCLAST PRESCFIRST 5555555551 DRLAST DRFIRST 110110000009770990300406048410 2015092720150927154401000000000000120150929 0000100000000000000000000000000
PT000000000700000030620E00000055612I00000010277I --Footer

As this a fixed file you can perform following to meet your requirement.
In your Informatica mapping, Read row in a single column.
In Expression, Mark each record for filter out if It does not start with PA OR PT (Assumption your Detail records do not start with PA or PT). Filter detail record out using Filter transformation.
Now you have only Header and Footer Records.
Now you can apply respective condition in expression for PA and PT Records.

Related

Dynamic Google Sheets Column + Row formula

I have a good sheet that I want to grab the header which a date time stamp which will match against another sheet find the entries with that date and suburb and type and give me an average cost.
My formula is =AVERAGEIFS(Sheet1!C:C,Sheet1!A:A, B11:B, Sheet1!F:F, C10) which gives me the average but i've hard coded the header date:
example:
What I want to do is dynamically add the data from the row above with the date time instead of of manually adding it in the formula something like this:
=AVERAGEIFS(Sheet1!C:C,Sheet1!A:A, B11:B, Sheet1!F:F, =CHAR(COLUMN()+64) & 10)
Which would automatically grab the column + row 10 e.g C10, D10, E10.
If i put =CHAR(COLUMN()+64) & 10 in its own cell it works but when I add it to averageifs condition it gives me a parsing error.
Expecting C10, D10, E10 from =CHAR(COLUMN()+64) & 10 which should allow me to dynamically filter data on the date int he header above it.
try:
=AVERAGEIFS(Sheet1!C:C, Sheet1!A:A, B11:B, Sheet1!F:F, INDIRECT(CHAR(COLUMN()+64)&10))

How to format first 7 rows in this txt file using Regex

I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)

Extracting the 'end' of a string, conditions or regular expression?

I have a data table - can be extracted to text or spreadsheet - The column has random text with areas in square metres that I want to copy to a new column in hectares. (So parse text and divide by 10,000).
e.g.
Deposited Plan 172499, 53,310 m2
Deposited Plan 166167, 853 m2
This plan has no area stated
Section 21 Block I Wellington District, 403,573 m2
Output column should have:
5.3310
0.0853
40.3573
Is there a way I can automate this in LibreOffice Calc, or with a regular expression editor like TextCrawler? Or perhaps using an AutoIt script?
Try with this
/([0-9]+,*[0-9]+\sm2)$/

How to load specific columns with varying location from a text file in python?

I'm trying to read the discharge data of 346 US rivers stored online in textfiles. The files are more or less in this format:
Measurement_number Date Gage_height Discharge_value
1 2017-01-01 10 1000
2 2017-01-20 15 2000
# etc.
I only want to read the gage height and discharge value columns.
The problem is that in most files additional columns with metadata are added in front of the 'Gage height' column, so i can not just simply read the 3rd and 4th column because their index varies.
I'm trying to find a way to say 'read the columns with the name 'Gage_height' and 'Discharge_value'', but I haven't succeeded yet.
I hope anyone can help. I'm currently trying to load the text files with numpy.genfromtxt so it would be great to find a solution with that package but other solutions are also more than welcome.
This is my code so far
data_url=urllib2.urlopen(#the url of this specific site)
data=np.genfromtxt(data_url,skip_header=1,comments='#',usecols=2,3])
You can use the names=True option to genfromtxt, and then use the column names to select which columns you want to read with usecols.
For example, to read 'Gage_height' and 'Discharge_value' from your data file:
data = np.genfromtxt(filename, names=True, usecols=['Gage_height', 'Discharge_value'])
Note that you don't need to set skip_header=1 if you use names=True.
You can then access the columns using their names:
gage_height = data['Gage_height'] # == array([ 10., 15.])
discharge_value = data['Discharge_value'] # == array([ 1000., 2000.])
See the docs here for more information.

VBA Excel : Read file, and stock images in a column

I am new to VBA (I mean, REALLY new) and I would like you to give me some tips.
I have an Excel file with 2 columns: SKU and media_gallery
I also have images stocked in a folder (lets name it /imageFolder)
I need to parse the imageFolder and look for ALL images sarting by SKU.jpg , and put them into the media_gallery column separated by a semicolon ( ; )
Example: My SKU is "1001", I need to parse the image folder for all images starting by 1001 (all image have this pattern: 1001-2.jpg , 1001-3.jpg etc...)
I can do that in Java or C# but I want to give a chance to VBA. :)
How can I do that?
EDIT: I only need file names yes! And I should of said that I have 20 000 images in my folder, and 8000 SKUs , so I don't know how we can handle looping on 20 000 images names.
EDIT2: If SKU contains a dash ( - ), I don't need to treat it, so I can pass to the next SKU. And each SKU has a maximum of 5 images (....;SKU-5.jpg)
Thanks all.
How to insert images given you have one image name per cell in a column: How to get images to appear in Excel given image url
Take the above and introduce an inner loop for the file name:
if instr(url_column.Cells(i).Value, "-") = 0 then
dim cur_file_name as string
cur_file_name = dir("imageFolder\" & url_column.Cells(i).Value & "*.jpg")
do until len(cur_file_name) = 0
image_column.Cells(i).Value = image_column.Cells(i).Value & cur_file_name & ";"
cur_file_name = Dir
loop
end if