How to import source file structure in informatica? - informatica

I have source file as
Customer_ID Customer_Name City
7004 Oracle Mumbai
7001 Microsoft San Francisco
7002 IBM Toronto
7003 Red Hat New york
when i import and take deltimated type as SPACE then the value also get seprated by space.for example:- Red Hat,New york,San Francisco.
Could you please help me to get the full name under the one header as in CITY full name San Francisco and New york.Under the Customer_Name Red Hat.

First thing first. This is not a delimetered file. It is a fixed width file. You have to import this as a fixed width file defining boundaries.

Related

Power BI How to remove duplicate rows?

In my report view, I have a table where the rows are repeated twice => once for each position available. I want to show only one row for each employee with his latest position. How can I accomplish this?
Name
Project
Date
Position
John Smith
PowerProject
01-01-2021
Engineer
John Smith
PowerProject
01-01-2021
Senior Engineer
Sort on the date. Group on the name. Choose All Rows as the function and change the code from _ to Table.Last(_) then expand

How to group the following table in order to display top values (strings) per category in one column?

I have the following big table over 13 million rows.
ProductCode
ProductName
valueUSD
ExportOrImport
Dest
100100
Fish
120K
Export
China
100100
Fish
122M
Export
Russia
100150
Oil
120B
Export
China
100150
Oil
122M
Export
US
I need to display the following summary table on dashboard.
ProductCode
ProductName
valueUSD
% From total
TopDest
100150
Oil
120.122B
90%
China, US
100100
Fish
122.12M
10%
China, Russia
...
...
...
...
...
I have created a "button" that separates export from import. But now I do not know how to compose TopDest column where I need to show top 5 Countries where particular product ExportedOrImported. Also, how to properly formulate this question for google search? is it grouping by category display topN ?
Any ideas how to create this table??

How can I solve the"FILTER has mismatched range sizes" error in the formula below

I get the filter has mismatched range error when I insert this: Data!N3:N <> "", "No Market")
Formula in Google Sheets
=FILTER(Data!B3:N,
Data!C3:C>=B1,
Data!C3:C<=D1,
Data!N3:N <> "", "No Market")
I'm trying to replace the blank values in Column N with the text "No Market"
Sample Table
Column B
Column N
03/07/2021
New York
03/07/2021
03/07/2021
Seattle
04/04/2021
04/04/2021
Boston
This formula also worked, but it excludes the blank values and I would like to include them.
=FILTER(Data!B3:N,
Data!C3:C>=B1,
Data!C3:C<=D1,
Data!N3:N <> "")
Column B
Column N
03/07/2021
New York
03/07/2021
Seattle
04/04/2021
Boston
Expected Results:replaces blank values with "No Market"
Column B
Column N
03/07/2021
New York
03/07/2021
No Market
03/07/2021
Seattle
04/04/2021
No Market
04/04/2021
Boston
I appreciate your help, thanks in advance!
Try below formula-
=ArrayFormula({A2:A6,IF(B2:B6="","No Market",B2:B6)})
It's difficult to write full formulas without access to any actual data, but this should work for you:
=ArrayFormula(IFERROR(FILTER({Data!B3:M, IF(Data!N3:N="", "No Market", Data!N3:N)}, Data!C3:C<>"", Data!C3:C>=B1, Data!C3:C<=D1)))
If it does not work as expected, consider sharing a link to the spreadsheet.

sas hierarchical raw file - no record type identifiers - multiple observations per records

I have a problem importing a hierarchical text file into SAS.
I've been searching for the past week and had no luck.
The problem is that this file does not contain anything that indicates that the detail records are linked with the header.
I have tried the various methods explained with the. Input #1 Test # with the then do.
Extract of file (every new record starts with Hong Kong but each record has a variable number of lines):
HONG KONG
STEEL GROUP
Invoice Date
09.12.2015
Number
90035565
Delivery note no.
80006292
SAP Order number
18915
Customer number
105226
Order number
RCHEB 5114 1-1 24-11
Shipped from Saldanha bay, South Africa, per vessel
LAN MAY
Bill of lading date
14.11.2015
Port of discharge
ANY CHINESE PORT
Reference no.
Agreement/Contract/Order
OMl/24/ll
Port Wet Metric Tons Dry Metric Tons
ANY CHINESE PORT 202,079.000 199,957.171
Product % USD Value
Steel Ore 50%;29% 3,500.00
HONG KONG
TRADING CORP
Invoice Date
21.12.2015
Number
90035792
Provisional Invoice No
90033952
SAP Order number
50005313
Customer number
102872
Order number
KITST 5007 1-1 21-11
Shipped from Saldanha bay, South Africa, per vessel
HEBEI SUCCESS
Bill of lading date
15.06.2015
Port of discharge
BEILUN
Reference no.
WUGANG
Agreement/Contract/Order
OM6/21/ABG
Port Wet Metric Tons Dry Metric Tons
BEILUN 124,772.000 122,214.174
Product % USD Value Sishen 63.5%, 8 mm Fine Ore
Steel Ore 50%,10% 2,500.00
Iron Ore 20%,80% 1,500.00
Unfortunately, there is not an easy way to do this in SAS (that I know of). I think you are the right track reading the file in, record by record, and writing logic in a data step to parse it.
I would do it like this:
data blah;
infile "stuff.txt";
format inStr $2000.;
input inStr $;
if strip(inStr) = "HONG KONG" do;
...
end;
else if ... then do;
...
end;
...
run;

How to parse a long dataframe of text using regular expressions into a dataframe [R]

I have a giant data frame which I would like to turn into a more usable format. It is based on a copy-pasted text file of a schedule, where the entries have a consistent format.So, I know that each day will have the format:
###
Title - Date
First event
Time: 11:00 AM
Location: Address
Address line 2
Second event description
Time: 12:00 AM
Location: Address
Address
###
What I am having trouble with is figuring out how to parse this. Basically, I want to store everything between the "###"'s as a single day, and then add events based on how many times the above format repeats, and make a string or datetime entry based on if letters are following a "Time:" or a "Location:".
However, I am really having trouble with this. I have tried putting it all into a giant dataframe where each line is a row, and then adding dummies for location rows, time rows, etc as seperate columns, but am not sure how to translate that into discrete days or events. Any help would be appreciated.
Data is public, so a sample is below -- it is a giant dataframe with one row for each row of text:
*Text*
###
The Public Schedule for Mayor Rahm Emanuel – January 5, 2014
There are no public events scheduled at this time.
###
The Public Schedule for Mayor Rahm Emanuel – January 6, 2014
Mayor Rahm Emanuel will greet and thank snow clearing teams from the Department of Streets and Sanitation.
Time: 11:30 AM
Location: Department of Streets and Sanitation
Salt Station
West Grand Avenue and North Rockwell Street
Chicago, IL*
*There will be no media availability.
Mayor Rahm Emanuel and City Officials will provide an update on the City’s efforts to provide services during the extreme weather.
Time: 2:00 PM
Location: Office of Emergency Management and Communications
Headquarters
1411 West Madison Street
Chicago, IL*
*There will be a media availability following.
Mayor Rahm Emanuel will greet and visit with residents taking advantage of a City warming center.
Time: 3:00 PM
Location: Department of Family and Support Services
10 South Kedzie Avenue
Chicago, IL**
*There will be no media availability.
**Members of the media must respect the privacy of residents at the facility, and can only film City of Chicago employees.
###
Edit:
An example output I would like is something like (sorry the code here is broken, not sure why!):
Date Time Description Location
December 4th 9:00 AM A housewarming party 1211 Main St.
December 5th 11:00 AM Another big event 1234 Main St.
If at all possible.
EDIT 2:
Basically -- I know how to pull all this stuff out of the columns. I think my issue may really be reshaping the data intelligently. I have split it into this giant dataframe with one column where each row is a string which corresponds to a row of text int he original schedule. I then have a bunch of columns like "is_time", "is_new_entry", "is_location" which are 1 if the row is a time, new entry beginning, or location. I just don't know how to reshape this into the output above.