Merge multiple lines into one line using Informatica - informatica

I have a .txt file that contains multiple lines separated by ~.
The input below is just an example - the actual file will have many lines which will vary every time.
abcdefgh~
asdfghjkliuy~
qwertyuiopasdfgh~
..........
Every line ends with ~, and I would like to merge all the lines into one.
Desired output:
abcdefgh~asdfghjkliuy~qwertyuiopasdfgh~..................................
How can I merge all the lines into one line using Informatica and write the result to a .txt file?

This is a concat multiple row to a column problem. Now, if you have a key on which you want to concat, it will make life easy else your concat string will be very long.
so, here are steps.
Sort the data based on key. if you dont have one ignore the step.
Create an expression transformation and create below ports.
in_key
in_data
v_data = IIF( prev_key <> in_key,in_data, v_data || in_data)
prev_data = in_data
prev_key = in_key
out_key = in_key
out_data = v_data
if you do not have key,
in_data
v_data =v_data || in_data
prev_data = in_data
out_data = v_data
Link out_key and out_data to next agg. Pls note, out_data column and v_data column should be data type string so that they can handle large concatenated string.
Attach an aggregator after this expression. Group by key if you have key. Create one output port like below.
out_data = MAX(data)
Link this field to target.

Related

PowerQuery: How to replace text with each column name for multiple columns

I'm trying to replace "x" in each column (excepts for the first 2 columns) with the column name in a table with an unknown number of columns but with at least 2 columns.
I found the code to change one column, but I want it to be dynamic:
#"Ersatt värde" = Table.ReplaceValue(Källa,"x", Table.ColumnNames(Källa){2},Replacer.ReplaceText,{Table.ColumnNames(Källa){2}})
Any ideas on how to solve it?
If I understand correctly, I think you can try either approach below:
#"Ersatt värde" =
let
columnsToTransform = List.Skip(Table.ColumnNames(Källa), 2),
accumulated = List.Accumulate(columnsToTransform, Källa, (tableState as table, columnName as text) =>
Table.ReplaceValue(tableState,"x", columnName, Replacer.ReplaceText, {columnName})
)
in accumulated
or:
#"Ersatt värde" =
let
columnsToTransform = List.Skip(Table.ColumnNames(Källa), 2),
transformations = List.Transform(columnsToTransform, (columnName) => {columnName, each
Replacer.ReplaceText(Text.From(_), "x", columnName)}),
transformed = Table.TransformColumns(Källa, transformations)
in transformed,
Both ways follow a similar approach:
Figure out which columns to do replacements in (i.e. all except the first 2 columns)
Loop over columns determined in previous step and actually do the replacement.
I've used Replacer.ReplaceText since that's what you'd used in your question, but I believe this will replace both partial matches and full matches.
If you only want full matches to be replaced, I think you can use Replacer.ReplaceValue instead.

Looking for the proper way to format the text in a column and compare that with the value of a cell?

I am trying to format the information from a column that I am querying and compare that to information in a cell. I have tried to hack together various ways to do this, but I am not a proficient SQL/spreadsheet user.
In COLUMN I there is nothing.
In COLUMN K there is a match on A2.
In COLUMN N there is Information formatted like 31'-40' and 41'+.
I would prefer to use = instead of contains.
The REPLACE Function seems to work when I substitute N for a String and run it on the W3 School Website.
The REGEXREPLACE seems to work on D2. I would expect them to match, but they do not.
COUNT( QUERY( '2019'!A2:P, "select D where I='' and upper(K) contains '" & UPPER(A2) & "' and REPLACE(REPLACE(REPLACE(N, '-', ''), '''', ''), '+','') contains '"& Regexreplace(D2,"[[:punct:]]","") &"' ")
I get 0 matches.
you almost had it, but try like this:
=COUNTA(FILTER(2019!D2:D, I2:I="",
REGEXMATCH(UPPER(K2:K), UPPER(A2)),
REGEXMATCH(UPPER(N2:N), UPPER(D2))))

How to manipulate multiple csv files with raw data starting from different row for each file?

I would like to format multiplecsv files, some of them have summaries before the raw data. Raw data can start at any row, but if “colname” is find at any row then raw data start there. I am using the Standard Libary csv module to read files and check if “colname” exist and extract the data from there. With the code below, print(data) always gives me data from the first row of the file. But I want to pull the data starting from where “colname” is found. If “colname” is not found I don’t want to read the data.
Root_dir=r”folder1”
for fname in os.listdir(root_dir):
file_path = os.path.join(root_dir, fname)
if fname.endswith(('.csv')):
n = 0
with open(file_path,'rU') as fp:
csv_reader = csv.reader(fp)
while True:
for line in csv_reader:
if line == " colname": continue
n = n + 1
data=line
print(data)
Your code's logic reads only skip lines that aren't exactly " colname", which has 2 problems:
You want to skip lines until AFTER you have seen "colname"; you could use a boolean variable to distinguish between these two situations
Not clear if your test for colname is correct; for example, if there isn't exactly one leading space, or the line has a trailing end-of-line character, would trip it up.

Parse multiple line query to a single line query stored in a string in C++

I am very new to C++ and it is taking some time for me to implement things.
My program should parse the input batch file (Example: input.SQL) given by the user
using the command
./DBMS script=input.SQL (where DBMS is an executable file).
I am trying to store all the data to a string (SQL_Query). My code runs in such a way that, the queries are parsed and executed one at a time which should have semicolon separator at the end of each line. My code is getting a segmentation error when I tried to parse the input. The input file is in the following format.
CREATE TABLE T(K INT, A INT, C CHAR(1), PRIMARY KEY(K));<br/>
INSERT INTO T VALUES(1,1,'A');<br/>
SELECT K,C<br/>
FROM T<br/>
WHERE A=1;<br/>
SELECT *<br/>
FROM (SELECT I,D FROM S WHERE B=3)Z1;<br/>
SELECT X.K,Y.C,X.C<br/>
FROM (SELECT * FROM T WHERE C='A') X<br/>
JOIN (SELECT K, C FROM Z)Y ON Y.K=X.K;<br/>
SELECT X1.K, Y1.C, X1.C<br/>
FROM (SELECT T.K,T.C FROM T JOIN Z ON T.K=Z.K WHERE T.C='A') X1<br/>
JOIN (SELECT Z.K,Z.C FROM Z)Y1 ON Y1.K=X1.K;
I need to get this multiple line query to a single line so that my program works. I have searched regarding this but did not find something helpful.
I tried doing it in this way but could not succeed.
for(int index = 0; index < SQL_Query.size(); index++){
if (SQL_Query.at(index) == ';'){
SQL_Query += " " + SQL_Query;
index++;
}
}
Kindly help.

Merge CSV row with a string match from a 2nd CSV file

I'm working with two large files; approximately 100K+ rows each and I want to search csv file #1 for a string contained in csv file#2, then join another string from csv file#1 to the row in csv file#2 based on the match criteria. Here's an example of the data I'm working with and my expected output:
File#1: String to be matched in file#2 is the 2nd element; 1st is to be appended to each matched row in file#2. (Integer to be appended is bold; string to be matched is italicized for clarity only)
row 1:
3604430123,mta0000cadd503c.mta.net
row 2:
3604434567,mta0000CADD5638.MTA.NET
row 3:
3606304758,mta00069234e9a51.DT.COM
File#2:
row 1:
4246,211-015617,mta0000cadd503c.mta.net,old,NW MG2,BBand2 ESA,Active
row 2:
7251,ACCOUNT,mta0000CADD5638.MTA.NET,FQDN ,NW MG2,BBand2 ESA,Active
row 3:
536887946,874-22558501,mta00069234e9a51.DT.COM,"P",NW MG2,BBand2 ESA,Active
Desired Output joining bold integer string from file#1 to entire row in file#2 based on string match between file#1 and file#2:
row 1:
4246,211-015617,mta0000cadd503c.mta.net,old,NW MG2,BBand2 ESA,Active,3604430123
row 2:
7251,ACCOUNT,mta0000CADD5638.MTA.NET,FQDN ,NW MG2,BBand2 ESA,Active,3604434567
row 3:
536887946,874-22558501,mta00069234e9a51.DT.COM,"P",NW MG2,BBand2 ESA,Active,3606304758
There are many instances where the case in the match string of file#1 doesn't match the case of file#2, however the characters match, thus case can be ignored for match critera. The character case does need to be preserved in file#2 after it is appended with the integer string from file#1.
I'm a python newb and I've been at this for a while and have scoured posts in SE, but can't seem to come up with working code that gets me to the point where I can just print out a line from file#2 that has been matched on the string in file#1. I've tried a few other methods, such as writing to a dictionary, using Dictreader, etc, but haven't been able to clear what appears to be simple errors in those methods, so I tried to strip this down to simple lists and get to the point where I can use a list comprehension to combine the data, then write that back to a file named output, which will eventually be written back to a csv file. Any help or suggestions would be greatly appreciated.
import csv
sg = []
fqdn = []
output = []
with open(r'file2.csv', 'rb') as src:
read = csv.reader(src, delimiter=',')
for row in read:
sg.append(row)
with open(r'file1.csv', 'rb') as src1:
read1 = csv.reader(src1, delimiter=',')
for row in read1:
fqdn.append(row)
output = output.append([s[0] for s in sg if fqdn[1] in sg])
print output
Result after running this is:
None
Process finished with exit code 0
You should use a dictionary for file#1 than just a list, as matching is easier. Just turn fqdn into a dict and in your loop reading file#1 set your key-value pairs on the dict. I would use .lower() on the match key. This turns the key to lower case so you later only have to check if the lower-cased version of the field in file#2 is a key in the dictionary:
import csv
sg = []
fqdn = {}
output = []
with open(r'file2.csv', 'rb') as src:
read = csv.reader(src, delimiter=',')
for dataset in read:
sg.append(dataset)
with open(r'file1.csv', 'rb') as src1:
read1 = csv.reader(src1, delimiter=',')
for to_append, to_match in read1:
fqdn[to_match.lower()] = to_append
for dataset in sg:
to_append = fqdn.get(dataset[2].lower()) # If the key matched, to_append now contains the string to append, else it becomes None
if to_append:
dataset.append(to_append) # Append the field
output.append(dataset) # Append the row to the result list
print(output)
You can then use csv.writer to create a csv file from the result.
Here's a brute force solution to solving this problem. For every line of the first file, you will search through every line of the second file until you find a match. The matched lines will be written out to the output.csv file in the format you specified using the csv writer.
import csv
with open('file1.csv', 'r') as file1:
with open('file2.csv', 'r') as file2:
with open('output.csv', 'w') as outfile:
writer = csv.writer(outfile)
reader1 = csv.reader(file1)
reader2 = csv.reader(file2)
for row in reader1:
if not row:
continue
for other_row in reader2:
if not other_row:
continue
# if we found a match, let's write it to the csv file with the id appended
if row[1].lower() == other_row[2].lower():
new_row = other_row
new_row.append(row[0])
writer.writerow(new_row)
continue
# reset file pointer to beginning of file
file2.seek(0)
You might be tempted to store the information in a data structure before writing it out to a file. In my experience, you always end up getting larger files in the future and may run into memory issues. I like to write things out to file as I find the matches in order to avoid this problem.