I have a session which reads from file list. File list has path of ONE file.
It has 4431 detail records but I am loading 4432 records, one of the extra records being "path of the file" present in the list file.
Any idea what could be causing this?
Related
I'm trying to find the best way to upload, parse and work with text file in Oracle APEX (current version 20.1). Bussiness case: I must upload text file, first line will be saved to table A.
Rest lines contains some records (columns are pipe delimited) should be validated. After that correct recordes should be saved to table B or if there is some error it should be saved to table C (error log).
I tried to do something with the Data Loading wizard but it doesn't fit to my requirements.
Right now I added a "File browse..." item to page, and after page submit I can find this file in APEX_APPLICATION_TEMP_FILES in blob_content.
Is there any other option to work with that file than working with blob_content from APEX_APPLICATION_TEMP_FILES. I find it difficoult to work with type of data.
Text file look something like that:
2020-06-05 info: header line
2020-06-05|columnAValue|columnBValue|
2020-06-05|columnAValue||columnCValue
2020-06-05|columnAValue|columnBValue|columnCValue
have a look into the APEX_DATA_PARSER.PARSE table function. It parses the CSV file and returns the values as rows and columns. It's described in more detail within this blog posting:
https://blogs.oracle.com/apex/super-easy-csv-xlsx-json-or-xml-parsing-about-the-apex_data_parser-package
Simply pass "file.csv" (literally) as the p_file_name argument. APEX_DATA_PARSER does not care about the "real" file name....
The function uses the file extension only to differentiate between delimited, XLSX, XML or JSON files. So simply pass in a static file name like "file.csv". That should be enough.
I have a CSV file from which i need to extract two values like 'UutId & Test' and place it in the last column (filename).How can i pull/extract values from same CSV file and place it in the last column of the same CSV file called 'filename' which is blank(without any values right now).
If possible please tell what processor and configs to use?TIA
For reference i am attaching CSV file snapshot of value i want to add in the last column called 'filename' repeated in the whole CSV records.
Values to be extracted
The column in which the values should be placed
List queue after GetFile
The list queue after playing the UpdateRecord processor makes it blank file.
List queue after playing UpdateRecord
please help
Use UpdateRecord processor below configs
in CSVRecordSetWriter controller service add filename column with string type as last field in the avro schema.
UpdateRecord Configs:
Add new property in UpdateRecord processor as
/filename
concat(/UutId,/Test)
As we are using Record Path Value as Replacement Value Strategy
now update record processor will concat UutId,Test values to filename column value.
Refer to this link for more details regarding Update Record processor.
I have started working with NiFi. I am working on a use case to load data into Hive. I get a CSV file and then I use SplitText to split the incoming flow-file into multiple flow-files(split record by record). Then I use ConvertToAvro to convert the split CSV file into an AVRO file. After that, I put the AVRO files into a directory in HDFS and I trigger the "LOAD DATA" command using ReplaceText + PutHiveQL processor.
I'm splitting the file record by record because to get the partition value(since LOAD DATA doesn't support dynamic partitioning). The flow looks like this:
GetFile (CSV) --- SplitText (split line count :1 and header line count : 1) --- ExtractText (Use RegEx to get partition fields' values and assign to attribute) --- ConvertToAvro (Specifying the Schema) --- PutHDFS (Writing to a HDFS location) --- ReplaceText (LOAD DATA cmd with partition info) --- PutHiveQL
The thing is, since I'm splitting the CSV file into each record at a time, it generates too many avro files. For ex, if the CSV file has 100 records, it creates 100 AVRO files. Since I want to get the partition values, I have to split them by one record at a time. I want to know is there any way, we can achieve this thing without splitting record by record. I mean like batching it. I'm quite new to this so I am unable to crack this yet. Help me with this.
PS: Do suggest me if there is any alternate approach to achieve this use case.
Are you looking to group the Avro records based on the partitions' values, one Avro file per unique value? Or do you only need the partitions' values for some number of LOAD DATA commands (and use a single Avro file with all the records)?
If the former, then you'd likely need a custom processor or ExecuteScript, since you'd need to parse, group/aggregate, and convert all in one step (i.e. for one CSV document). If the latter, then you can rearrange your flow into:
GetFile -> ConvertCSVToAvro -> PutHDFS -> ConvertAvroToJSON -> SplitJson -> EvaluateJsonPath -> ReplaceText -> PutHiveQL
This flow puts the entire CSV file (as a single Avro file) into HDFS, then afterwards it does the split (after converting to JSON since we don't have an EvaluateAvroPath processor), gets the partition value(s), and generates the Hive DDL statements (LOAD DATA).
If you've placed the file at the location where the hive table is reading the data using the puthdfs processor then you don't need to call the puthiveql processor. I am also new to this but I think you should leverage the schema-on-read capability of hive.
How can I read a CSV file, parse the values, and then output it to a particular database table?
That's the basic problem.
Here is a 'bigger picture' of what I'm trying to do:
I'm trying to either read from multiple CSV files (every minute) and or read from an ever-updating CSV file (with additional row entries every update) every minute.
I have a workflow which writes data from a table into a flatfile. It works just fine, but I want to insert a blank line inbetween each records. How can this be achieved ? Any pointer ?
Here, you can create 2 target instances. One with the proper data and in other instance pass blank line. Set Merge Type as "Concurrent Merge" in session properties.
Multiple possibilities -
You can prepare appropriate dataset into a relational table, and afterwards, dump data from that into a flat file. For preparation of that data set, you can insert blank rows into that relational target.
Send a blank line to a separate target file (based on some business condition using a router or something similar), after that you can use merge files option (in session config) to get that data into a single file.