I'm trying to find the best way to upload, parse and work with text file in Oracle APEX (current version 20.1). Bussiness case: I must upload text file, first line will be saved to table A.
Rest lines contains some records (columns are pipe delimited) should be validated. After that correct recordes should be saved to table B or if there is some error it should be saved to table C (error log).
I tried to do something with the Data Loading wizard but it doesn't fit to my requirements.
Right now I added a "File browse..." item to page, and after page submit I can find this file in APEX_APPLICATION_TEMP_FILES in blob_content.
Is there any other option to work with that file than working with blob_content from APEX_APPLICATION_TEMP_FILES. I find it difficoult to work with type of data.
Text file look something like that:
2020-06-05 info: header line
2020-06-05|columnAValue|columnBValue|
2020-06-05|columnAValue||columnCValue
2020-06-05|columnAValue|columnBValue|columnCValue
have a look into the APEX_DATA_PARSER.PARSE table function. It parses the CSV file and returns the values as rows and columns. It's described in more detail within this blog posting:
https://blogs.oracle.com/apex/super-easy-csv-xlsx-json-or-xml-parsing-about-the-apex_data_parser-package
Simply pass "file.csv" (literally) as the p_file_name argument. APEX_DATA_PARSER does not care about the "real" file name....
The function uses the file extension only to differentiate between delimited, XLSX, XML or JSON files. So simply pass in a static file name like "file.csv". That should be enough.
Related
I performed data cleaning using pandas and created a csv file. When I loaded the file in power query, I saw there were null values in few columns. I am sure that all null values were removed when performing cleaning task. I even re verified it.
UPDATE:- Instead of csv file, I tried creating an excel file and when I loaded it there were no null values.
Can anybody point out what's the issue with csv file method?
Typically associated with the format of the CSV. Such as single quotes inside text fields also using single quotes to start and stop the text field.
I used Athena's CTAS and INSERT commands and Avro files created at the external_location
But the file name is very strange and the filename extension also disappear. (That file don't have any filename extension. File has only their strange filename like hash code)
How can I define filenames rule for Athena's file?
Thank you.
As stated on page 20 of AWS Athena's manual, ..."This location in Amazon S3 comprises all of the files representing your table. For more information, see Using Folders in the Amazon Simple Storage Service Console User Guide."...
Reference:
https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf
So, no, you can't define the name of the file (or files, because more than one may be needed to represent a table). BUT THE RIGHT WAY TO THINK is that the BUCKET/PATH is what represents the file name, or the output table.
We might get confused because you're genereting and AVRO file, which really is a file, like PARQUET, but remember that Athena can also output to other formats, which may be multi-file.
I'v been asked to create an app in Oracle Apex that will allow me to drop a CSV file. The file contains a list of all active physicians and associated info in my area. I do not know where to begin! Requirements:
-after dropping CSV file to apex, remove unnecessary columns
-edit data in each field, ie if phone# > 7 characters and begins with 1, remove 1. Or remove all special characters from a column.
-The CSV contains physicians of every specialty, I only want to upload specific specialties to the database table.
I have a small amount of SQL experience from Uni, and I know some HTML and CSS, but beyond that I am lost. Please help!
Began tutorial on Oracle-Apex. Created upload wizard on a dev environment
User drops CSV file to apex
Apex edits columns to remove unneccesary characteres
Only uploads specific columns from CSV file
Only adds data when column "Specialties" = specific specialties
Does not add redundant data (physician is already located in table, do nothing)
Produces report showing all new physicians added to table
Huh, you're in deep trouble as you have to do some job using a tool you don't know at all, with limited knowledge of SQL language. Yes, it is said that Apex is simple to use, but nonetheless ... you have to know at least something. Otherwise, as you said, you're lost.
See if the following helps.
there's the CSV file
create a table in your database; its description should match the CSV file. Mention all columns it contains. Pay attention to datatypes, column lengths and such
this table will be "temporary" - you'll use it every day to load data from CSV files: first you'll delete all it contains, then load new rows
using Apex "Create page" Wizard, create the "Data loading" process. Follow the instructions (and/or read documentation about it). Once you're done, you'll have 4 new pages in your Apex application
when you run it, you should be able to load CSV file into that temporary table
That's the first stage - successfully load data into the database. Now, the second stage: fix what's wrong.
create another table in the database; it will be the "target" table and is supposed to contain only data you need (i.e. the subset of the temporary table). If such a table already exists, you don't have to create a new one.
create a stored procedure. It will read data from the temporary table and edit everything you've mentioned (remove special characters, remove leading "1", ...)
as you have to skip physicians that already exist in the target table, use NOT IN or NOT EXISTS
then insert "clean" data into the target table
That stored procedure will be executed after the Apex loading process is done; a simple way to do that is to create a button on the last page which will - when pressed - call the procedure.
The final stage is the report:
as you have to show new physicians, consider adding a column (into the target table) which will be a timestamp (perhaps DATE is enough, if you'll be doing it once a day) or process_id (all rows inserted in the same process will share the same value) so that you could distinguish newly added rows from the old ones
the report itself would be an Interactive report. Why? Because it is easy to create and lets you (or end users) to adjust it according to their needs (filter data, sort rows in a different manner, ...)
Good luck! You'll need it.
I want to extract filename and store the filename in one of the existing column in the CSV file. How to do this? Which processor to use? what configuration?
Ex- i have a filename 'FE_CHRGRSIM_20171207150616_CustRec.csv' and i want to extract ''FE_CHRGRSIM_20171207150616' and store this value under an existing column in the Same CSV file. Please help. TIA
Usually the "real" file name is available as an attribute on the flow file called "filename". You can use UpdateRecord with a Replacement Strategy of "Literal Value"; add a user-defined property called /filename and set the value to ${filename:substringBeforeLast('.')}. You'll need to make sure that the "filename" field is added to your schema (either by UpdateRecord or manually). If you won't know your CSV schema ahead of time you can use InferAvroSchema and it will try to figure it out.
If UpdateRecord and the schema stuff doesn't seem to be working for you, an alternative (since it's CSV) is to use ReplaceText, match the entire line, then replace with that value followed by ,${filename:substringBeforeLast('.')}. That should add the filename (with extension removed) as the last column in the outgoing CSV.
I'm new to this, so I am sure it is a silly question, but I have read through every question related on the site and can't find anything!
I am exporting from pgadmin. A few of the columns have line breaks within the cells, so the exported data is very choppy. Does anyone know how to fix this? Is there a way to make it so the line breaks within cells are not read?
I know I am doing the right settings for exporting, but basically what happens is that the header names are there, along with one row of content for each column and then Column A will have 20 more rows beneath it because of line breaks from the first cell in column E.
Any help would be much appreciated!
I assume that you're referring to the Query --> Execute to file command in the Query window. I don't think it's a bug that pgAdmin doesn't escape line breaks within strings in its csv output, but Excel can read it correctly anyway.
In the export options, please make sure that you use commas as column separators and double quotes as quote chars. Here are my settings:
Additionally, when you load your CSV into Excel, please don't use Data -> From Text. This one doesn't parse CSV with line breaks correctly. Just open the file directly in Excel (via Open within Excel, or by right clicking it in Windows Explorer and choosing Open With -> Microsoft Excel).