Read CSV File in Django and then write it to DB Postgresql - django

How can I read a CSV file, parse the values, and then output it to a particular database table?
That's the basic problem.
Here is a 'bigger picture' of what I'm trying to do:
I'm trying to either read from multiple CSV files (every minute) and or read from an ever-updating CSV file (with additional row entries every update) every minute.

Related

Irregularities when csv loaded in power query

I performed data cleaning using pandas and created a csv file. When I loaded the file in power query, I saw there were null values in few columns. I am sure that all null values were removed when performing cleaning task. I even re verified it.
UPDATE:- Instead of csv file, I tried creating an excel file and when I loaded it there were no null values.
Can anybody point out what's the issue with csv file method?
Typically associated with the format of the CSV. Such as single quotes inside text fields also using single quotes to start and stop the text field.

How to read, modify, and overwrite parquet files in S3 using Spark?

I am trying to read a bunch of parquet files from S3 into a Spark dataframe using df = spark.read.parquet("s3a://my-bucket/path1/path2/*.parquet").
Will this read all the Parquet files present at any level inside path2 (e.g. path2/path3/...file.parquet) or only the files present directly under path2 (e.g. path2/file1.parquet)
Will df now contain the complete filenames/filepaths (object keys) of all these Parquet files ?
While processing the contents of a single parquet file as a dataframe, I want to modify the dataframe, and overwrite the dataframe inside the same file. How can I do that ? Even if it deletes the old version of the file and creates a new file (new filename), that's fine, but I don't want any other files apart from the current file under consideration to be affected in any manner by this operation.

Upload and parse file in Oracle APEX

I'm trying to find the best way to upload, parse and work with text file in Oracle APEX (current version 20.1). Bussiness case: I must upload text file, first line will be saved to table A.
Rest lines contains some records (columns are pipe delimited) should be validated. After that correct recordes should be saved to table B or if there is some error it should be saved to table C (error log).
I tried to do something with the Data Loading wizard but it doesn't fit to my requirements.
Right now I added a "File browse..." item to page, and after page submit I can find this file in APEX_APPLICATION_TEMP_FILES in blob_content.
Is there any other option to work with that file than working with blob_content from APEX_APPLICATION_TEMP_FILES. I find it difficoult to work with type of data.
Text file look something like that:
2020-06-05 info: header line
2020-06-05|columnAValue|columnBValue|
2020-06-05|columnAValue||columnCValue
2020-06-05|columnAValue|columnBValue|columnCValue
have a look into the APEX_DATA_PARSER.PARSE table function. It parses the CSV file and returns the values as rows and columns. It's described in more detail within this blog posting:
https://blogs.oracle.com/apex/super-easy-csv-xlsx-json-or-xml-parsing-about-the-apex_data_parser-package
Simply pass "file.csv" (literally) as the p_file_name argument. APEX_DATA_PARSER does not care about the "real" file name....
The function uses the file extension only to differentiate between delimited, XLSX, XML or JSON files. So simply pass in a static file name like "file.csv". That should be enough.

Extract values from CSV and place it in a new column within the same CSV file using NiFi

I have a CSV file from which i need to extract two values like 'UutId & Test' and place it in the last column (filename).How can i pull/extract values from same CSV file and place it in the last column of the same CSV file called 'filename' which is blank(without any values right now).
If possible please tell what processor and configs to use?TIA
For reference i am attaching CSV file snapshot of value i want to add in the last column called 'filename' repeated in the whole CSV records.
Values to be extracted
The column in which the values should be placed
List queue after GetFile
The list queue after playing the UpdateRecord processor makes it blank file.
List queue after playing UpdateRecord
please help
Use UpdateRecord processor below configs
in CSVRecordSetWriter controller service add filename column with string type as last field in the avro schema.
UpdateRecord Configs:
Add new property in UpdateRecord processor as
/filename
concat(/UutId,/Test)
As we are using Record Path Value as Replacement Value Strategy
now update record processor will concat UutId,Test values to filename column value.
Refer to this link for more details regarding Update Record processor.

What is the best approach to load data into Hive using NiFi?

I have started working with NiFi. I am working on a use case to load data into Hive. I get a CSV file and then I use SplitText to split the incoming flow-file into multiple flow-files(split record by record). Then I use ConvertToAvro to convert the split CSV file into an AVRO file. After that, I put the AVRO files into a directory in HDFS and I trigger the "LOAD DATA" command using ReplaceText + PutHiveQL processor.
I'm splitting the file record by record because to get the partition value(since LOAD DATA doesn't support dynamic partitioning). The flow looks like this:
GetFile (CSV) --- SplitText (split line count :1 and header line count : 1) --- ExtractText (Use RegEx to get partition fields' values and assign to attribute) --- ConvertToAvro (Specifying the Schema) --- PutHDFS (Writing to a HDFS location) --- ReplaceText (LOAD DATA cmd with partition info) --- PutHiveQL
The thing is, since I'm splitting the CSV file into each record at a time, it generates too many avro files. For ex, if the CSV file has 100 records, it creates 100 AVRO files. Since I want to get the partition values, I have to split them by one record at a time. I want to know is there any way, we can achieve this thing without splitting record by record. I mean like batching it. I'm quite new to this so I am unable to crack this yet. Help me with this.
PS: Do suggest me if there is any alternate approach to achieve this use case.
Are you looking to group the Avro records based on the partitions' values, one Avro file per unique value? Or do you only need the partitions' values for some number of LOAD DATA commands (and use a single Avro file with all the records)?
If the former, then you'd likely need a custom processor or ExecuteScript, since you'd need to parse, group/aggregate, and convert all in one step (i.e. for one CSV document). If the latter, then you can rearrange your flow into:
GetFile -> ConvertCSVToAvro -> PutHDFS -> ConvertAvroToJSON -> SplitJson -> EvaluateJsonPath -> ReplaceText -> PutHiveQL
This flow puts the entire CSV file (as a single Avro file) into HDFS, then afterwards it does the split (after converting to JSON since we don't have an EvaluateAvroPath processor), gets the partition value(s), and generates the Hive DDL statements (LOAD DATA).
If you've placed the file at the location where the hive table is reading the data using the puthdfs processor then you don't need to call the puthiveql processor. I am also new to this but I think you should leverage the schema-on-read capability of hive.