I need to change the csv file path dynamically everyday on refresh. Like it would be path/filename_01_Dec_20.csv tomorrow it should be path/filename_02_Dec_20.csv, this way it will change daily. Please let me know if this can be done
Write the query for one of the CSV tables but modify the code where the filepath is specified from e.g.
.../filename_01_Dec_20.csv"...
To
.../filename_" & Date.ToText(Date.From(DateTime.LocalNow()), "dd_MMM_yy") & ".csv"...
This just puts the current date DateTime.LocalNow() in the format you're looking for using the Date.ToText formatting function.
Related
I have a file structure such as:
gs://BUCKET/Name/YYYY/MM/DD/Filename.csv
Every day my cloud functions are creating another path with another file innit corresponding to the date of the day (so for today's 5th of August) we would have gs://BUCKET/Name/2022/08/05/Filename.csv
I need to find a way to query this data to Big Query automatically so that if I want to query it for 'manual inspection' I can select for example data from all 3 months in one query doing CREATE TABLE with gs://BUCKET/Name/2022/{06,07,08}/*/*.csv
How can I replicate this? I know that BigQuery does not support more than 1 wildcard, but maybe there is a way to do so.
To query data inside GCS from Big Query you can use an external table.
Problem is this will fail because you cannot have a comma (,)
as part of the URI list
CREATE EXTERNAL TABLE `bigquerydevel201912.foobar`
OPTIONS (
format='CSV',
uris = ['gs://bucket/2022/{1,2,3}/data.csv']
)
You have to specify the 3 CSV file locations like this:
CREATE EXTERNAL TABLE `bigquerydevel201912.foobar`
OPTIONS (
format='CSV',
uris = [
'gs://inigo-test1/2022/1/data.csv',
'gs://inigo-test1/2022/2/data.csv']
'gs://inigo-test1/2022/3/data.csv']
)
Since you're using this sporadically, probably makes more sense to create a temporal external table.
se I found a solution that works at least for my use case, without using the external table.
During the creation of table in dataset in BigQuery use create table from: GCS and then when using URI pattern I used gs://BUCKET/Name/2022/* ; As long as filename is the same in each subfolder and schema is identical, then BQ will load everything and then you can perform date operations directly in BQ (I have a column with ingestion date)
I am using BIG QUERY EXPORT DATA statement to create files in cloud storage for an another team to extract for further reprocessing. I am using below statement, not pasting the select query as its huge.
EXPORT DATA OPTIONS(
uri='gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter='|') AS
SELECT
I see below files getting created in my cloud storage bucket
radhika_sharma_ibm#cloudshell:~ (whr-asia-datalake-nonprod)$ gsutil ls gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000000.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000001.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000002.csv
I cannot remove the suffix part as BIG QUERY creates it, but I am wondering if I can create files with DATE in the file name for the other team to identify what date it is created for??
That is like
Customer_Master_04022021_000000000000_.csv
I need to have a date in my file. Any help or inputs please?
Is there a work around or I will have to go with a data flow here that is using a data flow job to extract data from table in a file.
You can use the uri value as:
'gs://bucket/folder/your_filename-'||current_datetime()||'-*.csv'
Either Current_date() or current_datetime() can be used.
Thanks
I'm trying to find the best way to upload, parse and work with text file in Oracle APEX (current version 20.1). Bussiness case: I must upload text file, first line will be saved to table A.
Rest lines contains some records (columns are pipe delimited) should be validated. After that correct recordes should be saved to table B or if there is some error it should be saved to table C (error log).
I tried to do something with the Data Loading wizard but it doesn't fit to my requirements.
Right now I added a "File browse..." item to page, and after page submit I can find this file in APEX_APPLICATION_TEMP_FILES in blob_content.
Is there any other option to work with that file than working with blob_content from APEX_APPLICATION_TEMP_FILES. I find it difficoult to work with type of data.
Text file look something like that:
2020-06-05 info: header line
2020-06-05|columnAValue|columnBValue|
2020-06-05|columnAValue||columnCValue
2020-06-05|columnAValue|columnBValue|columnCValue
have a look into the APEX_DATA_PARSER.PARSE table function. It parses the CSV file and returns the values as rows and columns. It's described in more detail within this blog posting:
https://blogs.oracle.com/apex/super-easy-csv-xlsx-json-or-xml-parsing-about-the-apex_data_parser-package
Simply pass "file.csv" (literally) as the p_file_name argument. APEX_DATA_PARSER does not care about the "real" file name....
The function uses the file extension only to differentiate between delimited, XLSX, XML or JSON files. So simply pass in a static file name like "file.csv". That should be enough.
I just got started in PowerBI and I am generating two report files every month from Service NOW.
SLA's report and the Incident report. Eventually, these files have the naming INC_MM_YY.xls or SLA_MM_YY.xls.
I am trying to make the addition of the previous month's files without the need to add new data sources/edit the queries. It seems that it is possible using M language in the advanced query editor but seems a lot complicated since I have 0 experience with power query M.
Are there other ways?
Or in the case above. I can retrieve the folder data as a table and iterate over the files. But how to do that in the M language?
Thank you.
EDIT: Just to try to make it clear let's look at the table generated by the folder source.
We have the name of the file and it's path for each row.
So in pseudo code should be something like:
For (each row as n) {
if (n.folderpath ends with "sla") {
tablesla += load source n."folderpath" && n."filename"
}
else tableincident += load source n."folderpath" && n."filename"
}
It just seems not practical in powerquery :/ I could find how to make something similar to a for loop but very confusing.
I figured it out.
You can actually create two different sources, one for the folder with the SLA and another with the folder for incident. Just after combining and transforming the data from one of the folders. Still in the Query Editor, you just click New Source and the other folder data will combined in a different table.
With that you have two distinct tables and any time when you put a new file in one of the folders, hit refresh, the data will be added to the correct table.
Thank you guys.
try the load from folder option, you can place each months files into a its own folder one for the SLA's and one for the Incidents. With the load from folder, it will go though each file and load it. So the next month, you add in Novembers data, refresh the dataset(s) and it will add it automatically.
The files need to be the same structure for it to work effectively, and it will load what it sees in the folder, so if you remove a file, Power BI will not retain it in the workbook, it only loads what it can see.
Other examples
https://powerbi.tips/2016/06/loading-data-from-folder/
https://insightsoftware.com/blog/power-bi-load-data-from-folder/
Hope that helps
I want to extract filename and store the filename in one of the existing column in the CSV file. How to do this? Which processor to use? what configuration?
Ex- i have a filename 'FE_CHRGRSIM_20171207150616_CustRec.csv' and i want to extract ''FE_CHRGRSIM_20171207150616' and store this value under an existing column in the Same CSV file. Please help. TIA
Usually the "real" file name is available as an attribute on the flow file called "filename". You can use UpdateRecord with a Replacement Strategy of "Literal Value"; add a user-defined property called /filename and set the value to ${filename:substringBeforeLast('.')}. You'll need to make sure that the "filename" field is added to your schema (either by UpdateRecord or manually). If you won't know your CSV schema ahead of time you can use InferAvroSchema and it will try to figure it out.
If UpdateRecord and the schema stuff doesn't seem to be working for you, an alternative (since it's CSV) is to use ReplaceText, match the entire line, then replace with that value followed by ,${filename:substringBeforeLast('.')}. That should add the filename (with extension removed) as the last column in the outgoing CSV.