Cloud Data Fusion - trim quotes for all columns - google-cloud-platform

I have a csv file in GCS with fields with hundreds of columns enclosed in quotes, like below :
"John","Doe","5/15/2021 7:18:26 PM"
I need to load this to BigQuery using Data fusion, created a pipeline. My question is
How do I trim quotes from these the columns in the Wrangler? I don't find much documentation for this, rather than the basic things
How do I apply this rule for all the columns in one shot.
Please guide me, any good reading on these kind of operations will also be helpful

For testing purposes I used your sample data and add a few more entries.
Remove quotes
If your data looks like this and your objective is to just remove the quotes from your data, what you can do is:
Click the drop down arrow beside body
Select Find and replace
At find put " and leave replace as blank
Your output will look like this:
Parse CSV to split into columns
You can then convert your CSV to columns:
Click the drop down beside body
Select Parse -> CSV
A pop up will appear and select "Comma"
This will tell your wrangler to read it as a CSV and split the comma to columns. But the original data will remain at column body.
To delete body:
Select body by ticking the check box at the right
Click the drop down beside body
Select Delete column
Your data should now look like this:

Related

Is there a way to create a char(13) delimited string that can be used in Power BI?

I am creating a report in Power BI and would like to create an client address string that is formatted as a standard line-delimited postal address in the United States.
I tried creating a DAX measure, but could not get around the error:
A single value for column 'ServiceAddress1' in table 'pbiCoverPage' cannot be determined.
This error occurs if I try to use the DAX TRIM function on any TEXT column for a Measure.
As a workaround, I created a SQL View that returns a CHAR(13) delimited string with a postal address.
However, if I display the field in a card visual, the CHAR(13) do not create separate lines. The postal address is displayed on a single line with the CHAR(13) interpreted as a spaces.
My questions are:
Can text fields be used at all in DAX Functions such as TRIM in a Measure?
Is there a Power BI Visual, other than 'Card' that can display text on a report?
Displaying a postal address should not be a difficult task. Is there a simple way to do this in Power BI?'
Is there any way to use a Text Box in Power BI to show a field value? I think this would allow me to left-justify the address string.
If I try to add a Value to a text box, I get a message:
To turn of Q&A, we need to create a local index of your data. If you publish the report, we'll one in the service as well.
I am researching this message. It seems like I am driving a tack with a sledge hammer. Is there an easier way to display a formatted text string in Power BI?
I tested the following approaches:
Use a Button. The button supports text alignment and has word-wrap. However, it does not support line delimiters such as CHAR(10) or CHAR(13).
Use the HTML Content Custom Visual from:
https://www.html-content.com/
The HTML Content control allows you to write SQL Queries that return HTML that includes the <br> line-break tag.
Thus far, the HTML Content visual is the only way (short of a custom R or Python visual) to display a string (such as a postal address) with multiple delimited lines in Power BI.

Filtering google sheets column to show cells ending with one of several values

I need to filter column A to see which cells end with values like .com/,.org/,.co.uk/, etc
Instead of filtering by "text ends with" dozens or hundreds of times, is there a way to combine all of these into one custom formula?
try this custom formula for:
filter view
conditional formatting
data validation
=REGEXMATCH(A1, ".com/$|.org/$|.co.uk/$")

Turn off Toad word wrapping in output table

I have a table with a field which contains lots of XML data and that data comes back in multiple lines causing it to be difficult to scroll through.
I have only been able to figure out how to turn off the wrapping in the editor, but not the output table. Is this possible?

Toad Import - Change column value based on which excel worksheet the data came from

I am trying to import to a table an excel file with 3 worksheets. For one of the columns I would like to vary the value populated based on which worksheet it came from. Is there a way to do this with Toad Import. The sheets look the same and have the same columns. But if it is sheet 1 I want a certain column to be be ABC, if sheet 2 XYZ, sheet 3 ETC. Is there a way?
I load Excel files frequently, but I didn't notice such an option. So I went to see the loader once again, step-by-step, checked carefully what it offers - nope, there's nothing like that.
A workaround, quite simple, is to edit the Excel file and create that column. Then
enter ABC into the first cell of that column
select the cell
double click the bottom right corner
it'll auto-populate all rows in that worksheet with ABC (in that column)
do the same for other sheets
Another option is to set column's default value, e.g.
alter table test modify ident varchar2(3) default 'abc';
You'd have to do that for each sheet you load, modify 'abc' to 'def' etc.
Or, if you switch to SQL*Loader (there's that option in the same menu as "Import table data"), create a control file which utilizes the CONSTANT keyword, e.g.
load data
infile excel.csv
append
into table test
fields terminated by ',' optionally enclosed by '"'
(name,
value,
ident constant "abc"
)
Just like in previous case, you'd have to modify the constant value before loading each worksheet's data.
Also, note that SQL*Loader can't load XLS file - you'd have to save each sheet into its own .csv file first.
Which option would I choose? Probably the first one; seems to be the simplest & the fastest.

Amazon Athena - Column cannot be resolved on basic SQL WHERE query

I am currently evaluating Amazon Athena and Amazon S3.
I have created a database (testdb) with one table (awsevaluationtable). The table has two columns, x (bigint) and y (bigint).
When I run:
SELECT *
FROM testdb."awsevaluationtable"
I get all of the test data:
However, when I try a basic WHERE query:
SELECT *
FROM testdb."awsevaluationtable"
WHERE x > 5
I get:
SYNTAX_ERROR: line 3:7: Column 'x' cannot be resolved
I have tried all sorts of variations:
SELECT * FROM testdb.awsevaluationtable WHERE x > 5
SELECT * FROM awsevaluationtable WHERE x > 5
SELECT * FROM testdb."awsevaluationtable" WHERE X > 5
SELECT * FROM testdb."awsevaluationtable" WHERE testdb."awsevaluationtable".x > 5
SELECT * FROM testdb.awsevaluationtable WHERE awsevaluationtable.x > 5
I have also confirmed that the x column exists with:
SHOW COLUMNS IN sctawsevaluation
This seems like an extremely simple query yet I can't figure out what is wrong. I don't see anything obvious in the documentation. Any suggestions would be appreciated.
In my case, changing double quotes to single quotes resolves this error.
Presto uses single quotes for string literals, and uses double quotes for identifiers.
https://trino.io/docs/current/migration/from-hive.html#use-ansi-sql-syntax-for-identifiers-and-strings
Strings are delimited with single quotes and identifiers are quoted with double quotes, not backquotes:
SELECT name AS "User Name"
FROM "7day_active"
WHERE name = 'foo'
I have edited my response to this issue based on my current findings and my contact with both the AWS Glue and Athena support teams.
We were having the same issue - an inability to query on the first column in our CSV files. The problem comes down to the encoding of the CSV file. In short, AWS Glue and Athena currently do not support CSV's encoded in UTF-8-BOM. If you open up a CSV encoded with a Byte Order Mark (BOM) in Excel or Notepad++, it looks like any comma-delimited text file. However, opening it up in a Hex editor reveals the underlying issue. There are a bunch of special characters at the start of the file:  i.e. the BOM.
When a UTF-8-BOM CSV file is processed in AWS Glue, it retains these special characters, and associates then with the first column name. When you try and query on the first column within Athena, you will generate an error.
There are ways around this on AWS:
In AWS Glue, edit the table schema and delete the first column, then reinsert it back with the proper column name, OR
In AWS Athena, execute the SHOW CREATE TABLE DDL to script out the problematic table, remove the special character in the generated script, then run the script to create a new table which you can query on.
To make your life simple, just make sure your CSV's are encoded as UTF-8.
I noticed that the csv source of the original table had column headers with capital letters (X and Y) unlike the column names that were being displayed in Athena.
So I removed the table, edited the csv file so that the headers were lowercase (x and y), then recreated the table and now it works!