Error Message: "Failed to create table: Error while reading data, error message: CSV table references column position"
I'm having issues loading data from a CSV in Google Cloud Storage into BigQuery and creating an associated table. I'm starting with Cloud Storage, adding my raw CSV file there. Then, moving to BigQuery, Create Dataset > Create Table using the CSV in the Cloud Storage.
My CSV format is;
enter image description here
The Parameters in my BigQuery table are;
enter image description here
I can't get the data to load while in this format and this setup. the original dataset goes to 10k+ plus rows, however I've reduced the scope to troubleshoot the format error.
Any response or guidance would be greatly appreciated
Related
I am creating an application on appsheet where in I am trying to bulk upload data to google bigquery through appsheet by uploading a csv file. The functionality is working fine if there is only 1 row in the csv file but if there are more records then bulk upload is failing bigquery error
appsheet error
The restriction I have is that I can't use google sheets for any work around.
I directly connected appsheet with google bigquery and tried to bulk upload data into bigquery by uploading csv file in appsheet but it failed
I want to load data from a parquet file in google cloud storage to Bigquery. While loading I also want to add couple of extra columns to the table which are not present in the source file like insert_time_stamp and source_file_name. After doing some research I found these options -
Create a temporary table linked to file in GCS and then load the data from the temporary table along with additional columns to the final Bigquery table.
Load the data from parquet file to pandas dataframe, add the extra two columns and then use pandas.DataFrame.to_gbq or client.load_table_from_dataframe options to load data to Bigquery table.
Load the data from parquet file to a staging table(by this I mean a normal table) in Bigquery and then use this table to create the final table as - "insert into final_table select *,current_timestamp as insert_time_stamp, <file_name> as source_file_name from staging_table". And then finally dropping the staging table.
If the number of rows from the source file are in millions, what would be the best approach to take?
So the issue is, I am unable to load data to GCP BigQuery in dataset 'dw' located in US location. However I am able to load the data in East-Asia location.
I am trying to load data to partitioned tables in dataset 'dw' (US location) using NiFi ingestion tool but no error and no data loaded. I even tried inserting manually from BigQuery editor, unfortunately no error and no data inserted into dw.aes_mapdata2.
However, I am able to load data to dataset TEST.aes_mapdata2_copy which is in location "asia-east1".
Any ideas on what I am doing wrong?
We figured it out. We had to create new tables. Seems we messed up the settings with the original tables we created.
I am trying to copy table from spanner to big query. I have created two dataflow. One which copies from spanner to text file and other one that imports text file into bigquery.
Table has a column which has JSON string as a value. Issue is seen when dataflow job runs while importing from text file to bigquery. Job throws below error :
INVALD JSON: :1:38 Expected eof but found, "602...
Is there anyway I can exclude this column while copying or any way I can copy JSON object as it is? I tried excluding this column in schema file but it did not help.
Thank you!
Looking at https://cloud.google.com/dataflow/docs/guides/templates/provided-batch#cloud-spanner-to-cloud-storage-text there are options on BigQuery import jobs that would allow to skip columns, neither Cloud Spanner options that would skip a column when extracting.
I think your best shot is to write a custom processor that will drop the column, similar to Cleaning data in CSV files using dataflow.
it's more complicated but you can also try DataPrep: http://cloud/dataprep/docs/html/Drop-Transform_57344635. It should be possible to run DataPrep jobs as a DataFlow template.
I created a permanent Big Query table that reads some csv files from a Cloud Storage Bucket sharing the same prefix name (filename*.csv) and the same schema.
There are some csvs anyway that make fail BigQuery queries with a message like the following one: "Error while reading table: xxxx.xxxx.xxx, error message: CSV table references column position 5, but line starting at position:10 contains only 2 columns.
Moving all the csvs one-by-one from the bucket I devised the one responsible for that.
This csv file doesn't have 10 lines...
I found this ticket BigQuery error when loading csv file from Google Cloud Storage, so I thought the issue was having an empty line at the end. But also others csvs in my bucket do, so this can't be the reason.
On the other hand this csv is the only one with content type text/csv; charset=utf-8, all the others being text/csv,application/vnd.ms-excel,application/octet-stream.
Furthermore downloading this csv to my local Windows machine and uploading it againt to Cloud Storage, content type is automatically converted to application/vnd.ms-excel.
Then even with the missing line Big Query can then query the permanent table based on filename*.csvs.
Is it possible that BigQuery had issues querying csvs with UTF-8 encoding, or is it just coincidence?
Use Google Cloud Dataprep to load your csv file. Once the file is loaded, analyze the data and clean it if requires.
Once all the rows are cleaned, you can then sink that data in BQ.
Dataprep is GUI based ETL tool and it runs a dataflow job internally.
Do let me know if any more clarification is required.
Just to remark the issue, the CSV file had gzip as encoding which was the reason that BigQuery doesn't interpret as a CSV file.
According to documentation BigQuery expects CSV data to be UTF-8 encoded:
"encoding": "UTF-8"
In addition, since this issue is relate to the metadata of the files in GCS you can edit the metadata directly from the Console.