I want to get only the list of tables in a dataset. bq ls dataset command show the list of table name along with extra columns which are Type, Labels, Time, Partitioning and Clustered Fields.
How can I only show the tableId column?
bq ls <DATASET> | tail -n +3 | tr -s ' ' | cut -d' ' -f2
Working in Cloud Shell and locally on mac OS
Related
I'm trying to load S3 data which is in .csv format and the S3 Bucket has many files each with a different number of columns and different column sequence and when trying to use the copy command the data is getting stored in wrong columns.
Example:
File1
client_id | event_timestamp | event_name
aaa1 | 2020-08-21 | app_launch
bbb2 | 2020-10-11 | first_launch
File2
a_sales| event_timestamp | client_id | event_name
2039 | 2020-08-27 | ccc1 | app_used
3123 | 2020-03-15 | aaa2 | app_uninstalled
Desired OUTPUT:
a_sales | client_id | event_name | event_timestamp
2039 | ccc1 | app_used | 2020-08-27
3123 | aaa2 | app_uninstalled | 2020-03-15
| aaa1 | app_launch | 2020-08-21
| bbb2 | first_launch | 2020-10-11
I have tried the below SQL script which basically runs successfully but doesn't give the desired output can someone help me out with this issue.
COPY public.sample_table
FROM 's3://mybucket/file*'
iam_role 'arn:aws:iam::99999999999:role/RedShiftRole'
FILLRECORD DELIMITER ',' IGNOREHEADER 1;
You can COPY data from S3 bucket into corresponding structure mapping staging tables.
Then either you can move data into a combined table from these 2 tables with different columns, or you can create a view which reads data into a unified structure from these staging tables
So the COPY command does NOT align data to columns based on the text in the header row of the CSV file. You need to specify which columns of the table you want to populate from the CSV file in the same order as the data is specified in the csv file.
See: Copy-command
Since your two types of files have different column orders (and columns) you will need to have a different column list for each type.
I have around 108 tables in a dataset. I am trying to extract all those tables using the following bash script:
# get list of tables
tables=$(bq ls "$project:$dataset" | awk '{print $1}' | tail +3)
# extract into storage
for table in $tables
do
bq extract --destination_format "NEWLINE_DELIMITED_JSON" --compression "GZIP" "$project:$dataset.$table" "gs://$bucket/$dataset/$table.json.gz"
done
But it seems that bq ls only show around 50 tables at once and as a result I can not extract them to cloud storage.
Is there anyway I can access all of the 108 tables using the bq ls command?
I tried with CLI and This command worked for me:-
bq ls --max_results 1000 'project_id:dataset'
Note: --max_results number_based_on_Table_count
The default number of rows when listing tables that bq ls will display is 100. You can change this with the command line option --max_results or -n.
You can also set the default values for bq in $HOME/.bigqueryrc.
Adding flags to .bigqueryrc
This will take all the views and m/views for your dataset, and push them into a file.
Could add another loop to loop through all datasets too
#!/bin/bash
## THIS shell script will pull down every views SQL in a dataset into its own file
# Get the project ID and dataset name
DATASET=<YOUR_DATASET>
# for dataset in $(bq ls --format=json | jq -r '.[] | .dataset_id'); do
# Loop over each table in the dataset
for table in $(bq ls --max_results 1000 "$PROJECT_ID:$DATASET" | tail -n +3 | awk '{print $1}'); do
# Determine the table type and file extension
if bq show --format=prettyjson $DATASET.$table | jq -r '.type' | grep -q -E "MATERIALIZED_VIEW|VIEW"; then
file_extension=".bqsql"
# Output the table being processed
echo "Extracting schema for $DATASET.$table"
# Get the schema for the table
bq show --view --format=prettyjson $DATASET.$table | jq -r '.view.query' > "$DATASET-$table.$file_extension"
else
echo "Ignoring $table"
continue
fi
done
# done
I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep.
My first issue was that Dataprep would not correctly pick up DATE type columns, but converting them to TIMESTAMP works (thanks Elliot).
However, when using Dataprep and setting an output BigQuery table you only have 3 options for: Append, Truncate or Drop existing table. If the table is date partitioned and you use Truncate it will remove all existing data, not just data in that partition.
Is there another way to do this that I should be using? My alternative is using Dataprep to overwrite a table and then using Cloud Composer to run some SQL pushing this data into a date partitioned table. Ideally, I'd want to do this just with Dataprep but that doesn't seem possible right now.
BigQuery table schema:
Partition details:
The data I'm ingesting is simple. In one flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-08 | Josh1 |
| 2018-08-08 | Josh2 |
+------------+--------+
In the other flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-09 | Josh1 |
| 2018-08-09 | Josh2 |
+------------|--------+
It overwrites the data in both cases.
You ca create a partitioned table bases on DATE. Data written to a partitioned table is automatically delivered to the appropriate partition.
Data written to a partitioned table is automatically delivered to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.
Append the data to have the new data added to the partitions.
You can create the table using the bq command:
bq mk --table --expiration [INTEGER1] --schema [SCHEMA] --time_partitioning_field date
time_partitioning_field is what defines which field you will be using for the partitions.
I have a serverless project, I'm trying to export Dynamo DB tables into single csv, and then upload it to S3.
All npm modules i checked export single table. Is there a way to export multiple table data into one single csv?
To export as a CSV, adding onto #dixon1e post, use jq in the shell. With DynamoDb run:
aws dynamodb scan --table-name my-table --select ALL_ATTRIBUTES --page-size 500 --max-items 100000 --output json | jq -r '.Items' | jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ].S])[] | #csv' > export.my-table.csv
The AWS CLI can be used to download data from Dynamo DB:
aws dynamodb scan --table-name my-table --select ALL_ATTRIBUTES --page-size 500 --max-items 100000
The --page-size is important, there is a limit of 1M (megabyte) for every query result.
Hopefully this makes sense. I will probably just keep mulling this over, until i figure it. I have a table, that is formatted in such as way, that a specific date may have more than one record assigned. Each record is a plant, so the structure of that table looks like the pinkish table in the image below. However when using the Google chart API the data needs to be in the format in the blue table for a line chart. Which I have working.
I am looking to create a graph in the Google chart api similar to the excel graph, using the pink table. Where at one date e.g. 01/02/2003 there are three species recorded A,B,C with values 1,2,3. I thought possibly using a scatter but that didn't work either.
What ties these together is the CenterID all these records belong to XXX CenterID. Each record with its species has an SheetID that grouped them together for example SheetID = 23, all those species were recorded on the same date.
Looking for suggestions, whether google chart API or php amendments. My PHP is below (I will switch to json_encode eventually).
$sql = "SELECT * FROM userrecords";
$stmt = $conn->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll();
foreach ($data as $row)
{
$dateArray = explode('-', $row['eventdate']);
$year = $dateArray[0];
$month= $dateArray[1] - 1;
$day= $dateArray[2];
$dataArray[] = "[new Date ($year, $month, $day), {$row['scientificname']}, {$row['category_of_taxom']}]";
To get that chart, where the dates are the series instead of the axis values, you need to change the way you are pulling your data. Assuming your database is structured like the pink table, you need to pivot the data on the date column instead of the species column to create a structure like this:
| Species | 01/02/2003 | 01/03/2003 | 01/04/2003 |
|---------|------------|------------|------------|
| A | 1 | 2 | 3 |
| B | 3 | 1 | 4 |
| C | 1 | 3 | 5 |