I use the command exp parfile=/opt/p.par to export the database, the content of the p.par file is
userid=system/sys123
file=/opt/exp.dat
log=/opt/exp.log
compress=no
direct=y
full=y
rows=no
This is the error Log
About to export the entire database ...
. exporting tablespace definitions
. exporting profiles
. exporting user definitions
. exporting roles
. exporting resource costs
. exporting rollback segment definitions
. exporting database links
. exporting sequence numbers
. exporting directory aliases
. exporting context namespaces
. exporting foreign function library names
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions
. exporting system procedural objects and actions
. exporting pre-schema procedural objects and actions
. exporting cluster definitions
. about to export AUDSYS's tables via Direct Path ...
. about to export SYSTEM's tables via Direct Path ...
. . exporting table AQ$_KEY_SHARD_MAP
...
. about to export SYS$UMF's tables via Direct Path ...
. about to export MDDATA's tables via Direct Path ...
. exporting synonyms
. exporting views
. exporting referential integrity constraints
. exporting stored procedures
. exporting operators
. exporting indextypes
. exporting bitmap, functional and extensible indexes
. exporting posttables actions
. exporting triggers
. exporting materialized views
. exporting snapshot logs
. exporting job queues
. exporting refresh groups and children
. exporting dimensions
. exporting post-schema procedural objects and actions
EXP-00061: unable to find the outer table name of a nested table
EXP-00000: Export terminated unsuccessfully
I queried the meaning of EXP-00061 is
Error code: EXP-00061
Description: unable to find the outer table name of a nested table
Cause: While exporting a bitmap index or posttable action on an inner nested table, the name of the outer table could not be located, using the NTAB$ table.
Action: Verify the table is properly defined.
but this information is beyond my comprehension.
I tried to change full=y to owner=system, the export works fine. If it is a full database export, there will be an EXP-00061 error.
How can I avoid this problem please? (try not to use expdp)
Related
I have a file structure such as:
gs://BUCKET/Name/YYYY/MM/DD/Filename.csv
Every day my cloud functions are creating another path with another file innit corresponding to the date of the day (so for today's 5th of August) we would have gs://BUCKET/Name/2022/08/05/Filename.csv
I need to find a way to query this data to Big Query automatically so that if I want to query it for 'manual inspection' I can select for example data from all 3 months in one query doing CREATE TABLE with gs://BUCKET/Name/2022/{06,07,08}/*/*.csv
How can I replicate this? I know that BigQuery does not support more than 1 wildcard, but maybe there is a way to do so.
To query data inside GCS from Big Query you can use an external table.
Problem is this will fail because you cannot have a comma (,)
as part of the URI list
CREATE EXTERNAL TABLE `bigquerydevel201912.foobar`
OPTIONS (
format='CSV',
uris = ['gs://bucket/2022/{1,2,3}/data.csv']
)
You have to specify the 3 CSV file locations like this:
CREATE EXTERNAL TABLE `bigquerydevel201912.foobar`
OPTIONS (
format='CSV',
uris = [
'gs://inigo-test1/2022/1/data.csv',
'gs://inigo-test1/2022/2/data.csv']
'gs://inigo-test1/2022/3/data.csv']
)
Since you're using this sporadically, probably makes more sense to create a temporal external table.
se I found a solution that works at least for my use case, without using the external table.
During the creation of table in dataset in BigQuery use create table from: GCS and then when using URI pattern I used gs://BUCKET/Name/2022/* ; As long as filename is the same in each subfolder and schema is identical, then BQ will load everything and then you can perform date operations directly in BQ (I have a column with ingestion date)
I am using BIG QUERY EXPORT DATA statement to create files in cloud storage for an another team to extract for further reprocessing. I am using below statement, not pasting the select query as its huge.
EXPORT DATA OPTIONS(
uri='gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter='|') AS
SELECT
I see below files getting created in my cloud storage bucket
radhika_sharma_ibm#cloudshell:~ (whr-asia-datalake-nonprod)$ gsutil ls gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000000.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000001.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000002.csv
I cannot remove the suffix part as BIG QUERY creates it, but I am wondering if I can create files with DATE in the file name for the other team to identify what date it is created for??
That is like
Customer_Master_04022021_000000000000_.csv
I need to have a date in my file. Any help or inputs please?
Is there a work around or I will have to go with a data flow here that is using a data flow job to extract data from table in a file.
You can use the uri value as:
'gs://bucket/folder/your_filename-'||current_datetime()||'-*.csv'
Either Current_date() or current_datetime() can be used.
Thanks
I have read similar issue here but not able to understand if this is fixed.
Google bigquery export table to multiple files in Google Cloud storage and sometimes one single file
I am using below big query EXPORT DATA OPTIONS to export the data from 2 tables in a file. I have written select query for the same.
EXPORT DATA OPTIONS(
uri='gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_'||CURRENT_DATE()||'*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter='|') AS
SELECT
I have only 2 rows returning from my select query and I assume that only one file should be getting created in google cloud storage. Multiple files are created only when data is more than 1 GB. thats what I understand.
However, 3 files got created in cloud storage where 2 files just had the header record and the third file has 3 records(one header and 2 actual data record)
radhika_sharma_ibm#cloudshell:~ (whr-asia-datalake-nonprod)$ gsutil ls gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_2021-02-04000000000000.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_2021-02-04000000000001.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_2021-02-04000000000002.csv
Why empty files are getting created?
Can anyone please help? We don't want to create empty files. I believe only one file should be created when it is 1 GB. more than 1 GB, we should have multiple files but NOT empty.
You have to force all data to be loaded into one worker. In this way you will be exporting only one file (if <1Gb).
My workaround: add a select distinct * on top of the Select statement.
Under the hood, BigQuery utilizes multiple workers to read and process different sections of data and when we use wildcards, each worker would create a separate output file.
Currently BigQuery produces empty files even if no data is returned and thus we get multiple empty files. The Bigquery product team is aware of this issue and they are working to fix this, however there is no ETA which can be shared.
There is a public issue tracker that will be updated with periodic progress. You can STAR the issue to receive automatic updates and give it traction by referring to this link.
However for the time being I would like to provide a workaround as follows:
If you know that the output will be less than 1GB, you can specify a single URI to get a single output file. However, the EXPORT DATA statement doesn’t support Single URI.
You can use the bq extract command to export the BQ table.
bq --location=location extract \
--destination_format format \
--compression compression_type \
--field_delimiter delimiter \
--print_header=boolean \
project_id:dataset.table \
gs://bucket/filename.ext
In fact bq extract should not have the empty file issue like the EXPORT DATA statement even when you use Wildcard URI.
I faced the same empty files issue when using EXPORT DATA.
After doing a bit of R&D found the solution. Put LIMIT xxx in your SELECT SQL and it will do the trick.
You can find the count, and put that as LIMIT value.
SELECT ....
FROM ...
WHERE ...
LIMIT xxx
It turns out you need to enforce multiple files, wildcard syntax. Either a file for CSV or folder for other like AVRO.
The uri option must be a single-wildcard URI as described
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
Specifying a wildcard seems to start several workers to work on the extract, and as per the documentation, size of the exported files will vary.
Zero-length files is unusual but technically possible if the first worker is done before any other really get started. Hence why the wildcard is expected to be used only when you think your exported data will be larger than the 1 GB
I have just faced the same with Parquet but found out that bq CLI works, which should do for any format.
See (and star for traction) https://issuetracker.google.com/u/1/issues/181016197
Due to a change in the business I need to copy a whole BigQuery project from one account to another, also, the accounts are not related and is not possible to link it in any way.
Throughout the CLI I was able to export a table to Cloud Storage in a dataset. Also, list tables in a dataset looks possible so loop over it shouldn't be a problem.
But I can't find any suitable way to manage the datasets neither for exporting or creating in the new account so it left a lot of manual task.
I'm missing something? There is a way to export the whole project with all datasets or a manual task will be always required?
The data structure is not complex at all:
Project -> dataset -> table
-> table
-> ...
-> dataset -> table
-> table
-> ...
-> ...
You can't copy the whole project at once but you can try to automate the copy using a script in Python like this:
from google.cloud import bigquery
import os
source_project = "<your source project>"
new_project = "<your new project>"
#I suppose that you have access to the source project in your new project
client = bigquery.Client(project=source_project)
datasets = []
#List all the datasets in the source project and save it in a list
for i in client.list_datasets():
datasets.append(i.dataset_id)
#For all the datasets, build the commands and then execute them
for i in datasets:
create_command = "bq mk -d " + i
copy_command = "bq mk --transfer_config --project_id=" + new_project + " --data_source=cross_region_copy --target_dataset=" + i + " --display_name='My Dataset Copy' --params='{\"source_dataset_id\":\"" + i + "\",\"source_project_id\":\"" + source_project + "\",\"overwrite_destination_table\":\"true\"}'"
os.system(create_command)
os.system(copy_command)
You can use the Bigquery Data Transfer service for this. You can't copy all your project, but dataset per dataset. You can script this if you have a lot of dataset.
Be careful, you don't export from the source project to a target project, you import into the target project from the source project (I mean you have to define the transfert in the destination project)
To copy the dataset from one project to another project then you can use the below command to make the transfer job:
bq mk --transfer_config --project_id=[PROJECT_ID] --data_source=[DATA_SOURCE] --target_dataset=[DATASET] --display_name=[NAME] --params='[PARAMETERS]'
where PROJECT_ID : The destination project_ID
DATA_SOURCE : cross_region_copy
DATASET : Target dataset
NAME : Display name of your job.
PARAMETERS : Source project ID, Source Dataset ID and other parameteres can be defined( overwrite destination table etc.)
You can go through this link for detailed explanation.
I need to merge two databases for two different apps. How can add prefix to all Django tables to avoid any conflict?
For example, option should look like:
DB_PREFIX = 'my_prefix_'
You can use meta options for model,
class ModelHere():
class Meta:
db_table = "tablenamehere"
Edit
If you want to add prefix to all of your tables including auth_user, auth_group, etc. Then you are looking for something like django-table-prefix. Just install and add some settings to settings file and you are done.
Add 'table_prefix', to installed apps,
Set the table prefix as DB_PREFIX = 'nifty_prefix'
Then run syncdb and the output will be,
Creating tables ...
Creating table nifty_prefix_auth_permission
Creating table nifty_prefix_auth_group_permissions
Creating table nifty_prefix_auth_group
Creating table nifty_prefix_auth_user_groups
Creating table nifty_prefix_auth_user_user_permissions
Creating table nifty_prefix_auth_user
Creating table nifty_prefix_django_content_type
Creating table nifty_prefix_django_session
Creating table nifty_prefix_django_site
An alternative to prefixing all the names is to put one of the two DBs into a different schema
(multiple schemas can coexist in the same database, even if the objects have the same names) This will also take care of objects other than tables, such as indexes, views, functions, ...
So on one of the databases, just do
ALTER SCHEMA public RENAME TO myname;
After that, you can dump it (pg_dump -n myname to dump only one schema), and import it into the other database, without the chance of collisions.
You refer to tables or other objects in the new schema by myname.tablname or by setting the search_path (this can be done on a per-user basis eg via ALTER USER SET search_path = myschema, pg_catalog;)
Note: there may be a problem with frameworks and clients not being schema-aware, so you might need some additional tweaking. YMMV.
http://www.postgresql.org/docs/9.4/static/sql-alterschema.html