Dataflow batch template provided by google does not work

Dataflow batch template provided by google does not work - google-cloud-platform

I want to run the sample in [1].
However, when I do this, I get the following error:
org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"int","logicalType":"date"}]: 1990-01-01 (field=birthday)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish (DirectRunner.java:353)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish (DirectRunner.java:321)
at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:216)
at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run (Pipeline.java:317)
at org.apache.beam.sdk.Pipeline.run (Pipeline.java:303)
at org.apache.beam.examples.Test.run (Test.java:299)
at org.apache.beam.examples.Test.main (Test.java:232)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:834)
Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"int","logicalType":"date"}]: 1990-01-01 (field=birthday)
at org.apache.avro.generic.GenericDatumWriter.writeField (GenericDatumWriter.java:223)
at org.apache.avro.generic.GenericDatumWriter.writeRecord (GenericDatumWriter.java:210)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion (GenericDatumWriter.java:131)
at org.apache.avro.generic.GenericDatumWriter.write (GenericDatumWriter.java:83)
at org.apache.avro.generic.GenericDatumWriter.write (GenericDatumWriter.java:73)
at org.apache.beam.sdk.coders.AvroCoder.encode (AvroCoder.java:317)
at org.apache.beam.sdk.coders.Coder.encode (Coder.java:136)
at org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream (CoderUtils.java:82)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray (CoderUtils.java:66)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray (CoderUtils.java:51)
at org.apache.beam.sdk.util.CoderUtils.clone (CoderUtils.java:141)
at org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.<init> (MutationDetectors.java:115)
at org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder (MutationDetectors.java:46)
at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add (ImmutabilityCheckingBundleFactory.java:112)
at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output (ParDoEvaluator.java:301)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue (SimpleDoFnRunner.java:267)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.access$900 (SimpleDoFnRunner.java:79)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output (SimpleDoFnRunner.java:413)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output (SimpleDoFnRunner.java:401)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead$3.processElement (BigQueryIO.java:1139)
For reference, the version of avro is 1.10.1
Is there any solution?
[1]https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/v2/bigquery-to-parquet/src/main/java/com/google/cloud/teleport/v2/templates/BigQueryToParquet.java

This looks like a bug.
The Avro schema is retrieved from BigQuery and it indicates that the field birthday should be a nullable integer that represents a date.
The actual data when written is a string or date (cannot tell from the output).
I browsed Avro's code and it seems that exception is only thrown when writing data, not reading. So it appears this happening during the write to Parquet, perhaps because the GenericRecord loaded from the Avro files has converted the birthday field so it is no longer an integer.
You may be able to work around this by avoiding the DATE type.

Related

Netowrkx len() command giving a 'str' object error

To start, I created a Network using 2014 flight data as explained here:https://ipython-books.github.io/142-drawing-flight-routes-with-networkx/
I'm trying to create smaller graphs using airline specific flight paths to test connectivity within each business, starting with American Airlines, I did the following:
AASys = routes[
routes['source'].isin(airports_us.index) &
routes['dest'].isin(airports_us.index) &
routes['airline'].str.contains('AA')]
AASys
AAedges = AASys[['source', 'dest']].values
AAedges
AAg = nx.from_edgelist(AAedges)
len(AAg.nodes()), len(AAg.edges())
This mirrors the tutorial input almost exactly, with the only change being the additional str.contains('AA'). But when I try to test the len() of my new database, I get -- TypeError: 'str' object is not callable.
Can anyone explain why my edgelist is reading as a string? How to I fix the TypeError?

Can we change location from US to other region while reading data from Bigquery using Bigquery java library?

I am trying to read data from Bigquery using Bigquery java library.
My dataset is not in US location, so when i am giving my dataset name to library , it is throwing an error that dataset not found in US location because it searches by default in US location.
I have also tried giving the location using setLocation("asia-southeast1") but still it is finding in US location.
This is my code snippet:
val bigquery: BigQuery =BigQueryOptions.newBuilder().setLocation("asia-southeast1").build().getService
val query = "SELECT TO_JSON_STRING(t, true) AS json_row FROM "+dbName+"."+tableName+" AS t"
logger.info("Query is " + query)
val queryResult: QueryJobConfiguration = QueryJobConfiguration.newBuilder(query).build
val result: TableResult = bigquery.query(queryResult)
I am writing code in SCALA. As it uses same libraries as JAVA and JAVA is more popular, thats why I am asking this for JAVA.
Please help me to know that how I can change location from US to southeast.
Can I change something inside QueryJobConfiguration as i have searched a-lot but i am unable to find anything.
My only requirement is that I want final result as TableResult.
This is the exception being thrown
com.google.cloud.bigquery.BigQueryException: Not found: Dataset XXXXXXXX was not found in location US
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:106)
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:584)
at com.google.cloud.bigquery.BigQueryImpl$34.call(BigQueryImpl.java:1203)
at com.google.cloud.bigquery.BigQueryImpl$34.call(BigQueryImpl.java:1198)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:1197)
at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:1181)
at com.google.cloud.bigquery.Job$1.call(Job.java:329)
at com.google.cloud.bigquery.Job$1.call(Job.java:326)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.poll(RetryHelper.java:64)
at com.google.cloud.bigquery.Job.waitForQueryResults(Job.java:325)
at com.google.cloud.bigquery.Job.getQueryResults(Job.java:291)
at com.google.cloud.bigquery.BigQueryImpl.query(BigQueryImpl.java:1168)
...
Thanks in advance.

You shouldn't actually need to specify the location because BigQuery will infer it from the dataset being referenced in your query. See here.
When loading data, querying data, or exporting data, BigQuery
determines the location to run the job based on the datasets
referenced in the request. For example, if a query references a table
in a dataset stored in the asia-northeast1 region, the query job will
run in that region.
I just tested using the Java SDK on a dataset/table I created in asia-southeast1, and it worked without needing to explicitly specify the location.
If it's still not working for you by default (check the table you're referncing actually exists), then you can specify the location by setting it in the JobId and passing that to the overloaded method:
String query = "SELECT * FROM `grey-sort-challenge.asia_southeast1.a_table`;";
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query)
.setUseLegacySql(Boolean.FALSE)
.build();
JobId id = JobId.newBuilder().setLocation("asia-southeast1")
.setRandomJob()
.build();
try {
for (FieldValueList row : BIGQUERY.query(queryConfig, id).iterateAll()) {
for (FieldValue val : row) {
System.out.printf("%s,", val.toString());
}
System.out.printf("\n");
}
} catch (InterruptedException e) {
e.printStackTrace();
}

Mismatched entity and/or field definitions following database update

I have a Drupal site that I'm trying to update to the latest version 8.6.8. The problem is when I run the command 'drush updatedb', I see this error in the console;
Failed: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1-0-0-en' for key 'PRIMARY': INSERT INTO {taxonomy_term__parent} (bundle, entity_id, revision_id, langcode, [error]
delta, parent_target_id) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4,
:db_insert_placeholder_5), (:db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9, :db_insert_placeholder_10, :db_insert_placeholder_11),
(:db_insert_placeholder_12, :db_insert_placeholder_13, :db_insert_placeholder_14, :db_insert_placeholder_15, :db_insert_placeholder_16, :db_insert_placeholder_17),
(:db_insert_placeholder_18, :db_insert_placeholder_19, :db_insert_placeholder_20, :db_insert_placeholder_21, :db_insert_placeholder_22, :db_insert_placeholder_23),
(:db_insert_placeholder_24, :db_insert_placeholder_25, :db_insert_placeholder_26, :db_insert_placeholder_27, :db_insert_placeholder_28, :db_insert_placeholder_29),
(:db_insert_placeholder_30, :db_insert_placeholder_31, :db_insert_placeholder_32, :db_insert_placeholder_33, :db_insert_placeholder_34, :db_insert_placeholder_35),
(:db_insert_placeholder_36, :db_insert_placeholder_37, :db_insert_placeholder_38, :db_insert_placeholder_39, :db_insert_placeholder_40, :db_insert_placeholder_41),
(:db_insert_placeholder_42, :db_insert_placeholder_43, :db_insert_placeholder_44, :db_insert_placeholder_45, :db_insert_placeholder_46, :db_insert_placeholder_47),
(:db_insert_placeholder_48, :db_insert_placeholder_49, :db_insert_placeholder_50, :db_insert_placeholder_51, :db_insert_placeholder_52, :db_insert_placeholder_53),
(:db_insert_placeholder_54, :db_insert_placeholder_55, :db_insert_placeholder_56, :db_insert_placeholder_57, :db_insert_placeholder_58, :db_insert_placeholder_59),
(:db_insert_placeholder_60, :db_insert_placeholder_61, :db_insert_placeholder_62, :db_insert_placeholder_63, :db_insert_placeholder_64, :db_insert_placeholder_65),
(:db_insert_placeholder_66, :db_insert_placeholder_67, :db_insert_placeholder_68, :db_insert_placeholder_69, :db_insert_placeholder_70, :db_insert_placeholder_71),
(:db_insert_placeholder_72, :db_insert_placeholder_73, :db_insert_placeholder_74, :db_insert_placeholder_75, :db_insert_placeholder_76, :db_insert_placeholder_77),
(:db_insert_placeholder_78, :db_insert_placeholder_79, :db_insert_placeholder_80, :db_insert_placeholder_81, :db_insert_placeholder_82, :db_insert_placeholder_83),
(:db_insert_placeholder_84, :db_insert_placeholder_85, :db_insert_placeholder_86, :db_insert_placeholder_87, :db_insert_placeholder_88, :db_insert_placeholder_89),
(:db_insert_placeholder_90, :db_insert_placeholder_91, :db_insert_placeholder_92, :db_insert_placeholder_93, :db_insert_placeholder_94, :db_insert_placeholder_95),
(:db_insert_placeholder_96, :db_insert_placeholder_97, :db_insert_placeholder_98, :db_insert_placeholder_99, :db_insert_placeholder_100, :db_insert_placeholder_101)
As a result of this, when I look at the Status Report in Drupal, it shows that there are still database updates outstanding;
Mismatched entity and/or field definitions
The following changes were detected in the entity type and field definitions.
Taxonomy term
The Taxonomy term entity type needs to be updated.
The Published field needs to be installed.
The issue is, I have no idea why this is happening as it's a site I have only recently taken control of from another developer.

Truncating the table 'taxonomy_term__parent' fixed this.

In order to apply Pending Schema Updates.
Try "drush updb" and then
Try running "drush entity-updates".

Search Informatica for text in SQL override

Is there a way to search all the mappings, sessions, etc. in Informatica for a text string contained within a SQL override?
For example, suppose I know a certain stored procedure (SP_FOO) is being called somewhere in an INFA process, but I don't know where exactly. Somewhere I think there is a Post SQL on a source or target calling it. Could I search all the sessions for Post SQL containing SP_FOO ? (Similar to what I could do with grep with source code.)

You can use Repository queries for querying REPO tables(if you have enough access) to get data related with all the mappings,transformations,sessions etc.
Please use the below link to get almost all kind of repo queries.Ur answers can be find in the below link.
https://uisapp2.iu.edu/confluence-prd/display/EDW/Querying+PowerCenter+data
select *--distinct sbj.SUBJECT_AREA,m.PARENT_MAPPING_NAME
from REP_SUBJECT sbj,REP_ALL_MAPPINGS m,REP_WIDGET_INST w,REP_WIDGET_ATTR wa
where sbj.SUBJECT_ID = m.SUBJECT_ID AND
m.MAPPING_ID = w.MAPPING_ID AND
w.WIDGET_ID = wa.WIDGET_ID
and sbj.SUBJECT_AREA in ('TLR','PPM_PNLST_WEB','PPM_CURRENCY','OLA','ODS','MMS','IT_METRIC','E_CONSENT','EDW','EDD','EDC','ABS')
and (UPPER(ATTR_VALUE) like '%PSA_CONTACT_EVENT%'
-- or UPPER(ATTR_VALUE) like '%PSA_MEMBER_CHARACTERISTIC%'
-- or UPPER(ATTR_VALUE) like '%PSA_REPORTING_HH_CHRSTC%'
-- or UPPER(ATTR_VALUE) like '%PSA_REPORTING_MEMBER_CHRSTC%'
)
--and m.PARENT_MAPPING_NAME like '%ARM%'
order by 1
Please let me know if you have any issues.

Another less scientific way to do this is to export the workflow(s) as XML and use a text editor to search through them for the stored procedure name.

If you have read access to the schema where the informatica repository resides, try this.
SELECT DISTINCT f.subj_name folder, e.mapping_name, object_type_name,
b.instance_name, a.attr_value
FROM opb_widget_attr a,
opb_widget_inst b,
opb_object_type c,
opb_attr d,
opb_mapping e,
opb_subject f
WHERE a.widget_id = b.widget_id
AND b.widget_type = c.object_type_id
AND ( object_type_name = 'Source Qualifier'
OR object_type_name LIKE '%Lookup%'
)
AND a.widget_id = b.widget_id
AND a.attr_id = d.attr_id
AND c.object_type_id = d.object_type_id
AND attr_name IN ('Sql Query')--, 'Lookup Sql Override')
AND b.mapping_id = e.mapping_id
AND e.subject_id = f.subj_id
AND a.attr_value is not null
--AND UPPER (a.attr_value) LIKE UPPER ('%currency%')

Yes. There is a small java based tool called Informatica Meta Query.
Using that tool, you can search for any information that is present in the Informatica meta data tables.
If you cannot find that tool, you can write queries directly in the Informatica Meta data tables to get the required information.

Adding few more lines to solution provided by Data Origin and Sandeep.
It is highly advised not to query repository tables directly. Rather, you can create synonyms or views and then query those objects to avoid any damage to rep tables.
In our dev/ prod environment application programmers are not granted any direct access to repo. tables.

As querying the Informatica database isn't the best idea, I would suggest you to export all the workflows in your folder into xml using Repository Manager. From Rep Mgr you can select all of them once and export them at once. Then write a java program to search the pattern from the xml's you have.
I have written a sample prog here, please modify it as per your requirement:
make a spec file with workflow names(specFileName).
main()
{
try {
File inFile = new File(specFileName);
BufferedReader reader = new BufferedReader(newFileReader(infile));
String tectToSearch = '<YourString>';
String currentLine;
while((currentLine = reader.readLine()) != null)
{
//trim newline when comparing with String
String trimmedLine = currentLine.trim();
if(currentline has the string pattern)
{
SOP(specFileName); //specfile name
}
}
reader.close();
}
catch(IOException ex)
{
System.out.println("Error reading to file '" + specFileName +"'");
}
}

Missing descriptor exception while using Eclipselink Expressionbuilder

I have an entity Person with fields String name and String designation. When I tried to query using Eclipselink ExpressionBuilder, as:
Project project=new Project();
Login login=new DatabaseLogin();
login.setUserName("root");
login.setPassword("root");
project.setLogin(login);
DatabaseSession session=project.createDatabaseSession();
ExpressionBuilder expBuilder=new ExpressionBuilder();
Expression expression=expBuilder.get("name").equalsIgnoreCase("SomeName");
Vector readAllObjects = session.readAllObjects(Person.class, expression);
On executing last statement the following exception is thrown :
Exception [EclipseLink-6007] (Eclipse Persistence Services - 2.3.2.v20111125-r10461): org.eclipse.persistence.exceptions.QueryException
Exception Description: Missing descriptor for [class com.mycompany.entity.Person].
Query: ReadAllQuery(referenceClass=Person)
What might be the reason? Thanks in advance...

At last I got it. Instead of DatabaseSession, org.eclipse.persistence.sessions.server.ClientSession had to be used.
JpaEntityManager jpaEntityManager=em.unwrap(JpaEntityManager.class);
ClientSession session=jpaEntityManager.getServerSession().acquireClientSession();
ExpressionBuilder expBuilder=new ExpressionBuilder();
Expression expression=expBuilder.get("name").equalsIgnoreCase("SomeName");
Vector readAllObjects = session.readAllObjects(Person.class, expression);
em is the EntityManager.
This solved the issue.

You did not add a descriptor for Person. You need to map it to be able to query it.
Also consider using JPA.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Dataflow batch template provided by google does not work - google-cloud-platform

Related

Netowrkx len() command giving a 'str' object error

Can we change location from US to other region while reading data from Bigquery using Bigquery java library?

Mismatched entity and/or field definitions following database update

Search Informatica for text in SQL override

Missing descriptor exception while using Eclipselink Expressionbuilder

Categories

Resources