Cloud Dataflow executed successfully but not inserted data into Bigquery - google-cloud-platform

I have a CSV file which contain header and data. I want to insert file(data) into Bigquery. I have written code to read file header and used for table/column mapping. I made it as dynamic file import(In Bigquery I have created one static empty table).
My cloud dataflow got executed successfully but data was not inserted into my Bigquery table. I am not sure what would be the problem.
I ran below code in Eclipse:
package com.coe.cog;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.*;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.PCollection;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.api.services.bigquery.model.TableReference;
import com.google.api.services.bigquery.model.TableRow;
public class DemoPipeline_SameCodeProcess {
private static final Logger LOG = LoggerFactory.getLogger(DemoPipeline_SameCodeProcess.class);
//Get Project,dataset & table
public static TableReference getGCDSTableReference() {
TableReference ref = new TableReference();
ref.setProjectId("myownprojectbqs");
ref.setDatasetId("DS_Employee");
ref.setTableId("tLoadEmp");
return ref;
}
//split input file with header and data sepearately
static class TransformToTable extends DoFn<String, TableRow> {
#ProcessElement
public void processElement(ProcessContext c) throws IOException {
BufferedReader br = null;
String line = "";
String csvSplitBy = ",";
Integer incFlg = 0;
StringReader strdr = new StringReader(c.element().toString());
br = new BufferedReader(strdr);
line = br.readLine(); //Header as FirstLine
String[] colmnsHeader = line.split(csvSplitBy); //Only Header array
while ((line = br.readLine()) != null) {
// Content of the file excluding header
String[] colmnsList = line.split(csvSplitBy);
TableRow row = new TableRow();
for (int i = 0; i < colmnsList.length; i++) {
row.set(colmnsHeader[i], colmnsList[i]);
}
c.output(row);
}
}
}
public static void main(String[] args) {
MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
options.setTempLocation("gs://demo-bucket-data/temp");
Pipeline p = Pipeline.create(options);
PCollection<String> lines = p.apply("Read From Storage", TextIO.read().from("gs://demo-bucket-data/Demo/Test/MasterLoad_WithHeader.csv"));
PCollection<TableRow> rows = lines.apply("Transform To Table",ParDo.of(new TransformToTable()));
rows.apply("Write To Table",BigQueryIO.writeTableRows().to(getGCDSTableReference())
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER));
p.run();
}
}
Input File Format(MasterLoad_WithHeader.csv):-
ID,NAME,AGE,SEX
1,John,25,M
2,Smith,28,M
3,Josephine,22,F

Related

How to use a Generic Hadoop Cluster to make a Word Counter in AWS Elastic MapReduce EMR?

I am trying to use AWS EMR to make a word counter.
Currently what I have is WordCount.java code that will take my input text and do a map reduce on AWS EMR. I want to know if it is possible for the word count to only output specific words of a text file I stored in S3.
For example, I only want the words "the", "she", "he". I only want to output the total number of count of these 3 words instead of all the word count of my input text file.
WordCount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class Map
extends Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1); // type of output value
private Text word = new Text(); // type of output key
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString()); // line to string token
while (itr.hasMoreTokens()) {
word.set(itr.nextToken()); // set word as each input keyword
context.write(word, one); // create a pair <keyword, 1>
}
}
}
public static class Reduce
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0; // initialize the sum for each keyword
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result); // create a pair <keyword, number of occurences>
}
}
// Driver program
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); // get all args
if (otherArgs.length != 2) {
System.err.println("Usage: WordCount <in> <out>");
System.exit(2);
}
// create a job with name "wordcount"
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
// uncomment the following line to add the Combiner
job.setCombinerClass(Reduce.class);
// set output key type
job.setOutputKeyClass(Text.class);
// set output value type
job.setOutputValueClass(IntWritable.class);
//set the HDFS path of the input data
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
// set the HDFS path for the output
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
//Wait till job completion
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
This is what I intending to use to read my S3 text file for the words I desired to output. I have no idea how I can continue from here. How can I output only the desired words?
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.core.ResponseInputStream;
try {
Region region = Region.US_EAST_1;
S3Client s3 = S3Client.builder()
.region(region)
.build();
GetObjectRequest request = GetObjectRequest.builder()
.bucket(dictPath)
.key(DictFile)
.build();
ResponseInputStream<GetObjectResponse> s3objectResponse =
s3.getObject(request);
BufferedReader reader = new BufferedReader(new
InputStreamReader(s3objectResponse));
String line;
while ((line = reader.readLine()) != null) {
// System.out.println(line);
dict.add(line.toLowerCase());
}
reader.close();
s3.close();
} catch (Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}

Add a unit test for Flink SQL

I am using Flink v1.7.1. When I finished a Flink streaming job with tableSource, SQL and tableSink, I have no idea how to add a unit test for it.
I found a good example about how to testing flink sql with the help of user mailing list, here is a example.
package org.apache.flink.table.runtime.stream.sql;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.tuple.Tuple5;
import org.apache.flink.api.java.typeutils.RowTypeInfo;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.table.runtime.utils.JavaStreamTestData;
import org.apache.flink.table.runtime.utils.StreamITCase;
import org.apache.flink.test.util.AbstractTestBase;
import org.apache.flink.types.Row;
import org.junit.Test;
import java.util.ArrayList;
import java.util.List;
/**
* Integration tests for streaming SQL.
*/
public class JavaSqlITCase extends AbstractTestBase {
#Test
public void testRowRegisterRowWithNames() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
StreamITCase.clear();
List<Row> data = new ArrayList<>();
data.add(Row.of(1, 1L, "Hi"));
data.add(Row.of(2, 2L, "Hello"));
data.add(Row.of(3, 2L, "Hello world"));
TypeInformation<?>[] types = {
BasicTypeInfo.INT_TYPE_INFO,
BasicTypeInfo.LONG_TYPE_INFO,
BasicTypeInfo.STRING_TYPE_INFO};
String[] names = {"a", "b", "c"};
RowTypeInfo typeInfo = new RowTypeInfo(types, names);
DataStream<Row> ds = env.fromCollection(data).returns(typeInfo);
Table in = tableEnv.fromDataStream(ds, "a,b,c");
tableEnv.registerTable("MyTableRow", in);
String sqlQuery = "SELECT a,c FROM MyTableRow";
Table result = tableEnv.sqlQuery(sqlQuery);
DataStream<Row> resultSet = tableEnv.toAppendStream(result, Row.class);
resultSet.addSink(new StreamITCase.StringSink<Row>());
env.execute();
List<String> expected = new ArrayList<>();
expected.add("1,Hi");
expected.add("2,Hello");
expected.add("3,Hello world");
StreamITCase.compareWithList(expected);
}
}
the related code is here

HTTP GET request with proper content-type is not hitting the expected service method

I have 2 restful service method getCustomerJson and getCustomerXML in a class CustomerResource where i am using jersey API for Restful Webservices. All the parameters of the 2 methods are same except one produces xml and other produces json.
When i am using the a HTTP GET request with header Content-Type="application/json" it always invokes the getCustomerXML method which returns xml.
Can someone explain me how jersey works in this kind of situation ?
import java.net.URI;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger;
import javax.ws.rs.Consumes;
import javax.ws.rs.DELETE;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.PUT;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;
import javax.ws.rs.WebApplicationException;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.HttpHeaders;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;
import domain.Customer;
#Path("/customers")
public class CustomerResource {
private static Map<Integer, Customer> customerDB = new ConcurrentHashMap<Integer, Customer>();
private static AtomicInteger idCounter = new AtomicInteger();
// Constructor
public CustomerResource() {
}
#GET
#Produces(MediaType.TEXT_PLAIN)
public String sayHello() {
return "Hello Kundan !!!";
}
#GET
#Path("{id}")
#Produces("application/xml")
public Customer getCustomerXML(#PathParam("id") int id, #Context HttpHeaders header) {
final Customer customer = customerDB.get(id);
List<String> contentList = header.getRequestHeader("Content-Type");
List<String> languageList = header.getRequestHeader("Accept-Language");
List<String> compressionFormatList = header.getRequestHeader("Content-Type");
if (customer == null) {
throw new WebApplicationException(Response.Status.NOT_FOUND);
}
return customer;
}
#GET
#Path("{id}")
#Produces("application/json")
public Customer getCustomerJson(#PathParam("id") int id) {
final Customer customer = customerDB.get(id);
if (customer == null) {
throw new WebApplicationException(Response.Status.NOT_FOUND);
}
return customer;
}
#POST
#Consumes("application/xml")
public Response createCustomer(Customer customer) {
customer.setId(idCounter.incrementAndGet());
customerDB.put(customer.getId(), customer);
System.out.println("Created customer " + customer.getId());
return Response.created(URI.create("/customers/" + customer.getId())).build();
}
#PUT
#Path("{id}")
#Consumes("application/xml")
public void updateCustomer(#PathParam("id") int id, Customer customer) {
Customer current = customerDB.get(id);
if (current == null)
throw new WebApplicationException(Response.Status.NOT_FOUND);
current.setFirstName(customer.getFirstName());
current.setLastName(customer.getLastName());
current.setStreet(customer.getStreet());
current.setCity(customer.getCity());
current.setState(customer.getState());
current.setZip(customer.getZip());
current.setCountry(customer.getCountry());
}
#DELETE
#Path("{id}")
public void deleteCustomer(#PathParam("id") int id) {
customerDB.remove(id);
System.out.println("Deleted !");
}
}
Use Accept: application/json. Accept tells the server what type you want back. Content-Type if for the type of data you are sending to the server, like with a POST request.

Managing google cloud instances using jcloud api

I want to add and list all the instances/VM under my project in google cloud using Jcloud api. In this code, I am assuming a node to be an instance.
I have set all the variables as required and extracted the private key from the json file. The context build takes place successfully.
images = compute.listImages() => lists all the images provided by google.
nodes = compute.listNodes() =>should list nodes but instead give null pointer exception.
Output=>
No of images 246
Exception in thread "main" java.lang.NullPointerException: group
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at org.jclouds.compute.internal.FormatSharedNamesAndAppendUniqueStringToThoseWhichRepeat.checkGroup(FormatSharedNamesAndAppendUniqueStringToThoseWhichRepeat.java:124)
at org.jclouds.compute.internal.FormatSharedNamesAndAppendUniqueStringToThoseWhichRepeat.sharedNameForGroup(FormatSharedNamesAndAppendUniqueStringToThoseWhichRepeat.java:120)
at org.jclouds.googlecomputeengine.compute.functions.FirewallTagNamingConvention$Factory.get(FirewallTagNamingConvention.java:39)
at org.jclouds.googlecomputeengine.compute.functions.InstanceToNodeMetadata.apply(InstanceToNodeMetadata.java:68)
at org.jclouds.googlecomputeengine.compute.functions.InstanceToNodeMetadata.apply(InstanceToNodeMetadata.java:43)
at com.google.common.base.Functions$FunctionComposition.apply(Functions.java:211)
at com.google.common.collect.Iterators$8.transform(Iterators.java:794)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:646)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Iterators.addAll(Iterators.java:356)
at com.google.common.collect.Iterables.addAll(Iterables.java:350)
at com.google.common.collect.Sets.newLinkedHashSet(Sets.java:328)
at org.jclouds.compute.internal.BaseComputeService.listNodes(BaseComputeService.java:335)
at org.jclouds.examples.compute.basics.Example.main(Example.java:54)
package org.jclouds.examples.compute.basics;
import static com.google.common.base.Charsets.UTF_8;
import static org.jclouds.compute.config.ComputeServiceProperties.TIMEOUT_SCRIPT_COMPLETE;
import java.io.File;
import java.io.IOException;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import org.jclouds.ContextBuilder;
import org.jclouds.compute.ComputeService;
import org.jclouds.compute.ComputeServiceContext;
import org.jclouds.compute.domain.ComputeMetadata;
import org.jclouds.compute.domain.Image;
import org.jclouds.domain.Credentials;
import org.jclouds.enterprise.config.EnterpriseConfigurationModule;
import org.jclouds.googlecloud.GoogleCredentialsFromJson;
import org.jclouds.logging.slf4j.config.SLF4JLoggingModule;
import org.jclouds.sshj.config.SshjSshClientModule;
import com.google.common.base.Supplier;
import com.google.common.collect.ImmutableSet;
import com.google.common.io.Files;
import com.google.inject.Module;
public class Example {
public static void main(String[] args)
{
String provider = "google-compute-engine";
String identity = "***#developer.gserviceaccount.com";
String credential = "path to private key file ";
credential = getCredentialFromJsonKeyFile(credential);
Properties properties = new Properties();
long scriptTimeout = TimeUnit.MILLISECONDS.convert(20, TimeUnit.MINUTES);
properties.setProperty(TIMEOUT_SCRIPT_COMPLETE, scriptTimeout + "");
Iterable<Module> modules = ImmutableSet.<Module> of(
new SshjSshClientModule(),new SLF4JLoggingModule(),
new EnterpriseConfigurationModule());
ContextBuilder builder = ContextBuilder.newBuilder(provider)
.credentials(identity, credential)
.modules(modules)
.overrides(properties);
ComputeService compute=builder.buildView(ComputeServiceContext.class).getComputeService();
Set<? extends Image> images = compute.listImages();
System.out.printf(">> No of images %d%n", images.size());
Set<? extends ComputeMetadata> nodes = compute.listNodes();
System.out.printf(">> No of nodes/instances %d%n", nodes.size());
compute.getContext().close();
}
private static String getCredentialFromJsonKeyFile(String filename) {
try {
String fileContents = Files.toString(new File(filename), UTF_8);
Supplier<Credentials> credentialSupplier = new GoogleCredentialsFromJson(fileContents);
String credential = credentialSupplier.get().credential;
return credential;
} catch (IOException e) {
System.err.println("Exception reading private key from '%s': " + filename);
e.printStackTrace();
System.exit(1);
return null;
}
}
}

Loading an XSLT file from a JAR file loads the JAR file itself instead of the XSLT

I've an XSLT file on the classpath inside a JAR file. I've tried to load the XSLT file using an InputStream. After debugging, the InputStream contains the JAR file instead of the XSLT file.
String xslPath = "/com/japi/application/templates/foo.xslt";
InputStream is = getClass().getResourceAsStream(xslPath);
...
Source xslt = new StreamSource(is);
trans = factory.newTransformer(xsltSource); //Fatal error. Error parsing XSLT {0}
I have double checked that the path to the XSLT file is correct and a physical file is included in the JAR file. Any ideas?
Create a Custom Resolver to resolve from class path
set that to Transfromer
to test i had a jar file set in classpath in eclipse project
all code is here below
----- run example -------------
`
public class RunTransform {
public static void main(String[] args) {
// SimpleTransform.transform("xsl/SentAdapter.xsl", "C:/Amin/AllWorkspaces/ProtoTypes/XsltDemo/xml/acc.xml");
SimpleTransform.transform("xslt/ibanvalidation/accuity-ibanvalidationresponse.xsl", "C:/Amin/AllWorkspaces/ProtoTypes/XsltDemo/xml/acc.xml");
}
}
-----------Sample transfomring example ----------------
package com;
import java.io.File;
import java.io.FileOutputStream;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.URIResolver;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class SimpleTransform {
public static void transform(String xslName,String xmlName) {
try {
ResourceResolver resloader = new ResourceResolver();
TransformerFactory tFactory = TransformerFactory.newInstance();
tFactory.setURIResolver(resloader);
StreamSource xsltSRC = new StreamSource(resloader.resolve(xslName));
Transformer transformer = tFactory.newTransformer(xsltSRC);
StreamSource xmlSSRC = new StreamSource(xmlName);
System.out.println("Streamm sources created .....");
System.out.println("XSLT SET ....");
transformer.transform(xmlSSRC, new StreamResult(new FileOutputStream(new File("C:/Amin/AllWorkspaces/ProtoTypes/XsltDemo/xml/result.xml"))));
System.out.println("Finished transofrmation ..........");
System.out.println("************* The result is out in respoinse *************");
} catch (Throwable t) {
t.printStackTrace();
}
}
}
`
-----------Code for Custom resolver ---------------
`
package com;
import javax.xml.transform.URIResolver;
import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.stream.StreamSource;
import java.io.InputStream;
import java.net.URL;
import java.util.Enumeration;
import java.util.Iterator;
public class ResourceResolver implements URIResolver {
/* (non-Javadoc)
* #see javax.xml.transform.URIResolver#resolve(java.lang.String, java.lang.String)
*/
public Source resolve(String href, String base) throws TransformerException {
try {
InputStream is = ClassLoader.getSystemResourceAsStream(href);
return new StreamSource(is, href);
} // try
catch (Exception ex) {
throw new TransformerException(ex);
} // catch
} // resolve
/**
* #param href
* #return
* #throws TransformerException
*/
public InputStream resolve(String href) throws TransformerException {
try {
InputStream is = ClassLoader.getSystemResourceAsStream(href);
return is;
} // try
catch (Exception ex) {
throw new TransformerException(ex);
} // catch
}
} // ResourceResolver
`
Try this
String pathWithinJar = "com/example/xslt/dummy.xslt";
InputStream is = java.lang.ClassLoader.getSystemResourceAsStream(pathWithinJar);
Then you can us IOUtils (apache) or one of the suggestions here to convert the InputStream into a String or just use the javax.xml...StreamSource constructor that accepts an input stream.
public static void transform(InputStream xslFileStream, File xmlSource, File xmlResult)
throws TransformerException, IOException {
// unknown if the factory is thread safe, always create new instance
TransformerFactory factory = TransformerFactory.newInstance();
StreamSource xslStreamSource = new StreamSource(xslFileStream);
Transformer transformer = factory.newTransformer(xslStreamSource);
StreamSource sourceDocument = new StreamSource(xmlSource);
StreamResult resultDocument = new StreamResult(xmlResult);
transformer.transform(sourceDocument, resultDocument);
resultDocument.getOutputStream().flush();
resultDocument.getOutputStream().close();
}
InputStream contains jar file instead
xslt file
What makes you say that? Have you tried printing out the contents of the InputStream as text? In between creating the InputStream in and using it, are you doing something else with it(the ... part)?
If the path provided to the getResourceAsStream points to an XSLT and if is is not null after the call, is should contain the InputStream representing the XSLT resource. How about pasting the entire stack trace here?