Java sdk for copying to Redshift - amazon-web-services

Is it possible to fire a copy command from S3 To Redshift through java jdbc connection?
Example:
copy test from 's3://' CREDENTIALS 'aws_access_key_id=xxxxxxx;aws_secret_access_key=xxxxxxxxx'

Yes try code as below
String dbURL = "jdbc:postgresql://x.y.us-east-1.redshift.amazonaws.com:5439/dev";
String MasterUsername = "userame";
String MasterUserPassword = "password";
Connection conn = null;
Statement stmt = null;
try{
//Dynamically load postgresql driver at runtime.
Class.forName("org.postgresql.Driver");
System.out.println("Connecting to database...");
Properties props = new Properties();
props.setProperty("user", MasterUsername);
props.setProperty("password", MasterUserPassword);
conn = DriverManager.getConnection(dbURL, props);
stmt = conn.createStatement();
String sql="copy test from 's3://' CREDENTIALS 'aws_access_key_id=xxxxxxx;aws_secret_access_key=xxxxxxxxx'"
int j = stmt.executeUpdate(sql);
stmt.close();
conn.close();
}catch(Exception ex){
//For convenience, handle all errors here.
ex.printStackTrace();
}

Sandesh's answer works perfectly fine, but it uses PostgreSql driver. AWS Provides Redshift driver, which is better than PostgreSql driver.
Rest of things would remain same. I hope this information may help others.
1)JDBC Driver will change from org.postgresql.Driver to com.amazon.redshift.jdbcXX.Driver, where XX is the version of Redshift driver. e.g. 42.
2)Jdbc url will change from postgreSQL to redshift.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.util.Properties;
public class RedShiftJDBC {
public static void main(String[] args) {
Connection conn = null;
Statement statement = null;
try {
//Make sure to choose appropriate Redshift Jdbc driver and its jar in classpath
Class.forName("com.amazon.redshift.jdbc42.Driver");
Properties props = new Properties();
props.setProperty("user", "username***");
props.setProperty("password", "password****");
System.out.println("\n\nconnecting to database...\n\n");
//In case you are using postgreSQL jdbc driver.
conn = DriverManager.getConnection("jdbc:redshift://********url-to-redshift.redshift.amazonaws.com:5439/example-database", props);
System.out.println("\n\nConnection made!\n\n");
statement = conn.createStatement();
String command = "COPY my_table from 's3://path/to/csv/example.csv' CREDENTIALS 'aws_access_key_id=******;aws_secret_access_key=********' CSV DELIMITER ',' ignoreheader 1";
System.out.println("\n\nExecuting...\n\n");
statement.executeUpdate(command);
//you must need to commit, if you realy want to have data copied.
conn.commit();
System.out.println("\n\nThats all copy using simple JDBC.\n\n");
statement.close();
conn.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

Related

Creating Internal Accounts in SAS Metadata Server by programm on SAS Base

I'm trying to create Internal Accounts programmaticaly by using proc metadata.
The code section below creates person with External Login.
put"<Person Name=%str(%')&&PersonName&i.%str(%')>";
put"<Logins>";
put"<Login Name=%str(%')Login.&&PersonName&i.%str(%') Password=%str(%')&&word&i.%str(%')/>";
put"</Logins>";
put"</Person>";
To create ExternalLogin we can set attribute Password, and in SAS Metadata it will be encrypted automaticaly.
But to create InternalLogin type of object it is necessary to make the hash value of the password and the salt. I know that the standard sas002 encryption method, but in the case of using proc pwencode how to obtain the value of salt?
Is it possible create InternalLogin by using SAS Base?
Thanx.
So on. I found an article that can tell us how to create Stored Process for this problem. My answer is addition to the article.
The approach is base on execute java methods from sas programm.
1. Prerare setPasswd.java class
I've modified class from article. Separate code to connect to metadata server and create InternalLogin
import java.rmi.RemoteException;
import com.sas.metadata.remote.AssociationList;
import com.sas.metadata.remote.CMetadata;
import com.sas.metadata.remote.Person;
import com.sas.metadata.remote.MdException;
import com.sas.metadata.remote.MdFactory;
import com.sas.metadata.remote.MdFactoryImpl;
import com.sas.metadata.remote.MdOMIUtil;
import com.sas.metadata.remote.MdOMRConnection;
import com.sas.metadata.remote.MdObjectStore;
import com.sas.metadata.remote.MetadataObjects;
import com.sas.metadata.remote.PrimaryType;
import com.sas.metadata.remote.Tree;
import com.sas.meta.SASOMI.ISecurity_1_1;
import com.sas.iom.SASIOMDefs.VariableArray2dOfStringHolder;
public class setPasswd {
String serverName = null;
String serverPort = null;
String serverUser = null;
String serverPass = null;
MdOMRConnection connection = null;
MdFactoryImpl _factory = null;
ISecurity_1_1 iSecurity = null;
MdObjectStore objectStore = null;
Person person = null;
public int connectToMetadata(String name, String port, String user, String pass){
try {
serverName = name;
serverPort = port;
serverUser = user;
serverPass = pass;
_factory = new MdFactoryImpl(false);
connection = _factory.getConnection();
connection.makeOMRConnection(serverName, serverPort, serverUser, serverPass);
iSecurity = connection.MakeISecurityConnection();
return 0;
}catch(Exception e){
return 1;
}
}
public setPasswd(){};
public int changePasswd(String IdentityName, String IdentityPassword) {
try
{
//
// This block obtains the person metadata ID that is needed to change the password
//
// Defines the GetIdentityInfo 'ReturnUnrestrictedSource' option.
final String[][] options ={{"ReturnUnrestrictedSource",""}};
// Defines a stringholder for the info output parameter.
VariableArray2dOfStringHolder info = new VariableArray2dOfStringHolder();
// Issues the GetInfo method for the provided iSecurity connection user.
iSecurity.GetInfo("GetIdentityInfo","Person:"+IdentityName, options, info);
String[][] returnArray = info.value;
String personMetaID = new String();
for (int i=0; i< returnArray.length; i++ )
{
System.out.println(returnArray[i][0] + "=" + returnArray[i][1]);
if (returnArray[i][0].compareTo("IdentityObjectID") == 0) {
personMetaID = returnArray[i][1];
}
}
objectStore = _factory.createObjectStore();
person = (Person) _factory.createComplexMetadataObject(objectStore, IdentityName, MetadataObjects.PERSON, personMetaID);
iSecurity.SetInternalPassword(IdentityName, IdentityPassword);
person.updateMetadataAll();
System.out.println("Password has been changed.");
return 0; // success
}
catch (MdException e)
{
Throwable t = e.getCause();
if (t != null)
{
String ErrorType = e.getSASMessageSeverity();
String ErrorMsg = e.getSASMessage();
if (ErrorType == null)
{
// If there is no SAS server message, write a Java/CORBA message.
}
else
{
// If there is a message from the server:
System.out.println(ErrorType + ": " + ErrorMsg);
}
if (t instanceof org.omg.CORBA.COMM_FAILURE)
{
// If there is an invalid port number or host name:
System.out.println(e.getLocalizedMessage());
}
else if (t instanceof org.omg.CORBA.NO_PERMISSION)
{
// If there is an invalid user ID or password:
System.out.println(e.getLocalizedMessage());
}
}
else
{
// If we cannot find a nested exception, get message and print.
System.out.println(e.getLocalizedMessage());
}
// If there is an error, print the entire stack trace.
e.printStackTrace();
}
catch (RemoteException e)
{
// Unknown exception.
e.printStackTrace();
}
catch (Exception e)
{
// Unknown exception.
e.printStackTrace();
}
System.out.println("Failure: Password has NOT been changed.");
return 1; // failure
}
}
2. Resolve depends
Pay attention to imports in class. To enable execute the code below necessary set CLASSPATH enironment variable.
On linux you can add the next command in %SASConfig%/Lev1/level_env_usermods.sh:
export CLASSPATH=$CLASSPATH:%pathToJar%
On Windows you can add/change environment variable by Advanced system settings
So where should you search jar files? They are in folder:
%SASHome%/SASVersionedJarRepository/eclipse/plugins/
Which files i should include in path?
I've include all that used in OMI(Open Metadata Interface).Also I've added log4j.jar (not working without this jar. Your promts will be helpful):
sas.oma.joma.jar
sas.oma.joma.rmt.jar
sas.oma.omi.jar
sas.svc.connection.jar
sas.core.jar
sas.entities.jar
sas.security.sspi.jar
log4j.jar
setPasswd.jar (YOUR JAR FROM THE NEXT STEP!)
Choose files from nearest release. Example:
Here I'm set file from v940m3f (fix release).
Other ways is here.
3. Compile setPasswd.jar
I'm tried use internal javac.exe into SAS, but it's not worked properly. So ou need to download JDK to compile jars. I've create Bat-file:
"C:\Program Files\Java\jdk1.8.0_121\bin\javac.exe" -source 1.7 -target 1.7 setPasswd.java
"C:\Program Files\Java\jdk1.8.0_121\bin\jar" -cf setPasswd.jar setPasswd.class
Paramethers -source and -target will helpful if your version of JDK is upper, that usses in SAS. Version of "sas"-java you can see by:
PROC javainfo all;
run;
Search the next string in log:
java.vm.specification.version = 1.7
4. Finally. SAS Base call
Now we can call Java code by this method (All methods available here):
data test;
dcl javaobj j ("setPasswd");
j.callIntMethod("connectToMetadata", "%SERVER%", "%PORT%", "%ADMIN%", "%{SAS002}HASHPASSORPASS%", rc1);
j.callIntMethod("changePasswd", "testPassLogin", "pass1", rc2);
j.delete();
run;
In log:
UserClass=Normal
AuthenticatedUserid=Unknown
IdentityName=testPass
IdentityType=Person
IdentityObjectID=A56RQPC2.AP00000I
Password has been changed.
Now time to test. Create new user with no passwords.
Execute code:
data test;
dcl javaobj j ("setPasswd");
j.callIntMethod("connectToMetadata", "&server.", "&port.", "&adm", "&pass", rc1);
j.callIntMethod("changePasswd", "TestUserForStack", "Overflow", rc2);
j.delete();
run;
Now our user has InternalLogin object.
Thanx.

Whats the Efficient way to call http request and read inputstream in spark MapTask

Please see the below code sample
JavaRDD<String> mapRDD = filteredRecords
.map(new Function<String, String>() {
#Override
public String call(String url) throws Exception {
BufferedReader in = null;
URL formatURL = new URL((url.replaceAll("\"", ""))
.trim());
try {
HttpURLConnection con = (HttpURLConnection) formatURL
.openConnection();
in = new BufferedReader(new InputStreamReader(con
.getInputStream()));
return in.readLine();
} finally {
if (in != null) {
in.close();
}
}
}
});
here url is http GET request. example
http://ip:port/cyb/test?event=movie&id=604568837&name=SID&timestamp_secs=1460494800&timestamp_millis=1461729600000&back_up_id=676700166
This piece of code is very slow . IP and port are random and load is distributed so ip can have 20 different value with port so I dont see bottleneck .
When I comment
in = new BufferedReader(new InputStreamReader(con
.getInputStream()));
return in.readLine();
The code is too fast.
NOTE: Input data to process is 10GB. Using spark to read from S3.
is there anything wrong I am doing with BufferedReader or InputStreamReader any alternative .
I cant use foreach in spark as I have to get the response back from server and need to save JAVARdd as textFile on HDFS.
if we use mappartition code something as below
JavaRDD<String> mapRDD = filteredRecords.mapPartitions(new FlatMapFunction<Iterator<String>, String>() {
#Override
public Iterable<String> call(Iterator<String> tuple) throws Exception {
final List<String> rddList = new ArrayList<String>();
Iterable<String> iterable = new Iterable<String>() {
#Override
public Iterator<String> iterator() {
return rddList.iterator();
}
};
while(tuple.hasNext()) {
URL formatURL = new URL((tuple.next().replaceAll("\"", ""))
.trim());
HttpURLConnection con = (HttpURLConnection) formatURL
.openConnection();
try(BufferedReader br = new BufferedReader(new InputStreamReader(con
.getInputStream()))) {
rddList.add(br.readLine());
} catch (IOException ex) {
return rddList;
}
}
return iterable;
}
});
here also for each record we are doing same .. isnt it ?
Currently you are using
map function
which creates a url request for each row in the partition.
You can use
mapPartition
Which will make the code run faster as it creates connection to the server only once , that is only one connection per partition.
A big cost here is setting up TCP/HTTPS connections. This is exacerbated by the fact that Even if you only read the first (short) line of a large file, in an attempt to re-use HTTP/1.1 connections better, modern HTTP clients try to read() to the end of the file, so avoiding aborting the connection. This is a good strategy for small files, but not for those in MB.
There is a solution there: set the content-length on the read, so that only a smaller block is read in, reducing the cost of the close(); the connection recycling then reduces HTTPS setup costs. This is what the latest Hadoop/Spark S3A client does if you set fadvise=random on the connection: requests blocks rather than the entire multi-GB file. Be aware though: that design is actually really bad if you are going byte-by-byte through a file...

Mapreduce and Hcatalog Integration fails to use MySql MetaStore

Environment: HDP 2.3 Sandbox
Problem: I have created a table in hive with just 2 columns. Now i want to read this in my MR code using HCatalog integration. The MR Job fails to read the table from the MySql meta-store. It uses the Derby for some reason and hence it fails with "table not found" message.
Job Client code:
public class HCatalogMRJob extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = getConf();
args = new GenericOptionsParser(conf, args).getRemainingArgs();
String inputTableName = args[0];
String outputTableName = args[1];
String dbName = null;
Job job = new Job(conf, "HCatalogMRJob");
HCatInputFormat.setInput(job, dbName, inputTableName);
job.setInputFormatClass(HCatInputFormat.class);
job.setJarByClass(HCatalogMRJob.class);
job.setMapperClass(HCatalogMapper.class);
job.setReducerClass(HCatalogReducer.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(WritableComparable.class);
job.setOutputValueClass(DefaultHCatRecord.class);
HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName, outputTableName, null));
HCatSchema s = HCatOutputFormat.getTableSchema(conf);
System.err.println("INFO: output schema explicitly set for writing:"
+ s);
HCatOutputFormat.setSchema(job, s);
job.setOutputFormatClass(HCatOutputFormat.class);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new HCatalogMRJob(), args);
System.exit(exitCode);
}
}
Job Run Command:
hadoop jar mr-hcat.jar input_table out_table
Before running this command, i have set the necessary hcatalog, hive jars in the class path using the hadoop_classpath variable.
Question:
Now, how do i make the job to use the hive-site.xml correctly?
I tried setting this in the classpath using the same hadoop_classpath as mentioned above., but still it fails.

org.apache.calcite.sql.validate.SqlValidatorException

I'm using Apache Calcite to parse a simple SQL statement and return its relational tree. I obtain a database schema using a JDBC connection to a simple SQLite database. The schema is then added using FrameworkConfig. The parser configuration is then modified to handle identifier quoting and case (not sensitive). However the SQL validator is unable to find the quoted table identifier in the SQL statement. Somehow the parser ignore the configuration settings and converts the table to UPPER CASE. A SqlValidatorException is raised, stating the the table name is not found. I suspect, the configuration is not being updated correctly? I have already validated that the table name is correctly included in the schema's meta-data.
public class ParseSQL {
public static void main(String[] args) {
try {
// register the JDBC driver
String sDriverName = "org.sqlite.JDBC";
Class.forName(sDriverName);
JsonObjectBuilder builder = Json.createObjectBuilder();
builder.add("jdbcDriver", "org.sqlite.JDBC")
.add("jdbcUrl",
"jdbc:sqlite://calcite/students.db")
.add("jdbcUser", "root")
.add("jdbcPassword", "root");
Map<String, JsonValue> JsonObject = builder.build();
//argument for JdbcSchema.Factory().create(....)
Map<String, Object> operand = new HashMap<String, Object>();
//explicitly extract JsonString(s) and load into operand map
for(String key : JsonObject.keySet()) {
JsonString value = (JsonString) JsonObject.get(key);
operand.put(key, value.getString());
}
final SchemaPlus rootSchema = Frameworks.createRootSchema(true);
Schema schema = new JdbcSchema.Factory().create(rootSchema, "students", operand);
rootSchema.add("students", schema);
//build a FrameworkConfig using defaults where values aren't required
Frameworks.ConfigBuilder configBuilder = Frameworks.newConfigBuilder();
//set defaultSchema
configBuilder.defaultSchema(rootSchema);
//build configuration
FrameworkConfig frameworkdConfig = configBuilder.build();
//use SQL parser config builder to ignore case of quoted identifier
SqlParser.configBuilder(frameworkdConfig.getParserConfig()).setQuotedCasing(Casing.UNCHANGED).build();
//use SQL parser config builder to set SQL case sensitive = false
SqlParser.configBuilder(frameworkdConfig.getParserConfig()).setCaseSensitive(false).build();
//get planner
Planner planner = Frameworks.getPlanner(frameworkdConfig);
//parse SQL statement
SqlNode sql_node = planner.parse("SELECT * FROM \"Students\" WHERE age > 15.0");
System.out.println("\n" + sql_node.toString());
//validate SQL
SqlNode sql_validated = planner.validate(sql_node);
//get associated relational expression
RelRoot relationalExpression = planner.rel(sql_validated);
relationalExpression.toString();
} catch (SqlParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (RelConversionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ValidationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} // end main
} // end class
***** ERROR MESSAGE ******
Jan 20, 2016 8:54:51 PM org.apache.calcite.sql.validate.SqlValidatorException
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'Students' not found
This is a case-sensitivity issue, similar to table not found with apache calcite. Because you enclosed the table name in quotes in your SQL statement, the validator is looking for a table called "Students", and the error message attests to this. If your table is called "Students", I am surprised that Calcite can't find it.
There is a problem with how you are using the SqlParser.ConfigBuilder. When you call build(), you are not using the SqlParser.Config object that it creates. If you passed that object to Frameworks.ConfigBuilder.parserConfig, I think you would get the behavior you want.

Connect and query database in Managed C++ using SqlConnection

I'm building a project in C++ in Visual Studio 2012 and I've started by writing some classes for database access. Using SQL Server data tools I've managed to create a SQL project in my solution.
Now, my question is: How can I use the types in System::Data::SqlClient namespace to connect to the database in my code? All the examples I get are using the database as reference.
Thanks in advance
If my answer helps someone, I have used the classes SqlDataReader and SqlCommand in order to select some data from db. Note that I'm fetching the ConnectionString from an App.Config I've created earlier (how it could be done).
SqlDataReader getSqlDataReader(String ^_sql)
{
SqlDataReader ^_sqlDataReader = nullptr;
SqlConnection ^_connection = gcnew SqlConnection();
ConnectionStringSettings ^connectionSettings = ConfigurationManager::ConnectionStrings["AppDefaultConnection"];
this->_connection->ConnectionString = connectionSettings->ConnectionString;
try {
this->_connection->Open();
}
catch (Exception ^_exception)
{
Console::WriteLine("Error : " + _exception->Message);
return nullptr;
}
try
{
SqlCommand ^_sqlCommand = gcnew SqlCommand(_sql,_connection);
_sqlDataReader = _sqlCommand->ExecuteReader();
}
catch(Exception ^_exception)
{
Console::WriteLine("Error : " + _exception->Message);
return nullptr;
}
return _sqlDataReader;
}
To proper build the SQL we should be aware of the class SqlParameter (example for C#) and avoid SQL injection attacks.
To use the getSqlDataReader function:
SqlDataReader ^reader = getSqlDataReader(yourParameterizedQueryString);
List<TypeToFetch>^ data = gcnew List<TypeToFetch^>();
if(reader != nullptr && reader->HasRows)
{
TypeToFetch^ typeToFetch = gcnew TypeToFetch();
while(reader->Read())
{
// example
TypeToFetch->id = (int) reader["Id"];
TypeToFetch->name = reader["Name"]->ToString();
data->Add(typeToFetch);
}
}
This question/answer can help on INSERT.