elastic map reduce "keep alive" specification in the java api - elastic-map-reduce

How do I set the jobflow to "keep alive" in the java api like I do with command like like this:
elastic-mapreduce --create --alive ...
I have tried to add withKeepJobFlowAlivewhenNoSteps(true) but this still makes the jobflow shut down when a step fails (for example if I submit a bad jar)

You need to set withActionOnFailure to let the API know what to do when a step fails, and this has to be set on per step basis.
You must be having withActionOnFailure("TERMINATE_JOB_FLOW") for your StepConfigs.
Change them to withActionOnFailure("CANCEL_AND_WAIT").
Following is the full code to launch a cluster using Java API taken from here, just replacing the needful:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);
StepFactory stepFactory = new StepFactory();
StepConfig enabledebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("CANCEL_AND_WAIT") //here is the change
.withHadoopJarStep(stepFactory.newEnabledebuggingStep());
StepConfig installHive = new StepConfig()
.withName("Install Hive")
.withActionOnFailure("CANCEL_AND_WAIT") //here is the change
.withHadoopJarStep(stepFactory.newInstallHiveStep());
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("Hive Interactive")
.withSteps(enabledebugging, installHive)
.withLogUri("s3://myawsbucket/")
.withInstances(new JobFlowInstancesConfig()
.withEc2KeyName("keypair")
.withHadoopVersion("0.20")
.withInstanceCount(5)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m1.small")
.withSlaveInstanceType("m1.small"));
RunJobFlowResult result = emr.runJobFlow(request);

Related

Lettuce API getting into periodic TimeoutException Issues

We have 10 node (5 Masters, 5 Read Replicas) AWS Redis Cluster. We use Lettuce API. We are using Lettuce API non pool configuration and Async calls. Almost once every week we get into issue where we get continuous TimoutExceptions for few minutes. We are expecting this as a network issue, but networking team has found no issue with network. What can be possible solution?
private LettuceConfigWithoutPool(RedisClusterConfig pool) {
if (lettuceConfigWithoutPoolInstance != null) {
throw new RuntimeException("Use getInstance() method to get the single instance of this class.");
}
List<RedisURI> redisURIS = new RedisURIBuilder.Builder(pool)
.withPassword()
.withTLS()
.build();
ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = new ClusterTopologyBuilder.Builder(pool)
.setAdaptiveRefreshTriggersTimeoutInMinutes()
.build();
ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder()
.topologyRefreshOptions(clusterTopologyRefreshOptions)
.build();
RedisClusterClient redisClusterClient = ClusterClientProvider.buildClusterClient(redisURIS, clusterClientOptions);
StatefulRedisClusterConnection<String, Object> statefulRedisClusterConnection = redisClusterClient.connect(new SerializedObjectCodec());
statefulRedisClusterConnection.setReadFrom(ReadFromArgumentProvider.getReadFromArgument(pool.getReadFrom()));
this.command = statefulRedisClusterConnection.async();
}

How to submit a new step to a running EMR cluster in java sdk v2

I am trying to submit a HadoopJarStep to a running EMR cluster with the java sdk v2. From reading the api docs / examples I can't seem to figure out how to reference a running cluster instead of spinning up a new one.
Can anyone point me to the correct builder method to specify an existing cluster to submit to? The Scala code I have so far:
val emr = EmrClient
.builder()
.build()
val stepArgs = Seq("foo", "bar", "baz")
val jarStepConfig = HadoopJarStepConfig.builder()
.jar("s3://reveal-ci/deploy/emr/visit-etl.jar")
.args(stepArgs: _*)
.mainClass("com.revealmobile.visit.etl.Application")
.build()
val stepConfig = Seq(
StepConfig.builder()
.hadoopJarStep(jarStepConfig)
.build()
).asJavaCollection
val stepRequest = AddJobFlowStepsRequest.builder()
.steps(stepConfig)
.jobFlowId("JOB FLOW ID")
.build()
val result = Try(emr.addJobFlowSteps(stepRequest)) // I never specified which cluster?
result match {
case Success(_) => info("The step was added successfully")
case Failure(exception) =>
error(exception.getMessage)
throw (exception)
}
I eventually figured out that terminology is a bit different between CLI and SDK. cluster == job flow in this case so I needed to use this to point to the correct cluster:
val stepRequest = AddJobFlowStepsRequest.builder()
.steps(stepConfig)
.jobFlowId("JOB FLOW ID") //here
.build()

DynamoDB: getting table description null

I need to have a query on DynamoDB.
Currently I made so far this code:
AWSCredentials creds = new DefaultAWSCredentialsProviderChain().getCredentials();
AmazonDynamoDBClient client = new AmazonDynamoDBClient(creds);
client.withRegion(Regions.US_WEST_2);
DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient(creds));
Table table = dynamoDB.getTable("dev");
QuerySpec spec = new QuerySpec().withKeyConditionExpression("tableKey = :none.json");
ItemCollection<QueryOutcome> items = table.query(spec);
System.out.println(table);
The returned value of table is: {dev: null}, which means the that teh description is null.
It's important to say that while i'm using AWS CLI with this command: aws dynamodb list-tables i'm getting a result of all the tables so if i'm also making the same operation over my code dynamoDB.listTables() is retrieving empty list.
Is there something that I'm doing wrong?
Do I need to define some more credentials before using DDB API ?
I was getting the same problem and landed here looking for a solution. As mentioned in javadoc of getDesciption
Returns the table description; or null if the table description has
not yet been described via {#link #describe()}. No network call.
Initially description is set to null. After the first call to describe(), which makes a network call, description gets set and getDescription can be used after that.

How to poll in AWS SDK Java?

I am new to AWS sdk java. I am trying to write a code through which I want to control the instance and to get all AWS EC2 information.
I am able to start an instance and also stop it. But as you all must be aware that it takes some time to start an instance, so I want to wait there (don't want to use Thread.sleep) till it's up or when I'm stopping an instance it should wait there till I proceed to the next step.
Here's the code:
AmazonEC2 ec2 = = new AmazonEC2Client(credentialsProvider);
DescribeInstancesResult describeInstancesRequest = ec2.describeInstances();
List<Reservation> reservations = describeInstancesRequest.getReservations();
Set<Instance> instances = new HashSet<Instance>();
for (Reservation reservation : reservations) {
instances.addAll(reservation.getInstances());
}
for (Instance instance : instances) {
if ((instance.getInstanceId().equals("myimage"))) {
List<String> instancesToStart = new ArrayList<String>();
instancesToStart.add(instance.getInstanceId());
StartInstancesRequest startr = new StartInstancesRequest();
startr.setInstanceIds(instancesToStart);
ec2.startInstances(startr);
Thread.currentThread().sleep(60*1000);
}
if ((instat.getName()).equals("running")) {
List<String> instancesToStop = new ArrayList<String>();
instancesToStop.add(instance.getInstanceId());
StopInstancesRequest stoptr = new StopInstancesRequest();
stoptr.setInstanceIds(instancesToStop);
ec2.stopInstances(stoptr);
}
Also, I'd like to say that whenever I try to get the list of images it hangs in the below code.
DescribeImagesResult describeImagesResult = ec2.describeImages();
You can get an instance of the class "Instance" every time you want to see the updated status with the same "instance Id".
Instance instance = new Instance(<your instance id that you got previously from describe instances>);
To get the updated status with something like this:
InstanceStatus instat = instance.getStatus();
I think the key here is saving the "instance id" of the instance that you care about.
boto in Python has an nice method instance.update() that can be called on an instance and you can see its status but I can't find it in Java.
Hope this helps.

How to request "Snapshot Log" through AWS Java SDK?

Is it possible to request "Snapshot Logs" through AWS SDK somehow?
It's possible to do it through AWS console:
Cross posted to Amazon forum.
Requesting a log snapshot is a 3 step process. First you have to do an environment information request:
elasticBeanstalk.requestEnvironmentInfo(
new RequestEnvironmentInfoRequest()
.withEnvironmentName(envionmentName)
.withInfoType("tail"));
Then you have to retreive the environment information:
final List<EnvironmentInfoDescription> envInfos =
elasticBeanstalk.retrieveEnvironmentInfo(
new RetrieveEnvironmentInfoRequest()
.withEnvironmentName(environmentName)
.withInfoType("tail")).getEnvironmentInfo();
This returns a list of environment info descriptions, with the EC2 instance id and the URL to an S3 object that contains the log snapshot. You can then retreive the logs with:
DefaultHttpClient client = new DefaultHttpClient();
DefaultHttpRequestRetryHandler retryhandler =
new DefaultHttpRequestRetryHandler(3, true);
client.setHttpRequestRetryHandler(retryhandler);
for (EnvironmentInfoDescription environmentInfoDescription : envInfos) {
System.out.println(environmentInfoDescription.getEc2InstanceId());
HttpGet rq = new HttpGet(environmentInfoDescription.getMessage());
try {
HttpResponse response = client.execute(rq);
InputStream content = response.getEntity().getContent();
System.out.println(IOUtils.toString(content));
} catch ( Exception e ) {
System.out.println("Exception fetching " +
environmentInfoDescription.getMessage());
}
}
I hope this helps!