Unit testing AmazonS3ClientBuilder - unit-testing

How do we unit test this line of code
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).build();

Related

File Download from Amazon S3 via REST (Jax-RS)

I am trying to download a file from Amazon S3.
I want the user to visit my app via a GET api.
The app in turn gets the content from S3 and give it back to the user as a downloadable file.
Note:
I dont want to store the file locally in my server, i want it to be streamed form amazon s3 directly to the end user
I tried with a file of around 300 MB, if I run it locally like below
the memory footprint is low, i.e. when the same file is present locally
#GET
#Path("/pdfdownload")
#Produces("application/pdf")
public Response getFile() {
File file = new File('/pathToFile'); // in local
ResponseBuilder response = Response.ok((Object) file);
response.header("Content-Disposition", "attachment; filename=file.pdf");
return response.build();
}
But when I download the same from Amazon s3, my tomcat server's memory quickly raises to around 600 MB, I think I am streaming the content, but when i look at the memory used i doubt it
Am i missing something ?
#GET
#Path("/pdfdownload")
#Produces("application/pdf")
public Response getFile2() {
final S3Object s3Object = getAmazonS3Object();// AWS S3
final S3ObjectInputStream s3is = s3Object.getObjectContent();
final StreamingOutput stream = new StreamingOutput() {
#Override
public void write(OutputStream os) throws IOException, WebApplicationException {
byte[] read_buf = new byte[1024];
int read_len = 0;
while ((read_len = s3is.read(read_buf)) > 0) {
os.write(read_buf, 0, read_len);
}
os.close();
s3is.close();
}
};
ResponseBuilder response = Response.ok(stream);
response.header("Content-Disposition", "attachment; filename=file.pdf");
return response.build();
}
private S3Object getAmazonS3Object() {
AWSCredentials credentials = new BasicAWSCredentials("accesskey",
"secretkey");
try {
AmazonS3 s3 = new AmazonS3Client(credentials);
S3Object s3object = s3.getObject(new GetObjectRequest("bucketName", "filename_WithExtension"));
return s3object;
} catch (AmazonServiceException e) {
System.err.println(e.getErrorMessage());
System.exit(1);
}
System.out.println("Done!");
return null;
}
Pom :
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
<version>1.8</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.542</version>
</dependency>
Similar to this S3 download pdf - REST API
I dont want to use PreSignedURl: https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURLJavaSDK.html
Please see this article on streaming: https://memorynotfound.com/low-level-streaming-with-jax-rs-streamingoutput/
Could some please help as why the memory spikes up?
Thanks to all the post on stackoverflow and one of my colleague.
My colleague found the answer, actually the above code doesnt have a memory issue, when I was monitoring the jvm i saw a spike, but didnt realize garbage collection didnt kick in.
I tried downloading 6 files each fo 300 MB +, the server holds its ground

Scala : Writing Data to S3 bucket

I am trying to write the data to S3 bucket but i am getting below errors.
SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/11/18 23:32:14 ERROR Utils: Aborting task
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/11/18 23:32:14 WARN FileOutputCommitter: Could not delete s3a://Accesskey:SecretKey#test-bucket/Output/Check1Result/_temporary/0/_temporary/attempt_20181118233210_0004_m_000000_0
18/11/18 23:32:14 ERROR FileFormatWriter: Job job_20181118233210_0004 aborted.
18/11/18 23:32:14 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 209)
org.apache.spark.SparkException: Task failed while writing rows.
I have tried below code and am able to write the data to local file system. But when i am trying to write the data to S3 bucket then i am getting above errors.
My code :
package Spark_package
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object dataload {
def main(args: Array[String]) {
val spark = SparkSession.builder.master("local[*]").appName("dataload").config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2").getOrCreate()
val sc = spark.sparkContext
sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version", "2")
val conf = new SparkConf().setAppName("dataload").setMaster("local[*]").set("spark.speculation","false")
val sqlContext = spark.sqlContext
val data = "C:\\docs\\Input_Market.csv"
val ddf = spark.read.format("csv").option("inferSchema","true").option("header","true").option("delimiter",",").load(data)
ddf.createOrReplaceTempView("data")
val res = spark.sql("select count(*),cust_id,sum_cnt from data group by cust_id,sum_cnt")
res.write.option("header","true").format("csv").save("s3a://Accesskey:SecretKey#test-bucket/Output/Check1Result1")
spark.stop()
}
}

aws boto3 client Stubber help stubbing unit tests

I'm trying to write some unit tests for aws RDS. Currently, the start stop rds api calls have not yet been implemented in moto. I tried just mocking out boto3 but ran into all sorts of weird issues. I did some googling and found http://botocore.readthedocs.io/en/latest/reference/stubber.html
So I have tried to implement the example for rds but the code appears to be behaving like the normal client, even though I have stubbed it. Not sure what's going on or if I am stubbing correctly?
from LambdaRdsStartStop.lambda_function import lambda_handler
from LambdaRdsStartStop.lambda_function import AWS_REGION
def tests_turn_db_on_when_cw_event_matches_tag_value(self, mock_boto):
client = boto3.client('rds', AWS_REGION)
stubber = Stubber(client)
response = {u'DBInstances': [some copy pasted real data here], extra_info_about_call: extra_info}
stubber.add_response('describe_db_instances', response, {})
with stubber:
r = client.describe_db_instances()
lambda_handler({u'AutoStart': u'10:00:00+10:00/mon'}, 'context')
so the mocking WORKS for the first line inside the stubber and the value of r is returned as my stubbed data. When I try and go into my lambda_handler method inside my lambda_function.py and still use the stubbed client it behaves like a normal unstubbed client:
lambda_function.py
def lambda_handler(event, context):
rds_client = boto3.client('rds', region_name=AWS_REGION)
rds_instances = rds_client.describe_db_instances()
error output:
File "D:\dev\projects\virtual_envs\rds_sloth\lib\site-packages\botocore\auth.py", line 340, in add_auth
raise NoCredentialsError
NoCredentialsError: Unable to locate credentials
You will need to patch boto3 where it is called in the routine that you will be testing. Also Stubber responses appear to be consumed on each call and thus will require another add_response for each stubbed call as below:
def tests_turn_db_on_when_cw_event_matches_tag_value(self, mock_boto):
client = boto3.client('rds', AWS_REGION)
stubber = Stubber(client)
# response data below should match aws documentation otherwise more errors due to botocore error handling
response = {u'DBInstances': [{'DBInstanceIdentifier': 'rds_response1'}, {'DBInstanceIdentifierrd': 'rds_response2'}]}
stubber.add_response('describe_db_instances', response, {})
stubber.add_response('describe_db_instances', response, {})
with mock.patch('lambda_handler.boto3') as mock_boto3:
with stubber:
r = client.describe_db_instances() # first_add_response consumed here
mock_boto3.client.return_value = client
response=lambda_handler({u'AutoStart': u'10:00:00+10:00/mon'}, 'context') # second_add_response would be consumed here
# asert.equal(r,response)

AWS S3 Upload big file on xamarin is error with System.Net.Sockets.SocketException

I use AWS S3 to kept file that was uploaded from mobile it's working when upload small file but crash when it's a big file. (file was around 5mb)
this is my code.
TransferUtilityUploadRequest request = new TransferUtilityUploadRequest();
request.BucketName = bucketName;
request.StorageClass = S3StorageClass.Standard;
request.CannedACL = S3CannedACL.PublicRead;
request.FilePath = path;
request.Key = key;
TransferUtilityConfig config = new TransferUtilityConfig();
using (TransferUtility uploader = new TransferUtility(AccessKeyID, SecretAccessKey, Region))
{
await uploader.UploadAsync(request);
}
and this is an exception
Unhandled Exception:
System.IO.IOException: Error writing request ---> System.Net.Sockets.SocketException: Connection reset by peer
at System.Net.WebConnection.EndWrite (System.Net.HttpWebRequest request, System.Boolean throwOnError, System.IAsyncResult result) [0x000a6] in /Users/builder/data/lanes/3511/77cb8568/source/mono/mcs/class/System/System.Net/WebConnection.cs:1028
at System.Net.WebConnectionStream.WriteAsyncCB (System.IAsyncResult r) [0x00013] in /Users/builder/data/lanes/3511/77cb8568/source/mono/mcs/class/System/System.Net/WebConnectionStream.cs:458
I already try to change to assign stream to request instead of path or change timeout but exception is still occured.
What's wrong in my code?
Thank for your help.

Run spark unit test on Windows

I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines).
However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
My code is following:
#Test
def testETL() = {
val conf = new SparkConf()
val sc = new SparkContext("local", "test", conf)
try {
val etl = new IxtoolsDailyAgg() // empty constructor
val data = sc.parallelize(List("in1", "in2", "in3"))
etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop
Assert.assertTrue(true)
} finally {
if(sc != null)
sc.stop()
}
}
Why is it trying to access hadoop at all? and how can I fix it?
Thank you in advance
I've solved this issue on my own http://simpletoad.blogspot.com/2014/07/runing-spark-unit-test-on-windows-7.html