2022-09-08 18:57:53,375 [http-nio-8081-exec-9] ERROR c.b.t.w.e.RestResponseEntityExceptionHandler - 500 Status Code
c.a.SdkClientException: Unable to calculate MD5 hash: src/main/resources/maxmind/tech_fee_agreement.pdf (No such file or directory)
at c.a.s.s.AmazonS3Client.putObject(AmazonS3Client.java:1624)
at c.b.t.s.S3FileStorage.uploadPdfFile(S3FileStorage.java:100)
at c.b.t.s.S3FileStorage$$FastClassBySpringCGLIB$$7d744aa1.invoke(<generated>)
at o.s.c.p.MethodProxy.invoke(MethodProxy.java:218)
at o.s.a.f.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:685)
... 93 frames truncated
Caused by: j.i.FileNotFoundException: src/main/resources/maxmind/tech_fee_agreement.pdf (No such file or directory)
at j.i.FileInputStream.open0(FileInputStream.java)
at j.i.FileInputStream.open(FileInputStream.java:219)
at j.i.FileInputStream.<init>(FileInputStream.java:157)
at c.a.u.Md5Utils.computeMD5Hash(Md5Utils.java:97)
at c.a.u.Md5Utils.md5AsBase64(Md5Utils.java:104)
... 1 frames truncated
... 97 common frames omitted
I have created a method to generate a pdf file from an HTML template and save it locally and upload it to aws S3FileStorage then delete the local file. it is working fine in my local but why it is not able to Identify the path/ can't find the file or directory?
here is the method i created.
public void generatePdfFile(Map<String, Object> data, String pdfFileName) {
Context context = new Context();
context.setVariables(data);
String htmlContent = templateEngine.process("pdf/tech_fee_agreement.html", context);
try {
String fileNameWithPath = pdfDirectory + pdfFileName;
FileOutputStream fileOutputStream = new FileOutputStream(fileNameWithPath);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(htmlContent);
renderer.layout();
renderer.createPDF(fileOutputStream, false);
renderer.finishPDF();
} catch (FileNotFoundException | DocumentException e) {
logger.error(e.getMessage(), e);
}
}
the pdf directories strings are coming from the application.properties file -->
pdf.directory=src/main/resources/maxmind/
pdf.directory2=src/main/resources/maxmind/tech_fee_agreement.pdf
This Problem is solved by adding root directory into the file path, like this:
Path rootDIr = Paths.get(".").normalize().toAbsolutePath();
FileOutputStream fileOutputStream = new FileOutputStream(new File(rootDIr+"/"+pdfDirectory+pdfFileName));
Related
I am trying to download a file from Amazon S3.
I want the user to visit my app via a GET api.
The app in turn gets the content from S3 and give it back to the user as a downloadable file.
Note:
I dont want to store the file locally in my server, i want it to be streamed form amazon s3 directly to the end user
I tried with a file of around 300 MB, if I run it locally like below
the memory footprint is low, i.e. when the same file is present locally
#GET
#Path("/pdfdownload")
#Produces("application/pdf")
public Response getFile() {
File file = new File('/pathToFile'); // in local
ResponseBuilder response = Response.ok((Object) file);
response.header("Content-Disposition", "attachment; filename=file.pdf");
return response.build();
}
But when I download the same from Amazon s3, my tomcat server's memory quickly raises to around 600 MB, I think I am streaming the content, but when i look at the memory used i doubt it
Am i missing something ?
#GET
#Path("/pdfdownload")
#Produces("application/pdf")
public Response getFile2() {
final S3Object s3Object = getAmazonS3Object();// AWS S3
final S3ObjectInputStream s3is = s3Object.getObjectContent();
final StreamingOutput stream = new StreamingOutput() {
#Override
public void write(OutputStream os) throws IOException, WebApplicationException {
byte[] read_buf = new byte[1024];
int read_len = 0;
while ((read_len = s3is.read(read_buf)) > 0) {
os.write(read_buf, 0, read_len);
}
os.close();
s3is.close();
}
};
ResponseBuilder response = Response.ok(stream);
response.header("Content-Disposition", "attachment; filename=file.pdf");
return response.build();
}
private S3Object getAmazonS3Object() {
AWSCredentials credentials = new BasicAWSCredentials("accesskey",
"secretkey");
try {
AmazonS3 s3 = new AmazonS3Client(credentials);
S3Object s3object = s3.getObject(new GetObjectRequest("bucketName", "filename_WithExtension"));
return s3object;
} catch (AmazonServiceException e) {
System.err.println(e.getErrorMessage());
System.exit(1);
}
System.out.println("Done!");
return null;
}
Pom :
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
<version>1.8</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.542</version>
</dependency>
Similar to this S3 download pdf - REST API
I dont want to use PreSignedURl: https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURLJavaSDK.html
Please see this article on streaming: https://memorynotfound.com/low-level-streaming-with-jax-rs-streamingoutput/
Could some please help as why the memory spikes up?
Thanks to all the post on stackoverflow and one of my colleague.
My colleague found the answer, actually the above code doesnt have a memory issue, when I was monitoring the jvm i saw a spike, but didnt realize garbage collection didnt kick in.
I tried downloading 6 files each fo 300 MB +, the server holds its ground
I was able to load a text file from AWS S3 but facing a problem in reading the ".conf" file. Getting the error
"Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'spark'"
Scala code:
val configFile1 = ConfigFactory.load( "s3n://<bucket_name>/aws.conf" )
configFile1.getString("spark.lineage.key")
Here what I end up doing it, Create a wrapper utility Config.scala
import java.io.File
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain
import com.amazonaws.services.s3.{AmazonS3Client, AmazonS3URI}
import com.typesafe.config.{ConfigFactory, Config => TConfig}
import scala.io.Source
object Config {
private def read(location: String): String = {
val awsCredentials = new DefaultAWSCredentialsProviderChain()
val s3Client = new AmazonS3Client(awsCredentials)
val s3Uri = new AmazonS3URI(location)
val fullObject = s3Client.getObject(s3Uri.getBucket, s3Uri.getKey)
Source.fromInputStream(fullObject.getObjectContent).getLines.mkString("\n")
}
def apply(location: String): TConfig = {
if (location.startsWith("s3")) {
val content = read(location)
ConfigFactory.parseString(content)
} else {
ConfigFactory.parseFile(new File(location))
}
}
}
Use the created wrapper
val conf: TConfig = Config("s3://config/path")
You may use provided scope for aws-java-sdk since it will be available in the EMR cluster.
According to my research, we can only read delimiter files from AWS S3 through spark/scala. As .conf files are of = pair, its not possible.
Only way would be modify the format of data in the file.
Typesafe Config does not support loading .conf files from S3, but you can read s3 file as a string yourself and pass to typesafe config like val conf = ConfigFactory.parseString(... .conf files as string ...)
I use AWS S3 to kept file that was uploaded from mobile it's working when upload small file but crash when it's a big file. (file was around 5mb)
this is my code.
TransferUtilityUploadRequest request = new TransferUtilityUploadRequest();
request.BucketName = bucketName;
request.StorageClass = S3StorageClass.Standard;
request.CannedACL = S3CannedACL.PublicRead;
request.FilePath = path;
request.Key = key;
TransferUtilityConfig config = new TransferUtilityConfig();
using (TransferUtility uploader = new TransferUtility(AccessKeyID, SecretAccessKey, Region))
{
await uploader.UploadAsync(request);
}
and this is an exception
Unhandled Exception:
System.IO.IOException: Error writing request ---> System.Net.Sockets.SocketException: Connection reset by peer
at System.Net.WebConnection.EndWrite (System.Net.HttpWebRequest request, System.Boolean throwOnError, System.IAsyncResult result) [0x000a6] in /Users/builder/data/lanes/3511/77cb8568/source/mono/mcs/class/System/System.Net/WebConnection.cs:1028
at System.Net.WebConnectionStream.WriteAsyncCB (System.IAsyncResult r) [0x00013] in /Users/builder/data/lanes/3511/77cb8568/source/mono/mcs/class/System/System.Net/WebConnectionStream.cs:458
I already try to change to assign stream to request instead of path or change timeout but exception is still occured.
What's wrong in my code?
Thank for your help.
I am trying to get some data out of a pdf document using scraperwiki for pyhon. It works beautifully if I download the file using urllib2 like so:
pdfdata = urllib2.urlopen(url).read()
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.html.fromstring(xmldata)
pages = list(root)
But here comes the tricky part. As I would like to do this for a large number of pdf-files that I have on my disk, I would like to do away with the first line and pass the pdf file directly as an argument. However, if I try
pdfdata = open("filename.pdf","wb")
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.html.fromstring(xmldata)
I get the following error
xmldata = scraperwiki.pdftoxml(pdfdata)
File "/usr/local/lib/python2.7/dist-packages/scraperwiki/utils.py", line 44, in pdftoxml
pdffout.write(pdfdata)
TypeError: must be string or buffer, not file
I am guessing that this occurs because I do not open the pdf correctly?
If so, is there a way to open a pdf from disk just like urllib2.urlopen() does?
urllib2.urlopen(...).read() does just that it reads the contents of the stream returned from the url you passed as a parameter.
While open() returns a file handler. Just as urllib2 needed to do an open() call then a read() call so does file handlers.
Change your program to use the the following lines:
with open("filename.pdf", "rb") as pdffile:
pdfdata=pdffile.read()
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.html.fromstring(xmldata)
This will open your pdf then read the contents into a buffer named pdfdata. From there your call to scraperwiki.pdftoxml() will work as expected.
I use the following code for downloading two files in a folder from a website.
I want to download some files that contain "MOD09GA.A2008077.h22v05.005.2008080122814.hdf" and "MOD09GA.A2008077.h23v05.005.2008080122921.hdf" in the page. But I don't know how to select these files. The code below download all the files, but I only need two of them.
Does anyone have any ideas?
URL = 'http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2008.03.17/';
% Local path on your machine
localPath = 'E:/myfolder/';
% Read html contents and parse file names with ending *.hdf
urlContents = urlread(URL);
ret = regexp(urlContents, '"\S+.hdf.xml"', 'match');
% Loop over all files and download them
for k=1:length(ret)
filename = ret{k}(2:end-1);
filepathOnline = strcat(URL, filename);
filepathLocal = fullfile(localPath, filename);
urlwrite(filepathOnline, filepathLocal);
end
Try the regexp with tokens instead:
localPath = 'E:/myfolder/';
urlContents = 'aaaa "MOD09GA.A2008077.h22v05.005.2008080122814.hdf.xml" and "MOD09GA.A2008077.h23v05.005.2008080122921.hdf.xml" aaaaa';
ret = regexp(urlContents , '"(\S+)(?:\.\d+){2}(\.hdf\.xml)"', 'tokens');
%// Loop over each file name
for k=1:length(ret)
filename = [ret{k}{:}];
filepathLocal = fullfile(localPath, filename)
end