Create and Read password protected ZIP using streams.(Not physical File) - zip4j

I need to implement a solution that creates password protected ZIP streams and save to
database as a
blob. Also need to read the password protected content from the database read as the stream. This
should not create a physical File. Standard JAVA SDK does not support creating and reading password
protected ZIP. I tried with different solutions all most the available solution creating a physical file.
I found examples with writing/Reading password protected ZIP with ZIP4J
How to password protect a zipped Excel file in Java?
Is it possible to create and read password protected ZIPs with ZIP4j library without creating physical files. ?
Applying a patch to the other available source seems difficult to cater my requirement.
Write a password protected Zip file in Java

If your intention is protecting the BLOB data in database, why don't you just use javax.crypto.CipherOutputStream/javax.crypto.CipherInputStream?
Reading zip content from BLOB and convert it to stream using ZIP4J is quite complicate, you have to work on file header first...(check the source code of net.lingala.zip4j.unzip.UnzipEngine class)
Writing zip content into memory is easier, here is a example code:
ZipParameters zipParam = new ZipParameters();
zipParam.setSourceExternalStream(true);
//set parameter for encryption...
zipParam.setEncryptFiles(true);
zipParam.setEncryptionMethod(Zip4jConstants.ENC_METHOD_AES);
zipParam.setAesKeyStrength(Zip4jConstants.AES_STRENGTH_256);
zipParam.setPassword("test123");
ByteArrayOutputStream bo = new ByteArrayOutputStream(256);
ZipOutputStream zout = new ZipOutputStream(bo, new ZipModel());
String[] filenames = new String[]{"1.txt"};
for (int i = 0; i < filenames.length; i++) {
zipParam.setFileNameInZip(filenames[0]);
zout.putNextEntry(null, zipParam);
zout.write(filenames[0].getBytes());//data waiting for compressed...
zout.closeEntry();
}
zout.finish();
zout.close();
bo.toByteArray();//compressed data of zip file

Related

Save a stream to file in vibe.d

I would like to save a vibe.d stream such as HTTPClientResponse.bodyReader (of type InterfaceProxy!InputStream), but also other potential vibe.d streams to a file, how do I best do that in a memory efficient way without copying all data to RAM?
In general for downloading files using a HTTP client you can use the vibe.inet.urltransfer package which offers a download convenience function which performs a HTTP request, handles redirects and stores the final output to a file.
download(url, file);
However if you want to take a raw input stream (for example when not handling redirects) you can use vibe.core.file : openFile to open/create a file as file stream and then write to that.
To then write to the file stream you've got two options:
Either you directly call file.write(otherStream)
Otherwise you can use vibe.core.stream : pipe
Directly calling write on the FileStream object is what is being used inside the vibe.d urltransfer module and is also recommended for files as it will read directly from the stream into the write buffer instead of using an additional temporary buffer which pipe would use.
Sample:
// createTrunc creates a file if it doesn't exist and clears it if it does exist
// You might want to use readWrite or append instead.
auto fil = openFile(filename, FileMode.createTrunc);
scope(exit) fil.close();
fil.write(inputStream);

How to read a huge CSV file from Google Cloud Storage line by line using Java?

I'm new to Google Cloud Platform. I'm trying to read a CSV file present in Google Cloud Storage (non-public bucket accessed via Service Account key) line by line which is around 1GB.
I couldn't find any option to read the file present in the Google Cloud Storage (GCS) line by line. I only see the read by chunksize/byte size options. Since I'm trying to read a CSV, I don't want to use read by chunksize since it may split a record while reading.
Solutions tried so far:
Tried copying the contents from CSV file present in GCS to temporary local file and read the temp file by using the below code. The below code is working as expected but I don't want to copy huge file to my local instance. Instead, I want to read line by line from GCS.
StorageOptions options =
StorageOptions.newBuilder().setProjectId(GCP_PROJECT_ID)
.setCredentials(gcsConfig.getCredentials()).build();
Storage storage = options.getService();
Blob blob = storage.get(BUCKET_NAME, FILE_NAME);
ReadChannel readChannel = blob.reader();
FileOutputStream fileOuputStream = new FileOutputStream(TEMP_FILE_NAME);
fileOuputStream.getChannel().transferFrom(readChannel, 0, Long.MAX_VALUE);
fileOuputStream.close();
Please suggest the approach.
Since, I'm doing batch processing, I'm using the below code in my ItemReader's init() method which is annotated with #PostConstruct. And In my ItemReader's read(), I'm building a List. Size of list is same as chunk size. In this way I can read lines based on my chunkSize instead of reading all the lines at once.
StorageOptions options =
StorageOptions.newBuilder().setProjectId(GCP_PROJECT_ID)
.setCredentials(gcsConfig.getCredentials()).build();
Storage storage = options.getService();
Blob blob = storage.get(BUCKET_NAME, FILE_NAME);
ReadChannel readChannel = blob.reader();
BufferedReader br = new BufferedReader(Channels.newReader(readChannel, "UTF-8"));
One of the easiest ways might be to use the google-cloud-nio package, part of the google-cloud-java library that you're already using: https://github.com/googleapis/google-cloud-java/tree/v0.30.0/google-cloud-contrib/google-cloud-nio
It incorporates Google Cloud Storage into Java's NIO, and so once it's up and running, you can refer to GCS resources just like you'd do for a file or URI. For example:
Path path = Paths.get(URI.create("gs://bucket/lolcat.csv"));
try (Stream<String> lines = Files.lines(path)) {
lines.forEach(s -> System.out.println(s));
} catch (IOException ex) {
// do something or re-throw...
}
Brandon Yarbrough is right, and to add to his answer:
if you use gcloud to login with your credentials then Brandon's code will work: google-cloud-nio will use your login to access the files (and that'll work even if they are not public).
If you prefer to do it all in software, you can use this code to read credentials from a local file and then access your file from Google Cloud:
String myCredentials = "/path/to/my/key.json";
CloudStorageFileSystem fs =
CloudStorageFileSystem.forBucket(
"bucket",
CloudStorageConfiguration.DEFAULT,
StorageOptions.newBuilder()
.setCredentials(ServiceAccountCredentials.fromStream(
new FileInputStream(myCredentials)))
.build());
Path path = fs.getPath("/lolcat.csv");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
edit: you don't want to read all the lines at once so don't use realAllLines, but once you have the Path you can use any of the other techniques discussed above to read just the part of the file you need: you can read one line at a time or get a Channel object.

Is it possible to write to s3 via a stream using s3 java sdk

Normally when a file has to be uploaded to s3, it has to first be written to disk, before using something like the TransferManager api to upload to the cloud. This cause data loss if the upload does not finish on time(application goes down and restarts on a different server, etc). So I was wondering if it's possible to write to a stream directly across the network with the required cloud location as the sink.
You don't say what language you're using, but I'll assume Java based on your capitalization. In which case the answer is yes: TransferManager has an upload() method that takes a PutObjectRequest, and you can construct that object around a stream.
However, there are two important caveats. The first is in the documentation for PutObjectRequest:
When uploading directly from an input stream, content length must be specified before data can be uploaded to Amazon S3
So you have to know how much data you're uploading before you start. If you're receiving an upload from the web and have a Content-Length header, then you can get the size from it. If you're just reading a stream of data that's arbitrarily long, then you have to write it to a file first (or the SDK will).
The second caveat is that this really doesn't prevent data loss: your program can still crash in the middle of reading data. One thing that it will prevent is returning a success code to the user before storing the data in S3, but you could do that anyway with a file.
Surprisingly this is not possible (at time of writing this post) with standard Java SDK. Anyhow thanks to this 3rd party library you can atleast avoid buffering huge amounts of data to either memory or disk since it buffers internally ~5MB parts and uploads them automatically within multipart upload for you.
There is also github issue open in SDK repository one can follow to get updates.
It is possible:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.build();
s3Client.putObject("bucket", "key", youtINputStream, s3MetData)
AmazonS3.putObject
public void saveS3Object(String key, InputStream inputStream) throws Exception {
List<PartETag> partETags = new ArrayList<>();
InitiateMultipartUploadRequest initRequest = new
InitiateMultipartUploadRequest(bucketName, key);
InitiateMultipartUploadResult initResponse =
s3.initiateMultipartUpload(initRequest);
int partSize = 5242880; // Set part size to 5 MB.
try {
byte b[] = new byte[partSize];
int len = 0;
int i = 1;
while ((len = inputStream.read(b)) >= 0) {
// Last part can be less than 5 MB. Adjust part size.
ByteArrayInputStream partInputStream = new ByteArrayInputStream(b,0,len);
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucketName).withKey(key)
.withUploadId(initResponse.getUploadId()).withPartNumber(i)
.withFileOffset(0)
.withInputStream(partInputStream)
.withPartSize(len);
partETags.add(
s3.uploadPart(uploadRequest).getPartETag());
i++;
}
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(
bucketName,
key,
initResponse.getUploadId(),
partETags);
s3.completeMultipartUpload(compRequest);
} catch (Exception e) {
s3.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, key, initResponse.getUploadId()));
}
}

how to write real-time data to HDFS with Avro/Parquet?

I have the following working in a unit test to write a single object in Avro/Parquet to a file in my Cloudera/HDFS cluster.
That said, given that Parquet is a columnar format, it seems like it can only write out an entire file in a batch mode (updates not supported).
So, what are the best practices for writing files for data ingested (via ActiveMQ/Camel) in real-time (small msgs at 1k msg/sec, etc)?
I suppose I could aggregate my messages (buffer in memory or other temp storage) and write them out in batch mode using a dynamic filename, but I feel like I'm missing something with the partitioning/file naming by hand, etc...
Configuration conf = new Configuration(false);
conf.set("fs.defaultFS", "hdfs://cloudera-test:8020/cm/user/hive/warehouse");
conf.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, false);
AvroReadSupport.setAvroDataSupplier(conf, ReflectDataSupplier.class);
Path path = new Path("/cm/user/hive/warehouse/test1.data");
MyObject object = new MyObject("test");
Schema schema = ReflectData.get().getSchema(object.getClass());
ParquetWriter<InboundWirelessMessageForHDFS> parquetWriter = AvroParquetWriter.<MyObject>builder(path)
.withSchema(schema)
.withCompressionCodec(CompressionCodecName.UNCOMPRESSED)
.withDataModel(ReflectData.get())
.withDictionaryEncoding(false)
.withConf(conf)
.withWriteMode(ParquetFileWriter.Mode.OVERWRITE) //required because the filename doesn't change for this test
.build();
parquetWriter.write(object);
parquetWriter.close();
based on my (limited) research...I'm assuming that files can't be appended to (by design)...so I simply must batch real-time data (in memory or otherwise) before writing out files in parquet...
How to append data to an existing parquet file

How do I load QBytesArray containing a zip file to QuaZip?

I'm using Qt Creator in a new project, so I don't know many things about this... :(
I want to download a zip file, containing a json file, read this file and use that information. I can download the zip, save it in my disk and open it again to read json and use it. But I want to open my zip just in memory without really saving it...
I have the zip info in a QByteArray and I need to send this "file" to QuaZip constructor/object.
How do I do it?
You can use QBuffer. It provides a QIODevice interface for a QByteArray.
Example:
QByteArray byteArray("abc");
QBuffer buffer(&byteArray);
buffer.open(QIODevice::WriteOnly);
buffer.seek(3);
buffer.write("def", 3);
buffer.close();
Then you can use QuaZip::QuaZip(QIODevice *ioDevice) constructor to create QuaZip object.