How to modify the filename of the S3 object uploaded using the Kafka Connect S3 Connector? - amazon-web-services

I've been using the S3 connector for a couple of weeks now, and I want to change the way the connector names each file. I am using the HourlyBasedPartition, so the path to each file is already enough for me to find each file, and I want the filenames to be something generic for all the files, like just 'Data.json.gzip' (with the respective path from the partitioner).
For example, I want to go from this:
<prefix>/<topic>/<HourlyBasedPartition>/<topic>+<kafkaPartition>+<startOffset>.<format>
To this:
<prefix>/<topic>/<HourlyBasedPartition>/Data.<format>
The objective of this is to only make one call to S3 to download the files later, instead of having to look for the filename first and then download it.
Searching through the files from the folder called 'kafka-connect-s3', I found this file:
https://github.com/confluentinc/kafka-connect-storage-cloud/blob/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/TopicPartitionWriter.java which at the end has some of the following functions:
private RecordWriter getWriter(SinkRecord record, String encodedPartition)
throws ConnectException {
if (writers.containsKey(encodedPartition)) {
return writers.get(encodedPartition);
}
String commitFilename = getCommitFilename(encodedPartition);
log.debug(
"Creating new writer encodedPartition='{}' filename='{}'",
encodedPartition,
commitFilename
);
RecordWriter writer = writerProvider.getRecordWriter(connectorConfig, commitFilename);
writers.put(encodedPartition, writer);
return writer;
}
private String getCommitFilename(String encodedPartition) {
String commitFile;
if (commitFiles.containsKey(encodedPartition)) {
commitFile = commitFiles.get(encodedPartition);
} else {
long startOffset = startOffsets.get(encodedPartition);
String prefix = getDirectoryPrefix(encodedPartition);
commitFile = fileKeyToCommit(prefix, startOffset);
commitFiles.put(encodedPartition, commitFile);
}
return commitFile;
}
private String fileKey(String topicsPrefix, String keyPrefix, String name) {
String suffix = keyPrefix + dirDelim + name;
return StringUtils.isNotBlank(topicsPrefix)
? topicsPrefix + dirDelim + suffix
: suffix;
}
private String fileKeyToCommit(String dirPrefix, long startOffset) {
String name = tp.topic()
+ fileDelim
+ tp.partition()
+ fileDelim
+ String.format(zeroPadOffsetFormat, startOffset)
+ extension;
return fileKey(topicsDir, dirPrefix, name);
}
I don't know if this can be customised to what I want to do but seems to be somehow near/related to my intentions. Hope it helps.
(Submitted an issue to Github as well: https://github.com/confluentinc/kafka-connect-storage-cloud/issues/369)

Related

Is there any way to get the Active Solution Configuration name in C# Code?

I have three solution configurations in my UWP solution in Visual Studio:
Development
Staging
Production
Each is a associated with a different web service and auth provider in the configuration files. In my code, how do I tell which one is which? In the past I've explicitly provided DEFINE constants, but there must be a better way by now.
The active solution configuration is stored in the .suo file beneath the .vs directory at the root solution folder. The .suo file has a compound file binary format, which means you can't just parse it with text manipulation tools.
However, using OpenMcdf -- a tool that can be used to manipulate these types of files -- you can easily get the active solution configuration.
Here's a console app I wrote that works. Feel free to adapt the code to your situation:
using OpenMcdf;
using System;
using System.IO;
using System.Linq;
using System.Text;
namespace GetActiveBuildConfigFromSuo
{
internal enum ProgramReturnCode
{
Success = 0,
NoArg = -1,
InvalidFileFormat = -2
}
internal class Program
{
private const string SolutionConfigStreamName = "SolutionConfiguration";
private const string ActiveConfigTokenName = "ActiveCfg";
internal static int Main(string[] args)
{
try
{
ValidateCommandLineArgs(args);
string activeSolutionConfig = ExtractActiveSolutionConfig(
new FileInfo(args.First()));
throw new ProgramResultException(
activeSolutionConfig, ProgramReturnCode.Success);
}
catch (ProgramResultException e)
{
Console.Write(e.Message);
return (int)e.ReturnCode;
}
}
private static void ValidateCommandLineArgs(string[] args)
{
if (args.Count() != 1) throw new ProgramResultException(
"There must be exactly one command-line argument, which " +
"is the path to an input Visual Studio Solution User " +
"Options (SUO) file. The path should be enclosed in " +
"quotes if it contains spaces.", ProgramReturnCode.NoArg);
}
private static string ExtractActiveSolutionConfig(FileInfo fromSuoFile)
{
CompoundFile compoundFile;
try { compoundFile = new CompoundFile(fromSuoFile.FullName); }
catch (CFFileFormatException)
{ throw CreateInvalidFileFormatProgramResultException(fromSuoFile); }
if (compoundFile.RootStorage.TryGetStream(
SolutionConfigStreamName, out CFStream compoundFileStream))
{
var data = compoundFileStream.GetData();
string dataAsString = Encoding.GetEncoding("UTF-16").GetString(data);
int activeConfigTokenIndex = dataAsString.LastIndexOf(ActiveConfigTokenName);
if (activeConfigTokenIndex < 0)
CreateInvalidFileFormatProgramResultException(fromSuoFile);
string afterActiveConfigToken =
dataAsString.Substring(activeConfigTokenIndex);
int lastNullCharIdx = afterActiveConfigToken.LastIndexOf('\0');
string ret = afterActiveConfigToken.Substring(lastNullCharIdx + 1);
return ret.Replace(";", "");
}
else throw CreateInvalidFileFormatProgramResultException(fromSuoFile);
}
private static ProgramResultException CreateInvalidFileFormatProgramResultException(
FileInfo invalidFile) => new ProgramResultException(
$#"The provided file ""{invalidFile.FullName}"" is not a valid " +
$#"SUO file with a ""{SolutionConfigStreamName}"" stream and an " +
$#"""{ActiveConfigTokenName}"" token.", ProgramReturnCode.InvalidFileFormat);
}
internal class ProgramResultException : Exception
{
internal ProgramResultException(string message, ProgramReturnCode returnCode)
: base(message) => ReturnCode = returnCode;
internal ProgramReturnCode ReturnCode { get; }
}
}
Download DTE nuget packages : EnvDTE.8.0.2
Add below code
EnvDTE.DTE DTE = Marshal.GetActiveObject("VisualStudio.DTE.15.0") as EnvDTE.DTE;
var activeConfig = (string)DTE.Solution.Properties.Item("ActiveConfig").Value;

AWS S3 returns 404 for a file that definitely still exists there

We have some code that downloads a bunch of S3 files to a local directory. The list of files to retrieve is from a query we run. It only lists files that actually exist in our S3 bucket.
As we loop to retrieve these files, about 10% of them return a 404 error as if the file doesn't exist. I log out the name/location of that file, so I can go to S3 and check, and sure enough every single one of the IS ON S3 in the location we went looking for it.
Why does S3 throw a 404 when the file exists?
Here is the Groovy code of the script.
class RetrieveS3FilesFromCSVLoader implements Loader {
private static String missingFilesFile = "00-MISSED_FILES.csv"
private static String csvFileName = "/csv/s3file2.csv"
private static String saveFilesToLocation = "/tmp/retrieve/"
public static final char SEPARATOR = ','
#Autowired
DocumentFileService documentFileService
private void readWithCommaSeparatorSQL() {
int counter = 0
String fileName
String fileLocation
File missedFiles = new File(saveFilesToLocation + missingFilesFile)
PrintWriter writer = new PrintWriter(missedFiles)
File fileCSV = new File(getClass().getResource(csvFileName).toURI())
fileCSV.splitEachLine(SEPARATOR as String) { nextLine ->
//if (counter < 15) {
if (nextLine != null && (nextLine[0] != 'FileLocation')) {
counter++
try {
//Remove 0, only if client number start with "0".
fileLocation = nextLine[0].trim()
byte[] fileBytes = documentFileService.getFile(fileLocation)
if (fileBytes != null) {
fileName = fileLocation.substring(fileLocation.indexOf("/") + 1, fileLocation.length())
File file = new File(saveFilesToLocation + fileName)
file.withOutputStream {
it.write fileBytes
}
println "$counter) Wrote file ${fileLocation} to ${saveFilesToLocation + fileLocation}"
} else {
println "$counter) UNABLE TO RETRIEVE FILE ELSE: $fileLocation"
writer.println(fileLocation)
}
} catch (Exception e) {
println "$counter) UNABLE TO RETRIEVE FILE: $fileLocation"
println(e.getMessage())
writer.println(fileLocation)
}
} else {
counter++;
}
//}
}
writer.close()
}
Here is the code for getFile(fileLocation) and client creation.
public byte[] getFile(String filename) throws IOException {
AmazonS3Client s3Client = connectToAmazonS3Service();
S3Object object = s3Client.getObject(S3_BUCKET_NAME, filename);
if(object == null) {
return null;
}
byte[] fileAsArray = IOUtils.toByteArray(object.getObjectContent());
object.close();
return fileAsArray;
}
/**
* Connects to Amazon S3
*
* #return instance of AmazonS3Client
*/
private AmazonS3Client connectToAmazonS3Service() {
AWSCredentials credentials;
try {
credentials = new BasicAWSCredentials(S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY);
} catch (Exception e) {
throw new AmazonClientException(
"Cannot load the credentials from the credential profiles file. " +
"Please make sure that your credentials file is at the correct " +
"location (~/.aws/credentials), and is in valid format.",
e);
}
AmazonS3Client s3 = new AmazonS3Client(credentials);
Region usWest2 = Region.getRegion(Regions.US_EAST_1);
s3.setRegion(usWest2);
return s3;
}
The code above works for 90% of the files in the list passed to the script, but we know with fact that all 100% of the files exist in S3 and with the location String we are passing.
I am just an idiot. Thought it had the production AWS credentials in the properties file. Instead it was development credentials. So I had the wrong credentials.

Updating a file in Amazon S3 bucket

I am trying to append a string to the end of a text file stored in S3.
Currently I just read the contents of the file into a String, append my new text and resave the file back to S3.
Is there a better way to do this. I am thinkinig when the file is >>> 10MB then reading the entire file would not be a good idea so how should I do this correctly?
Current code
[code]
private void saveNoteToFile( String p_note ) throws IOException, ServletException
{
String str_infoFileName = "myfile.json";
String existingNotes = s3Helper.getfileContentFromS3( str_infoFileName );
existingNotes += p_note;
writeStringToS3( str_infoFileName , existingNotes );
}
public void writeStringToS3(String p_fileName, String p_data) throws IOException
{
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream( p_data.getBytes());
try {
streamFileToS3bucket( p_fileName, byteArrayInputStream, p_data.getBytes().length);
}
catch (AmazonServiceException e)
{
e.printStackTrace();
} catch (AmazonClientException e)
{
e.printStackTrace();
}
}
public void streamFileToS3bucket( String p_fileName, InputStream input, long size)
{
//Create sub folders if there is any in the file name.
p_fileName = p_fileName.replace("\\", "/");
if( p_fileName.charAt(0) == '/')
{
p_fileName = p_fileName.substring(1, p_fileName.length());
}
String folder = getFolderName( p_fileName );
if( folder.length() > 0)
{
if( !doesFolderExist(folder))
{
createFolder( folder );
}
}
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(size);
AccessControlList acl = new AccessControlList();
acl.grantPermission(GroupGrantee.AllUsers, Permission.Read);
s3Client.putObject(new PutObjectRequest(bucket, p_fileName , input,metadata).withAccessControlList(acl));
}
[/code]
It's not possible to append to an existing file on AWS S3. When you upload an object it creates a new version if it already exists:
If you upload an object with a key name that already exists in the
bucket, Amazon S3 creates another version of the object instead of
replacing the existing object
Source: http://docs.aws.amazon.com/AmazonS3/latest/UG/ObjectOperations.html
The objects are immutable.
It's also mentioned in these AWS Forum threads:
https://forums.aws.amazon.com/message.jspa?messageID=179375
https://forums.aws.amazon.com/message.jspa?messageID=540395
It's not possible to append to an existing file on AWS S3.
You can delete existing file and upload new file with same name.
Configuration
private string bucketName = "my-bucket-name-123";
private static string awsAccessKey = "AKI............";
private static string awsSecretKey = "+8Bo..................................";
IAmazonS3 client = new AmazonS3Client(awsAccessKey, awsSecretKey,
RegionEndpoint.APSoutheast2);
string awsFile = "my-folder/sub-folder/textFile.txt";
string localFilePath = "my-folder/sub-folder/textFile.txt";
To Delete
public void DeleteRefreshTokenFile()
{
try
{
var deleteFileRequest = new DeleteObjectRequest
{
BucketName = bucketName,
Key = awsFile
};
DeleteObjectResponse fileDeleteResponse = client.DeleteObject(deleteFileRequest);
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
To Upload
public void UploadRefreshTokenFile()
{
FileInfo file = new FileInfo(localFilePath);
try
{
PutObjectRequest request = new PutObjectRequest()
{
InputStream = file.OpenRead(),
BucketName = bucketName,
Key = awsFile
};
PutObjectResponse response = client.PutObject(request);
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
One option is to write the new lines/information to a new version of the file. This would create a LARGE number of versions. But, essentially, whatever program you are using the file for could read ALL the versions and append them back together when reading it (this seems like a really bad idea as I write it out).
Another option would be to write a new object each time with a time stamp appended to the object name. my-log-file-date-time . Then whatever program is reading from it could append them all together after downloading my-log-file-*.
You would want to delete objects older than a certain time just like log rotation.
Depending on how busy your events are this might work. If you have thousands per second, I don't think this would work. But if you just have a few events per minute it may be reasonable.
You can do it with s3api put-object.
First download the version you want and use below commend. it will upload as the latest version.
ᐅ aws s3api put-object --bucket $BUCKET --key $FOLDER/$FILE --body $YOUR_LOCAL_DOWNLOADED_VERSION_FILE

Sitecore FileUtil.ZipFiles creating empty zip file

I am trying to use the ZipFiles() utility method and its producing an empty zip file. I am using Sitecore 6.5. There are no error, permissions or otherwise.
Any thoughts? Here is the code.
public void CreateZipFile(string zipfileName, List<string> files)
{
var zipfile = string.Format("{0}/{1}/{2}", TempFolder.Folder, "myfolder", zipfileName) ;
var fileArray = files.ToArray();
var x = FileUtil.ZipFiles(zipfile, fileArray);
}
EDIT:
I am passing the files like this
var files = new List<string> { FileUtil.MapPath("/temp/sample.xlf") };
The proper usage of FileUtil.ZipFiles method is:
FileUtil.ZipFiles("/test.zip", new []{"/web.config", "/otherfile.txt"})
Sitecore automatically maps paths. The zip file will be created in your web app root.
EDIT AFTER COMMENT
If you want to create a zip file outside the web root and with a flat structure inside, you can use Sitecore ZipWriter class like this:
public static string ZipFiles(string absolutePathToZipfile, string[] files)
{
using (ZipWriter zipWriter = new ZipWriter(absolutePathToZipfile))
{
foreach (string path in files)
{
using (FileStream fileStream = System.IO.File.OpenRead(path.StartsWith("/") ? FileUtil.MapPath(path) : path))
zipWriter.AddEntry(FileUtil.GetFileName(path), fileStream);
}
}
return absolutePathToZipfile;
}

SharpLibZip: Add file without path

I'm using the following code, using the SharpZipLib library, to add files to a .zip file, but each file is being stored with its full path. I need to only store the file, in the 'root' of the .zip file.
string[] files = Directory.GetFiles(folderPath);
using (ZipFile zipFile = ZipFile.Create(zipFilePath))
{
zipFile.BeginUpdate();
foreach (string file in files)
{
zipFile.Add(file);
}
zipFile.CommitUpdate();
}
I can't find anything about an option for this in the supplied documentation. As this is a very popular library, I hope someone reading this may know something.
My solution was to set the NameTransform object property of the ZipFile to a ZipNameTransform with its TrimPrefix set to the directory of the file. This causes the directory part of the entry names, which are full file paths, to be removed.
public static void ZipFolderContents(string folderPath, string zipFilePath)
{
string[] files = Directory.GetFiles(folderPath);
using (ZipFile zipFile = ZipFile.Create(zipFilePath))
{
zipFile.NameTransform = new ZipNameTransform(folderPath);
foreach (string file in files)
{
zipFile.BeginUpdate();
zipFile.Add(file);
zipFile.CommitUpdate();
}
}
}
What's cool is the the NameTransform property is of type INameTransform, allowing customisation of the name transforms.
How about using System.IO.Path.GetFileName() combined with the entryName parameter of ZipFile.Add()?
string[] files = Directory.GetFiles(folderPath);
using (ZipFile zipFile = ZipFile.Create(zipFilePath))
{
zipFile.BeginUpdate();
foreach (string file in files)
{
zipFile.Add(file, System.IO.Path.GetFileName(file));
}
zipFile.CommitUpdate();
}
The MSDN entry for Directory.GetFiles() states that The returned file names are appended to the supplied path parameter. (http://msdn.microsoft.com/en-us/library/07wt70x2.aspx), so the strings you are passing to zipFile.Add() contain the path.
According to the SharpZipLib documentation, there is an overload of the Add method,
public void Add(string fileName, string entryName)
Parameters:
fileName(String) The name of the file to add.
entryName (String) The name to use for the ZipEntry on the Zip file created.
Try this approach:
string[] files = Directory.GetFiles(folderPath);
using (ZipFile zipFile = ZipFile.Create(zipFilePath))
{
zipFile.BeginUpdate();
foreach (string file in files)
{
zipFile.Add(file, Path.GetFileName(file));
}
zipFile.CommitUpdate();
}