How to maintain the appropriate file type (MIME) when moving a documents (pdf, xls) from one library to another - sharepoint-2013

I have a workflow that I created in moving a document from one library to another library and it works fine for word documents, but when moving excel or PDF files, the file is moved, but the filetype seems to be a word file type.
I am using Create Item in creating the item in the new library. There does not seem to be a way to update the file type or mime type using this command.

Related

Quarto - How to embed files in outputted word documents programmatically?

My quarto document uses knitr as the backend. I have the output set to create a word document. I would like the outputted word document to also contain an embedded object that is the file used to create the word document itself.
So create_word.qmd will create output_word_file.docx. I would like there to be an embedded object in the output_word_file.docx file that is a copy of create_word.qmd. Is this possible programmatically? Or do I need to manually do this part?
If it's only possible using the .Rmd file type then I can make that work.

Load files from folder with a custom query with power BI

I am trying to load csv files from a folder but I need to apply several custom steps to each file, including dropping the PromoteHeaders default.
I have a custom query that can load a single file successfully. How do I turn it into a query that loads all files in a folder?
By default, File.folder's "promoteHeaders" messes up my data because of a missing column name (which my custom query fixes).
The easiest way to create a function that reads a specific template of file is to actually do it. Just create the M to read it and by right click on the entity transform it to a function.
After that is really simple to transform your M so it uses parameters.
You can create a blank query and replace the code with this on as an example, customize with more steps to deal with your file requirements.
= (myFile) => let
Source = Csv.Document(myFile,[Delimiter=",", Columns=33, Encoding=1252, QuoteStyle=QuoteStyle.None])
in
Source
And then Invoke Custom Function for each file with the content as the parameter.

Open symbolic link file as rb, and not the file it is pointing to. (or generating it in buffer)

How could I possible open a symbolic link and get the content of the file instead of the file it is pointing to?
By doing:
with open('/home/symlink.txt', 'rb') as f:
data=f.read()
If the symbolic link points to /foo/faa.txt, the variable data will contain the content of faa.txt. This is a big security and file problem from my server because I'm generating zip archives.
If for example, a folder contains multiple symbolic links with different names to avoid duplicating files, the zip archive will contain multiple files instead of multiple symbolic links!
I hope to be clear enough!
An extra explanation:
The point of this is to allow downloading symlinks in a django server. The way of returning files is the following one:
response = HttpResponse()
response.write(data))
return response
This means that data must contain the content that the user will download. I can not just give it a path. So what I need to do is to give it a symbolic link. The problem is that reading a symbolic link makes python read the content where it is pointing to instead of its real content. In a few words, the user downloads the real file instead of the symbolic link!
A possible solution to this would be to get the path where the symlink points to, and then generate the link in the buffer. Is this possible?
It looks like there are 2 questions here: How can you read a symlink from the filesystem, and how can you store this in a .zip file such that it will be recreated when you unzip it.
Reading a symlink
The contents of a symlink are defined here:
http://man7.org/linux/man-pages/man7/symlink.7.html
A symbolic link is a special type of file whose contents are a string that is the pathname of another file, the file to which the link refers
You can read that path by using os.readlink (https://docs.python.org/2/library/os.html#os.readlink) - this is analogous to C's readlink function.
It's also important to note that these symlinks aren't distinguished by their content or file attributes, but by the fact that the file entry on disk points to a string rather than a file object:
In other words, a symbolic link is a pointer to another name, and not to an underlying object.
This means that there isn't really a "file" you could store in the ZIP. So how do the existing zip & unzip utilities do it?
Storing a symlink in a zip file
The spec for the ZIP format is here: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Note that section 4.5.7 (defining UNIX Extra Field) says:
The variable length data field will contain file type specific data. Currently the only values allowed are the original "linked to" file names for hard or symbolic links, and the major and minor device node numbers for character and block device nodes. [...] Link files will have the name of the original file stored.
This means that to store a symlink, all you need to do is add the UNIX extra field block to the data you are writing (these appear to live immediately after the filename is written, and you need to set the extra field length accordingly), and populate its "Variable length data field" with the path you get from readlink. The content you store for the node will be empty.
If you're using a library to generate the zip data (recommended!), it will probably have an abstraction available for that. If not, I'd suggest you put in a feature request!
Of course, most existing zip and unzip utilities follow the same definition, which is why you are able to zip and unzip symbolic links as if they were regular files.

Use TfileUnarchive on Amazon S3

I have a talend job which is simple like below:
ts3Connection -> ts3Get -> tfileinputDelimeted -> tmap -> tamazonmysqloutput.
Now the scenario here is that some times I get the file in .txt format and sometimes I get it in a zip file.
So I want to use tFileUnarchive to unzip the file if it's in zip or process it bypassing the tFileUnarchive component if the file is in unzipped format i.e only in .txt format.
Any help on this is greatly appreciated.
The trick here is to break the file retrieval and potential unzipping into one sub job and then the processing of the files into another sub job afterwards.
Here's a simple example job:
As normal, you connect to S3 and then you might list all the relevant objects in the bucket using the tS3List and then pass this to tS3Get. Alternatively you might have another way of passing the relevant object key that you want to download to tS3Get.
In the above job I set tS3Get up to fetch every object that is iterated on by the tS3List component by setting the key as:
((String)globalMap.get("tS3List_1_CURRENT_KEY"))
and then downloading it to:
"C:/Talend/5.6.1/studio/workspace/S3_downloads/" + ((String)globalMap.get("tS3List_1_CURRENT_KEY"))
The extra bit I've added starts with a Run If conditional link from the tS3Get which links the tFileUnarchive with the condition:
((String)globalMap.get("tS3List_1_CURRENT_KEY")).endsWith(".zip")
Which checks to see if the file being downloaded from S3 is a .zip file.
The tFileUnarchive component then just needs to be told what to unzip, which will be the file we've just downloaded:
"C:/Talend/5.6.1/studio/workspace/S3_downloads/" + ((String)globalMap.get("tS3List_1_CURRENT_KEY"))
and where to extract it to:
"C:/Talend/5.6.1/studio/workspace/S3_downloads"
This then puts any extracted files in the same place as the ones that didn't need extracting.
From here we can now iterate through the downloads folder looking for the file types we want by setting the directory to "C:/Talend/5.6.1/studio/workspace/S3_downloads" and the global expression to "*.csv" in my case as I wanted to read in only the CSV files (including the zipped ones) I had in S3.
Finally, we then read the delimited files by setting the file to be read by the tFileInputDelimited component as:
((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))
And in my case I simply then printed this to the console but obviously you would then want to perform some transformation before uploading to your AWS RDS instance.

Specifying Source file name using parameter variables in informatica 9?

I have a mapping like
SA-->SQ--->EXPR--->TGT
The source will be of the same structure and the tartget also.
There are a bunch of files(with the same structure) which will go through this mapping .
So i want to use a parameter file through which i will give the file names for every run manually.
How to use the param file in session for Source filename attribute
Please suggest..
you could use indirect source type, wherein your source file is basically a list of files, and in turn the session reads each of the files one by one.
the parameter file could reference a source file name (the list) as
$InputFile_myName=/a/b/c.list
In line with what Raghav says, indicate the name of a file that will hold a list of input files in the 'Source filename' property box for the SQ in question in the Mapping tab, making the file 'Source filetype' be 'Indirect', specified in the Session Properties. If you already know ahead of time the names of the input files, you can specify them in that file and deploy that file with the workflow to the location you indicate in the 'Source file directory' property box. However if you won't know the names of the input files until run-time but know the files' naming standard (e.g: "Input_files_name_ABC_" where "" represents variable text, such as a numeric value incremented per input file generated by some other process), then one way to deal with that is to use a Pre-Session Command specifiable in the 'Components' tab of the Session. Create one that will build a new file at the location and with the name specified for the Indirect input file referenced above by using the Unix shell (or if running on Windows, the cmd shell) to list the files conforming to the naming standard for them and redirect the listing output to that file.
Tricky thing is that there must be one or more files listed in that Indirect type of input file. If that file is empty, the workflow will fail (abend). An Indirect file type must have in it listed at least one file (even if that file is empty) and that file must exist. The workflow fails if the indirect file reader gets no files to read from or if a file listed in it is not present on the server to be read from. One way to get around this is to make sure an empty file is present at all times that conforms to the naming standard. This can be assured by creating a "touchfile" before executing the listing command to build the Indirect file type listing file. In Unix, you'd use the 'touch {path}/{filename}' command ({filename} could be, for example, "Input_files_name_ABC_TOUCHFILE"), or on Windows you'd redirect an empty string to a file likewise named via cmd shell process. Either way, that will help you avoid an abend. Cleaning up that file is easy to do: a Post-Session command can be used to delete the empty touchfile. Likewise, you can do the same for the Indirect type of file if desired.