How to remove all files matching specific file content in HDFS? - hdfs

By mistake, using NiFi, huge number of files got generated in HDFS directory with content "val3val2val1". I want to remove all files matching this content using HDFS command. Please advise

Related

error in indirect file load in Informatica

I am trying indirect file load in Informatica.
I put below files in $PmSrcFilesDir (from here the workflow task pick up files)
-list.txt
-production_plan_20210906.csv
-production_plan_20210907.csv
The list.txt files contains the csv file names only.
I configured below options:
Source filetype- Indirect
Source filename- list.txt
Source file directory- $PMSourceFileDir
After running the workflow it shows error- as
FR_3000 Error Opening File...No such file or directory
You can give absolute path in list.txt.
/Path/to/file/production_plan_20210906.csv
/Path/to/file/production_plan_20210907.csv
You can use command task or shell script to get absolute path and file name.
Pls check session log, which file it cant read - list file or main file. If list file mention $PMSourceFileDir correctly in param file.
Now, make sure informatica user (user that runs infa server) has read access to those data, list folders and files. Admin can help.

Proper way to zip xlsx-file from directory using Info-Zip's Zip utility

I’m using Zip utility from the Info-Zip library to compress the tree of catalogs to get xlsx-file.
For that I’m using the next command:
zip -r -D res.xlsx source
source - contains the correct directory tree of the xlsx file.
But if you then look at the resulting file structure, the source directory will be included in the paths of all files and directories at the top level, and MS Office Excel will not be able to open this file. This is a well known problem. To avoid it zip.exe needs to be inside of the dest directory.
The problem is that I want to use the source code of this utility in my project, so this leads me to be unable to call my process, which will be responsible for compressing directories, to get xlsx files from these directories.
I’ve tried to find a place in the zip source code, where the parent catalog appending on the top-level happens. But seems
it is done implicitly.
Who can suggest how it can be done?

Using regex in FTP for filenames for downloading files

Is it possible to use regex for matching file names in FTP to get files from server ?
I need to do FTP to server and need to download the files whose file names are ending with the same value. In my case, it is 14_04_25_144238.
I am not sure if it is doable. But, just out of curiosity, asking this.
Can we use regex like .*14_04_25_144238 in the ftp get command ?
Thanks in advance.
Dinesh S
You want the mget command.
From the Unix man page
mget remote-files
Expand the remote-files on the remote machine and
do a get for each file name thus produced. See glob for details on
the filename expansion. Resulting file names will then be
processed according to case, ntrans, and nmap settings. Files are
transferred into the local working directory, which can be changed
with 'lcd directory'; new local directories can be created with '!
mkdir directory'.
If you want to turn of the prompting of each file, then you also need this:
prompt
Toggle interactive prompting. Interactive prompting occurs
during multiple file transfers to allow the user to selectively retrieve or store files. If prompting is turned off (default is on), any mget or mput will transfer all files, and any
mdelete will delete all files
.

How can I rename files of which I don't know the extension?

I have a directory with these files:
one.txt
two.mp3
three.bmp
Is there any way to rename files using MoveFile() but without specifying the extension of the file (all the files have different filenames)?
If not, how can I rename these files to anything I want, when I only know the
one
two
three
?
You'll have to use the underlying OS API to scan through the directory for files and compare each filename with your desired prefix. Here is another question that shows you how to list the contents of a directory in Windows.
Once you know prefix read the filelist of the directory and find a valid filename with that prefix, and use that filename with the function.

having file names and delete them

I want to have access to all files of a folder and have a list of them and work with them.
For example: there is a folder named "new folder" and consist of files : 1.txt and 2.txt
I don't know what are in the folder new folder. So I want a list of files in it .
Hence the questions are :
1- How can I have such this list?
2- How can I delete a file (e.g 2.txt) whether i know there is file with this name or not.
3- Is it possible to figure out has a txt file been used or not (whether it is empty or not)
thanks;
I'd use Boost filesystem to analyze folder content, and remove to delete the file. You will find in filesystem tutorial some sample that will ease your work.
edit: remove(path) it's available in boost filesystem.