Read from text file, capture text, write to another text file in specific syntax - readfile

I have a text file A with the following syntax:
Attribute_Name, 'Path', 'Tutorial';
Attribute_Name2, 'Path2', 'Tutorial';
....
What I need to do is to read from that file, capture those 3 values: Attribute Name, Path and Project Name (tutorial in that case) and write it to output text file, B with the following syntax:
DELETE ATTRIBUTE "Atribute_Name" IN FOLDER "Path" FROM PROJECT "Tutorial";
and repeat for as many iterations as there are lines in the input file.
What is the best(easiest) language to implement that? Can anyone provide example code for that?

I'd personally do something like that with Perl, because I'm familiar with Perl and it works great for these kinds of tasks. You can also write a sed one-liner to get that done.
If you're not a fan of Perl, any modern dynamic language should let you get the job done with minimal effort.
EDIT: An example Perl script (full file for readability) would look like this:
use warnings;
use strict;
while (my $line = <>) {
$line =~ /^\s*(.+?), '(.+?)', '(.+?)';$/; # Doesn't handle internal escaping
print "DELETE ATTRIBUTE \"$1\" IN FOLDER \"$2\" FROM PROJECT \"$3\";\n";
}
See the result.

Related

How to update path in Perl script in multiple files

I am working on creating some training material where I am using perl. One of the things I want to do is have the scripts be set up for the student correctly, regardless of where they extra the compressed files. I am working on a Windows batch file that will copy the perl templates to the working location and then update path in the copy of the perl template files to the correct location. The perl template have this as the first line:
#!_BASE_software/perl/bin/perl.exe
The batch file looks like this:
SET TRAINING=%~dp0
copy %TRAINING%\template\*.pl %TRAINING%work
%TRAINING%software\perl\bin\perl -pi.bak -e 's/_BASE_/%TRAINING%/g' %TRAINING%work\*.pl
I have a few problems with this:
Perl doesn't seem to like the wildcard in the filename
It turns out that %TRAINING% is going to expand into a string with backslashes which need to be converted into forwardslashes and needs to be escaped within the regex.
How do I fix this?
First of all, Windows doesn't use the shebang line, so I'm not sure why you're doing any of this work in the first place.
Perl will read the shebang line and look for options if perl is found in the path, even on Windows, but that means that #!perl is sufficient if you want to pass options via the shebang line (e.g. #!perl -n).
Now, it's possible that you use Cygwin, MSYS or some other unix emulation instead of Windows to run the program, but you are placing a Windows path in the shebang line (C:...) rather than a unix path, so that doesn't make sense either.
There are three additional problems with the attempt:
cmd uses double-quotes for quoting.
cmd doesn't perform wildcard expansion like sh, so it's up to your program do it.
You are trying to generate Perl code from cmd. ouch.
If we go ahead, we get:
"%TRAINING%software\perl\bin\perl" -MFile::DosGlob=glob -pe"BEGIN { #ARGV = map glob, #ARGV; $base = $ENV{TRAINING} =~ s{\\}{/}rg } s/_BASE_/$base/g" -i.bak -- %TRAINING%work\*.pl
If we add line breaks for readability, we get the following (that cmd won't accept):
"%TRAINING%software\perl\bin\perl"
-MFile::DosGlob=glob
-pe"
BEGIN {
#ARGV = map glob, #ARGV;
$base = $ENV{TRAINING} =~ s{\\}{/}rg
}
s/_BASE_/$base/g
"
-i.bak -- %TRAINING%work\*.pl

How to run list of perl regex from file in terminal

I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!
Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.
It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

Powershell: Read a section of a file into a variable

I'm trying to create a kind of a polyglot script. It's not a true polyglot because it actually requires multiple languages to perform, although it can be "bootstrapped" by either Shell or Batch. I've got this part down no problem.
The part I'm having trouble with is a bit of embedded Powershell code, which needs to be able to load the current file into memory and extract a certain section that is written in yet another language, store it in a variable, and finally pass it into an interpreter. I have an XML-like tagging system that I'm using to mark sections of the file in a way that will hopefully not conflict with any of the other languages. The markers look like this:
lang_a_code
# <{LANGB}>
... code in language B ...
... code in language B ...
... code in language B ...
# <{/LANGB}>
lang_c_code
The #'s are comment markers, but the comment markers can be different things depending on the language of the section.
The problem I have is that I can't seem to find a way to isolate just that section of the file. I can load the entire file into memory, but I can't get the stuff between the tags out. Here is my current code:
#ECHO OFF
SETLOCAL EnableDelayedExpansion
powershell -ExecutionPolicy unrestricted -Command ^
$re = '(?m)^<{LANGB}^>(.*)^<{/LANGB}^>';^
$lang_b_code = ([IO.File]::ReadAllText(^'%0^') -replace $re,'$1');^
echo "${re}";^
echo "Contents: ${lang_b_code}";
Everything I've tried so far results in the entire file being output in the Contents rather than just the code between the markers. I've tried different methods of escaping the symbols used in the markers, but it always results in the same thing.
NOTE: The use of the ^ is required because the top-level interpreter is Batch, which hangs up on the angle brackets and other random things.
Since there is just one block, you can use the regex
$re = '(?s)^<{LANGB}^>(.*)^^.*^<{/LANGB}^>';^
but with -match operator, and then access the text using $matches[1] variable that is set as a result of -match.
So, after the regex declaration, use
[IO.File]::ReadAllText(^'%0^') -match $re;^
echo $matches[1];

Reorganizing large amount of files with regex?

I have a large amount of files organized in a hierarchy of folders and particular file name notations and extensions. What I need to do, is write a program to walk through the tree of files and basically rename and reorganize them. I also need to generate a report of the changes and information about the transformed organization along with statistics.
The solution that I can see, is to walk through the tree of files just like any other tree data structure, and use regular expressions on the path name of the files. This seems very doable and not a huge amount of work. My questions are, is there tools I should be using other than just C# and regex? Perl comes to mind since I know it was originally designed for report generation, but I have no experience with the language. And also, is using regex for this situation viable, because I have only used it for file CONTENTS not file names and organization.
Yes, Perl can do this. Here's something pretty simple:
#! /usr/bin/env perl
use strict;
use warnings;
use File::Find;
my $directory = "."; #Or whatever directory tree you're looking for...
find (\&wanted, $directory);
sub wanted {
print "Full File Name = <$File::Find::name>\n";
print "Directory Name = <$File::Find::dir>\n";
print "Basename = <$_\n>";
# Using tests to see various things about the file
if (-f $File::Find::name) {
print "File <$File::Find::name> is a file\n";
}
if (-d $File::Find::name) {
print "Directory <$File::Find::name> is a directory\n";
}
# Using regular expressions on the file name
if ($File::Find::name =~ /beans/) { #Using Regular expressions on file names
print "The file <$File::Find::name> contains the string <beans>\n";
}
}
The find command takes the directory, and calls the wanted subroutine for each file and directory in the entire directory tree. It is up to that subroutine to figure out what to do with that file.
As you can see, you can do various tests on the file, and use regular expressions to parse the file's name. You can also move, rename, or delete the file to your heart's content.
Perl will do exactly what you want. Now, all you have to do is learn it.
If you can live with glob patterns instead of regular expressions, mmv might be an option.
> ls
a1.txt a2.txt b34.txt
> mmv -v "?*.txt" "#2 - #1.txt"
a1.txt -> 1 - a.txt : done
a2.txt -> 2 - a.txt : done
b34.txt -> 34 - b.txt : done
Directories at any depth can be reorganized, too. Check out the manual. If you run Windows, you can find the tool in Cygwin.

extract audio from certain files in working dir in perl

Basically, what I'm trying to do is extract the audio from a set of downloaded YouTube videos, the names of which are (partially) identified in a file (mus.txt) that was opened with the handle TXTFILELIST. TXTFILELIST contains one 11-character identifier for the video on each line (for example, "dQw4w9WgXcQ") and the downloaded file is of the form [title]-[ID].mp4 (in the previous example, "Rick Astley - Never Gonna Give You Up-dQw4w9WgXcQ.mp4").
#snip...
if ($opt_extract_audio) {
open(TXTFILELIST, "<", "mus.txt") or die $!;
my #all_dir_files = `dir /b`;
my $file_to_convert;
foreach $file_to_convert (<TXTFILELIST>) {
my #files = grep("/${file_to_convert}\.mp4$/", #all_dir_files); #the problem line!
print "files: #files\n";
foreach $file (#files) {
system("ffmpeg.exe -i ${file} -vn -y -acodec pcm_s16le -ac 2 ${file}.wav");
}
}
#snip...
The rest of the snipped code works (I checked it with several videos, replacing vars, commenting, etc.), is legal (I used the strict and warnings pragmas) and, I believe, is irrelevant, because it has nothing to do with defining any vars (besides $opt_extract_audio) used in this snippet. However, this is the one bit of code that's giving me trouble; I can't seem to extract the files that are identified in TXTFILELIST from #all_dir_files. I got the code for 'the problem line' from other Stack Overflow answerers, but it isn't working for some reason.
TL;DR What I want to do is this: list all files in the current dir (say the directory contains mus.txt, "Rick Astley - Never Gonna Give You Up-dQw4w9WgXcQ.mp4", and blah.mp4), choose only the identified file(s) (the Rick Astley video) using the 11-char ID in TXTFILELIST (dQw4w9WgXcQ) and extract the audio from it. And yes, I am running this script on Windows, so I can't use *nix utilities like ack or find.
Remove the line
my #all_dir_files = `dir /b`;
And use this loop instead:
for my $file (<*${file_to_convert}.mp4>) {
say $file;
system(...);
}
The <...> above is a glob, can also be written glob "${file_to_convert}.mp4". I think it is almost always better to use perl functions rather than rely on system calls.
As has been pointed out, "/${file...$/" is not a regex, but a string. And since you can use expressions with grep, and a non-empty string is always true, your grep will essentially do nothing, and pass all the values into your array.
Get rid of the double quotes around the regular expression in the grep function.