Create multiple output files and cut dna with enzymes - Perl - regex

I am a first year grad student who's relatively new in computational biology. I recently started using Perl and it's not the easiest language to learn, at least not for me.
I need help applying my idea/logic the right way to figure out the solution to my problem.
I have a dna string and I want to split it at specific sites to get multiple fragments using information from an enzyme file that contains lines of recognition sites. Once the fragments are obtained, I want to output the list of dna fragments in an output file. I want to create an output file for every line in the enzyme file I am going to extract the information from, to apply it to the dna string.
Here's what I mean exactly:
Hypothetical scenario:
Enzyme.File contains:
abc/at'gtct// (abc is the name of the enzyme. (atgtct) is the recognition site.)
def/cgg'ataaa// ........
Suppose the dna string is: $dna = "accggttatgtctaaacggataaagtctcggataaattt" (recognition sites are bolded)
For line 1
When I extract the info from the first line/enzyme(abc) from the enzyme file and apply it to this string, the output should be:
accggttat
gtctaaacggataaagtctcggataaattt
(split between cgg'ataaa) the apostrophe represents the cut point
(note: Even though there is another gtct in the string, it does not split it because at ought to precede it.)
For line 2
$dna = accggttatgtctaaacggataaagtctcggataaattt (Info is applied to same dna string)
Info from line/enzyme 2 (def) would split the dna as follow:
accggttatgtctaaacgg (split between cgg'ataaa)
ataaagtctcgg
ataaattt
I want to put each output from the different lines in separate file with distinct names. (I can take care of assigning the names)
So in conclusion, this example would create two new files, one name "abc_whatever" and "def_whatever". Important: If the enzyme file had 8 lines with different enzymes, I would get 8 new output files with their distinct dna fragments."
Here's what I've tried so far:
#!/usr/bin/perl;
use warnings;
use strict;
open(ENZ,$ARGV[0]) || die; # ENZ(file handle for enzyme file)
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
while (<ENZ>) {
if ( match pattern etc..) { # I took care of that and created captured groups of
$1 = holds "abc" # the info I needed from the line e.g. I captured
$2 = ..."at" # (abc)/(at)'(gtct)//, so they are stored in $1,$2,$3
$3 = ..."gtct" # respectively
}
while (<$dna>){
my #fragments_array = split(/$3/, $dna);
open (OutFile, ">$dna"."_"."$1")
print OutFile shift #fragments_array,"\n";
foreach (#fragments_array) {
print OutFile "$3$_\n";
close OutFile;
}
}
}
close ENZ;
FIRST
I can only create an output only for the 1st line in the Enzyme file. I want to create and output file for all the lines.
SECOND
I am not properly cutting the dna. From other examples I have seen online, it looks like I am gonna have to use the following functions to properly apply the enzyme information on the dna. The functions include:
the for loop, length and substr(),
If you can, please demonstrate your work in the simplest form (no extravagant, impressing codes lol :-) since I am just learning this language)
Thanks in advance!

FIRST I can only create an output only for the 1st line in the Enzyme file. I want to create and output file for all the lines.
That's simply because you put close OutFile; into the foreach (#fragments_array) loop, instead of placing the close after the loop body.
SECOND I am not properly cutting the dna.
That's because you forgot to include $2, the head of the recognition site (e. g. the at of atgtct) in the split pattern as well as in the output.
The problem is solved easier if we just insert the splitting new-line character everywhere between the head and the tail:
#!/usr/bin/perl
use warnings;
use strict;
open(ENZ, $ARGV[0]) || die; # ENZ (file handle for enzyme file)
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
while (<ENZ>)
{
if (m-(.*)/(.*)'(.*)//-)
{
my ($head, $tail) = ($2, $3); # $2$3 is the recognition site; save it
open(OutFile, ">${dna}_$1");
(my $fragments = $dna) =~ s/$head$tail/$head\n$tail/g; # insert NLs
print OutFile $fragments, "\n";
close OutFile;
}
}
close ENZ;

I changed your code a bit, hope it works now
#!/usr/bin/perl
use warnings;
use strict;
open(ENZ, $ARGV[0]);
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
my ($enzyme, $first, $second) = ("", "", "");
for my $line (<ENZ>) {
chomp($line); # remove \n at the end of string
my #elements = split(/\/|'/, $line); # split string into tokens (e.g. abc/at'gtct => array(abc, at, gtct))
$elements[2] = substr($elements[2], 0, -2); # remove the last "//"
my ($firstPart, $secondPart) = ($elements[1], $elements[2]);
if ($dna =~ /(.*)$firstPart$secondPart(.*)/) {
$first = $1 . $firstPart;
$second = $2 . $secondPart;
$enzyme = $elements[0];
open(OUTPUT, ">$enzyme" . "_something");
print OUTPUT "$first\n$second\n";
close(OUTPUT);
}
}
close ENZ;
EDIT: this is the working version. I suggest you learn how to use Regular Expression if you want to use Perl for your study. It is the strongest tool in Perl.

Related

Perl regex store matches in array

I have a file with strings in each row as follows
"229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1
the next line could look like
84545,X,2
I'm trying to parse this text in Perl. Note: quotes are present in the strings when there are several of them in a row, but not present if there is only item
I would like to parse each item into an array. I tried the following regex
#fields = ($_ =~ /(\d+\_\d+),*/g);
but it is missing the last 2714. How do I capture that edge case? Any help appreciated. Thanks in advance
It looks like you have a CSV File, so use an actual CSV parser for it like Text::CSV.
After you parse the columns, you can separate your first field into the array:
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
my $line = qq{"229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1 the next line could look like 84545,X,2};
if ($csv->parse($line)) {
my #columns = $csv->fields();
my #nums = split ',', $columns[0];
print "#nums\n";
}
Outputs:
229269_2 190594_2 94552_2 266076_2 269628_2 165328_2 99319_2 263339_2 263300_2 99315_2 271509_2 2714
Why not a regex ?
Yes, of course it's possible to use a regex for practically anything. But what you need to understand is that this will make your code extremely fragile and difficult to maintain.
Even if you want to use a regular expression, you should STILL do this in two steps. First separate the initial column(s) of your CSV, and then process the specific column that you're worried about.
Because you're just working with the first column, you could use code like the following:
use strict;
use warnings;
my $line = qq{"229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1 the next line could look like 84545,X,2};
if ($line =~ /^"(.*?)"|^([^,]*)/) {
my $column0 = $1 // $2;
my #nums = split ',', $column0;
print "#nums\n";
}
The above happens to accomplish the same thing as the previous code. However, it has one big flaw, it's not nearly as obvious to the maintaining programmer what's going on.
Whenever a new coder, or even yourself in 6 months, views the first set of code, it is extremely obvious what format your data is in. You're working with a CSV file, and the first column is a list separated by commas. The second code also works, but the new maintainer must actually read the regex and figure out what's going on to understand both what format the data is in, and whether the code is actually doing it correctly.
Anyway, do whatever you will, but I strongly advise you to use an actual CSV Parser for parsing csv files.
If all you want is all but the last two fields...
my $string = qq("229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1);
$string =~ s/"//g; # delete the quotes
my #f = split (/,/, $string); # split on the comma
pop #f; pop #f; # jettison the last two columns
# #f contains what you're looking for

Perl - How to Read, Filter & Output results

scenario: I am a Jr. C# developer, but recently (3 days) began learning Perl for batch files. I have a requirement to parse through a text file, extract some key data, then output the key data to a new text file. As seems to always be the case, there are butt loads of fragmented examples on the net regarding how to 'read' from a file, 'write' to a file, 'store' line by line into an array, 'filter' this and that, yadda yadda, but nothing discussing the entire process of read, filter, write. Trying to splice examples from the net together is no good, because none seem to work together as coherent code. Coming from C#, Perl's syntax structure is hella confusing. I just need some advice on this process.
My objective is to parse a text file, single out all lines similar to the one below, by date, and output only the first 8 digits of the 2nd number group and 5 digits from the 3rd number group to a new text file.
11122 20100223454345 ....random text..... [keyword that identifies all the
entries I need]... random text 0.0034543345
I know regex is likely the best option, and have most of the expression written, but it does not work in Perl!
Question: Could someone please show a simple (dummy) example of how to read from, filter (using dummy regex) the file, then output the (dummy) results to a new file? I'm not concerned with functional details, I can learn those, I just need the syntax structure Perl uses. For example:
open(FH, '<', 'dummy1.txt')
open(NFH, '>', 'dummy2.txt')
#array; or $dumb;
while(<FH>)
{
filter each line [REGEX] and shove it into [#array or $dumb scalar]
}
print(join(',', #array)) to dummy2.txt
close FH;
close NFH;
Note: For various reasons, I cannot paste my source code in here, sorry. Any help is appreciated.
UPDATE: ANSWER:
Much thanks to all those who provided insight into my issue. After reading through you replies, as well as conducting further research, I learned that there are dozens of ways to accomplish the same task in Perl(which I am not a fan of). In the end, this is how I solved the problem, and IMO it's the cleanest, and most succinct, solution for those having similar struggles. Thanks again for all the help.
#======================================================================
# 1. READ FILE: inputFile.txt
# 2. CREATE FILE: outputFile.txt
# 3. WRITE TO: outputFile.txt IF line matches REGEX constraints
# 4. CLOSE FILES: outputFile.txt & inputFile.txt
#==========================================================================
#1
$readFile = 'C:/.../.../inputFile.txt';
open(FH, '<', $readFile) or Error("Could not read file ($!)");
#2
$writeFile = 'C:/.../.../outputFile.txt';
open(NFH, '>', $writeFile) or Error("Cannot write to file ($!)");
#3
#lines = <FH>;
LINE: foreach $line (#lines)
{
if ($line =~ m/(201403\d\d).*KEYWORD.*time was (\d+\.\d+)/)
{
$date = $1;
$elapsedtime = $2;
print NFH "$date,$elapsedtime\n";
}
}
#4
close NFH;
close FH;
perlfaq5 - How do I change, delete, or insert a line in a file, or append to the beginning of a file? covers most of the different scenarios for how to use files.
However, I will add to that by saying that always start your scripts with use strict; and use warnings;, and because you're doing file processing, use autodie; will serve you as well.
With that in mind, a quick stub would be the following:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'dummy1.txt';
open my $outfh, '>', 'dummy2.txt';
while (my $line = <$infh>) {
chomp $line; # Remove \n
if (Whatever magically processing here) {
print $outfh, "your new data";
}
}
while(<FH>)
{
# variable $_ contains the current line
if(m/regex_goes_here/) #by default, the regex match operator m// attempts to match the default $_ variable
{
#do actions
}
}
Also note, m/regex/ is the same as /regex/
Refer to:
http://perldoc.perl.org/perlvar.html#General-Variables
http://perldoc.perl.org/perlre.html
For capturing variables from regex match, THIS might help
EDIT
If you want a different variable than the default $_, as #Miller suggested, use while($line = <FH>) followed by if($line =~ m/regex_goes_here/)
=~ is the Binding Operator
One tip. Don't explicitly open filehandles to your input and output files. Instead read from STDIN and write to STDOUT. Your program will be far more flexible and easier to use as you'll be able to treat it like a Unix filter.
$ your_filter_program < your_input.txt > your_output.txt
And doing this actually makes your program simpler to write too.
while (<>) { # <> reads from STDIN
# transform your data (which is in $_) in some way
...
print; # prints $_ to STDOUT
}
You might find the first few chapters of Data Munging with Perl are useful.
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
INPUT_FILE => "NAME_OF_INPUT_FILE",
OUTPUT_FILE => "NAME_OF_OUTPUT_FILE",
FILTER => qr/regex_for_line_to_filter/,
};
open my $in_fh, "<", INPUT_FILE;
open my $out_fh, ">", OUTPUT_FILE;
while ( my $line = <$in_fh> ) {
chomp $line;
next unless $line =~ FILTER;
$line =~ s/regular_expression/replacement/;
say {$out_fh} $line;
}
close $in_file;
close $out_file;
The $in_file is your input file, and $out_fh is your output file. I basically open both, and loop through the input. The chomp removes the \n from the end. I always recommend doing that.
The next goes to the next iteration of the loop unless I match FILTER which is a regular expression matching lines you want to keep. This is identical to:
if ( $line !~ FILTER ) {
next;
}
I then use the substitution command to get the parts of the line I want, and munge them into the output I want. I maybe better off expanding this a bit. Maybe using split to split up my line into various pieces, the only using the pieces I want. I could then use substr to pull out the substring from the select pieces.
The say command is like print except it automatically adds in a NL on the end. This is how you write a line to a file.
Now, get Learning Perl and read it. If you know any programming. it shouldn't take you more than a week to go through the first half of the book. That should be more than enough to be able to write a program like this. The more complex stuff like references and object orientation might take a bit longer.
On line documentation can be found at http://perldoc.perl.org. You can look up the use statements which are called pragmas over there. Documentation on the individual functions are also available.
If I understood well, this one liner will do the job:
perl -ane 'print substr($F[1],0,8),"\t",substr($F[-1],0,5),"\n" if /keyword/' in.txt
Assuming in.txt is:
11122 20100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.0034543345
11122 30100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.124543345
11122 40100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.65487
11122 50100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.6215
output:
20100223 0.003
40100223 0.654

Regular expression statement inside a while loop only matching and printing one of several expected matches

I've been struggling with this for a while and I was wondering if there was something obvious I've missed.
As programming learning/practice, I'm trying to put together a simple script for calculating the components of a restriction enzyme digest mix. However, first I need to get a list of enzyme stock concentrations.
I pulled all the individual pages from the New England Biolabs enzyme page, and my goal with this current script is to pull out the name of the enzyme and the concentrations available from the company.
This example works with a local copy of EcoRI (link included at bottom of submission).
use warnings;
use strict;
open(FILE,'productR0101.asp');
my $line;
my $counter;
my $array1;
my $array2;
my $array3;
my $concentration;
my #array4;
$counter = 1;
while ($line = <FILE>) {
chomp($line);
if ($counter == 6 ){
$array1 = $line;
$counter++;
}
else{
$counter++;
}
if ($line =~ m/.{8}units.ml/g) {
(#array4) =$line =~ m/.{8}units.ml/g;
print #array4;
}
}
print "\n".$array1;
exit;
Every file has the enzyme name on the sixth line of the file, so I just pulled that whole line. However, the concentrations are in different locations, so my approach was to read in the file one line at a time, and match to the units/ml tag.
My thinking was that it should print out the match for each line, if there was one, every time the while loop runs, effectively resulting in a string of separate print statements.
This is where I get messed up. There are six different locations in this file with a units/ml tag: three for 20,000 and three for 100,000.
I was expecting six different results printed, but when I run this, only one 100,000 units/ml result is returned.
I've tried all sorts of fixes. I tried concatenating strings, I tried storing it as a string, I tried concatenating it onto another array that never gets touched by the (#array4) = $line =~ m/.{8}units.ml/g line, and it either breaks it or gives the same result.
And finally, I apologize for any weird conventions. I'm still learning Perl, and my first experience programming was with MATLAB.
Also, the $array1, $array2, etc. exist because I was trying to keep track of exactly what was getting put where; my intention is to clean it up once I get it functional.
So does anyone have any ideas about what I'm doing wrong?
EDIT: the data source is the source code to each individual enzyme page. For this example, if you view the page source you get the complete input file I gave to the script.
Are the 20,000 units/ml at the start of the line? Because in that case, .{8} would fail to match - the dot doesn't match newlines, and 20,000_ is only 7 characters.
We really need to see the data you are processing, but it looks like you are storing only the last occurrence of /units.ml/ in #array4 because you are reading the file line by line.
I will add to this answer if you supplement your question, but for now I need to know
What your data looks like
What the mysterious /.{8}/ is for
Are you aware that $array1, $array2, and $array3, are scalars, as well as being very bad names for variables?
For now, here is a rewrite of your code using idiomatic Perl, and the $. variable that evaluates to the line number of the file most recently read
use strict;
use warnings;
open my $file, '<', 'productR0101.asp' or die $!;
my $array1;
my #array4;
while (my $line = <$file>) {
chomp $line;
$array1 = $line if $. == 6;
if ($line =~ m/.{8}units.ml/) {
#array4 = $line =~ m/.{8}units.ml/g;
print "#array4\n";
}
}
print "\n".$array1;
I can't exactly reproduce the behavior you've reported of only getting one of the 100,000 units/ml results, as I'm not exactly sure what your input data is. However, I think the problem is with the regular expression not having any captures. You should put parenthesis around the part of the regex match that you want to be returned to #array4. So instead of this:
#array4 = $line =~ m/.{8}units.ml/g;
Try this:
#array4 = $line =~ m/(.{8})units.ml/g;
#array4 = $line =~ /(.{8})units.ml/;
EDIT:
You also don't want to use the m/ and /g modifiers.

Regex: How to remove extra spaces between strings in Perl

I am working on a program that take user input for two file names. Unfortunately, the program can easily break if the user does not follow the specified format of the input. I want to write code that improves its resiliency against these types of errors. You'll understand when you see my code:
# Ask the user for the filename of the qseq file and barcode.txt file
print "Please enter the name of the qseq file and the barcode file separated by a comma:";
# user should enter filenames like this: sample1.qseq, barcode.txt
# remove the newline from the qseq filename
chomp ($filenames = <STDIN>);
# an empty array
my #filenames;
# remove the ',' and put the files into an array separated by spaces; indexes the files
push #filename, join(' ', split(',', $filenames))
# the qseq file
my $qseq_filename = shift #filenames;
# the barcode file.
my barcode = shift #filenames;
Obviously this code runs can run into errors if the user enters the wrong type of filename (.tab file instead of .txt or .seq instead of .qseq). I want code that can do some sort of check to see that the user enters the appropriate file type.
Another error that could break the code is if the user enters too many spaces before the filenames. For example: sample1.qseq,(imagine 6 spaces here) barcode.txt (Notice the numerous spaces after the comma)
Another example: (imagine 6 spaces here) sample1.qseq,barcode.txt (This time notice the number of spaces before the first filename)
I also want lines of code that can remove extra spaces so that the program doesn't break. I think the user input has to be in the following kind of format: sample1.qseq, barcode.txt. The user input has to be in this format so that I can properly index the filenames into an array and shift them out later.
Thanks any help or suggestions are greatly appreciated!
The standard way to deal with this kind of problem is utilising command-line options, not gathering input from STDIN. Getopt::Long comes with Perl and is servicable:
use strict; use warnings FATAL => 'all';
use Getopt::Long qw(GetOptions);
my %opt;
GetOptions(\%opt, 'qseq=s', 'barcode=s') or die;
die <<"USAGE" unless exists $opt{qseq} and $opt{qseq} =~ /^sample\d[.]qseq$/ and exists $opt{barcode} and $opt{barcode} =~ /^barcode.*\.txt$/;
Usage: $0 --qseq sample1.qseq --barcode barcode.txt
$0 -q sample1.qseq -b barcode.txt
USAGE
printf "q==<%s> b==<%s>\n", $opt{qseq}, $opt{barcode};
The shell will deal with any extraneous whitespace, try it and see. You need to do the validation of the file names, I made up something with regex in the example. Employ Pod::Usage for a fancier way to output helpful documentation to your users who are likely to get the invocation wrong.
There are dozens of more advanced Getopt modules on CPAN.
First, put use strict; at the top of your code and declare your variables.
Second, this:
# remove the ',' and put the files into an array separated by spaces; indexes the files
push #filename, join(' ', split(',', $filenames))
Is not going to do what you want. split() takes a string and turns it into an array. Join takes a list of items and returns a string. You just want to split:
my #filenames = split(',', $filenames);
That will create an array like you expect.
This function will safely trim white space from the beginning and end of a string:
sub trim {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
Access it like this:
my $file = trim(shift #filenames);
Depending on your script, it might be easier to pass the strings as command line arguments. You can access them through the #ARGV array but I prefer to use GetOpt::Long:
use strict;
use Getopt::Long;
Getopt::Long::Configure("bundling");
my ($qseq_filename, $barcode);
GetOptions (
'q|qseq=s' => \$qseq_filename,
'b|bar=s' => \$barcode,
);
You can then call this as:
./script.pl -q sample1.qseq -b barcode.txt
And the variables will be properly populated without a need to worry about trimming white space.
You'll need to trim spaces before handling the filename data in your routine, you could check the file extension with yet another regular expression, as nicely described in Is there a regular expression in Perl to find a file's extension?. If it's the actual type of file that matters to you, then it might be more worthwile to check for that instead with File::LibMagicType.
While I think your design is a little iffy, the following will work?
my #fileNames = split(',', $filenames);
foreach my $fileName (#fileNames) {
if($fileName =~ /\s/) {
print STDERR "Invalid filename.";
exit -1;
}
}
my ($qsec, $barcode) = #fileNames;
And here is one more way you could do it with regex (if you are reading the input from STDIN):
# read a line from STDIN
my $filenames = <STDIN>;
# parse the line with a regex or die with an error message
my ($qseq_filename, $barcode) = $filenames =~ /^\s*(\S.*?)\s*,\s*(\S.*?)\s*$/
or die "invalid input '$filenames'";

Inserting different text to another using Regular Expressions?

I've two text files. I want to take text from first one between </sup><sup> tags, and insert it to another text file between {}.
Better example (sth like a dictionary)
Text1:
<sup>1</sup>dog
<sup>2</sup>cat
<sup>3</sup>lion
<sup>1</sup>flower
<sup>2</sup>tree
.
.
Text2:
\chapter1
\pkt{1}{}{labrador retirever is..}
\pkt{2}{}{home pets..}
\pkt{3}{}{wild cats..}
\chapter2
\pkt{1}{}{red rose}
\pkt{2}{}{lemon tree}
.
.
What I want:
Text3:
\chapter1
\pkt{1}{dog}{labrador retirever is..}
\pkt{2}{cat}{home pets..}
\pkt{3}{lion}{wild cats..}
\chapter2
\pkt{1}{flower}{red rose}
\pkt{2}{tree}{lemon tree}
Text is random, but You can see what I want. Perl would be best.
So get
</sup>**text**<sup>
and paste it to
\pkt{nr}{**here**}{this is translation of this word already stored in text2}.
Text A and B are in order, so if I could read first </sup>text<sup> from Text A, save it in temp, delete this line from Text A, put it on first free {} slot in text B, and start over again it would be great. Numbers will match because order is saved.
Sorry for my English:)
Thanks!
This code puts all dict items in an array, in the order they appear. The tex file is then looped and each time \pkt{num}{} is hit an item from the array is inserted.
Newlines in dict are handled and replaced with spaces (Just remove this replace in the map if you don't want this behavior). \pkt should be found as long as the part \pkt{num}{} is not spanning multiple lines. Otherwise I think the easiest solution would be to undef $/ (the input record separator) and read the whole file into a string and just loop the replacement (could be a bit memory hungry though).
#!/usr/bin/perl -wT
use strict;
my $dict_filename = 'text1';
my $tex_filename = 'text2';
my $out_filename = 'text3';
open(DICT, $dict_filename);
my #dict;
{
# Set newline separator to <sup>
local $/ = '<sup>';
# Throw away first "line", it will be empty
<DICT>;
# Extract string and throw away newlines
#dict = map { $_ =~ m#</sup>\s*(.*?)\s*(?:<sup>|$)#s; $_ = $1; $_ =~ s/\n/ /g; $_; } <DICT>;
}
close(DICT);
open(TEX, $tex_filename);
open(OUT, ">$out_filename");
my $tex_line;
my $dict_pos = 0;
while($tex_line = <TEX>)
{
# Replace any \pkt{num}{} with \pkt{num}{text}
$tex_line =~ s|(\\pkt\{\d+\}\{)(\})|$1$dict[$dict_pos++]$2|g;
print OUT $tex_line;
}
close(TEX);
close(OUT);