Hello i have a huge file which has 1000s of BEGIN and END,i am looking for a logic which will tell me for each BEGIN there is END or not.
for ex:
# Begin Variable
Name = loopEndDriven
Decl_type = UInt8
Usage = Z
Value = CE_SELECT
# End Variable
# Begin Variable
Name = locationNeeded
Decl_type = Loop_Location_t
Usage = Z
Value = SHORT_LOCATION
# End Variable
perl -lne 'BEGIN{$t=0}
if(/Begin/ ||(eof && $t==1))
{print "No end at $." unless($t==0);$t=1}
$t=0 if(/End/);' your_file
This above command will print the line number when there is no End for each Begin.
I have tested below:
> cat temp
# Begin Variable
Name = loopEndDriven
Decl_type = UInt8
Usage = Z
Value = CE_SELECT
# End Variable
# Begin Variable
Name = locationNeeded
Decl_type = Loop_Location_t
Usage = Z
Value = SHORT_LOCATION
# Begin Variable
Name = locationNeeded
Decl_type = Loop_Location_t
Usage = Z
Value = SHORT_LOCATION
> perl -lne 'BEGIN{$t=0}if(/Begin/ ||(eof && $t==1)){print "No end at $." unless($t==0);$t=1}$t=0 if(/End/);' temp
No end at 17
No end at 23
>
On the similar lines i hope you can also write the logic to check for each End there is a Begin or not.
Assuming nesting isn't allowed.
my $in_begin = 0;
while (<>) {
if (/# Begin/) {
warn(qq{Missing "End" at line $.\n"}) if $in_begin;
$in_begin = 1;
}
elsif (/# End/) {
warn(qq{Missing "Begin" at line $.\n"}) if !$in_begin;
$in_begin = 0;
}
}
warn(qq{Missing "End" at EOF\n"}) if $in_begin;
Better diagnostics:
my $begin = 0;
while (<>) {
if (/# Begin/) {
warn(qq{Missing "End" for "Begin" at line $begin\n}) if $begin;
$begin = $.;
}
elsif (/# End/) {
warn(qq{Missing "Begin" for "End' at line $.\n"}) if !$begin;
$begin = 0;
}
}
warn(qq{Missing "End" for "Begin" at line $begin\n"}) if $begin;
Balanced Expressions
There are no BEGIN or END keywords in the corpus you posted. The following one-liner will check for balanced expressions using your block comments instead.
$ perl -ne '$pairs += 1 if /Begin Variable/;
$pairs -= 1 if /End Variable/;
END {
if ($pairs == 0) {print "balanced\n"} else {print "unbalanced\n"}
}' /tmp/foo
With your corpus as currently posted, the one-liner should print balanced.
As you ask for logic rather than implementation:
Walk through the file line by line
Increment a variable each time you encounter # Begin Variable
Decrement this variable each time you encounter # End Variable
If the variable ever becomes 2, the previous block was unended. Record this somehow and decrement.
Related
This question is somewhat similar to this, but my task is to place something, in my case the dash, between the repeating characters, for example the question marks, using the gsub function.
Example:
"?" = "?"
"??" = "?-?"
"??? = "?-?-?"
Try this:
function test(s)
local t=s:gsub("%?%?","?-?"):gsub("%?%?","?-?")
print(#s,s,t)
end
for n=0,10 do
test(string.rep("?",n))
end
A possible solution using LPeg:
local lpeg = require 'lpeg'
local head = lpeg.C(lpeg.P'?')
local tail = (lpeg.P'?' / function() return '-?' end) ^ 0
local str = lpeg.Cs((head * tail + lpeg.P(1)) ^ 1)
for n=0,10 do
print(str:match(string.rep("?",n)))
end
print(str:match("?????foobar???foo?bar???"))
This what i can came out with scanning each letter by letter
function test(str)
local output = ""
local tab = {}
for let in string.gmatch(str, ".") do
table.insert(tab, let)
end
local i = 1
while i <= #tab do
if tab[i - 1] == tab[i] then
output = output.."-"..tab[i]
else
output = output..tab[i]
end
i = i + 1
end
return output
end
for n=0,10 do
print(test(string.rep("?",n)))
end
By default, using p variable-name will display $num = variable-value, $num is the value history, but it there a way to print the variable name along with the variable value like $num = variable-name = variable-value?
I want this since I use
define p
set $i = 0
while $i < $argc
eval "print $arg%d", $i
set $i = $i + 1
end
end
in my ~/.gdbinit, to redefine p command so I can use p var1 var2 var3... to print multiple variables at once, but the print command only output $num = variable-value, and I don't know what the exact variable is in the output, the other situation is when I print the value history using just p $num, it is not that readable, I don't know the exact variable name.
NOTE: the variable may be int/char/pointer/array/vector/...
A solution could be to first add to the display list the wanted variables and then to display all of them together. Note that, it is needed to free the display list before with undisplay, otherwise it also prints the variables of the previous executions.
define p
set confirm off
eval "undisplay"
set confirm on
set $i = 0
while $i < $argc
eval "display $arg%d", $i
set $i = $i + 1
end
display
end
The undisplay evaluation is enclosed between set confirm off/on to suppress the following message:
[answered Y; input not from terminal]
If you have already set the confirm off option in your ~/gdbinit file, you will need to remove these two lines.
Edit: Honestly, I came to know about the display command finding a solution for this question. Although this answer might be useful to print multiple variables with their respective names, after several days using display in my workflow, I discourage to use this answer since I have come to the conclusion that display itself fits better at least my needs (printing multiple variables at every stop). Here the official doc:
If you find that you want to print the value of an expression frequently (to see how it changes), you might want to add it to the automatic display list so that GDB prints its value each time your program stops. Each expression added to the list is given a number to identify it; to remove an expression from the list, you specify that number. The automatic display looks like this:
2: foo = 38
3: bar[5] = (struct hack *) 0x3804
Basically, I have started using the command like this: I add a variable with display $var to the list of variables, and every time a breakpoint is reached the listed variables are automatically printed. It makes sense to have a feature like this in gdb. Thanks #CodyChan for the motivation.
In a nutshell, we want to output
$num = variable-name = variable-value
instead of
$num = variable-value
As far as I can tell, gdb adds to the value history in only three places: the print command, the call command, and the history-append! Scheme function. Since my Scheme is rusty, we'll need to use the CLI or Python to run print and modify its output.
Using the CLI
define pp
set $i = 0
while $i < $argc
eval "pipe print $arg%d | awk -v name='$arg%d' '{ if (NR == 1 && $2 == \"=\") { $2 = \"= \" name \" =\" }; print }'", $i, $i
set $i++
end
end
Pipe is new in gdb 10.
That awk command is, after unescaping,
awk -v name='$arg%d' '{ if (NR == 1 && $2 == "=") { $2 = "= " name " =" }; print }'
which changes the = (second field) in $num = variable-value to = variable-name = . If gdb's print command outputs more than one line, the NR == 1 in the awk command makes sure the substitution is only done on the first line.
Security note: gdb's pipe command appears to parse the shell_command into tokens and uses execve to run it, rather than passing it to an actual shell. This prevents some code injection attacks (if, for instance, the $arg%d in name='$arg%d' contains single quotes), but you should be careful of running any shell command comprised of text you haven't vetted.
Using Python
class PP(gdb.Command):
"""print value history index, name, and value of each arg"""
def __init__(self):
super(PP, self).__init__("pp", gdb.COMMAND_DATA, gdb.COMPLETE_EXPRESSION)
def invoke(self, argstr, from_tty):
for arg in gdb.string_to_argv(argstr):
line = gdb.execute("print " + arg, from_tty=False, to_string=True)
line = line.replace("=", "= " + arg + " =", 1)
gdb.write(line)
PP()
Here, we're using a more sed-like approach, using string.replace.
Sample session:
(gdb) set args a b c
(gdb) start
Starting program: /home/mp/argprint a b c
Temporary breakpoint 2, main (argc=4, argv=0x7ffffffee278) at argprint.c:4
4 for(int i=0; i < argc; i++) {
(gdb) pp i argc argv argv[0]#argc
$1 = i = 0
$2 = argc = 4
$3 = argv = (char **) 0x7ffffffee278
$4 = argv[0]#argc = {0x7ffffffee49f "/home/mp/argprint", 0x7ffffffee4b1 "a", 0x7ffffffee4b3 "b", 0x7ffffffee4b5 "c"}
I'm working with some bioinformatics data, and I've got this sed expression:
sed -n 'N;/.*:\(.*\)\n.*\1/{p;n;p;n;p};D' file.txt
It currently takes a file that is structured such as:
#E00378:1485 1:N:0:ABC
ABCDEF ##should match, all characters present
+
#
#E00378:1485 1:N:1:ABC
XYZABX ##should match, with permutation
+
#
#E00378:1485 1:N:1:ABCDE
ZABCDXFGH ##should match, with permutation
+
#
#E00378:1485 1:N:1:CBA
ABC ##should not match, order not preserved
+
#
Then it returns 4 lines if the sequence after : is found in the second line, so in this case I would get:
#E00378:1485 1:N:0:ABC
ABCDEF
+
#
However, I am looking to expand my search a little, by adding the possibility of searching for any single permutation of the letters, while maintaining the order, such that ABX, ZBC, AHC, ABO would all match the search criteria ABC.
Is a search like this possible to construct as a one-liner? Or should I write a script?
I was thinking it should be possible to programmatically change one of the letters to a * in the pattern space.
I am trying to make something along the lines of an AWK pattern that has a match defined as:
p = "";
p = p "."a[2]a[3]a[4]a[5]a[6]a[7]a[8]"|";
p = p a[1]"."a[3]a[4]a[5]a[6]a[7]a[8]"|";
p = p a[1]a[2]"."a[4]a[5]a[6]a[7]a[8]"|";
p = p a[1]a[2]a[3]"."a[5]a[6]a[7]a[8]"|";
p = p a[1]a[2]a[3]a[4]"."a[6]a[7]a[8]"|";
p = p a[1]a[2]a[3]a[4]a[5]"."a[7]a[8]"|";
p = p a[1]a[2]a[3]a[4]a[5]a[6]"."a[8]"|";
p = p a[1]a[2]a[3]a[4]a[5]a[6]a[7]".";
m = p;
But I can't seem to figure out how to make it programmatically for n numbers.
Okay, check this out where fuzzy is your input above:
£ perl -0043 -MText::Fuzzy -ne 'if (/.*:(.*?)\n(.*?)\n/) {my ($offset, $edits, $distance) = Text::Fuzzy::fuzzy_index ($1, $2); print "$offset $edits $distance\n";}' fuzzy
3 kkk 0
5 kkd 1
5 kkkkd 1
Since you haven't been 100% clear on your "fuzziness" criteria (and can't be until you have a measurement tool), I'll explain this first. Reference here:
http://search.cpan.org/~bkb/Text-Fuzzy-0.27/lib/Text/Fuzzy.pod
Basically, for each record (which I've assumed are split on # which is the -0043 bit), the output is an offset, how the 1st string can become the 2nd string, and lastly the "distance" (Levenshtein, I would assume) between the two strings.
So..
£ perl -0043 -MText::Fuzzy -ne 'if (/.*:(.*?)\n(.*?)\n/) {my ($offset, $edits, $distance) = Text::Fuzzy::fuzzy_index ($1, $2); print "$_\n" if $distance < 2;}' fuzzy
#E00378:1485 1:N:0:ABC
ABCDEF
+
#
#E00378:1485 1:N:1:ABC
XYZABX
+
#
#E00378:1485 1:N:1:ABCDE
ZABCDXFGH
+
#
See here for installing perl modules like Text::Fuzzy
https://www.thegeekstuff.com/2008/09/how-to-install-perl-modules-manually-and-using-cpan-command/
Example input/output for a record that wouldn't be printed (distance is 3):
#E00378:1485 1:N:1:ABCDE
ZDEFDXFGH
+
#
gives us this (or simply doesn't print with the second perl command)
3 dddkk 3
Awk doesn't have sed back-references, but has more expressiveness to make up the difference. The following script composes the pattern for matching from the final field of the lead line, then applies the pattern to the subsequent line.
#! /usr/bin/awk -f
BEGIN {
FS = ":"
}
# Lead Line has 5 fields
NF == 5 {
line0 = $0
seq = $NF
getline
if (seq != "") {
n = length(seq)
if (n == 1) {
pat = seq
} else {
# ABC -> /.BC|A.C|AB./
pat = "." substr(seq, 2, n - 1)
for (i = 2; i < n; ++i)
pat = pat "|" substr(seq, 1, i - 1) "." substr(seq, i + 1, n - i)
pat = pat "|" substr(seq, 1, n - 1) "."
}
if ($0 ~ pat) {
print line0
print
getline; print
getline; print
next
}
}
getline
getline
}
If the above needs some work to form a different matching pattern, we mostly limit our modification to the lines of pattern composition. By the way... I noticed that sequences repeat -- to make this faster we can implement caching:
#! /usr/bin/awk -f
BEGIN {
FS = ":"
# Noticed that sequences repeat
# -- implement caching of patterns
split("", cache)
}
# Lead Line has 5 fields
NF == 5 {
line0 = $0
seq = $NF
getline
if (seq != "") {
if (seq in cache) {
pat = cache[seq]
} else {
n = length(seq)
if (n == 1) {
pat = seq
} else {
# ABC -> /.BC|A.C|AB./
pat = "." substr(seq, 2, n - 1)
for (i = 2; i < n; ++i)
pat = pat "|" substr(seq, 1, i - 1) "." substr(seq, i + 1, n - i)
pat = pat "|" substr(seq, 1, n - 1) "."
}
cache[seq] = pat
}
if ($0 ~ pat) {
print line0
print
getline; print
getline; print
next
}
}
getline
getline
}
I have a grade book file that looks like
StudentID:LastName:FirstName:hw01:quiz01:exam01:proj01:quiz02:
0123:Smith:Jon:100:80:80:100:90:
0987:Williams:Pat:20:30:35:46:50:
0654:Bar:Foo:100:100:100:100:100:
I need to add all the hws/quizes/exams/projects for each student and append the total to the end of the corresponding line
An example output file could be
StudentID:LastName:FirstName:hw01:quiz01:exam01:proj01:quiz02:hT:qT:eT:pT
0123:Smith:Jon:100:80:80:100:90:100:170:80:100:
0987:Williams:Pat:20:30:35:46:50:20:80:35:46:
0654:Bar:Foo:100:100:100:100:100:100:200:100:100:
The output file doesn't have to be the same file, but keep in mind that the order of the grades in the header line (1st line) could be anything. So the order of the assignments could be in any order.
I'm assuming I must use grep to search the file for all fields containing "hw"/"quiz"/"exam"/"proj" and get the corresponding field. Then go through each line and add the totals for hw/quiz/exam/proj individually.
Maybe it's easier with awk?
$ cat tst.awk
BEGIN { FS=OFS=":" }
NR==1 {
for (i=4;i<NF;i++) {
name = substr($i,1,1) "T"
nr2name[i] = name
if (!seen[name]++) {
names[++numNames] = name
}
}
printf "%s", $0
for (nameNr=1; nameNr<=numNames; nameNr++) {
printf "%s%s", names[nameNr], OFS
}
print ""
next
}
{
delete tot
for (i=4;i<NF;i++) {
name = nr2name[i]
tot[name] += $i
}
printf "%s", $0
for (nameNr=1; nameNr<=numNames; nameNr++) {
printf "%s%s", tot[names[nameNr]], OFS
}
print ""
}
.
$ awk -f tst.awk file
StudentID:LastName:FirstName:hw01:quiz01:exam01:proj01:quiz02:hT:qT:eT:pT:
0123:Smith:Jon:100:80:80:100:90:100:170:80:100:
0987:Williams:Pat:20:30:35:46:50:20:80:35:46:
0654:Bar:Foo:100:100:100:100:100:100:200:100:100:
This seems to do the job; it is intricate, though:
script.awk
BEGIN { FS = ":"; OFS = FS }
NR == 1 {
for (i = 4; i < NF; i++)
{
c = substr($i, 1, 1)
if (!(c in columns)) order[n++] = c
columns[c]++
letter[i] = c
}
nf = NF
for (i = 0; i < n; i++)
$(i+nf) = order[i] "T"
print $0 OFS
next
}
{
for (c in columns) total[c] = 0
for (i = 4; i < NF; i++)
{
c = letter[i]
total[c] += $i
}
nf = NF
for (i = 0; i < n; i++)
{
c = order[i]
$(i+nf) = total[c]
}
print $0 OFS
}
Explanation:
BEGIN:
Set the input and output field separators.
NR == 1:
Loop over the fields after the student ID and name fields.
Extract the first letter.
If the letter has not been seen before, note it in the order and increment the number of totals (n).
Increment the number of times the letter has been seen.
Record which letter goes with the current column.
Add the new columns after the existing columns in sequence.
Print the line plus a trailing output field separator (aka OFS or :).
Note that $NF is empty because of the trailing : in the data, hence (unusually for awk scripts), i < NF rather than i <= NF.
Each other line:
Reset the totals for each letter.
For each of the scoring fields, find the letter that the column belongs to (letter[i]).
Add the column to the total for the letter.
For each of the extra fields in order, set the value of the appropriate extra field to the total for that letter.
Print the record plus an extra colon (aka OFS).
data
StudentID:LastName:FirstName:hw01:quiz01:exam01:proj01:quiz02:
0123:Smith:Jon:100:80:80:100:90:
0987:Williams:Pat:20:30:35:46:50:
0654:Bar:Foo:100:100:100:100:100:
Sample output
$ awk -f script.awk data
StudentID:LastName:FirstName:hw01:quiz01:exam01:proj01:quiz02:hT:qT:eT:pT:
0123:Smith:Jon:100:80:80:100:90:100:170:80:100:
0987:Williams:Pat:20:30:35:46:50:20:80:35:46:
0654:Bar:Foo:100:100:100:100:100:100:200:100:100:
$
The only difference between this and the sample output in the question is a trailing colon on the title line, for consistency with the data lines (and input).
with few adaptation
- Order of Total is not the same (dynamic)
- Name of total is using full radical names without the last 2 digit
- Use a parameter to define first field with data to count (4th here with -v 'St=4')
awk -v 'St=4' '
BEGIN{FS=OFS=":"}
NR==1 {
printf "%s",$0
for(i=St;i<=(nf=NF-1);i++){
tn=$i;sub(/..$/,"T",tn)
T[tn]=0;TN[i]=tn
}
Sep=""
for(t in T){
printf "%s%s",Sep,t;Sep=OFS
}
print Sep
next
}
{
for(i=St;i<=nf;i++){
T[TN[i]]+=$i
}
for(i=1;i<=nf;i++)printf "%s%s",$i,OFS
Sep=""
for(t in T){
printf "%s%s",Sep,T[t]
T[t]=0;Sep=OFS
}
print Sep
}' YourFile
StudentID:LastName:FirstName:hw01:quiz01:exam01:proj01:quiz02:examT:quizT:hwT:projT:
0123:Smith:Jon:100:80:80:100:90:80:170:100:100:
0987:Williams:Pat:20:30:35:46:50:35:80:20:46:
0654:Bar:Foo:100:100:100:100:100:100:200:100:100:
I have a given file:
application_1.pp
application_2.pp
#application_2_version => '1.0.0.1-r1',
application_2_version => '1.0.0.2-r3',
application_3.pp
#application_3_version => '2.0.0.1-r4',
application_3_version => '2.0.0.2-r7',
application_4.pp
application_5.pp
#application_5_version => '3.0.0.1-r8',
application_5_version => '3.0.0.2-r9',
I would like to be able to read this file and search for the string
".pp"
When that string is found, it adds that line into a variable and stores it.
It then reads the next line of the file. If it encounters a line preceded by a # it ignores it and moves onto the next line.
If it comes across a line that does not contain ".pp" and doesn't start with # it should print out that line next to a the last stored variable in a new file.
The output would look like this:
application_1.pp
application_2.pp application_2_version => '1.0.0.2-r3',
application_3.pp application_3_version => '2.0.0.2-r7',
application_4.pp
application_5.pp application_5_version => '3.0.0.2-r9',
I would like to achieve this with awk. If somebody knows how to do this and it is a simple solution i would be happy if they could share it with me. If it is more complex, it would be helpful to know what in awk I need to understand in order to know how to do this (arrays, variables, etc). Can it even be achieved with awk or is another tool necessary?
Thanks,
I'd say
awk '/\.pp/ { if(NR != 1) print line; line = $0; next } NF != 0 && substr($1, 1, 1) != "#" { line = line $0 } END { print line }' filename
This works as follows:
/\.pp/ { # if a line contains ".pp"
if(NR != 1) { # unless we just started
print line # print the last assembled line
}
line = $0 # and remember this new one
next # and we're done here.
}
NF != 0 && substr($1, 1, 1) != "#" { # otherwise, unless the line is empty
# or a comment
line = line $0 # append it to the line we're building
}
END { # in the end,
print line # print the last line.
}
You can use sed:
#n
/\.pp/{
h
:loop
n
/[^#]application.*version/{
H
g
s/\n[[:space:]]*/\t/
p
b
}
/\.pp/{
x
p
}
b loop
}
If you save this as s.sed and run
sed -f s.sed file
You will get this output
application_1.pp
application_2.pp application_2_version => '1.0.0.2-r3',
application_3.pp application_3_version => '2.0.0.2-r7',
application_4.pp
application_5.pp application_5_version => '3.0.0.2-r9',
Explanation
The #n supresses normal output.
Once we match the /\.pp/, we store that line into the hold space with h, and start the loop.
We go to the next line with n
If it matches /[^#]application.*version/, meaning it doesn't start with a #, then we append the line to the hold space with H, then copy the hold space to the pattern space with g, and substitute the newline and any subsequent whitespace for a tab. Finally we print with p, and skip to the end of the script with b
If it matches /\.pp/, then we swap the pattern and hold spaces with x, and print with p.