Microsoft Word Performing computations - if-statement

I'm attempting to do a conversion from military time to regular time, however word is not evaluating anything. Honestly this is really counter intuitive.
Anyways The first thing I tried was:
{SET H { =INT({MERGEFIELD ARRIVAL_HOUR})}
{IF "{MERGEFIELD ARRIVAL_HOUR}" > 12 "{REF H}-12" "{MERGEFIELD ARRIVAL_HOUR}"}
and word gives me Invalid OP code : Ok, so then I tried:
{IF "{MERGEFIELD ARRIVAL_HOUR \* MERGEFORMAT }"
>12 "{={MERGEFIELD ARRIVAL_HOUR \*MERGEFORMAT } - 12}PM" "" }
and I get "15-12PM" displayed on the page. So why isn't it doing the computation?

Related

awk, skip current rule upon sanity check

How to skip current awk rule when its sanity check failed?
{
if (not_applicable) skip;
if (not_sanity_check2) skip;
if (not_sanity_check3) skip;
# the rest of the actions
}
IMHO, it's much cleaner to write code this way than,
{
if (!not_applicable) {
if (!not_sanity_check2) {
if (!not_sanity_check3) {
# the rest of the actions
}
}
}
}
1;
I need to skip the current rule because I have a catch all rule at the end.
UPDATE, the case I'm trying to solve.
There is multiple match point in a file that I want to match & alter, however, there's no other obvious sign for me to match what I want.
hmmm..., let me simplify it this way, I want to match & alter the first match and skip the rest of the matches and print them as-is.
As far as I understood your requirement, you are looking for if, else if here. Also you could use switch case available in newer version of gawk packages too.
Let's take an example of a Input_file here:
cat Input_file
9
29
Following is the awk code here:
awk -v var="10" '{if($0<var){print "Line " FNR " is less than var"} else if($0>var){print "Line " FNR " is greater than var"}}' Input_file
This will print as follows:
Line 1 is less than var
Line 2 isgreater than var
So if you see code carefully its checking:
First condition if current line is less than var then it will be executed in if block.
Second condition in else if block, if current line is greater than var then print it there.
I'm really not sure what you're trying to do but if I focus on just that last sentence in your question of I want to match & alter the first match and skip the rest of the matches and print them as-is. ... is this what you're trying to do?
{ s=1 }
s && /abc/ { $0="uvw"; s=0 }
s && /def/ { $0="xyz"; s=0 }
{ print }
e.g. to borrow #Ravinder's example:
$ cat Input_file
9
29
$ awk -v var='10' '
{ s=1 }
s && ($0<var) { $0="Line " FNR " is less than var"; s=0 }
s && ($0>var) { $0="Line " FNR " is greater than var"; s=0 }
{ print }
' Input_file
Line 1 is less than var
Line 2 is greater than var
I used the boolean flag variable name s for sane as you also mentioned something in your question about the conditions tested being sanity checks so each condition can be read as is the input sane so far and this next condition is true?.

Split string by . but ignore whats inside []

I have a string:
s='articles[zone.id=1].comments[user.status=active].user'
Looking to split (via split(some_regex_here)). The split needs to occur on every period other than those inside the bracketed substring.
Expected output:
["articles[zone.id=1]", "comments[user.status=active]", "user"]
How would I go about this? Or is there something else besides split(), I should be looking at?
Try this,
s.split(/\.(?![^\[]*\])/)
I got this result,
2.3.2 :061 > s.split(/\.(?![^\[]*\])/)
=> ["articles[zone.id=1]", "comments[user.status=active]", "user"]
You can also test it here:
https://rubular.com/r/LaxEFQZJ0ygA3j
I assume the problem is to split on periods that are not within matching brackets.
Here is a non-regex solution that works with any number of nested brackets. I've assumed the brackets are all matched, but it would not be difficult to check that.
def split_it(s)
left_brackets = 0
s.each_char.with_object(['']) do |c,a|
if c == '.' && left_brackets.zero?
a << '' unless a.last.empty?
else
case c
when ']' then left_brackets -= 1
when '[' then left_brackets += 1
end
a.last << c
end
end.tap { |a| a.pop if a.last.empty? }
end
split_it '.articles[zone.id=[user.loc=1]].comments[user.status=active].user'
#=> ["articles[zone.id=[user.loc=1]]", "comments[user.status=active]", "user"]

Need to know logic behind few regular expression

I am having a code as
$wrk = OC192-1-1-1;
#temp = split (/-/, $wrk);
if ($temp1[3] =~ /101 || 102 /)
{
print "yes";
} else {
print "no";
}
Output :
yes
Need to know why this is printing yes. I know for regular expression | is supported for OR operator. But need to know why || is giving as "yes" as output
It is because || will make regex match succeed by matching with nothing all the time.
So it is essentially matching $temp1[3] (which doesn't exist) with anyone of the following
"101 "
""
" 102 "
I added double quotes just for explanation.
/101 || 102 / regex tries to match '101 ', or '' (empty string), or ' 102 '.
Since empty string can always be matched, it always returns true in your condition.
In addition to the regex-relevant answer from #anubhava, note that: OC192-1-1-1 is same as 0-1-1-1, which is just "-3", therefore #temp evaluates to ( "", "3" )
And of course there's no such thing as $temp1

Find words, that are substrings of other words efficiently

I have an Ispell list of english words (nearly 50 000 words), my homework in Perl is to get quickly (like under one minute) list of all strings, that are substrings of some other word. I have tried solution with two foreach cycles comparing all words, but even with some optimalizations, its still too slow. I think, that right solution could be some clever use of regular expressions on array of words. Do you know how to solve this problem quicky (in Perl)?
I have found fast solution, which can find some all these substrings in about 15 seconds on my computer, using just one thread. Basically, for each word, I have created array of every possible substrings (eliminating substrings which differs only in "s" or "'s" endings):
#take word and return list of all valid substrings
sub split_to_all_valid_subwords {
my $word = $_[0];
my #split_list;
my ($i, $j);
for ($i = 0; $i < length($word); ++$i){
for ($j = 1; $j <= length($word) - $i; ++$j){
unless
(
($j == length($word)) or
($word =~ m/s$/ and $i == 0 and $j == length($word) - 1) or
($word =~ m/\'s$/ and $i == 0 and $j == length($word) - 2)
)
{
push(#split_list, substr($word, $i, $j));
}
}
}
return #split_list;
}
Then I just create list of all candidates for substrings and make intersection with words:
my #substring_candidates;
foreach my $word (#words) {
push( #substring_candidates, split_to_all_valid_subwords($word));
}
#make intersection between substring candidates and words
my %substring_candidates=map{$_ =>1} #substring_candidates;
my %words=map{$_=>1} #words;
my #substrings = grep( $substring_candidates{$_}, #words );
Now in substrings I have array of all words, that are substrings of some other words.
Perl regular expressions will optimize patterns like foo|bar|baz into an Aho-Corasick match - up to a certain limit of total compiled regex length. Your 50000 words will probably exceed that length, but could be broken into smaller groups. (Indeed, you probably want to break them up by length and only check words of length N for containing words of length 1 through N-1.)
Alternatively, you could just implement Aho-Corasick in your perl code - that's kind of fun to do.
update
Ondra supplied a beautiful solution in his answer; I leave my post here as an example of overthinking a problem and failed optimisation techniques.
My worst case kicks in for a word that doesn't match any other word in the input. In that case, it goes quadratic. The OPT_PRESORT was a try to advert the worst case for most words. The OPT_CONSECUTIVE was a linear-complexity filter that reduced the total number of items in the main part of the algorithm, but it is just a constant factor when considering the complexity. However, it is still useful with Ondras algorithm and saves a few seconds, as building his split list is more expensive than comparing two consecutive words.
I updated the code below to select ondras algorithm as a possible optimisation. Paired with zero threads and the presort optimisation, it yields maximum performance.
I would like to share a solution I coded. Given an input file, it outputs all those words that are a substring of any other word in the same input file. Therefore, it computes the opposite of ysth's ideas, but I took the idea of optimisation #2 from his answer. There are the following three main optimisations that can be deactivated if required.
Multithreading
The questions "Is word A in list L? Is word B in L?" can be easily parallelised.
Pre-sorting all the words for their length
I create an array that points to the list of all words that are longer than a certain length, for every possible length. For long words, this can cut down the number of possible words dramatically, but it trades quite a lot of space, as one word of length n appears in all lists from length 1 to length n.
Testing consecutive words
In my /usr/share/dict/words, most consecutive lines look quite similar:
Abby
Abby's
for example. As every word that would match the first word also matches the second one, I immediately add the first word to the list of matching words, and only keep the second word for further testing. This saved about 30% of words in my test cases. Because I do that before optimisation No 2, this also saves a lot of space. Another trade-off is that the output will not be sorted.
The script itself is ~120 lines long; I explain each sub before showing it.
head
This is just a standard script header for multithreading. Oh, and you need perl 5.10 or better to run this. The configuration constants define the optimisation behaviour. Add the number of processors of your machine in that field. The OPT_MAX variable can take the number of words you want to process, however this is evaluated after the optimisations have taken place, so the easy words will already have been caught by the OPT_CONSECUTIVE optimisation. Adding anything there will make the script seemingly slower. $|++ makes sure that the status updates are shown immediately. I exit after the main is executed.
#!/usr/bin/perl
use strict; use warnings; use feature qw(say); use threads;
$|=1;
use constant PROCESSORS => 0; # (false, n) number of threads
use constant OPT_MAX => 0; # (false, n) number of words to check
use constant OPT_PRESORT => 0; # (true / false) sorts words by length
use constant OPT_CONSECUTIVE => 1; # (true / false) prefilter data while loading
use constant OPT_ONDRA => 1; # select the awesome Ondra algorithm
use constant BLABBER_AT => 10; # (false, n) print progress at n percent
die q(The optimisations Ondra and Presort are mutually exclusive.)
if OPT_PRESORT and OPT_ONDRA;
exit main();
main
Encapsulates the main logic, and does multi-threading. The output of n words will be matched will be considerably smaller than the number of input words, if the input was sorted. After I have selected all matched words, I print them to STDOUT. All status updates etc. are printed to STDERR, so that they don't interfere with the output.
sub main {
my #matching; # the matching words.
my #words = load_words(\#matching); # the words to be searched
say STDERR 0+#words . " words to be matched";
my $prepared_words = prepare_words(#words);
# do the matching, possibly multithreading
if (PROCESSORS) {
my #threads =
map {threads->new(
\&test_range,
$prepared_words,
#words[$$_[0] .. $$_[1]] )
} divide(PROCESSORS, OPT_MAX || 0+#words);
push #matching, $_->join for #threads;
} else {
push #matching, test_range(
$prepared_words,
#words[0 .. (OPT_MAX || 0+#words)-1]);
}
say STDERR 0+#matching . " words matched";
say for #matching; # print out the matching words.
0;
}
load_words
This reads all the words from the input files which were supplied as command line arguments. Here the OPT_CONSECUTIVE optimisation takes place. The $last word is either put into the list of matching words, or into the list of words to be matched later. The -1 != index($a, $b) decides if the word $b is a substring of word $a.
sub load_words {
my $matching = shift;
my #words;
if (OPT_CONSECUTIVE) {
my $last;
while (<>) {
chomp;
if (defined $last) {
push #{-1 != index($_, $last) ? $matching : \#words}, $last;
}
$last = $_;
}
push #words, $last // ();
} else {
#words = map {chomp; $_} <>;
}
#words;
}
prepare_words
This "blows up" the input words, sorting them after their length into each slot, that has the words of larger or equal length. Therefore, slot 1 will contain all words. If this optimisation is deselected, it is a no-op and passes the input list right through.
sub prepare_words {
if (OPT_ONDRA) {
my $ondra_split = sub { # evil: using $_ as implicit argument
my #split_list;
for my $i (0 .. length $_) {
for my $j (1 .. length($_) - ($i || 1)) {
push #split_list, substr $_, $i, $j;
}
}
#split_list;
};
return +{map {$_ => 1} map &$ondra_split(), #_};
} elsif (OPT_PRESORT) {
my #prepared = ([]);
for my $w (#_) {
push #{$prepared[$_]}, $w for 1 .. length $w;
}
return \#prepared;
} else {
return [#_];
}
}
test
This tests if the word $w is a substring in any of the other words. $wbl points to the data structure that was created by the previous sub: Either a flat list of words, or the words sorted by length. The appropriate algorithm is then selected. Nearly all of the running time is spent in this loop. Using index is much faster than using a regex.
sub test {
my ($w, $wbl) = #_;
my $l = length $w;
if (OPT_PRESORT) {
for my $try (#{$$wbl[$l + 1]}) {
return 1 if -1 != index $try, $w;
}
} else {
for my $try (#$wbl) {
return 1 if $w ne $try and -1 != index $try, $w;
}
}
return 0;
}
divide
This just encapsulates an algorithm that guarantees a fair distribution of $items items into $parcels buckets. It outputs the bounds of a range of items.
sub divide {
my ($parcels, $items) = #_;
say STDERR "dividing $items items into $parcels parcels.";
my ($min_size, $rest) = (int($items / $parcels), $items % $parcels);
my #distributions =
map [
$_ * $min_size + ($_ < $rest ? $_ : $rest),
($_ + 1) * $min_size + ($_ < $rest ? $_ : $rest - 1)
], 0 .. $parcels - 1;
say STDERR "range division: #$_" for #distributions;
return #distributions;
}
test_range
This calls test for each word in the input list, and is the sub that is multithreaded. grep selects all those elements in the input list where the code (given as first argument) return true. It also regulary outputs a status message like thread 2 at 10% which makes waiting for completition much easier. This is a psychological optimisation ;-).
sub test_range {
my $wbl = shift;
if (BLABBER_AT) {
my $range = #_;
my $step = int($range / 100 * BLABBER_AT) || 1;
my $i = 0;
return
grep {
if (0 == ++$i % $step) {
printf STDERR "... thread %d at %2d%%\n",
threads->tid,
$i / $step * BLABBER_AT;
}
OPT_ONDRA ? $wbl->{$_} : test($_, $wbl)
} #_;
} else {
return grep {OPT_ONDRA ? $wbl->{$_} : test($_, $wbl)} #_;
}
}
invocation
Using bash, I invoked the script like
$ time (head -n 1000 /usr/share/dict/words | perl script.pl >/dev/null)
Where 1000 is the number of lines I wanted to input, dict/words was the word list I used, and /dev/null is the place I want to store the output list, in this case, throwing the output away. If the whole file should be read, it can be passed as an argument, like
$ perl script.pl input-file >output-file
time just tells us how long the script ran. Using 2 slow processors and 50000 words, it executed in just over two minutes in my case, which is actually quite good.
update: more like 6–7 seconds now, with the Ondra + Presort optimisation, and no threading.
further optimisations
update: overcome by better algorithm. This section is no longer completely valid.
The multithreading is awful. It allocates quite some memory and isn't exactly fast. This isn't suprising considering the amount of data. I considered using a Thread::Queue, but that thing is slow like $#*! and therefore is a complete no-go.
If the inner loop in test was coded in a lower-level language, some performance might be gained, as the index built-in wouldn't have to be called. If you can code C, take a look at the Inline::C module. If the whole script were coded in a lower language, array access would also be faster. A language like Java would also make the multithreading less painful (and less expensive).

Regex performance: validating alphanumeric characters

When trying to validate that a string is made up of alphabetic characters only, two possible regex solutions come to my mind.
The first one checks that every character in the string is alphanumeric:
/^[a-z]+$/
The second one tries to find a character somewhere in the string that is not alphanumeric:
/[^a-z]/
(Yes, I could use character classes here.)
Is there any significant performance difference for long strings?
(If anything, I'd guess the second variant is faster.)
Just by looking at it, I'd say the second method is faster.
However, I made a quick non-scientific test, and the results seem to be inconclusive:
Regex Match vs. Negation.
P.S. I removed the group capture from the first method. It's superfluous, and would only slow it down.
Wrote this quick Perl code:
#testStrings = qw(asdfasdf asdf as aa asdf as8up98;n;kjh8y puh89uasdf ;lkjoij44lj 'aks;nasf na ;aoij08u4 43[40tj340ij3 ;salkjaf; a;lkjaf0d8fua ;alsf;alkj
a a;lkf;alkfa as;ldnfa;ofn08h[ijo ok;ln n ;lasdfa9j34otj3;oijt 04j3ojr3;o4j ;oijr;o3n4f;o23n a;jfo;ie;o ;oaijfoia ;aosijf;oaij ;oijf;oiwj;
qoeij;qwj;ofqjf08jf0 ;jfqo;j;3oj4;oijt3ojtq;o4ijq;onnq;ou4f ;ojfoqn;aonfaoneo ;oef;oiaj;j a;oefij iiiii iiiiiiiii iiiiiiiiiii);
print "test 1: \n";
foreach my $i (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /^([a-z])+$/) {
#print "match"
} else {
#print "not"
}
}
}
print `date` . "\n";
print "test 2: \n";
foreach my $j (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /[^a-z]/) {
#print "match"
} else {
#print "not"
}
}
}
then ran it with:
date; <perl_file>; date
it isn't 100% scientific, but it gives us a good idea. The first Regex took 10 or 11 seconds to execute, the second Regex took 8 seconds.