decipher the regular expression - regex

please help me decipher the regular expression-
'!_[$0]++'
It is being used to get a MSISDN (one at a time from a file containing list of MSISDN starting with zero )by the following usage:
awk '!_[$0]++' file.txt

It's not a regular expression, it's an arithmetic and boolean expression.
$0 = The current input line
_[$0] = An associative array element whose key is the input line
_[$0]++ = increment that array element each time we encounter a repeat of the line, but evaluates to the original value
!_[$0]++ = boolean inverse, so it returns true if the value was originally 0 or the empty string, false otherwise
So this expression is true the first time a line is encountered, false every other time. Since there's no action block after the expression, the default is to print the line if the expression is true, skip it when false.
So this prints the input file with duplicates omitted.

'true'- then the line will be printed
'_[$0]++'- associative array will be incremented everytime when $0 is present.means it will set the number of times each line is repeated.
'!_[$0]++'-this will be true when a line is inserted in the associative array for the firsttime only and the rest of the times it will resolve to false ultimately not printing the line.
So all the duplicate lines will not be prited.

This is not a regular expression. This particular command prints unique lines the first time they are found.
_ is being used as an array here and $0 refers to the entire line. Given that the default numeric value for array element is 0 (it's technically an empty string, but in numeric contexts its treated as 0), the first time you see a line, you print the line (since _[$0] is falsy, !_[$0] will be true). The command increments every time it sees a line (after printing -- awk's default command is to print), so the next time you see the line _[$0] will be 1 and the line will not be printed

Related

what does if(index(i,$12)==1 indicate

Jus come across an awk script
awk 'BEGIN {OFS=FS} NR==FNR {a[$1]=($2" "$3);next} {for (i in a) if(index(i,$12)==1) print $0,a[$12]}'
in this script what does
if(index(i,$12)==1
mean? Is it indicating true/false condition on just numerical equal to 1?
Without samples it is difficult to understand complete requirements of question. Trying it by seeing your code.
BEGIN: section executes before an Input_file is being read.
OFS=FS it doesn't make sense to me since both variables by default values will be spaces.
NR==FNR: it is the condition when first Input_file is being read.
a[$1]: creating an array named a whose index is $1 of current line and value is 2nd and 3rd column of that line with space in them.
next: next will skip all further statements for 1dt Input_file from here.
for(i in a): starting a for loop which traverse through all elements of array a.
index(i,$12)==1: checking condition if index of array a which was 1st Input_file's 1st column is same as starting point of 12th column. Though it is not guarantee that it will match exact word. It will look match and returns it's starting point so here we are checking if that starting value of matched string is 1.
If above condition is TRUE then printing current line and array a whose index is $12.
index() is a function. It gets the position of a string within another string. From man awk:
index(s, t) Return the index of the string t in the string s, or 0 if t is not present. (This implies that character indices start at
one.) It is a fatal error to use a regexp constant for t.
In your example you iterate over the keys of the array a and check if column 12 starts with the key.

What does "$\=$/;" mean in perl?

I came across a perl program which counts the number of vowels in a string. But I'm not able to infer a single line how it is working. Anyone who can decode this program line by line?
$\=$/;map{
$_=<>;print 0+s/[aeiou]//gi
}1..<>
What does $\=$/; mean in perl?
Sets $\ to the value of $/.
$/ defines the line ending ending for readline (<>). It's default is a line feed (U+000A).
$\ is appended to the output of each print. It's default is the empty string.
So, assuming $/ hadn't been changed, it sets $\ to line feed, which makes print act like say.
Anyone who can decode this program line by line?
Globally make print act like say.
Read a line from ARGV.
For a number of times equal to the number read,
Read a line from ARGV.
Use s/[aeiou]//gi to count the number of vowels.
Print the result.
In scalar context, s///g returns the number of matches/replacements. 0+ forces scalar context.
By the way, tr/aeiouAEIOU// would be faster than 0+s/[aeiou]//gi, and no longer. It's also non-destructive.

Cutting a section out of a string and returning the opposite

Given a directory tree, how would I go about cutting the last field of the delimiter out of the string and returning the string without that delimiter, assuming I don't know where that string ends?
For instance, given
/1/2/3/4/5
I know I can return 5 with
cut -f 5 -d '/'
if I know the last field is the 5th one, or if a=/1/2/3/4/5
echo ${a##*/}
to pick the last field. But how would I go about returning the original string minus the last field? ie
/1/2/3/4
You can use the % or %% operator instead of ##. From the Bash man-page:
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in pathname expan-
sion. If the pattern matches a trailing portion of the expanded value
of parameter, then the result of the expansion is the expanded value of
parameter with the shortest matching pattern (the ``%'' case) or the
longest matching pattern (the ``%%'' case) deleted. If parameter is #
or *, the pattern removal operation is applied to each positional
parameter in turn, and the expansion is the resultant list. If parame-
ter is an array variable subscripted with # or *, the pattern removal
operation is applied to each member of the array in turn, and the
expansion is the resultant list.
In short, ${a%/*} will do the trick.
You can give cut a range of fields
cut -f 1-4 -d '/'

Python output of split function

I am trying to understand syntax of
attributeMap[tuple[0]] = tuple[1]
from
I have Python code for detecting input parameter - How to do similar in Powershell
It doesn't look correct because the brackets are uneven, but the program is interpreted without error. On the other hand if I change it to
attributeMap[tuple[0] = tuple[1]]
I get the error
File "lookup.py", line 15
attributeMap[tuple[0] = tuple[1]]
The brackets are not "uneven" at all:
attributeMap[tuple[0]] = tuple[1]
We have three expressions here:
tuple[0] # first element of tuple
tuple[1] # second element of tuple
attributeMap[tuple[0]] # value in attributeMap which has the key matching first element of tuple
As you can see, the third expression makes use of the first, and at the end all we do is assign the second to the third. The brackets are in the right places.

regex and file read line in autohotley

well i am currently writing a script that is meant to check the logs of another script i wrote to see if it has had three or more unsuccessful pings in a row before a successful one, this is just barebones at the moment but it should look something like this
fileread,x,C:\Users\Michael\Desktop\ping.txt
result:=RegExMatch(%x% ,failure success)
msgbox,,, The file is = %x% `n the result is = %result%
now the file that is trying to read is
success failure success
and for some reason, when it reads the file it says that the variable %x% 'contains illegal characters
when i copy and paste the contents of ping.txt into the script and save it as a variable it works
i have made sure that the file has windows line endings CR +LF
i have assigned the variable generated in file read as another variable thus stripping any trailing or leading whitespace characters
the file is encoded in ANSI and still has the problem with UTF8
Function parameters take variable names without the % symbol, simply remove them.
I also want to point out that if the second parameter is meant to be a regular expression,
instead of a variable containing a regular expression, you will need quotes around it.
As is your script passes an empty string as the pattern which will always return 1
(failure is interpreted as a variable with an empty string associated with it.).
To quote Lexikos:
"An empty string, when compiled as a regex pattern, will match exactly
zero characters at whatever position you attempt to match it. Think of
it this way: For any position n in any string, the next 0 characters
are always the same."
Because you are simply truth testing,
or finding the index I want to point out that Autohotkey has a useful shorthand operator for this.
string := "this is a test"
f1::
result := RegExMatch(string, "\sis")
traytip,, %result%
Return
f2::
result := string ~= "\sis"
traytip,, % result
Return
These hotkeys both do the same thing; the second uses the shorthand operator ~=
and notice how the traytip parameter in the second example has only one %
When you start a command parameter with a % that starts an expression,
and within an expression variables are not enclosed with %.
The ternary operator ?: is also very useful:
string := "this is a test"
f3::traytip,, % (result := string ~= "\sis") ? (result) : ("nothing")
It might look complicated but it's very simple.
Think of
% as if
? as then
: as else
If (true) then (a) else (b)
% (true) ? (a) : (b)
A variable will be evaluated as False if 0 (or nothing) is assigned to it.
But in this example "\sis" is matched and the index of the space is returned (5),
so it is evaluated as True.
You can read more about variables and operators here:
http://l.autohotkey.net/docs/Variables.htm