using awk in tcl script

using awk in tcl script - regex

I want to print a particular column number fields in a file while in TCL script.
I tried with exec awk '{print $4}' foo where foo is filename, but it is not working as it gives error
can't read "4": no such variable
How can I do above awk in tcl scripting?
Thanks,

The problem is that single quotes have no special meaning in Tcl, they're just ordinary characters in a string. Thus the $4 is not hidden from Tcl and it tries to expand the variable.
The Tcl equivalent to shell single quotes are braces. This is what you need:
exec awk {{print $4}} foo
The double braces look funny, but the outer pair are for Tcl and the inner pair are for awk.
Btw, the Tcl translation of that awk program is:
set fid [open foo r]
while {[gets $fid line] != -1} {
set fields [regexp -all -inline {\S+} $line]
puts [lindex $fields 3]
}
close $fid

Related

Extracting part of path containing a number in bash

In bash, given a path such as:
mypath='my/path/to/version/5e/is/7/here'
I would like to extract the first part that contains a number. For the example I would want to extract: 5e
Is there a better way than looping over the parts using while and checking each part for a number?
while IFS=/ read part
do
if [[ $part =~ *[0-9]* ]]; then
echo "$part"
fi
done <<< "$mypath"

Using Bash's regex:
[[ "$mypath" =~ [^/]*[0-9]+[^/]* ]] && echo "${BASH_REMATCH[0]}"
5e

Method using 'grep -o'.
echo $mypath | grep -o -E '\b[^/]*[0-9][^/]*\b' | head -1

Replace / by a newline
Filter the first match with a number
mypath='my/path/to/version/5e/is/7/here'
<<<"${mypath//\//$'\n'}" grep -m1 '[0-9]'
and a safer alternative that uses zero separated stream with GNU tools in case there are newlines in the path:
<<<"${mypath}" tr '/' '\0' | grep -z -m1 '[0-9]'
Is there a better way than looping over the parts using while and checking each part for a number?
No, either way or another you have to loop through all the parts until the first part with numbers is discovered. The loop may be hidden behind other tools, but it's still going to loop through the parts. You solution seems to be pretty good by itself, just break after you've found the first part if you want only the first.

Could you please try following, written and tested with shown samples. This should print if we have more than 1 values in the lines too. If you talk about better way, awk could be fast compare to pure bash loop + regex solutions IMHO, so adding it here.
awk -F'/' '
{
val=""
for(i=1;i<=NF;i++){
if($i~/[0-9][a-zA-Z]/ || $i~/[a-zA-Z][0-9]/){
val=(val?val OFS:"")$i
}
}
print val
}' Input_file
Explanation: Adding detailed explanation for above.
awk -F'/' ' ##Starting awk program from here and setting field separator as / here.
{
val="" ##Nullifying val here.
for(i=1;i<=NF;i++){ ##Running for loop till value of NF.
if($i~/[0-9][a-zA-Z]/ || $i~/[a-zA-Z][0-9]/){ ##Checking condition if field value is matching regex of digit alphabet then do following.
val=(val?val OFS:"")$i ##Creating variable val where keep on adding current field value in it.
}
}
print val ##Printing val here.
}' Input_file ##Mentioning Input_file name here.

Using Perl:
mypath='my/path/to/version/5e/is/7/here'
# Method 1 (using for loop):
echo "${mypath}" | perl -F'/' -lane 'for my $dir ( #F ) { next unless $dir =~ /\d/; print $dir; last; }'
# Method 2 (using grep):
echo "${mypath}" | perl -F'/' -lane 'my $dir = ( grep { /\d/ } #F )[0]; print $dir if defined $dir;'
# Prints:
# 5e
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'/' : Split into #F on /, rather than on whitespace.
next unless $dir =~ /\d/; : skip the rest of the loop if the current part of the path does not* contain a digit (\d).
last; : exit the loop (here, it also exits the script), so that it prints only the first occurrence of the matching directory.
grep { ... } LIST : for the LIST argument, returns the list of elements for which the expression ... is true, here returns the list of all path elements that have a digit.
(LIST)[0] : returns the first element of the LIST, here, the first path element with a digit.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups

With awk, set RS to / and print the first record containing a number.
awk -v RS=/ '/[0-9]/{print;exit}' <<< "$mypath"
5e

Another bash variant
mypath='my/path/to/app version/5e/is/7/here'
until [[ ${mypath:0:1} =~ [0-9] ]]; do
mypath=${mypath#*/}
done
echo ${mypath%%/*}

RegEx - How to change two double quotes to one double quote?

I have a bunch of strings:
pipe 1/4"" square
3" bar
3/16"" spanner
nozzle 2""
1/2"" tube pipe with 6"" cut out
I want to replace the 2 double quotation marks from a string with Regex. I've been trying on some code with the aid of some references but cannot seem to do it right.
Ideally once RegEx'ed I would like to pass it into a $var that I can call further on in my script.
Q: What is the Regex that will do this with Bash?

You can use sed:
sed 's/""/"/g' input_file > output_file
Or, process the input line by line and use parameter expansion:
while read -r line ; do
line=${line//\"\"/\"}
echo "$line"
done < input_file
/g in sed and // in the expansion serve the same purpose: they'll apply the substitution on all occurrences on a line.

Using Bash parameter expansion:
echo "${var//\"\"/\"}"
sample output:
pipe 1/4" square

You can use the gawk:
echo $varName | gawk '{ gsub(/""/,"\"") } 1'
or the sed command:
echo $varName | sed 's/""/"/g'
I assumed your variable is named varName.
Instead if you need to to this for a file:
gawk '{ gsub(/""/,""") } 1' fileName
or
sed 's/""/"/g' fileName

SED replace expression "within" a regular expression

I have to change a CSV file column (the date) which is written in the following format:
YYYY-MM-DD
and I would like it to be
YYYY.MM.DD
I can write a succession of 2 sed rules piped one to the other like :
sed 's/-/./' file.csv | sed 's/-/./'
but this is not clean. my question is: is there a way of assigning variables in sed and tell it that YYYY-MM-DD should be parsed as year=YYYY ; month=MM ; day=DD and then tell it
write $year.$month.$day
or something similar? Maybe with awk?

You could use groups and access the year, month, and day directly via backreferences:
sed 's#\([0-9][0-9][0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\)#\1.\2.\3#g'

Here's an alternative solution with awk:
awk 'BEGIN { FS=OFS="," } { gsub("-", ".", $1); print }' file.csv
BEGIN { FS=OFS="," } tells awk to break the input lines into fields by , (variable FS, the [input] Field Separator), as well as to also use , when outputting modified input lines (variable OFS, the Output Field Separator).
gsub("-", ".", $1) replaces all - instances with . in field 1
The assumption is that the data is in the 1st field, $1; if the field index is a different one, replace the 1 in $1 accordingly.
print simply outputs the modified input line, terminated with a newline.

What you are doing is equivalent to supplying the "global" replacement flag:
sed 's/-/./g' file.csv
sed has no variables, but it does have numbered groups:
sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\1.\2.\3/g' file.csv
or, if your sed has no -r:
sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1.\2.\3/g' file.csv

You may try this sed command also,
sed 's/\([0-9]\{4\}\)\-\([0-9]\{2\}\)\-\([0-9]\{2\}\)/\1.\2.\3/g' file
Example:
$ (echo '2056-05-15'; echo '2086-12-15'; echo 'foo-bar-go') | sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1.\2.\3/g'
2056.05.15
2086.12.15
foo-bar-go

How to match anything before and after a pattern in awk?

The following awk code searches for $find among the 2nd column of file.csv and outputs the data found in the 1st column of the first matching row:
awk -v pattern="$find" '$2 ~ pattern { print $1; exit }' file.csv
E.g., given file.csv:
1,panda
2,zebra
3,bobcat
4,lion
5,cat
If $find is set to "cat", it prints "5".
This appears to be only matching the entire contents of the cell, similar to ^cat$ in grep.
How can I adjust this such that it finds the first time the text appears somewhere within the cell, e.g., if $find is set to "cat", it prints "3", because "bobcat" contains the word "cat". In other words, rather than matching the entire cell in the CSV, if the match is found somewhere within the cell, it is sufficient.
Only the first match should be output.
I tried the following, but they do not work as expected:
awk -v pattern="*$find*" '$2 ~ pattern { print $1; exit }' file.csv
I could find no instructions at AWK Language Programming - Regular Expressions for matching anything before and after in awk.

It shouldn't. You are using a csv file and have not set the field separator to ,.
Here is the output you expect:
$ cat file.csv
1,panda
2,zebra
3,bobcat
4,lion
5,cat
$ find=cat
$ awk -F, -v pattern="$find" '$2 ~ pattern { print $1; exit }' file.csv
3
For exact match, use == instead of ~.
$ awk -F, -v pattern="$find" '$2==pattern { print $1; exit }' file.csv
5

In addition to what JS explained there is another way to perform this search in non-regex way for the cases when your search string may contain special regex characters is by using index function:
find='cat'
awk -F, -v pattern="$find" 'index($2, pattern) { print $1; exit }' file.csv
3

why doesn't this Perl capture work

I expected this to capture and print just the group defined in parens, but instead it prints the whole line. How can I capture and print just the group in parens?
echo "abcdef" | perl -ne "print $1 if /(cd)/ "
What I want this to print: cd
What it actually prints: abcdef
How to fix?

In the perl command, you have to use single quotes or protect variables :
echo "abcdef" | perl -ne "print \$1 if /(cd)/"
or
echo "abcdef" | perl -ne 'print $1 if /(cd)/'
In double quotes, the shell expand $1.

The instant fix to your question is to change your double quotes to single quotes, like this:
$ echo abcdef | perl -ne 'print $1 if /(cd)/'
cd
With double quotes, the shell environment interprets your unprotected variable $1, which in your environment apparently evaluates to an empty string. So perl only receives the command print if /(cd)/ which is an implied command print $_ if /(cd)/ which prints the entire line.
You can also use a protected variable like this:
$ echo abcdef | perl -ne "print \$1 if /(cd)/"
cd
Note that matches which use different delimiters (other than / and /) are required to begin with the m keyword rather than using the shorthand form. But in your case, this does not matter, but it is often something worth being aware of when working with matches, e.g., m|/| would match a / character using the pipe as the delimiter for the regular expression.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using awk in tcl script - regex

I want to print a particular column number fields in a file while in TCL script. I tried with exec awk '{print $4}' foo where foo is filename, but it is not working as it gives error can't read "4": no such variable How can I do above awk in tcl scripting? Thanks,

Related

Extracting part of path containing a number in bash

RegEx - How to change two double quotes to one double quote?

SED replace expression "within" a regular expression

How to match anything before and after a pattern in awk?

why doesn't this Perl capture work

Categories

Resources