Parsing Values in columns using TCL

Parsing Values in columns using TCL - regex

I have an output which looks like below
A B C
0 1 2
I have lot of coloumns due to which the o/p looks to have wrapped around. I am wondering a way to get the respective value if I parse the keys (A B or C)

Considering each key (i.e. alphabet) will have one value (i.e. numeral), then we can use the following way. (This might be a workaround to get what we need)
set input "A B C
0 1 2"
set alpha [ regexp -all -inline {[a-zA-Z]} $input]; #Match all alphabets
set numeric [ regexp -all -inline {\d} $input]; #Match all numeric values
#Using 'foreach' to loop both list at a same time.
foreach a $alpha n $numeric {
puts "$a : $n"
}
If the pair is not equally distributed, (i.e either alphabet or numeric value is missing) then they will be assigned with empty string during the course of foreach loop execution.
If you want to get them in Key-Value pair then, we can make use of dict or array in tcl.
Dictionary Implementation
foreach a $alpha n $numeric {
dict append test $a $n
}
puts "Whole dictionary : [ dict get $test ]"
puts "Value of key A : [ dict get $test A ]"; #Getting value of key 'A'
dict for { a n } $test {
puts "Key : $a ==> Value : $n "
}
Array Implementation
foreach a $alpha n $numeric {
set result($a) $n
}
puts "Whole array : [ array get result ]"
puts "Value of key A : $result(A) "#Getting value of key 'A'
foreach index [array names result] {
puts "Key : $index ==> Value : $result($index)"
}
Reference : dict, array

Related

conceptual error with control flow in Tcl

I have a collection of things, some of them empty.
I want to form a collection of non empty things, separated by separator.
this is essentially what I do in c++, but fails with any and all combinations of $ signs etc. etc.
I've already got a work around thanks, I'd like to know why and how this fails.
set q1 "a"
set q2 ""
set q3 "c"
set q4 d
set q5 ""
set answer ""
set needSeparator 0
foreach { var } {
q1 q2 q3 q4 q5
} {
if { $var ne "" } {
if {$needSeparator} {
append answer " separator "
}
append answer $var
set needSeparator 1
}
}
# expecting answer to be "a separator c separator d"
puts $answer
edit 2021-09-14
Following on from #Shawn
< if { $var ne "" } {
---
> if { [set elem [set $var]] ne "" } {
< append answer $var
---
> append answer $elem
on my effort does the job.
not quite sure how set is doing the dereferencing there
but that's one for another day.
this was a minimal example so the rather more funky answers are too involved for someone trying to program in c++ :-). The qNs are
horrible and come from different places, but the final code example is
sweet and works translated back into my real problem - see below
# build compound SELECT
set q1 [select $mapText "final_text"]
set q2 [select $parish "parish"]
set q3 [select $la "local_authority"]
set q4 [sqSelect $five00]
set q5 ""
if {$nation ne "All"} {
set q5 {SELECT pin_id AS id FROM gazetteer WHERE nation = '}
append q5 $nation "'\n"
}
set compound {}
foreach clause {q1 q2 q3 q4 q5} {
if {[set q [set $clause]] ne ""} {
lappend compound $q
}
}
if {[llength compound] == 0} { return ""}
set res "WITH pinIds AS (\n"
append res [join $compound "INTERSECT\n "] ")\n"
Thanks for your help

You're better off using a list, dict or array to store related values instead of a bunch of different variables. But any way your data is stored, lappend the non-empty values to a list or otherwise filter out the empty ones and join the result:
#!/usr/bin/env tclsh
set data {a "" c d ""}
# Using foreach
set answer {}
foreach elem $data {
if {$elem ne ""} {
lappend answer $elem
}
}
puts [join $answer " separator "]
# Using lmap for a more functional style; note eq instead of of ne
set answer [lmap elem $data { if {$elem eq ""} continue; set elem }]
puts [join $answer " separator "]
# Using a dict
set data [dict create q1 a q2 "" q3 c q4 d q5 ""]
set answer {}
# Dict traversal happens in the same order keys were added
dict for {_ elem} $data {
if {$elem ne ""} {
lappend answer $elem
}
}
puts [join $answer " separator "]
When iterating through a list of variable names, you have to use set to get the value of the current name (In your code, $var is q1, q2, etc. which are always going to be not equal to an empty string):
set answer {}
foreach varname {q1 q2 q3 q4 q5} {
set elem [set $varname]
if {$elem ne ""} {
lappend answer $elem
}
}
puts [join $answer " separator "]

Not an answer, but a response to the array for comments on Shawn's answer
A Tcl implementation of array for
proc array_for {vars arrayName body} {
if {[llength $vars] != 2} {
error {array for: "vars" must be a 2 element list}
}
lassign $vars keyVar valueVar
# Using the complicated `upvar 1 $arrayName $arrayName` so that any
# error messages propagate up with the user's array name
upvar 1 $arrayName $arrayName \
$keyVar key \
$valueVar value
set sid [array startsearch $arrayName]
# If the array is modified while a search is ongoing, the searchID will
# be invalidated: wrap the commands that use $sid in a try block.
try {
while {[array anymore $arrayName $sid]} {
set key [array nextelement $arrayName $sid]
set value [set "${arrayName}($key)"]
uplevel 1 $body
}
} trap {TCL LOOKUP ARRAYSEARCH} {"" e} {
puts stderr [list $e]
dict set e -errorinfo "detected attempt to add/delete array keys while iterating"
return -options $e
} finally {
array donesearch $arrayName $sid
}
return
}
and add to the array ensemble:
set map [namespace ensemble configure array -map]
dict set map for ::array_for
namespace ensemble configure array -map $map
Given that, an array values subcommand can be easily created (to pair with array names)
proc array_values {arrayName} {
upvar 1 $arrayName ary
set values [list]
array for {name value} ary {lappend values $value}
return $values
}

How to extract only two fields using Regular Expression

In TCL I am writing the regular expression for below output:
Output args is
packet-filter 0
identifier 0
direction bidirectional
network-ip 10.7.98.231/32
ue-port-start 0
ue-port-end 0
nw-port-start 0
nw-port-end 0
protocol 1
precedence 0
packet-filter 1
identifier 1
direction uplink
network-ip 10.7.98.231/32
ue-port-start 0
ue-port-end 0
nw-port-start 0
nw-port-end 0
protocol 1
precedence 0
Output of my Regular Expression : regexp -all -inline {direction\s+(\S+)} $args is
{direction bidirectional} bidirectional {direction uplink} uplink
I need to extract the direction value which is bidirectional and uplink
Any suggestion ?

For the current case, where the captured substrings are chunks of non-whitespace text, you may re-build the output checking if each item has length set to 1:
set results [regexp -all -inline {direction\s+(\S+)} $args]
set res {}
foreach item $results {
if {[llength $item] == 1} {
lappend res $item
}
}
Then, $res will only hold bidirectional and uplink.
See the Tcl demo.
For a more generic case, you may use
set res {}
foreach {whole capture1} $results {
lappend res $capture1
}
See this Tcl demo
You may add more captureX arguments to accommodate all the capturing group values returned by your regex.

You simply need a loop or something equivalent. If you need to work on each direction individually, a foreach loop is appropriate:
set results [regexp -all -inline {direction\s+(\S+)} $args]
foreach {main sub} $results {
puts $sub
}
# bidirectional
# uplink
Or if you need the list of directions, then lmap sounds appropriate:
set directions [lmap {main sub} $results {set sub}]
# bidirectional uplink

The regexp is not absolutely necessary, you may process the value of args into a dictionary:
set d [dict create]
foreach {k v} $args {
dict lappend d $k $v
}
puts [dict get $d direction]

Perl - Using regex to match input in hash key or value

First, this is a homework assignment. I am having a tough time with regex, and I'm stuck.
This is the code I have so far, where I have the user designate a filename, and if it exists, populates a hash of the names as keys, and the phone numbers as the values.
#!/usr/bin/perl
use strict;
print "\nEnter Filename: ";
my $file = <STDIN>;
chomp $file;
if(!open(my $fileName, "<", "$file"))
{
print "Sorry, that file doesn't exist!", "\n";
}
else
{
my %phoneNums;
while (my $line=<$fileName>)
{
chomp($line);
(my $name,my $number) = split /:/, $line;
$phoneNums{$name} = $number;
}
print "Read in the file!", "\n\n";
print "Enter search: ";
my $input = <STDIN>;
chomp $input;
#HERE IS WHERE I'M LOST
}
print "\n";
This is the part I am stuck on:
Allow the user to enter a search string.
Look for matches using the same style as the phone. Any individual
character in the search string can match any other character from the
key, meaning a ‘2’ in the search string can match a ‘2’, ‘A’, ‘B’, or ‘C’ in the contact list. Matches can occur in the contact name or the phone number. For a match to occur, each character in the search string must appear, in order, in the contact info, but not necessarily next to each
other. For example, a search string of “86” (essentially the same as a search string of “TM” or “NU”) would match “TOM” but not “MOTHER”.
Characters on each phone keys:
0,
1,
2ABC,
3DEF,
4GHI,
5JKL,
6MNO,
7PQRS,
8TUV,
9WXYZ
I just am stuck on how exactly to make all those character classes, and any help at all is much appreciated.

The way to tackle this is by writing a function that reduces your 'things' to their common components. The best way to do this IMO is use a hash:
my %num_to_letter = (
0 => [],
1 => [],
2 => [ "A", "B", "C" ],
3 => [ "D", "E", "F" ],
4 => [ "G", "H", "I" ],
5 => [ "J", "K", "L" ],
## etc.
);
my %letter_to_num;
foreach my $key ( keys %num_to_letter ) {
foreach my $element ( #{$num_to_letter{$key}} ) {
$letter_to_num{lc($element)} = lc($key);
}
}
print Dumper \%letter_to_num;
This creates a map of which letters or numbers map to their original - a bit like this:
$VAR1 = {
'b' => '2',
'g' => '4',
'e' => '3',
'i' => '4',
'a' => '2',
'j' => '5',
...
Note - you can do this by hand, but I prefer to generate from the top map, because I think it looks neater. Note - we use lc to lower case everything, so this becomes case insensitive. It's probably worth looking at fc - which is a similar tool but handles international characters. (Not relevant in this example though)
You then 'reduce' both search and 'target' to their common values:
sub normalise {
my ( $input ) = #_;
#join with no delimiter.
return join ( '',
#look up $_ (each letter) in $letter_to_num
#if not present, use // operator to return original value.
#this means we get to turn numbers into letters,
#but leave things that are already numbers untouched.
map { $letter_to_num{lc($_)} // $_ }
#split the input line into characters.
split ( //, $input )
);
}
print normalise ("DAD"),"\n"; ## 323
And then compare one against the other:
my $search = "DAD";
my $normalised_search = normalise($search);
print "Searching for: \"$normalised_search\"\n";
my $number_to_match = '00533932388';
my $string_to_match = "daddyo";
print "Matches number\n"
if normalise($number_to_match) =~ m/$normalised_search/;
print "Matches string\n"
if normalise($string_to_match) =~ m/$normalised_search/;

Here's an almost procedural approach that cheats a bit by using Hash::MultiValue:
use Hash::MultiValue; # makes reversing and flattening easier
# build a hash from the phone_keypad array or do it manually!
my #phone_keypad = qw(0 1 2ABC 3DEF 4GHI 5JKL 6MNO 7PQRS 8TUV 9WXYZ);
my %num2let = map { /(\d{1})(\w{3,4})/;
if ($2) { $1 => [ split('',$2) ] } else { 0 => [] , 1 => [] }
} #phone_keypad ;
# Invert the hash using Hash::MultiValue
my $num2let_mv = Hash::MultiValue->from_mixed(\%num2let);
my %let2num = reverse $num2let_mv->flatten ;
# TOM in numbers - 866 in letters
my $letters = "TOM" ;
print join '', $let2num{$_} // $_ for (split('', $letters)), "\n";
my $phone_input = "866" ;
print join '', #{$num2let{$_}}," " for (split('', $phone_input)) , "\n";
Output:
866
TUV MNO MNO
So here "TOM" would overlap with "UNO" ... I like #Sobrique's answer :-)
To search an array/list of contact names using the phone keypad input we can create a hash containing keys and values of the names and their number equivalents and then match the "converted" name value against the input:
use Hash::MultiValue; # makes reversing and flattening easier
my #contacts = <DATA> ;
chomp #contacts;
# build a hash from the phone_keypad array or do it manually!
my #phone_keypad = qw(0 1 2ABC 3DEF 4GHI 5JKL 6MNO 7PQRS 8TUV 9WXYZ);
my %num2let = map { /(\d{1})(\w{3,4})/;
if ($2) { $1 => [ split('',$2) ] } else { 0 => [] , 1 => [] }
} #phone_keypad ;
# Invert the hash using Hasj::MultiValue
my $num2let_mv = Hash::MultiValue->from_mixed(\%num2let);
my %let2num = reverse $num2let_mv->flatten ;
# create key/value pairs for contact database
my %contacts2nums ;
for $contact (#contacts) {
$contacts2nums{$contact} = join "",
map { $let2num{$_} } split('', uc $contact);
}
my $phone_input = "866";
for my $contact (keys %contacts2nums) {
print "The text: \"$contact\" matches the input: \"$phone_input\" \n"
if $phone_input eq $contacts2nums{$contact};
}
__DATA__
Tom
Mother
TIMTOWDI
DAD
Gordon
Output:
The text: "Tom" matches the input: "866"
A more organized approach would wrap the conversion operation in a function.
Addendum:
With a real keypad you could probably come up with a simple algorithm that could be more deterministic regarding the letter you want to associate with the number on the keypad. You could iterate through the array based on number of presses of the key: e.g. two presses on "2" would be equal to "B", etc. You'd just have to figure out how/when to move to the next character with some kind of timeout/wait value. This way you would have a more exact string on which to base your search.

tcl check key value format

I would like to check if the line is in key-value format, so I do:
set index [string first "=" $line]
if { $index == -1 } {
#error
}
set text [string range $line [expr $index + 1] end]
if { [string first "=" $text ] != -1 } {
#error
}
How can I write this check as a regexp?

You could also split the string using = as a separator and check the number of resulting fields
set fields [split $line =]
switch [llength $fields] {
1 {error "no = sign"}
2 {lassign $fields key value}
default {error "too many = signs"}
}

Your code is a bit confusing for the last if statement.
Through regex, you can use:
% regexp {=(.*)$} $line - text
1 # If there's no "=", it will be zero and nothing will be stored in $text,
# as $text will not exist
In an if block, you can use:
if {[regexp {=(.*)$} $line - text]} {
puts $text
} else {
# error
}
EDIT: To check if the string contains only one = sign:
if {[regexp {^[^=]*=[^=]*$} $line]} {
return 1
} else {
return 0
}
^ means the beginning of a string.
[^=] means any character except the equal sign.
[^=]* means any character except the equal sign occurring 0 or more times.
= matches only one equal sign.
$ matches the end of the string.
So, it checks whether the string has only one equal sign.
1 means that the line contains only 1 equal sign, 0 means there are no equal sign, or more than 1 equal sign.

How do I get the text that present under the matched string in tcl

I have a string value in tcl as
set out " ABC CDE EFG
123 456"
I want to get the text that is present below text "EFG".
As right now it is "456", but it can be anything so I need a way though which I can grep for "EFG" and get the text below it.

This answer takes some inspiration from Johannes Kuhn's answer, but I use regexp to get the word indices from the "keys" line.
# this is as close as I can get to a here-doc in Tcl
set out [string trim {
ABC DEF GHI
123 456
}]
# map the words in the first line to the values in the 2nd line
lassign [split $out \n] keys values
foreach range [regexp -all -inline -indices {\S+} $keys] {
set data([string range $keys {*}$range]) [string range $values {*}$range]
}
parray data
outputs
data(ABC) = 123
data(DEF) =
data(GHI) = 456

I Suggest splitting the string into the keys and values with
lassign [split $out \n] keys values
and then look for the string position in the keys and get the same range in the values
set start [string first "EFG" $keys]
set value [string range $values $start [expr {${start}+[string length "EFG"]-1}]]
wraping it in a proc and we get
proc getValue {input lookFor} {
lassign [split $input \n] keys values
set start [string first $lookfor $keys]
set value [string range $values $start \
[expr {${start}+[string length $lookfor]-1}]]
}
invoke it like that:
getValue $out "EFG"
Edit: how is the 2nd line aligned? With a tabulator (\t), spaces?

In this case what you actually have is two lines with groups of 3 alphanumeric characters separated by spaces with a large amount of leading whitespace prefixing the second line ("\x20ABC\x20CDE\x20EFG\n[string repeat \x20 10]123[string repeat \x20 5]456" will reproduce what you posted). In your example [string range end-2 end] would give you what you need. I'd suggest reading the file line by line and each time you see the EFG, on the next line extract the part you need (maybe using string range) and emit it.
For example (untested):
set state 0
set f [open $inputfile r]
while {[gets $f line] != -1} {
if {$state} {
puts [string range $line end-2 end]
set state 0
} else {
if {[string match "*EFG" $line]} { set state 1 }
}
}
close $f

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing Values in columns using TCL - regex

I have an output which looks like below A B C 0 1 2 I have lot of coloumns due to which the o/p looks to have wrapped around. I am wondering a way to get the respective value if I parse the keys (A B or C)

Related

conceptual error with control flow in Tcl

How to extract only two fields using Regular Expression

Perl - Using regex to match input in hash key or value

tcl check key value format

How do I get the text that present under the matched string in tcl

Categories

Resources