tcl: parse string into list - regex

I'm trying to parse a string into a flat list in TCL.
The string has the format of
name1='value1',name2='value2',name3='value3'
I'm wondering if there's a way to capture names and values into a list that looks like this:
{name1 value1 name2 value2 name3 value3}
Note that the name or value itself may contain anything which includes characters like ' or = or ,

well, possible
set data {name1='value1',name2='value2',name3='value3'}
foreach {- key value -} [regexp -all -inline {(.*?)='(.*?)'(,|$)} $data] {
lappend result $key $value
}
Note: If the key only occurs once, i suggest using dicts (dict set result $key $value).

The easiest way is to replace the symbols =, , and ' with spaces:
% set s {name1='value1',name2='value2',name3='value3'}
name1='value1',name2='value2',name3='value3'
% set new_s [string map {"='" " " "'," " " "'" " "} $s]
name1 value1 name2 value2 name3 value3
string map takes two parameters. The first is a list of old-new pairs of strings. The second is the string itself. Essentially, string map will search for the old strings and replace them with the new. For example, it will search for =' and replaces with (space).
Update
The above solution assumes no space in the names or values. The following solution will work for those values with embedded spaces:
% set s {name1='value1',name2='value 2',name3='value3'}
name1='value1',name2='value2',name3='value3'
% string map {"=" " " "'" {"} "," " "} $s
name1 "value1" name2 "value 2" name3 "value3"

You can do it with a negative character class
set s {name1='value1',name2='value2',name3='value3'}
set result [regexp -inline -all -- {[^=',]+} $s]

Related

ahk - Get text after caracter (space)

I'm new to Autohotkeys. I'm trying to remove all the text up to the first space on each line, getting everything else.
example:
txt1=something
txt2=other thing
var.="-1" " " txt1 " " txt2 "`n"
var.="2" " " txt1 " " txt2 "`n"
var.="4" " " txt1 " " txt2 "`n"
;; more add ...
FinalVar:=var
;...
msgbox % FinalVar
RETURN
Current output:
-1 something other thing
2 something other thing
4 something other thing
how I wish (all lines of FinalVar whitout need Loop):
something other thing
something other thing
something other thing
In bash i could use something like SED
Is there a fast way to do the same thing in ahk?
Thanks to your atention. Sorry my english!
You can use a combination of the InStr command
InStr()
Searches for a given occurrence of a string, from the left or the right.
FoundPos := InStr(Haystack, Needle , CaseSensitive := false, StartingPos := 1, Occurrence := 1)
and SubStr command.
SubStr()
Retrieves one or more characters from the specified position in a string.
NewStr := SubStr(String, StartingPos , Length)
With InStr you find the position of the first space in var.
With SubStr you extract everything after that position to the end of the string like this:
StartingPos := InStr(var, " ")
var := SubStr(var, StartingPos + 1)
Note the + 1, it is there because you need to start extracting the text 1 position after the space, otherwise the space will be the first character in the extracted text.
To replace the leading text in all lines you can use RegExReplace
RegExReplace()
Replaces occurrences of a pattern (regular expression)
inside a string.
NewStr := RegExReplace(Haystack, NeedleRegEx , Replacement := "", OutputVarCount := "", Limit := -1, StartingPosition := 1)
FinalVar := RegExReplace(var, "m`a)^(.*? )?(.*)$", "$2")
m`a)are RegEx options, ^(.*? )?(.*)$ is the actual search pattern.
m Multiline. Views Haystack as a collection of individual lines (if
it contains newlines) rather than as a single continuous line.
`a: `a recognizes any type of newline, namely `r, `n, `r`n,
`v/VT/vertical tab/chr(0xB), `f/FF/formfeed/chr(0xC), and
NEL/next-line/chr(0x85).

R regex - extract strings between two characters for multiple instances

I am trying to extract some keywords from a string in R as follows.
I want to get the strings in between the first ":" after each "[" and ", " or "\b".
string <- c("[G1]3451:GHEIN, [G2]FR343:4453, [G05]RT3342:34:GR", "[L1]TTG4:4532, [L3]EK445:GHR[1C]", "[RT1]JGR:45,RE")
gsub('\\[\\S+:', '', string)
"GHEIN, 4453, GR" "4532, GHR[1C]" "45,RE"
The problem is when two ":" are there.
I should be getting the output as 34:GR instead of GR.
out <- c("GHEIN, 4453, 34:GR", "4532, GHR[1C]", "45,RE")
How to get the desired result using regex in R?
Make it non-greedy:
gsub('*?\\[\\S+:', '', string)
[1] "GHEIN, 4453, 34:GR" "4532, GHR[1C]" "45,RE"

Split string by words in R

I would like to split a string by two words:
s <- "PCB153 treated HepG2 cells at T18"
strsplit(s, split = <treated><at>)
What should I write instead of <>?
I would get:
"PCB153" "HepG2 cells" "T18"
strsplit(s, split="treated|at")
#[[1]]
#[1] "PCB153 " " HepG2 cells " " T18"
You have to enter it as a string. To split on treated:
s <- "PCB153 treated HepG2 cells at T18"
s2 <- strsplit(s,split="treated")
unlist(s2)
To split on treated and at:
unlist(strsplit(unlist(s2),split="at"))

How to convert a list of string to a string in racket?(leaving the spaces intact)

How do I convert a list of strings into a string in DrRacket? For example, if I have
'("44" "444") convert it into "44 444"?
I tried string-join, but it takes a delimiter and if I put one it replaces the space with the delimiter and if I use "" for the delimiter it simply gets rid of it.
In fact string-join is the right procedure for using in this case, simply use " " (a single space) as delimiter:
(string-join '("44" "444") " ")
=> "44 444"
Just to clarify: in a list the spaces between elements are not considered part of the list, they're there to separate the elements. For example, all these lists are equal and evaluate to the same value:
'("44""444")
'("44" "444")
'("44" "444")
If for some reason you want to consider the spaces as part of the list then you have to explicitly add them as elements in the list:
(define lst '("a" " " "b" " " "c" " " "d"))
(string-join lst "")
=> "a b c d"

Perl regular expression removing duplicate consecutive substrings in a string

I tried to do a search on this particular problem, but all I get is either removal of duplicate lines or removal of repeated strings where they are separated by a delimiter.
My problem is slightly different. I have a string such as
"comp name1 comp name2 comp name2 comp name3"
where I want to remove the repeated comp name2 and return only
"comp name1 comp name2 comp name3"
They are not consecutive duplicate words, but consecutive duplicate substrings. Is there a way to solve this using regular expressions?
s/(.*)\1/$1/g
Be warned that the running time of this regular expression is quadratic in the length of the string.
This works for me (MacOS X 10.6.7, Perl 5.13.4):
use strict;
use warnings;
my $input = "comp name1 comp name2 comp name2 comp name3" ;
my $output = "comp name1 comp name2 comp name3" ;
my $result = $input;
$result =~ s/(.*)\1/$1/g;
print "In: <<$input>>\n";
print "Want: <<$output>>\n";
print "Got: <<$result>>\n";
The key point is the '\1' in the matching.
To avoid removing duplicate characters within the terms (e.g. comm1 -> com1) bracket .* in regular expression with \b.
s/(\b.*\b)\1/$1/g
I never work with languages that support this but since you are using Perl ...
Go here .. and see this section....
Useful Example: Checking for Doubled Words
When editing text, doubled words such as "the the" easily creep in. Using the regex \b(\w+)\s+\1\b in your text editor, you can easily find them. To delete the second word, simply type in \1 as the replacement text and click the Replace button.
If you need something running in linear time, you could split the string and iterate through the list:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "comp name1 comp name2 comp name2 comp name3";
my #elems = split("\\s", $str);
my $prevComp;
my $prevFlag = -1;
foreach my $elemIdx (0..(scalar #elems - 1)) {
if ($elemIdx % 2 == 1) {
if (defined $prevComp) {
if ($prevComp ne $elems[$elemIdx]) {
print " $elems[$elemIdx]";
$prevFlag = 0;
}
else {
$prevFlag = 1;
}
}
else {
print " $elems[$elemIdx]";
}
$prevComp = $elems[$elemIdx];
}
elsif ($prevFlag == -1) {
print "$elems[$elemIdx]";
$prevFlag = 0;
}
elsif ($prevFlag == 0) {
print " $elems[$elemIdx]";
}
}
print "\n";
Dirty, perhaps, but should run faster.