TCL: Backslash issue (regsub) - regex

I have an issue while trying to read a member of a list like \\server\directory
The issue comes when I try to get this variable using the lindex command, that proceeds with TCL substitution, so the result is:
\serverdirectory
Then, I think I need to use a regsub command to avoid the backslash substitution, but I did not get the correct proceedure.
An example of what I want should be:
set mistring "\\server\directory"
regsub [appropriate regular expresion here]
puts "mistring: '$mistring'" ==> "mistring: '\\server\directory'"
I have checked some posts around this, and keep the \\ is ok, but I still have problems when trying to keep always a single \ followed by any other character that could come here.
UPDATE: specific example. What I am actually trying to keep is the initial format of an element in a list. The list is received by an outer application. The original code is something like this:
set mytable $__outer_list_received
puts "Table: '$mytable'"
for { set i 0 } { $i < [llength $mitabla] } { incr i } {
set row [lindex $mytable $i]
puts "Row: '$row'"
set elements [lindex $row 0]
puts "Elements: '$elements'"
}
The output of this, in this case is:
Table: '{{
address \\server\directory
filename foo.bar
}}'
Row: '{
address \\server\directory
filename foo.bar
}'
Elements: '
address \\server\directory
filename foo.bar
'
So I try to get the value of address (in this specific case, \\server\directory) in order to write it in a configuration file, keeping the original format and data.
I hope this clarify the problem.

If you don't want substitutions, put the problematic string inside curly braces.
% puts "\\server\directory"
\serverdirectory
and it's not what you want. But
% puts {\\server\directory}
\\server\directory
as you need.

Since this is fundamentally a problem on Windows (and Tcl always treats backslashes in double-quotes as instructions to perform escaping substitutions) you should consider a different approach (otherwise you've got the problem that the backslashes are gone by the time you can apply code to “fix” them). Luckily, you've got two alternatives. The first is to put the string in {braces} to disable substitutions, just like a C# verbatim string literal (but that uses #"this" instead). The second is perhaps more suitable:
set mistring [file nativename "//server/directory"]
That ensures that the platform native directory separator is used on Windows (and nowadays does nothing on other platforms; back when old MacOS9 was supported it was much more magical). Normally, you only need this sort of thing if you are displaying full pathnames to users (usually a bad idea, GUI-wise) or if you are passing the name to some API that doesn't like forward slashes (notably when going as an argument to a program via exec but there are other places where the details leak through, such as if you're using the dde, tcom or twapi packages).

A third, although ugly, option is to double the slashes. \\ instead of \, and \ instead of \, while using double quotes. When the substitution occurs it should give you what you want. Of course, this will not help much if you do the substitution a second time.

Related

Ruby Regex on Active Directory String

I have a string that represents multiple DNs for Active Directory but has been separated by commas instead of ;
The String:
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
I am trying to write a regex that will match on both ou=App1 and not the ou=App2 but then also make the , after dc=internal become a ;
Is this possible?
The result would be:
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;
Using #strip and #sub to Clean Up Your LDIF Data
Really, the "correct" answer would be to get valid LDIF in the first place, and then parse it as such with a gem like Net::LDAP. However, the changes you want to your existing file are fairly trivial. For example, we'll start by assigning the String data from your question to a variable named ldif using a here-document literal:
ldif = <<~'LDIF'
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
LDIF
You can now modify and match the lines from the String that you want with String#each_line to iterate, and String#gsub and a Regexp lookahead assertion to find and collect the lines you want using Array#select on the output from #each_line, and storing the results into a matching_apps Array.
This all sounds much more complicated than it is. Consider the following method chain, which is really just a one-liner wrapped for readability:
matching_apps =
ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
.map { _1.strip.sub /[,;]$/, ";" }
#=>
["CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;",
"CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;"]
The use of String#strip and String#sub will help to ensure that all lines are normalized the way you want, including the trailing semicolons. However, this is likely to cause problems in subsequent steps, so I'd probably recommend removing those trailing semicolons as well.
Note: You can stop reading here if you just want to solve your immediate question as originally posted. The rest of the answer covers additional considerations related to data normalization, and provides some examples on how and why you might want to strip the semicolons as well.
Why and How to Normalize without Semicolons
You can replace the final substitution from #sub with an empty String (e.g. "") to remove the trailing semicolons (if present). Normalizing without the semicolons now may save you the trouble of having to clean up those lines again later when you iterate over the Array of results stored in matching_apps from Array#select.
For example, if you need to rejoin lines with commas, interpolate the lines within other String objects in subsequent steps, or do anything where those stored semicolons may be an unexpected surprise it's better to deal with it sooner rather than later. If you really need the trailing semicolons, it's very easy to use String#concat or other forms of String interpolation to add them back, but having unexpected characters in a String can be a source of unexpected bugs that are best avoided unless you're sure you'll always need that semicolon at the end.
Example 1: Output Where Semicolons Might be Unexpected
For example, suppose you want to use the results to format output for a command-line client where a trailing semicolon wouldn't be expected. The following works nicely because the semicolons are already stripped:
matching_apps =
ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
.map { _1.strip.sub /[,;]$/, "" }
printf "Make the following calls:\n\n"
matching_apps.each_with_index do |dn, idx|
puts %(#{idx.succ}. ldapsearch -D '#{dn}' [opts])
end
This would print out:
Make the following calls:
1. ldapsearch -D 'CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
2. ldapsearch -D 'CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
without having to first strip any trailing semicolons that might not work with the printed command, tool, or other output.
Examples of Rejoining with Commas and Semicolons
On the other hand, you can just as easily rejoin the Array elements with a comma or semicolon if you want. Consider the following two examples:
matching_apps.join ", "
#=> "CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal, CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal"
p format("(%s)", matching_apps.join("; "))
#=> "(CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal; CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal)"
Keep Flexibility in Mind
If the String objects in your Array still had the trailing semicolons, you'd have to do something about them. So, unless you already know what you plan to do with each String, and whether or not the semicolons will be needed, it's probably best to keep them out of matching_apps in the first place to optimize for flexibility. That's just an opinion, to be sure, but definitely one worth considering.

Bash replace substring after first colon

I am trying to build a connection string that requires pulling 3 IP addresses from another config file. When I get those values, I need to replace the port on each. I plan to replace each port using simple Bash find and replace ${string/pattern/replacement} but my problem is I'm stuck on the best way to parse the pattern out of the IP.
Here is what I have so far:
myFile.config:
ip.1=ip-ip-1-address:1234:5678
ip.2=ip-ip-2-address:1234:5678
ip.3=ip-ip-3-address:1234:5678
Copying some other simple process, I found I can pull the value of each IP like this:
IP1=`grep "ip.1=" /path/to/conf/myFile.config | awk -F "=" '{print $2}'`
which gives me ip.1=ip-ip-1-address:1234:5678. However, I need to replace 1234:5678 with 6543 for example. I've been looking around and I found this awesome answer that detailed using Bash prefix substitution but that relies on knowing the parameter. for example, I would have to do it this way:
test=${ip1##ip-ip-1-address:}
which results in $test being 1234:5678. That's fine but maybe I don't know the IP address as the parameter, so I'm back to considering regex unless there's a way for me to use * as the parameter or something, but I have been unsuccessful so far. For regex, I have tried a bunch such as test=${ip1/(?<=:).*/}.
Note that ${ip1/(?<=:).*/} you tried is an example of string manipulation syntax that does not support regex, only specific patterns.
You seem to want
x='ip.1=ip-ip-1-address:1234:5678'
echo "${x%%:*}:6543" # => ip.1=ip-ip-1-address:6543
The ${x%%:*} takes the value of x and removes all chars from the end till the first : including it. :6543 is added to the result of this manipulation using "${x%%:*}:6543".
To extract that value, you may also use
awk '/^ip\.1=/{sub("^[^:]+:", "");print}' myFile.config
The awk command finds lines starting with ip.1= and then removes all text from the start till the first colon including the colon and only prints these values.

Issues while processing zeroes found in CSV input file with Perl

Friends:
I have to process a CSV file, using Perl language and produce an Excel as output, using the Excel::Writer::XSLX module. This is not a homework but a real life problem, where I cannot download whichever Perl version (actually, I need to use Perl 5.6), or whichever Perl module (I have a limited set of them). My OS is UNIX. I can also use (embedding in Perl) ksh and csh (with some limitation, as I have found so far). Please, limit your answers to the tools I have available. Thanks in advance!
Even though I am not a Perl developer, but coming from other languages, I have already done my work. However, the customer is asking for extra processing where I am getting stuck on.
1) The stones in the road I found are coming from two sides: from Perl and from Excel particular styles of processing data. I already found a workaround to handle the Excel, but -as mentioned in the subject- I have difficulties while processing zeroes found in CSV input file. To handle the Excel, I am using the '0 way which is the final way for data representation that Excel seems to have while using the # formatting style.
2) Scenario:
I need to catch standalone zeroes which might be present in whichever line / column / cell of the CSV input file and put them as such (as zeroes) in the Excel output file.
I will go directly to the point of my question to avoid loosing your valuable time. I am providing more details after my question:
Research and question:
I tried to use Perl regex to find standalone "0" and replace them by whichever string, planning to replace them back to "0" at the end of processing.
perl -p -i -e 's/\b0\b/string/g' myfile.csv`
and
perl -i -ple 's/\b0\b/string/g' myfile.csv
Are working; but only from command line. They aren't working when I call them from the Perl script as follows:
system("perl -i -ple 's/\b0\b/string/g' myfile.csv")
Do not know why... I have already tried using exec and eval, instead of system, with the same results.
Note that I have a ton of regex that work perfectly with the same structure, such as the following:
system("perl -i -ple 's/input/output/g' myfile.csv")
I have also tried using backticks and qx//, without success. Note that qx// and backticks have not the same behavior, since qx// is complaining about the boundaries \b because of the forward slash.
I have tried using sed -i, but my System is rejecting -i as invalid flag (do not know if this happens in all UNIX, but at least happens in the one at work. However is accepting perl -i).
I have tried embedding awk (which is working from command line), in this way:
system `awk -F ',' -v OFS=',' '$1 == \"0\" { $1 = "string" }1' myfile.csv > myfile_copy.csv
But this works only for the first column (in command line) and, other than having the disadvantage of having extra copy file, Perl is complaining for > redirection, assuming it as "greater than"...
system(q#awk 'BEGIN{FS=OFS=",";split("1 2 3 4 5",A," ") } { for(i in A)sub(0,"string",$A[i] ) }1' myfile.csv#);
This awk is working from command line, but only 5 columns. But not in Perl using #.
All the combinations of exec and eval have also been tested without success.
I have also tried passing to system each one of the awk components, as arguments, separated by commas, but did not find any valid way to pass the redirector (>), since Perl is rejecting it because of the mentioned reason.
Using another approach, I noticed that the "standalone zeroes" seem to be "swallowed" by the Text::CSV module, thus, I get rid off it, and turned back to a traditional looping in csv line by line and a spliter for commas, preserving the zeroes in that way. However I found the "mystery" of isdual in Perl, and because of the limitation of modules I have, I cannot use the Dumper. Then, I also explored the guts of binaries in Perl and tried the $x ^ $x, which was deprecated since version 5.22 but valid till that version (I said mine is 5.6). This is useful to catch numbers vs strings. However, while if( $x ^ $x ) returns TRUE for strings, if( !( $x ^ $x ) ) does not returns TRUE when $x = 0. [UPDATE: I tried this in a devoted Perl script, just for this purpose, and it is working. I believe that my probable wrong conclusion ("not returning TRUE") was obtained when I did not still realize that Text::CSV was swallowing my zeroes. Doing new tests...].
I will appreciate very much your help!
MORE DETAILS ON MY REQUIREMENTS:
1) This is a dynamic report coming from a database which is handover to me and I pickup programmatically from a folder. Dynamic means that it might have whichever amount of tables, whichever amount of columns in each table, whichever names as column headers, whichever amount of rows in each table.
2) I do not know, and cannot know, the column names, because they vary from report to report. So, I cannot be guided by column names.
A sample input:
Alfa,Alfa1,Beta,Gamma,Delta,Delta1,Epsilon,Dseta,Heta,Zeta,Iota,Kappa
0,J5,alfa,0,111.33,124.45,0,0,456.85,234.56,798.43,330000.00
M1,0,X888,ZZ,222.44,111.33,12.24,45.67,0,234.56,0,975.33
3) Input Explanation
a) This is an example of a random report with 12 columns and 3 rows. Fist row is header.
b) I call "standalone zeroes" those "clean" zeroes which are coming in the CSV file, from second row onwards, between commas, like 0, (if the case is the first position in the row) or like ,0, in subsequent positions.
c) In the second row of the example you can read, from the beginning of the row: 0,J5,alfa,0, which in this particular case, are "words" or "strings". In this case, 4 names (note that two of them are zeroes, which required to be treated as strings). Thus, we have a 4 names-columns example (Alfa,Alfa1,Beta,Gamma are headers for those columns, but only in this scenario). From that point onwards, in the second row, you can see floating point (*.00) numbers and, among them, you can see 2 zeroes, which are numbers. Finally, in the third line, you can read M1,0,X888,Z, which are the names for the first 4 columns. Note, please, that the 4th column in the second row has 0 as name, while the 4th column in the third row has ZZ as name.
Summary: as a general picture, I have a table-report divided in 2 parts, from left to right: 4 columns for names, and 8 columns for numbers.
Always the first M columns are names and the last N columns are numbers.
- It is unknown which number is M: which amount of columns devoted for words / strings I will receive.
- It is unknown which number is N: which amount of columns devoted for numbers I will receive.
- It is KNOWN that, after the M amount of columns ends, always starts N, and this is constant for all the rows.
I have done a quick research on Perl boundaries for regex ( \b ), and I have not found any relevant information regarding if it applies or not in Perl 5.6.
However, since you are using and old Perl version, try the traditional UNIX / Linux style (I mean, what Perl inherits from Shell), like this:
system("perl -i -ple 's/^0/string/g' myfile.csv");
The previous regex should do the work doing the change at the start of the each line in your CSV file, if matches.
Or, maybe better (if you have those "standalone" zeroes, and want avoid any unwanted change in some "leading zeroes" string):
system("perl -i -ple 's/^0,/string,/g' myfile.csv");
[Note that I have added the comma, after the zero; and, of course, after the string].
Note that the first regex should work; the second one is just a "caveat", to be cautious.

Is there a way to match strings:numbers with variable positioning within the string?

We are using a simple curl to get metrics via an API. The problem is, that the output is fixed in the amount of arguments but not their position within the output.
We need to do this with a "simple" regex since the tool only accepts this.
/"name":"(.*)".*?"memory":(\d+).*?"consumer_utilisation":(\w+|\d+).*?"messages_unacknowledged":(\d+).*?"messages_ready":(\d+).*?"messages":(\d+)/s
It works fine for:
{"name":"queue1","memory":89048,"consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0}
However if the output order is changed, then it doesn't match any more:
{"name":"queue2","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0,"memory":21944}
{"name":"queue3","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"memory":21944,"messages":0}
I need a relative definition of the strings to match, since I never know at which position they will appear. Its in total 9 different queue-metric-groups.
The simple option is to use a regex for each key-value pair instead of one large regex.
/"name":"((?:[^\\"]|\\.)*)"/
/"memory":(\d+)/
This other option is not a regex, but might be sufficient. Instead of using regex, you could simply transform the resulting response before reading it. Since you say "We are using a simple curl" I'm guessing you're talking about the Curl command line tool. You could pipe the result into a simple Perl command.
perl -ne 'use JSON; use Text::CSV qw(csv); $hash = decode_json $_; csv (sep_char=> ";", out => *STDOUT, in => [[$hash->{name}, $hash->{memory}, $hash->{consumer_utilisation}, $hash->{messages_unacknowledged}, $hash->{messages_ready}, $hash->{messages}]]);'
This will keep the order the same, making it easier to use a regex to read out the data.
input
{"name":"queue1","memory":89048,"consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0}
{"name":"queue2","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0,"memory":21944}
{"name":"queue3","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"memory":21944,"messages":0}
output
queue1;89048;;0;0;0
queue2;21944;;0;0;0
queue3;21944;;0;0;0
For this to work you need Perl and the packages JSON and Text::CSV installed. On my system they are present in perl, libjson-perl and libtext-csv-perl.
note: I'm currently using ; as separator. If this is included into one of the output will be surrounded by double quotes. "name":"que;ue1" => "que;ue1";89048;;0;0;0 If the value includes both a ; and a " the " will be escaped by placing another one before it. "name":"q\"ue;ue1" => "q""ue;ue1";89048;;0;0;0

Perl replace every occurrence differently

In a perl script, I need to replace several strings. At the moment, I use:
$fasta =~ s/\>[^_]+_([^\/]+)[^\n]+/\>$1/g;
The aim is to format in a FASTA file every sequence name. It works well in my case so I don't need to touch this part. However, it happens that a sequence name appears several times in the file. I must not have at the end twice - or more - the same sequence name. I thus need to have for instance:
seqName1
seqName2
etc.
(instead of seqName, seqName, etc.)
Is this possible to somehow process differently every occurrence automatically? I don't know how many sequence there are, if there are similar names, etc. An idea would be to concatenate a random string at every occurrence for instance, hence my question.
Many thanks.
John perfectly solved it and chepner helped with the smart idea to avoid conflicts, here is the final result:
$fasta =~ s/\>[^_]+_([^\/]+)[^\n]+/
sub {
return '>'.$1.$i++;
}->();
/eg;
Many many thanks.
I was actually trying to do something like this the other day, here's what I came up with
$fasta =~ s/\>[^_]+_([^\/]+)[^\n]+/
sub {
# return random string
}->();
/eg;
the \e modifier interprets the substitution as code, not text. I use an anonymous code ref so that I can return at any point.