Tcl - How to Add Text after last character through regex?

Tcl - How to Add Text after last character through regex? - regex

I need a tip, tip or suggestion followed by some example of how I can add an extension in .txt format after the last character of a variable's output line.
For example:
set txt " ONLINE ENGLISH COURSE - LESSON 5 "
set result [concat "$txt" .txt]
Print:
Note that there is space in the start, means and fin of the variable phrase (txt). What must be maintained are the spaces of the start and means. But replace the last space after the end of the sentence, with the format of the extension [.txt].
With the built-in concat method of Tcl, it does not achieve the desired effect.
The expected result was something like this:
ONLINE ENGLISH COURSE - LESSON 5.txt
I know I could remove spaces with string map but I don't know how to remove just the last occurrence on the line.
And otherwise I don’t know how to remove the last space to add the text [.txt]
If anyone can point me to one or more solutions, thank you in advance.

set result "[string trimright $txt].txt"
or
set result [regsub {\s*$} $txt ".txt"]

Related

Splitting name/value pairs with regex to ignore special characters based on surrounding characters

I have this regex that's worked well so far that splits 'name=value' pairs separated by a given character.
(?s)([^\s=]+)=(.*?)(?=\s+[^\s=]+=|\Z)
I know the separator, but the problem is in the example below (tab separated):
usrName=Wilma sev=4 cat=Detection CommandLine="C:\powershell.exe" -Enc 0ATQBpAG0AAcABDAHIAZQBkAHMAIgA= IOCValue= ProcessEndTime=2023-01-18 15:51:05
https://regex101.com/r/1wgVxs/5
Some values can have no value in the case of 'IOCValue' which works as expected, however some values like the CommandLine are giving me up to -Enc as one match and the remainder to the next pair as another.
What I'm hoping to get out from the above is:
usrName=Wilma
sev=4
cat=Detection
CommandLine="C:\powershell.exe" -Enc 0ATQBpAG0AAcABDAHIAZQBkAHMAIgA=
IOCValue=
ProcessEndTime=2023-01-18 15:51:05
But I'm getting:
usrName=Wilma
sev=4
cat=Detection
CommandLine="C:\powershell.exe" -Enc
0ATQBpAG0AAcABDAHIAZQBkAHMAIgA=
IOCValue=
ProcessEndTime=2023-01-18 15:51:05
Given I know the separator is a tab I think what I need is to only look for name=value pairs when they are at the start of the line or proceeded by the separator (tab). Is this possible?
Note, I can expect a space separator too, but I have a less performant and messy non-regex version I can send these too, so presume tab.

You may use this simplified regex:
(?s)([^\s=]+)=(.*?)(?=\t|\Z)
Updated RegEx Demo
Here, lookahead (?=\t|\Z) will make sure that value part is followed by either a tab character or end position.

PowerShell - Problems at using match if the text I need is before the keyword

My problem is that if I write $var -match "(id="".*?$args)", I get everything from the first id of the text to $args, but I only need text starting from the id which is closest to $args.
Any help would be much appreciated.

Assuming $var has no properties we can grab, if so just use select/where, otherwise what you can try is finding the start index of $args, then if the IDs are the same length each time you could grab X number of characters before $args.
#So it would look something like: ID=xxxxxx$args
$argsIndex = $var.IndexOf($args)
#Lets say the ID is always 6 chars long, grab 6 characters prior using the substring method:
$var.substring($argsIndex-6,6)

Advanced text replacement (cloze deletion)

Well, I'd like to replace specific texts based on text, yeah sounds funny, so here it is.
The problem is how to replace the tab-separated values. Essentially, what I'd like to do is replace the matching vocabulary string found on the sentence with {...}.
The value before the tab \t is the vocab, the value after the tab is the sentence. The value on the left of the \t is the first column, to its right is the second column
TL;DR Version (English Version)
Essentially, I want to replace the text on the second column based on the first Column.
Examples:
ABCD \t 19475ABCD_97jdhgbl
would turn into
ABCD \t 19475{...}_97jdhgbl
ABCD is the first column here and 19475ABCD_97jdhgbl is the second one.
If you don't get the context of the Long Version below, solving this ABCD problem would be fine by me. I think it's quite a simple code but given that it's been about 4 years since I last coded in C and I've only recently started learning python, I can't do it.
Long Version: (Japanese-specific text)
1. Case 1: (For pure Kanji)
全部 \t それ、全部ください。
would become
全部 \t それ、{...}ください。
2. Case 2: (For pure Kana)**
ああ \t ああうるさい人は苦手です。
would become
ああ \t {...}うるさい人は苦手です。
あいづち \t 彼の話に私はあいづちを打ったの。
would become
あいづち \t 彼の話に私は{...}を打ったの。
For Case 1 and Case 2 it has to be exact matches, especially for kana because otherwise it might replace other kana in the sentence. The coding for Case 3 has to be different (see next).
3. Case 3: (for mixed Kana and Kanji)
This is the most complex one. For this one, I'd like the script/solution to change only the matching strings, i.e., it will ignore what doesn't match and only replace those with found matches. What it does is it takes the longest possible match and replace accordingly.
上げる \t 彼は荷物をあみだなに上げた。
would become
上げる \t 彼は荷物をあみだなに{...}た。
Note here that the first column has 上げる but the second column has 上げた because it has changed in tense (First column has る while the second one has た).
So, Ideally the solution should take the longest string found in both columns, in this case it is 上げ, so this is the only string replaced with {...}, while it leaves た.
Another example
が増える \t 値段がが増える
would become
が増える \t 値段が{...}
More TL;DR
I'm actually using this for Anki.
I could use excel or notepad++ but I don't think they could replace text based on placeholders.
My goal here is to create pseudo-cloze sentences that I can use as hints hidden in a hint field only to be used for ridiculously hard synonyms or homonyms (I have an Auditory card).
I know I'm missing a fourth case, i.e., pure kana with the possibility of a sentence having changed its tense, hence its spelling. Well, that'd be really hard to code so I'd rather do it manually so as not to mess up the other kana in the sentence.
Update
I forgot to say that the text is contained in a .txt file in this format:
全部 \t それ、全部ください。
ああ \t ああうるさい人は苦手です。
あいづち \t 彼の話に私はあいづちを打ったの。
上げる \t 彼は荷物をあみだなに上げた。
There are about 7000 lines of those things so it has to check the replacements for every line.
Code works, thanks, just a minor bug with sentences including non-full replacements, it creates broken characters.
上げたxxxx 彼は荷物をあみだなに上げあ。
ABCD ABCD123
86876 xx86876h897
全部 それ、全部ください
ああ ああうるさい人は苦手です。
上げたxxxx 彼は荷物をあみだなに上げあ。
務める ああうるさい人は苦手で務めす。
務める ああうるさい務めす人は苦手で。
turns into:
Just edited James' code a bit for testing purposes (I'm using this edited version to check what kind of strings would throw off the code.
So far I've discovered that spaces in the vocabulary could cause some trouble.
This code prints the original line below the parsed line.
Just change this line:
fout.write(output)
to this
fout.write(output+str(line)+'\n')

This regex should deal with the cases you are looking for (including matching the longest possible pattern in the first column):
^(\S+)(\S*?)\s+?(\S*?(\1)\S*?)$
Regex demo here.
You can then go on to use the match groups to make the specific replacement you are looking for. Here is an example solution in python:
import re
regex = re.compile(r'^(\S+)(\S*?)\s+?(\S*?(\1)\S*?)$')
with open('output.txt', 'w', encoding='utf-8') as fout:
with open('file.txt', 'r', encoding='utf-8') as fin:
for line in fin:
match = regex.match(line)
if match:
hint = match.group(3).replace(match.group(1), '{...}')
output = '{0}\t{1}\n'.format(match.group(1) + match.group(2), hint)
fout.write(output)
Python demo here.

find a pattern in string and remove that pattern of the string from excel cells without touching the pattern in the middle of the string

I have a column which has "--" pattern in the beginning, middle and end of the string. For example:
-- myString
my -- String
myString --
I want to find these two types of cells
-- myString
myString --
and remove the "--" pattern, so it will look fine! I am an amateur user of excel but can use functions if you suggest me. It should be possible with find and use the results of the Find in Replace functions, but I do not know how to pass the results to Replace.
Please note: The answer should take care all the cells in the column, which are hundreds. One solution for changing all, not one solution for one cell.

EDIT: Just reread the request, per instruction from Gary'sStudent. This will remove all instances of "--", not only those at the beginning/end.
If the data is in A1, use the following formula:
=SUBSTITUTE(A1,"--","")

With data in A1 in B1 enter:
=IF(LEFT(A1,2)="--",MID(A1,3,9999),IF(RIGHT(A1,2)="--",MID(A1,1,LEN(A1)-2),A1))

OK, I found the answer. The answer from #Dubison helped me to find the right answer.
If the left two characters in this cell is "--" and the last two characters are "--" the substitute the "--" with "", else to nothing.
=IF(LEFT(A1,2)="--",SUBSTITUTE(A1,"--",""),IF(RIGHT(A1,2)="--",SUBSTITUTE(A1,"--",""), A1))

This will be pretty much the same with previous answers, only using simpler logic. If your strings first or last character = "-" do nothing, else replace "--" with "".
=IF(LEFT(A1,1)="-",A1,IF(RIGHT(A1,1)="-",A1, SUBSTITUTE(A1,"--","")))
UPDATE:
I noticed that I have misread the question. Above code will remove the "--" only if it is in the middle. However original question was to remove "--" only if it is at the beginning or at the end. So formula should be:
=IF(OR(LEFT(A1,2)="--",RIGHT(A1,2)="--"),SUBSTITUTE(A1,"--",""),A1)

Regular Expression to find string in Expect buffer

I'm trying to find a regex that works to match a string of escape characters (an Expect response, see this question) and a six digit number (with alpha-numeric first character).
Here's the whole string I need to identify:
\r\n\u001b[1;14HX76196
Ultimately I need to extract the string:
X76196
Here's what I have already:
interact {
#...
#...
#this expression does not identify the screen location
#I need to find "\r\n\u001b[1;14H" AND "([a-zA-Z0-9]{1})[0-9]{5}$"
#This regex was what I was using before.
-nobuffer -re {^([a-zA-Z0-9]{1})?[0-9]{5}$} {
set number $interact_out(0,string)
}
I need to identify the escape characters to to verify that it is a field in that screen region. So I need a regex that includes that first portion, but the backslashes are confusing me...
Also once I have the full string in the $number variable, how do I isolate just the number in another variable in Tcl?

If you just want the number at the end, then this should be enough...
[0-9]{6}
Update with new information
Assuming \n is a newline character, rather than a literal \ followed by a literal n, you can do this...
\r\n\u001B\[1;14H(X[0-9]{5})

I found out a few things with some more digging. First of all I wasn't looking at the output of the program but the input of the user. I needed to add the "-o" flag to look at the program output. I also shortened the regex to just the necessary part.
The regex example from #rikh led me to look at why his or my own regex was failing, and that was due to the fact that I wasn't looking at the output but the input. So the original regex that I tried wasn't at fault but the data being looked at (missing the "-o" flag)
Here's the complete answer to my problem.
interact {
#...
-o -nobuffer -re {(\[1;14H[a-zA-Z0-9]{1})[0-9]{5}} {
#get number in place
set numraw $interact_out(0,string)
#get just number out
set num [string range $numraw 6 11]
#switch to lowercase
set num [string tolower $num]
send_user " stored number: $num"
}
}
I'm a noob with Expect and Tcl so if any of this doesn't make sense or if you have any more insights into the interact flags, please set me straight.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js