Find and Replace first character after a certain pattern - regex

Current text
Variable length text = some string(some more text
Change to
Variable length text = some string(addition some more text
Need to add a certain text after first parenthesis in a line only after "=" character is encountered. Another condition is to ignore patterns like "= (", which essentially means you should ignore patterns with only space between "=" and "("
My Try:
sed -e "s#*=(\w\()#\1(addition#g"
Thanks in anticipation!

Tweak this for your needs:
$ echo 'Variable length text = some string(some more text' |\
sed 's/^[^=]*=[^(]*[[:alnum:]][^(]*(/&addition /'
That matches for:
Beginning of the string
Anything but = any number of times
=
Anything but ( any number of times
An alpha-numeric character
Anything but ( any number of times
(
... and substitutes it with the matched string adding ' addition' to it.
The output is
Variable length text = some string(addition some more text

in perl
s/(.*?=[\s][^(]+?)\((.*)/$1(aditional text $2/

Related

Replace a sequence of characters with a sequence of different characters of same length using regular expressions

I have a string which starts with spaces. I want to replace the leading spaces with equal number of dashes -. I don't want to replace any other spaces which may occur elsewhere in the string.
If I use /^\s*/-/, it only replaces with a single dash. If I use /^\s/-/, it only replaces the first space with a dash. If I remove the anchor /\s/-/, it replaces every occurences of space in the string which is not acceptable.
My string looks like this in general:
<n-leading-spaces><a-non-space-character><remaining-characters>
Example (pipes added to show the boundary):
| ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn |
After substitution (pipes added to show the boundary):
|---ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn |
NOTE: I cannot use any code snippet. I just want to know whether this can be done using just regex patterns. (Forgive my formatting as I'm new to markdown. I welcome formatting corrections)
You can use the following solution to replace a sequence of characters with a sequence of different characters of same length using regular expressions:
my $string = ' ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn ';
$string =~ s/^(\s+)/"-" x length($1)/eg;
print $string;
Returns '----ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn '

How to filter a string for invalid filename characters using regex

My problem is that I don't want the user to type in anything wrong so I am trying to remove it and my problem is that I made a regex which removes everything except words and that also remove . , - but I need these signs to make the user happy :D
In a short summary: This Script removes bad characters in an input field using a regex.
Input field:
$CustomerInbox = New-Object System.Windows.Forms.TextBox #initialization -> initializes the input box
$CustomerInbox.Location = New-Object System.Drawing.Size(10,120) #Location -> where the label is located in the window
$CustomerInbox.Size = New-Object System.Drawing.Size(260,20) #Size -> defines the size of the inputbox
$CustomerInbox.MaxLength = 30 #sets max. length of the input box to 30
$CustomerInbox.add_TextChanged($CustomerInbox_OnTextEnter)
$objForm.Controls.Add($CustomerInbox) #adding -> adds the input box to the window
Function:
$ResearchGroupInbox_OnTextEnter = {
if ($ResearchGroupInbox.Text -notmatch '^\w{1,6}$') { #regex (Regular Expression) to check if it does match numbers, words or non of them!
$ResearchGroupInbox.Text = $ResearchGroupInbox.Text -replace '\W' #replaces all non words!
}
}
Bad Characters I don't want to appear:
~ " # % & * : < > ? / \ { | } #those are the 'bad characters'
Note that if you want to replace invalid file name chars, you could leverage the solution from How to strip illegal characters before trying to save filenames?
Answering your question, if you have specific characters, put them into a character class, do not use a generic \W that also matches a lot more characters.
Use
[~"#%&*:<>?/\\{|}]+
See the regex demo
Note that all these chars except for \ do not need escaping inside a character class. Also, adding the + quantifier (matches 1 or more occurrences of the quantified subpattern) streamlines the replacing process (matches whole consecutive chunks of characters and replaced all of them at once with the replacement pattern (here, empty string)).
Note you may also need to account for filenames like con, lpt1, etc.
To ensure the filename is valid, you should use the GetInvalidFileNameChars .NET method to retrieve all invalid character and use a regex to check whether the filename is valid:
[regex]$containsInvalidCharacter = '[{0}]' -f ([regex]::Escape([System.IO.Path]::GetInvalidFileNameChars()))
if ($containsInvalidCharacter.IsMatch(($ResearchGroupInbox.Text)))
{
# filename is invalid...
}
$ResearchGroupInbox.Text -replace '~|"|#|%|\&|\*|:|<|>|\?|\/|\\|{|\||}'
Or as #Wiketor suggest you can obviate it to '[~"#%&*:<>?/\\{|}]+'

Insert space in string using powershell?

I have a number of files, each with a number of lines of plain text in them and I'd like to insert a space between certain "words".
I have no problem looping the files, or replacing some text with different text, but not sure how to keep the existing text when I do so!
$ln = "20142301 Starting_LOC1SVR14"
$newln = $ln -replace "_", " "
$newln = $newln -replace "LOC[0-9]","????"
Given this sample, I want to insert the space between LOC1 and SVR14 to give LOC1 SVR14
Note that the LOC goes up to 16, but I can write the regex for 1 or more numerals, it's keeping that LOC1 part thats giving me the headache!
Look up regexp capture groups. They are used to save the matching string into a variable that can be used later like so,
$newln -replace "LOC[0-9]+","$0 " # $0 is the match, so replace the match with "match "
20142301 Starting LOC1 SVR14

Extract text between single quotes in MATLAB

I have multiple lines in some text files such as
.model sdata1 s tstonefile='../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p' passive=2
I want to extract the text between the single quotes in MATLAB.
Much help would be appreciated.
To get all of the text inside multiple '' blocks, regexp can be used as follows:
regexp(txt,'''(.[^'']*)''','tokens')
This says to get text surrounded by ' characters, which does not include a ' in the captured text. For example, consider this file with two lines (I made up different file name),
txt = ['.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2 ', char(10), ...
'.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'' passive=2']
>> stringCell = regexp(txt,'''(.[^'']*)''','tokens');
>> stringCell{:}
ans =
'../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'
ans =
'../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'
>>
Trivia:
char(10) gives a newline character because 10 is the ASCII code for newline.
The . character in regexp (regex in the rest of the coding word) pattern usually does not match a newline, which would make this a safer pattern. In MATLAB, a dot in regexp does match a newline, so to disable this, we could add 'dotexceptnewline' as the last input argument to `regexp``. This is convenient to ensure we don't get the text outside of the quotes instead, but not needed since the first match sets precedent.
Instead of excluding a ' from the match with [^''], the match can be made non-greedy with ? as follows, regexp(txt,'''(.*?)''','tokens').
If you plan to use textscan:
fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','''');
fclose(fid);
output = rawdata{:}(2)
As also used in other answers the single apostrophe 'is represented by a double one: '', e.g. for delimiters.
considering the comment:
fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','\n');
fclose(fid);
lines = rawdata{1,1};
L = size(lines,1);
output = cell(L,1);
for ii=1:L
temp = textscan(lines{ii},'%s','delimiter','''');
output{ii,1} = temp{:}(2);
end
One easy way is to split the string with single quote delimiter and take the even-numbered strings in the output:
str = fileread('test.txt');
out = regexp(str, '''', 'split');
out = out(2:2:end);
You can do this using regular expressions. Assuming that there is only one occurrence of text between quotation marks:
% select all chars between single quotation marks.
out = regexp(inputString,'''(.*)''','tokens','once');
After identifing which lines you want to extract info from, you could tokenize it or do something like this if they all have the same form:
test='.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2';
a=strfind(test,'''')
test=test(a(1):a(2))

help with regex - extracting text

Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like
#article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}
I want to extract the content of title and store it in a bash variable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".
Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.
Give this a try:
title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)
Explanation:
/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - If a line matches this regex
s/// - delete the matched portion
s/}[^}]*$//p - delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print
} - end if
title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)
/title *=/: Only act upon lines which have the word 'title' followed by a '=' after an arbitrary number of spaces
s/^[^{]*{\([^,]*\),.*$/\1/: From the beginning of the line look for the first '{' character. From that point save everything you find until you hit a comma ','. Replace the entire line with everything you saved
s/} *$//p: strip off the trailing brace '}' along with any spaces and print the result.
title=$(sed -n ... ): save the result of the above 3 steps in the bash variable named title
There are definitely more elegant ways, but at 2:40AM:
title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`
Grep for the line that interests us, strip everything up to and including the opening curly, then strip everything from the last curly to the end of the line