I'm trying to replace all characters between two characters.
This is gonna be my input string:
And I am trying to get this output:
This regex should give you the results you want. It looks for a K or c that is preceded by a K, c or < and followed by a K, c or < or the end of line:
You can use this with the re.MULTILINE flag to re.sub:
import re
s = re.sub(r'(?<=[Kc<])[Kc](?=[Kc<]|$)', '<', s, 0, re.MULTILINE)
If the \n in your string is a literal \n rather than a newline, just replace $ in the regex with \\n:
s = r'P<HRVSPECIMEN<<SPECIMENC<<<<<<<K<K<K<K<KKKKKK\n10070070071HRVB212258F1407019<<<<<c<c<<<<<<06'
s = re.sub(r'(?<=[Kc<])[Kc](?=[Kc<]|\\n)', '<', s, 0)
Demo on rextester
I have a list of string i.e.
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
I want to remove the '-' from string where it is the first character and is followed by strings but not numbers or if before the '-' there is number/alphabet but after it is alphabets, then it should replace the '-' with space
So for the list slist I want the output as
["args", "-111111", "20 args", "20 - 20", "20-10", "args deep"]
I have tried
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
nlist = list()
for estr in slist:
nlist.append(re.sub("((^-[a-zA-Z])|([0-9]*-[a-zA-Z]))", "", estr))
print (nlist)
and i get the output
['rgs', '-111111', 'rgs', '20 - 20', '20-10', 'argseep']
You may use
nlist.append(re.sub(r"-(?=[a-zA-Z])", " ", estr).lstrip())
nlist.append(re.sub(r"-(?=[^\W\d_])", " ", estr).lstrip())
Result: ['args', '-111111', '20 args', '20 - 20', '20-10', 'args deep']
See the Python demo.
The -(?=[a-zA-Z]) pattern matches a hyphen before an ASCII letter (-(?=[^\W\d_]) matches a hyphen before any letter), and replaces the match with a space. Since - may be matched at the start of a string, the space may appear at that position, so .lstrip() is used to remove the space(s) there.
Here, we might just want to capture the first letter after a starting -, then replace it with that letter only, maybe with an i flag expression similar to:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"^-([a-z])"
test_str = ("-args\n"
"20 - 20\n"
subst = "\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
const regex = /^-([a-z])/gmi;
const str = `-args
20 - 20
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
One option could be to do 2 times a replacement. First match the hyphen at the start when there are only alphabets following:
Regex demo
In the replacement use an empty string.
Then capture 1 or more times an alphabet or digit in group 1, match - followed by capturing 1+ times an alphabet in group 2.
Regex demo
In the replacement use r"\1 \2"
For example
import re
regex1 = r"^-(?=[a-zA-Z]+$)"
regex2 = r"^([a-zA-Z0-9]+)-([a-zA-Z]+)$"
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
slist = list(map(lambda s: re.sub(regex2, r"\1 \2", re.sub(regex1, "", s)), slist))
['args', '-111111', '20 args', '20 - 20', '20-10', 'args deep']
Python demo
How do I replace the following using python
NUM4*41*2*My Break Room Place*****6*1133337
I want to replace the all character after first occurence of '*' . All characters must be replace except '*'
Example input:
NUM4*41*2*My Break Room Place*****6*1133337
example output:
NUM4*11*1*11 11111 1111 11111*****1*1111111
Fairly simple, use a callback to return group 1 (if matched) unaltered, otherwise
return replacement 1
Note - this also would work in multi-line strings.
If you need that, just add (?m) to the beginning of the regex. (?m)(?:(^[^*]*\*)|[^*\s])
You'd probably want to test the string for the * character first.
( ^ [^*]* \* ) # (1), BOS/BOL up to first *
| # or,
[^*\s] # Not a * nor whitespace
import re
def repl(m):
if ( m.group(1) ) : return m.group(1)
return "1"
str = 'NUM4*41*2*My Break Room Place*****6*1133337'
if ( str.find('*') ) :
newstr = re.sub(r'(^[^*]*\*)|[^*\s]', repl, str)
print newstr
else :
print '* not found in string'
NUM4*11*1*11 11111 1111 11111*****1*1111111
If you want to use regex, you can use this one: (?<=\*)[^\*]+ with re.sub
inputs = ['GSA*HC*11177*NYSfH-EfC*23130303*0313*1*R*033330103298',
'NUM4*41*2*My Break Room Place*****6*1133337']
outputs = [re.sub(r'(?<=\*)[^\*]+', '1', inputline) for inputline in inputs]
Regex explication here
I have a string that looks like this:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
And I need it to be converted into this:
new_str = "This sentence has a <b>bolded</b> word, and <b>another</b> one too!"
Is it possible to use Python's string.replace or re.sub method to do this intelligently?
Just capture all the characters before | inside [] into a group . And the part after | into another group. Just call the captured groups through back-referencing in the replacement part to get the desired output.
Replacemnet string:
>>> import re
>>> s = "This sentence has a [b|bolded] word, and [b|another] one too!"
>>> m = re.sub(r'\[([^\[\]|]*)\|([^\[\]]*)\]', r'<\1>\2</\1>', s)
>>> m
'This sentence has a <b>bolded</b> word, and <b>another</b> one too!'
Try this expression: [[]b[|](\w+)[]] shorter version can also be \[b\|(\w+)\]
Where the expression is searching for anything that starts with [b| captures what is between it and the closing ] using \w+ which means [a-zA-Z0-9_] to include a wider range of characters you can also use .*? instead of \w+ which will turn out in \[b\|(.*?)\]
Online Demo
Sample Demo:
import re
p = re.compile(ur'[[]b[|](\w+)[]]')
test_str = u"This sentence has a [b|bolded] word, and [b|another] one too!"
subst = u"<bold>$1</bold>"
result = re.sub(p, subst, test_str)
This sentence has a <bold>bolded</bold> word, and <bold>another</bold> one too!
Just for reference, in case you don't want two problems:
Quick answer to your particular problem:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
print my_str.replace("[b|", "<b>").replace("]", "</b>")
# output:
# This sentence has a <b>bolded</b> word, and <b>another</b> one too!
This has the flaw that it will replace all ] to </b> regardless whether it is appropriate or not. So you might want to consider the following:
Generalize and wrap it in a function
def replace_stuff(s, char):
begin = s.find("[{}|".format(char))
while begin != -1:
end = s.find("]", begin)
s = s[:begin] + s[begin:end+1].replace("[{}|".format(char),
"<{}>".format(char)).replace("]", "</{}>".format(char)) + s[end+1:]
begin = s.find("[{}|".format(char))
return s
For example
s = "Don't forget to [b|initialize] [code|void toUpper(char const *s)]."
print replace_stuff(s, "code")
# output:
# "Don't forget to [b|initialize] <code>void toUpper(char const *s)</code>."
How can I grab a letter after ; using regular expressions? For example:
c ; d
e ; f ; m ; k ; s
import re
f = open('file.txt')
regex = re.compile(r"(?<=\; )\w+")
for line in f:
match = regex.search(line)
if match:
print match.group()
This code only grabs d and f. I need the outcome yo look like:
Replace all occurrences of "; " to a newline character and trim all spaces from the ends of every line.
use a regex similar to this if you want to "blacklist" the ";" character:
I don't know much about python, but here how you would use it in JavaScript:
var desired_chars = myString.replace(/[;]/gi, '')
Instead of regex.search use regex.findall. That'll give you a list of matches for each line which you can then manipulate and print on separate lines.
I have a situation where I need to remove the last n numeric characters after a / character.
For eg:
After the last /, I need the number 61 stripped out of the line so that the output is,
I tried using chop, but it removes only the last character, ie. 1, in the above example.
The last part, ie 61, above can be anything, like 221 or 2 or 100 anything. I need to strip out the last numeric characters after the /. Is it possible in Perl?
A regex substitution for removing the last digits:
my $str = '/iwmout/sourcelayer/iwm_service/iwm_ear_layer/pomoeron.xml##/main/lsr_int_vnl46a/61';
$str =~ s/\d+$//;
\d+ matches a series of digits, and $ matches the end of the line. They are replaced with the empty string.
#Tim's answer of $str =~ s/\d+$// is right on; however, if you wanted to strip the last n digit characters of a string but not necessarily all of the trailing digit characters you could do something like this:
my $s = "abc123456";
my $n = 3; # Just the last 3 chars.
$s =~ s/\d{$n}$//; # $s == "abc123"
// Code to remove last n number of strings from a string.
// Import common lang jar
import org.apache.commons.lang3.StringUtils;
public class Hello {
public static void main(String[] args) {
String str = "Hello World";
System.out.println(StringUtils.removeEnd(str, "ld"));