How to grab a letter after ';' with regular expressions? - regex

How can I grab a letter after ; using regular expressions? For example:
c ; d
e ; f ; m ; k ; s
import re
f = open('file.txt')
regex = re.compile(r"(?<=\; )\w+")
for line in f:
match = regex.search(line)
if match:
print match.group()
This code only grabs d and f. I need the outcome yo look like:
d
f
m
k
s

Replace all occurrences of "; " to a newline character and trim all spaces from the ends of every line.

use a regex similar to this if you want to "blacklist" the ";" character:
[;]
I don't know much about python, but here how you would use it in JavaScript:
var desired_chars = myString.replace(/[;]/gi, '')

Instead of regex.search use regex.findall. That'll give you a list of matches for each line which you can then manipulate and print on separate lines.

Related

OCaml regex being buggy when trying to use escape characters

I'm trying to write a lexer for a variation on C using OCaml. For the lexer I need to match the strings "^" and "||" (as the exponentiation and or symbols respectively). Both of these are special characters in regex, and when I try to escape them using the backslash, nothing changes and the code runs as if "\^" was still beginning of line and "\|\|" was still "or or". What can I do to fix this?
Backslash characters in string literals have to be doubled to make them past the OCaml string parser:
# let r = Str.regexp "\\^" in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
If you are using OCaml 4.02 or later, you can also use quoted strings ({| ... |}), which do not handle a backslash character specially. This may result in more readable code because backslash characters do not have to be doubled:
# let r = Str.regexp {|\^|} in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
Or you may consider using Str.regexp_string (or Str.quote), which creates a regular expression that will match all characters in its argument literally:
# let r = Str.regexp_string "^" in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
The Str module does not take | as a special regex character, so you do not have to worry about quoting when you want to use it literally:
# let r = Str.regexp "||" in
Str.search_forward r "FOO||BAR" 0;;
- : int = 3
| has to be quoted only when you want to use it as the "or" construct:
# let r = Str.regexp "BAZ\\|BAR" in
Str.search_forward r "FOOBAR" 0;;
- : int = 3
You might want to refer to Str.regexp for the full syntax of regular expressions.

Avoiding "double search" when using regexp matching groups

Is there a more efficient way of doing this:
if re.search("(?P<value>[0-9]*[.][0-9]*) (?P<units>KB|MB|GB|TB|PB)", line):
m = re.search("(?P<value>[0-9]*[.][0-9]*) (?P<units>KB|MB|GB|TB|PB)", line)
self.capacity = convert_to_bytes(m.group("units"), m.group("value"))
Like in C and other languages you can do this and avoid executing the search twice:
if m = re.search("(?P<value>[0-9]*[.][0-9]*) (?P<units>KB|MB|GB|TB|PB)", line):
self.capacity = convert_to_bytes(m.group("units"), m.group("value"))
Is this a better way of doing this?
m = re.search("(?P<value>[0-9]*[.][0-9]*) (?P<units>KB|MB|GB|TB|PB)", line)
if m:
self.capacity = convert_to_bytes(m.group("units"), m.group("value"))
As #Christian Aichinger says, this is correct way, but I'd trim the regex a bit:
m = re.search("(?i)(?P<value>[0-9]+(?:\\.[0-9]+)?) (?P<units>KB|MB|GB|TB|PB)", line)
if m:
self.capacity = convert_to_bytes(m.group("units"), m.group("value"))
Now, [0-9]+(?:\\.[0-9]+)? will match some digit(s) and optionally decimal fractions after it. Mind that in case you have other decimal separator, or if you want to include a thousand digit grouping symbol(s), you'd rather use a character class like [0-9]+(?:[., ][0-9]+)? (in Russian or Polish, a space is a valid thousand digit grouping symbol).
Also, making the regex pattern case-insensitive is also a good idea, so that it also matched 1,000 kb.
Sample code:
import re
line = "1.56 kb"
m = re.search("(?i)(?P<value>[0-9]+(?:\\.[0-9]+)?) (?P<units>KB|MB|GB|TB|PB)", line)
if m:
print m.group("units") + " - " + m.group("value")
Output:
kb - 1.56

Python Replacement of Shortcodes using Regular Expressions

I have a string that looks like this:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
And I need it to be converted into this:
new_str = "This sentence has a <b>bolded</b> word, and <b>another</b> one too!"
Is it possible to use Python's string.replace or re.sub method to do this intelligently?
Just capture all the characters before | inside [] into a group . And the part after | into another group. Just call the captured groups through back-referencing in the replacement part to get the desired output.
Regex:
\[([^\[\]|]*)\|([^\[\]]*)\]
Replacemnet string:
<\1>\2</\1>
DEMO
>>> import re
>>> s = "This sentence has a [b|bolded] word, and [b|another] one too!"
>>> m = re.sub(r'\[([^\[\]|]*)\|([^\[\]]*)\]', r'<\1>\2</\1>', s)
>>> m
'This sentence has a <b>bolded</b> word, and <b>another</b> one too!'
Explanation...
Try this expression: [[]b[|](\w+)[]] shorter version can also be \[b\|(\w+)\]
Where the expression is searching for anything that starts with [b| captures what is between it and the closing ] using \w+ which means [a-zA-Z0-9_] to include a wider range of characters you can also use .*? instead of \w+ which will turn out in \[b\|(.*?)\]
Online Demo
Sample Demo:
import re
p = re.compile(ur'[[]b[|](\w+)[]]')
test_str = u"This sentence has a [b|bolded] word, and [b|another] one too!"
subst = u"<bold>$1</bold>"
result = re.sub(p, subst, test_str)
Output:
This sentence has a <bold>bolded</bold> word, and <bold>another</bold> one too!
Just for reference, in case you don't want two problems:
Quick answer to your particular problem:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
print my_str.replace("[b|", "<b>").replace("]", "</b>")
# output:
# This sentence has a <b>bolded</b> word, and <b>another</b> one too!
This has the flaw that it will replace all ] to </b> regardless whether it is appropriate or not. So you might want to consider the following:
Generalize and wrap it in a function
def replace_stuff(s, char):
begin = s.find("[{}|".format(char))
while begin != -1:
end = s.find("]", begin)
s = s[:begin] + s[begin:end+1].replace("[{}|".format(char),
"<{}>".format(char)).replace("]", "</{}>".format(char)) + s[end+1:]
begin = s.find("[{}|".format(char))
return s
For example
s = "Don't forget to [b|initialize] [code|void toUpper(char const *s)]."
print replace_stuff(s, "code")
# output:
# "Don't forget to [b|initialize] <code>void toUpper(char const *s)</code>."

Error with regex, match numbers

I have a string 00000001001300000708303939313833313932E2
so, I want to match everything between 708 & E2..
So I wrote:
(?<=708)(.*\n?)(?=E2) - tested in RegExr (it's working)
Now, from that result 303939313833313932 match to get result
(every second number):
099183192
How ?
To match everything between 708 and E2, use:
708(\d+)
if you are sure that there will be only digits. Otherwise try with:
708(.*?)E2
To match every second digit from 303939313833313932, use:
(?:\d(\d))+
use a global replace:
find: \d(\d)
replace: $1
Are you expecting a regular expression answer to this?
You are perhaps better off doing this using string operations in whatever programming language you're using. If you have text = "abcdefghi..." then do output = text[0] + text[2] + text[4]... in a loop, until you run out of characters.
You haven't specified a programming language, but in Python I would do something like:
>>> text = "abcdefghjiklmnop"
>>> for n, char in enumerate(text):
... if n % 2 == 0: #every second char
... print char
...
a
c
e
g
j
k
m
o

How to match word where count of characters same

Please help with below
I need match only words where counting of characters same
for example same counting for a b c
abc ///match 1 (abc)
aabbcc match 2(abc)
adabb not mach 2(ab)
ttt match 0(abc)
Why are you using regular expressions for this? Regular expressions are the right tool for some jobs but they are overused where plain old string processing would do the trick, possibly with greater clarity or efficiency. Here's a sample implemented in Python:
def matchCount(inputString, lettersToMatch, count) :
matches = []
wordsArray = inputString.split()
for word in wordsArray:
letterCounts = {}
for letter in word:
if letter in letterCounts:
letterCounts[letter] += 1
else:
letterCounts[letter] = 1
allCorrect = True
for letter in lettersToMatch:
if letter !in letterCounts:
allCorrect = False
if letterCounts[letter] != count:
allCorrect = False
if !allCorrect:
break
if allCorrect:
matches.append(word)
return matches
You should use a recursive regular expression.
Below is the Perl code for matching the same number or 0s and 1s
$regex = qr/0(??{$regex})*1/;
NB: for more backround, please refer to Recursive Regular Expressions on Peteris Krumins's blog.