I have been working on a python project while I don't have that much experience so can you tell me please if I have this string : Synset'dog.n.01' and I want to extract the string dog only what should I do ?
I mean just to extract any string between Synset' and .n.01'
I suggest to use re (regex)
import re
s = "Synset'dog.n.01'"
result = re.search("Synset'(.*).n.01'", s)
print result.group(1)
Related
i have a log where a certain part is requestid in that text is there which i have to extract
Ex: RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book,
Can any1 pls help in extracting Book out of it
Consider string splitting instead
>>> s = "RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book"
>>> s.split("_")[-1]
'Book'
It seems that string splitting will be more efficient, if you must use regular expressions, here is an example.
#!/usr/bin/env python3
import re
print(
re.findall(r"^\w+_\d+\d+_(\w+)$",'RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book')
)
// output: ['Book']
I try to extract words from the string content like
export {AbcClient} from ...
export {AdcClient} from ..
How to use regular expression to get array of string? In this example is[AbcClient, AdcClient]
Thanks
Most programming languages have the ability to do a regex find all. In Python, we can try:
inp = """export {AbcClient} from ...
export {AdcClient} from .."""
matches = re.findall(r'\bexport \{(.*?)\}', inp)
print(matches)
This prints:
['AbcClient', 'AdcClient']
I have stored the multiline string in java as shown in code below it shows the output as :
aa
bb
hhh me $ hdddhd hhhdhhdhh
hrx
$
dddsss
I dont need the line starting with hhh me $ and in between lines and upto $.
I need to get output as
aa
bb
hrx
dddsss
I have tried like this on eclipse
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class dummyFile {
public static void main(String[] args) throws FileNotFoundException {
String line = new StringBuilder()
.append("aa\n\n")
.append("bb\n\n")
.append("hhh me $ hdddhd hhhdhhdhh\n\n")
.append("hrx\n\n")
.append("$\n\n")
.append("dddsss")
.toString();
System.out.println(line);
String pattern = "hhh me (.)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find())
{
System.out.println(m.group(1));
}
if (line.contains("hhh me "+ m.group(1)))
{
line.replace(
line.substring(
line.indexOf("banner mod " +m.group(1)),
line.lastIndexOf(m.group(1))+1
),
""
)
.replace("\n\n", "\n");
}
System.out.println(line);
}
}
Could some one please help ??
Phew, that was a fun one (if you're insane like me!)
(?!.*?\$.*?)^.+?(?:\n\n|$).*?
You'll need the regex options global and multiline. For most regex instances that's just a matter of formatting it like:
/(?!.*?\$.*?)^.+?(?:\n\n|$).*?/gm
However for Java there may be some options you need to supply, I'm not 100% sure.
That pattern will give you multiple matches, which you can glue back together with StringBuilder, for example.
If you REALLY want, I'll edit my answer and break down exactly what it's doing if you need me to.
This sounds a lot like homework that I don't want to do for you. But I'll throw some stuff up here that will hopefully help you figure it out.
Your regex isn't going to match what you want. (.) will capture a single character, and it won't capture new line characters. So you'll have to fix that. + matches one or more of the previous character set and * matches zero or more of the previous character. Seems like you also want to make sure you're matching from $ to $. You're working inside Java strings so you have to escape it.
Try something like this for your regex:
final String pattern = "hhh me \\$([a-zA-Z\\s\n\r]*)\\$";
Then in Eclipse or in Java Docs look around the Matcher class for some helpful methods to find/replace matches you've got (The stuff inside () in a regular expression).
Maybe something like Matcher.replaceFirst() will help.
I would like to extract "1381912680" from the following code:
[<abbr class="timestamp" data-utime="1381912680"></abbr>]
Using Python 2.7, this is what I currently have in my code to get to that stage:
s = soup.find_all("abbr", { "class" : "timestamp" })
print s
Should I use regex or can BS do it on its own?
EDIT
I tried to using regex but with no luck:
import re
regex = 'data-utime=\"(\d+)\"'
x = re.compile(regex)
x2 = re.findall(x, s)
print x2
I got: TypeError: expected string or buffer
Python reserves class so you use the format:
s= soup.find("abbr", class_="timestamp")
but... <abbr> is empty so use the above answers :)
You could use the below regex to extract the number within double quotes,
(?<=data-utime=\")[^\"]*
DEMO
Python code would be,
>>> import re
>>> str = '[<abbr class="timestamp" data-utime="1381912680"></abbr>]'
>>> m = re.findall(r'(?<=data-utime=\")[^\"]*', str)
>>> m
['1381912680']
Explanation:
(?<=data-utime=\") Regex engine sets a marker just after to the string data-utime="
[^\"]* Matches nay character zero or more times upto the literal "
I try to write a python scripts to analys a data txt.I want the script to do such things:
find all the time data in one line, and compare them.but this is my first time to write RE syntax.so I write a small script at 1st.
and my script is:
import sys
txt = open('1.txt','r')
a = []
for eachLine in txt:
a.append(eachLine)
import re
pattern = re.compile('\d{2}:\d{2}:\d{2}')
for i in xrange(len(a)):
print pattern.match(a[i])
#print a
and the output is always None.
my txt is just like the picture:
what's the problem? plz help me. thx a lot.
and my python is python 2.7.2.my os is windows xp sp3.
Didn't you miss one of the ":" in you regex? I think you meant
re.compile('\d{2}:\d{2}:\d{2}')
The other problems are:
First, if you want to search in the hole text, use search instead of match. Second, to access your result you need to call group() in the match object returned by your search.
Try it:
import sys
txt = open('1.txt','r')
a = []
for eachLine in txt:
a.append(eachLine)
import re
pattern = re.compile('\d{2}:\d{2}:\d{2}')
for i in xrange(len(a)):
match = pattern.search(a[i])
print match.group()
#print a
I think you're missing the colons and dots in your regex. Also try using re.search or re.findall instead on the entire text. Like this:
import re, sys
text = open("./1.txt", "r").read() # or readlines() to make a list of lines
pattern = re.compile('\d{2}:\d{2}:\d{2}')
matches = pattern.findall(text)
for i in matches:
print(i);