regular expression extract words from file - regex

I try to extract words from the string content like
export {AbcClient} from ...
export {AdcClient} from ..
How to use regular expression to get array of string? In this example is[AbcClient, AdcClient]
Thanks

Most programming languages have the ability to do a regex find all. In Python, we can try:
inp = """export {AbcClient} from ...
export {AdcClient} from .."""
matches = re.findall(r'\bexport \{(.*?)\}', inp)
print(matches)
This prints:
['AbcClient', 'AdcClient']

Related

regular expression search backwards, How to deal with words with and without?

I tested by https://regexr.com/
There two sample words.
BOND_aa_SB1_66-1.pdf
BOND_bb_SB2.pdf
I want to extract SB1, SB2 from each sample.
but my regular expression is not perfect.
It is working
(?<=BOND_.*_).*
But It is difficult to write the following.
I try
(?<=BOND_.*_).*(?=(_|\.))
But first sample result is 'SB1_66-1'
I just want to extract SB1
sb1 The following may or may not exist. if there is content, it can be separated by starting with _.
How should I fix it?
To extract the third underscore-separated term, we can use re.search as follows:
inp = ["BOND_aa_SB1_66-1.pdf", "BOND_bb_SB2.pdf"]
output = [re.search(r'^BOND_[^_]+_([^_.]+)', x).group(1) for x in inp]
print(output) # ['SB1', 'SB2']
s = "BOND_aa_SB1_66-1.pdf BOND_bb_SB2.pdf"
(re.findall(r'(SB\d+)', s))
['SB1', 'SB2']

Regex to extract text from request id

i have a log where a certain part is requestid in that text is there which i have to extract
Ex: RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book,
Can any1 pls help in extracting Book out of it
Consider string splitting instead
>>> s = "RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book"
>>> s.split("_")[-1]
'Book'
It seems that string splitting will be more efficient, if you must use regular expressions, here is an example.
#!/usr/bin/env python3
import re
print(
re.findall(r"^\w+_\d+\d+_(\w+)$",'RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book')
)
// output: ['Book']

capture special string between 2 characters in python

I have been working on a python project while I don't have that much experience so can you tell me please if I have this string : Synset'dog.n.01' and I want to extract the string dog only what should I do ?
I mean just to extract any string between Synset' and .n.01'
I suggest to use re (regex)
import re
s = "Synset'dog.n.01'"
result = re.search("Synset'(.*).n.01'", s)
print result.group(1)

looking for a regular expression to extract all text outputs to user from js file

i have some huge js files and there are some texts/messages/... which are output for a human beeing. the problem is they don't run over the same method.
but i want to find them all to refactor the code.
now i am searching for a regular expression to find those messages.
...his.submit_register = function(){
if(!this.agb_accept.checked) {
out_message("This is a Messge tot the User in English." , "And the Title of the Box. In English as well");
return fals;
}
this.valida...
what i want to find is all the strings which are not source code.
in this case i want as return:
This is a Messge tot the User in
English. And the Title of the Box. In
English as well
i tried something like: /\"(\S+\s{1})+\S\"/, but this wont work ...
thanks for help
It's not possible to parse Javascript source code using regular expressions because Javascript is not a regular language. You can write a regular expression that works most of the time:
/"(.*?)"/
The ? means that the match is not greedy.
Note: this will not correctly handle strings that contain ecaped quotes.
A simple java regex solving your problem (assuming that the message doesn't contain a " character):
Pattern p = Pattern.compile("\"(.+?)\"");
The extraction code :
Matcher m;
for(String line : lines) {
m = p.matcher(line);
while(m.find()) {
System.out.println(m.group(1));
}
}

regex to strip out image urls?

I need to separate out a bunch of image urls from a document in which the images are associated with names like this:
bellpepper = "http://images.com/bellpepper.jpg"
cabbage = "http://images.com/cabbage.jpg"
lettuce = "http://images.com/lettuce.jpg"
pumpkin = "http://images.com/pumpkin.jpg"
I assume I can detect the start of a link with:
/http:[^ ,]+/i
But how can I get all of the links separated from the document?
EDIT: To clarify the question: I just want to strip out the URLs from the file minus the variable name, equals sign and double quotes so I have a new file that is just a list of URLs, one per line.
Try this...
(http://)([a-zA-Z0-9\/\\.])*
If the format is constant, then this should work (python):
import re
s = """bellpepper = "http://images.com/bellpepper.jpg" (...) """
re.findall("\"(http://.+?)\"", s)
Note: this is not "find an image in a file" regexp, just an answer to the question :)
do you mean to say you have that kind of format in your document and you just want to get the http part? you can just split on the "=" delimiter without regex
$f = fopen("file","r");
if ($f){
while( !feof($f) ){
$line = fgets($f,4096);
$s = explode(" = ",$line);
$s = preg_replace("/\"/","",$s);
print $s[1];
}
fclose($f);
}
on the command line :
#php5 myscript.php > newfile.ext
if you are using other languages other than PHP, there are similar string splitting method you can use. eg Python/Perl's split(). please read your doc to find out
You may try this, if your tool supports positive lookbehind:
/(?<=")[^"\n]+/