Regular Expression(^string\s\w+) in python - regex

I'd like to know how to search using regular expression.
this is my code
import re
data = "python one test code"
p = re.compile("^python\s\w+")
result
print(p.findall(data))
['python one']
The result I want to get is as below
print(p.findall(data))
['python one test code']
I can get the above result if as below
p = re.compile("^python\s\w+\s\w+\s\w+")
but I don't want to repeat "\s\w+" like as "^python\s\w+\s\w+\s\w+"
How can I get the result using * or + in "\s\w+" ?

You can try this:
^python(?:\s\w+)+
Explanation
^python string starts with python
?: the () group will not be captured
\s\w+ will match a space and the immediate word
(?:\s\w+)+ the outer plus will match all occurrence of no.3 where + means one or more

You can try following
p = re.compile("^python[\w\s]*")

Related

Regex to find 4th value inside bracket

How i can read 4th Value(inside "" i.e "vV0...." using Regex in below condition ?
I am updating a bit this part - Is it possible to first find Word "LaunchFileUploader" and then select the 4th Value, if there are multiple instance of LaunchFileUploader in the file just select 4th Value of first word found ? Attaching screenshot of file where this needs to be searched (In the file word is "LaunchFileUploader")
I tried this but it gives as - I need 4th value (Group 1 is giving me third value)
\bLaunchFileUploader\b(\:?.*?,){3}.*?\)
Match 1
Full match 11030-11428 LaunchFileUploader("ERM-1BLX3D04R10-0001", 1662, "2ecbb644-34fa-4919-9809-a5ff47594c2d", "8dZOPyHKBK...
Group 1. n/a "2ecbb644-34fa-4919-9809-a5ff47594c2d",
I am still looking for solution for this. Any help is aprreciated.
Depending on what's available to you to use, there's a couple of ways to do it.
Either way, this would work better if there were no new lines in the string, just plain ("value1","value2","value3","value4") etc. It'll still work, but you may need to clean up some new lines from the resulting string.
The easy way - use code for the hard part. Grab the inner string with:
(?<=\().*?(?=\))
This will get everything that's between the 2 parentheses (using positive lookarounds). In code, you could then split/explode this string on , and take the 4th item.
If you want to do it all in regex, you could use something along the lines of:
(?<=\()(?:.*?,){3}(.*?)(?=\))
This would a) match the entire contents of the parentheses and b) capture the 4th option in a capture group. To go even deeper:
(?<=\()(?:.*?,){3}\"(.*?)\"(?=\))
would capture the contents of the "" quotation marks only.
Some tools don't allow you to use lookarounds, if this is the case let me know and I'll see what other ways there are around it.
EDIT Ran this in JS console on browser. This absolutely does work.
EDIT 2 I see you've updated your question with the text you're actually searching in. This pattern will include the space and the new line character as per the copy/paste of the above text.
(?<=\(\")(?:.*?,\s?\n?){3}\"(.*?)\"(?=\))
See my second image for the test in console
This works for python and PHP:
(?<=\")(.*)(?:\"\);)\Z
Demo for Python and PHP
For Java, replace \Z with $ as follows:
(?:")(.*)(?:\"\);)$
Demo for JavaScript
NOTE: Be sure to look the captured group and not the matched group.
UPDATE:
Try this for your updated request:
"(.*)"(?:[\\);\] \/>}]*)$
Demo for updated input string
all the above regex patterns assume there is a line break after each comma
Auto-generated Java Code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\"(.*)\"(?:[\\\\);\\] \\/>\\}]*)$";
final String string = "\n"
+ "}$(document).ready( function(){ PathUploader\n"
+ " (\"ERM-1BLX3D04R10-0001\", \n"
+ " 1662, \n"
+ " \"1bff5c85-7a52-4cc5-86ef-a4ccbf14c5d5\", \n"
+ "\"vV0mX3VadCSPnN8FsAO7%2fysNbP5b3SnaWWHQETFy7ORSoz9QUQUwK7jqvCEr%2f8UnHkNNVLkJedu5l%2bA%2bne%2fD%2b2F5EWVlGox95BYDhl6EEkVAVFmMlRThh1sPzPU5LLylSsR9T7TAODjtaJ2wslruS5nW1A7%2fnLB%2bljZaQhaT9vZLcFkDqLjouf9vu08K9Gmiu6neRVSaISP3cEVAmSz5kxxhV2oiEF9Y0i6Y5%2f5ASaRiW21w3054SmRF0rq3IwZzBvLx0%2fAk1m6B0gs3841b%2fw%3d%3d\"); } );//]]>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}

parse csv using lua script

I have a csv file that has data like this:
+12345678901,08:00:00,12:00:00,1111100,35703,test.domain.net
+12345678901,,,0000000,212,test.domain.net
I'm trying to write lua code that will loop through each line, and create an array of values like this:
local mylist = {}
for line in io.lines("data/dd.csv") do
local id, start, finish, dow, int, domain = line:match("(+%d+),(%d*:*),(%d*:*),(%d*),(%d*),(%a*.*)")
mylist[#mylist + 1] = { id = id, start = start, finish = finish, dow = dow, int = int, domain = domain}
print(mylist[#mylist]['id'])
end
The problem is that when the code hits a line that has empty values for start and finish, the regex fails and all fields are nil.
I thought using the * meant 0 or more...
I can't seem to find my error / typo.
Thanks.
This pattern works for me:
"(.-),(.-),(.-),(.-),(.-),(.-)$"
It seems that you just need to group the digits and : inside a [...]:
match("(%+%d+),([%d:]*),([%d:]*),(%d*),(%d*),(.*)")
^ ^^^^^^ ^^^^^^
Now, the [%d:]* matches zero or more digits or : symbols. Your pattern did not find the match because %d*:* matched 0+ digits followed with 0+ : symbols, and you had more than 1 such sequence.
Also, you need to escape the first + to make sure it matches a literal +.
See online Lua demo

regex to catch assembler C-command

I'm taking the Nand-2-Tetris course. We are asked to write and assembler. A C-command is in the type of dest=comp;jump where each part is optional.
I was trying to write a regex to make everything easier - I want to be able to compile the expression on a given line, and just by the group number, know which part of the expression I'm using. For example, for the expression: A=M+1;JMP I want to get group(1) = A, group(2) = M and group(3) = JMP.
My problem is that each part is optional, so I don't know exactly how to write this regex. So far I come up with:
(A?M?D?)\s=([^;\s]*)\s?(?=;[\s]*([a-zA-Z]{1,4})|$)
This works for most cases, but it doesn't work as I expect it. For example, lack of comp won't work (D;JGT). I have tried positive lookahead but it didn't work.
The RegEx that you are looking for is as follows:
(?P<dest>[AMD]{1,3}=)?(?P<comp>[01\-AMD!|+&><]{1,3})(?P<jump>;[JGTEQELNMP]{3})?
Let's break it down into parts:
(?P<dest>[AMD]{1,3}=)? - will search for optional destination to store the computation result in it.
(?P<comp>[01\-AMD!|+&><]{1,3}) - will search for computation instruction.
(?P<jump>;[JGTEQELNMP]{3})? - will search for optional jump directive.
Do note, that dest and jump parts of every C-Instruction are optional.
They only appear with postfix = and prefix ; respectively.
Hence, you will have to take care of these signs:
if dest is not None:
dest = dest.rstrip("=")
if jump is not None:
jump = jump.lstrip(";")
Finally, you will get the desired C-Instrucion parsing:
For the line A=A+M;JMP you will get:
dest = 'A'
comp = 'A+M'
jump = 'JMP'
For the line D;JGT you will get:
dest = None
comp = 'D'
jump = 'JGT'
And for the line M=D you will get:
dest = 'M'
comp = 'D'
jump = None
Not quite sure what you want to do, but based on your examples you can make a regular expression like this:
([\w]+)[=]?([\w])*[+-]*[\w]*;([\w]+)
Then for that line:
A=M+1;JMP
You'll get the following:
Full match A=M+1;JMP
Group 1 A
Group 2 M
Group 3 JMP
And for that line:
D;JGT
You'll get:
Full match D;JGT
Group 1 D
Group 3 JGT
See example here: https://regex101.com/r/v8t4Ma/1

Having trouble doing a search and replace in Ruby

I’m using Rails 4.2.3 and trying to do a regular expression search and replace. If my variable starts out like so …
url = “http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0=“
and then I run that through
display_start = url.match(/iDisplayStart=(\d+)/).captures[0]
display_start = display_start.to_i + 1000
url = url.gsub(/iDisplayStart=(\d+)/) { display_start }
The result is
http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&1001&iDisplayLength=100&mDataProp_0=
But what I want is to simply replace the “iDisplayStart” parameter with my new value, so I would like the result to be
http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&1001&iDisplayStart=1001&iDisplayLength=100&mDataProp_0=
How do I do this?
You can achieve what you want with
url = "http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0="
display_start = url.sub(/(?<=iDisplayStart=)\d+/) {|m| m.to_i+1000}
puts display_start
See the IDEONE demo
Since you replace 1 substring, you do not need gsub, a sub will do.
The block takes the whole match (that is, 1 or more digits that are located before iDisplayStart), m, and converts to an int value that we add 1000 to inside the block.
Another way is to use your regex (or add \b for a safer match) and access the captured vaalue with Regexp.last_match[1] inside the block:
url = "http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0="
display_start = url.sub(/\biDisplayStart=(\d+)/) {|m| "iDisplayStart=#{Regexp.last_match[1].to_i+1000}" }
puts display_start
See this IDEONE demo

Regular expression any character with dynamic size

I want to use a regular expression that would do the following thing ( i extracted the part where i'm in trouble in order to simplify ):
any character for 1 to 5 first characters, then an "underscore", then some digits, then an "underscore", then some digits or dot.
With a restriction on "underscore" it should give something like that:
^([^_]{1,5})_([\\d]{2,3})_([\\d\\.]*)$
But i want to allow the "_" in the 1-5 first characters in case it still match the end of the regular expression, for example if i had somethink like:
to_to_123_12.56
I think this is linked to an eager problem in the regex engine, nevertheless, i tried to do some lazy stuff like explained here but without sucess.
Any idea ?
I used the following regex and it appeared to work fine for your task. I've simply replaced your initial [^_] with ..
^.{1,5}_\d{2,3}_[\d\.]*$
It's probably best to replace your final * with + too, unless you allow nothing after the final '_'. And note your final part allows multiple '.' (I don't know if that's what you want or not).
For the record, here's a quick Python script I used to verify the regex:
import re
strs = [ "a_12_1",
"abc_12_134",
"abcd_123_1.",
"abcde_12_1",
"a_123_123.456.7890.",
"a_12_1",
"ab_de_12_1",
]
myre = r"^.{1,5}_\d{2,3}_[\d\.]+$"
for str in strs:
m = re.match(myre, str)
if m:
print "Yes:",
if m.group(0) == str:
print "ALL",
else:
print "No:",
print str
Output is:
Yes: ALL a_12_1
Yes: ALL abc_12_134
Yes: ALL abcd_134_1.
Yes: ALL abcde_12_1
Yes: ALL a_123_123.456.7890.
Yes: ALL a_12_1
Yes: ALL ab_de_12_1
^(.{1,5})_(\d{2,3})_([\d.]*)$
works for your example. The result doesn't change whether you use a lazy quantifier or not.
While answering the comment ( writing the lazy expression ), i saw that i did a mistake... if i simply use the folowing classical regex, it works:
^(.{1,5})_([\\d]{2,3})_([\\d\\.]*)$
Thank you.