Please how do I overcome the problem of
TypeError: cannot use a string pattern on a bytes-like object
when trying to run multiple regexes match against a line from the file?
The multiple match I am trying is:
re.match('|'.join('(?:{0})'.format(x) for x in (regex1, regex2, regex3)), line):
which works in plain text file matches and which I attribute to StackOverflow assistance.
I have compiled the regexes like so:
regex1 = re.compile(b'http\:\/\/ipaddress\:port\/service\?')
regex2 = re.compile(b'\_event\=new?')
regex3 = re.compile(b'askment\:')
but this TypeError still appears.
Earlier in my script I can get away with this:
match = re.search(b'something-string:\s+111+\d{2,5}', line)
So I thought prefixing the regexes with 'b' in the multiple match was sufficient.
Please what am I doing wrong?
I had to decode the line, since its coming in as a binary stream.
re.match('|'.join('(?:{0})'.format(x) for x in (regex1, regex2, regex3)), line.decode("ascii or something else")):
Related
I'm very new to regex, I'm trying to analyse data that come from a simple text file. Before I start the data analysis, I need to make sure the format or structure of the content in the simple text file is correct, then only can continue the process. The content in the file look like this:
,file_06,,
x data,y data
-969.0,-42.18187,
-958.0,-39.62946,
-948.0,-37.748737,
-938.0,-35.73368,
-929.0,-33.9873,
-919.0,-32.24092,
-910.0,-30.76321,
-899.0,-29.01683,
-891.0,-27.40478,
-878.0,-26.19575,
-872.0,-24.986712,
-864.0,-23.24033,
-853.0,-22.16563,
Looking for help in writing the regex.
I tried to write out some regex, but I keep match the first line only. I can't match the whole content.
Regex pattern :
/(,file_[\d]*,,)\n(x data,y data)\n((-?[\d]*.[\d]*,-?[\d]*.[\d]*,?)\n)*(,,)?/g
This will work
/(?=-)(.?[^\,]*)/gm
Using positive lookahead to start at the '-' then delimiting everything by the ','.
Use
/(?=-)(.*)/gm
if you want to capture the pairs of data together.
Sample at https://regex101.com/r/a5Dk5Y/1/
I'm working to look through websites to find specific words. I use the re.compile with bs4 to search for the word. I am having issues if my word contains a backslash ('\'). I was hoping I could get some help on how to do this. My code is usually like this
results = self.soup.find_all(string=re.compile('.*{0}.*'.format(searched_word), re.IGNORECASE), recursive=True)
This code throws an error of re.error: bad escape \M at position 13 when I try to have searched_word = Software\Microsoft\Windows\CurrentVersion\Run
I read somewhere that in order to escape backslash, I should make it Software\\Microsoft\\Windows\\CurrentVersion\\Run which throws an error. Or Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run which doesn't throw an error but does not return the text.
It seems you are not escaping the string for re.compile(). To do that, use re.escape() (doc):
results = self.soup.find_all(string=re.compile('.*{0}.*'.format(re.escape(searched_word)), re.IGNORECASE), recursive=True)
file app_ids.txt is of the following format:
app1 = "0123456789"
app2 = "1234567890"
app3 = "2345678901"
app4 = "3456789012"
app5 = "4567890123"
printing the lines containing the given regex with the following code in file, find_app_id.jl:
#! /opt/julia/julia-1.1.0/bin/julia
function find_app_id()
app_pattern = "r\"app2.*\"i"
open("/path/to/app_ids.txt", "r") do apps
for app in eachline(apps)
if occursin(app_pattern, app)
println(app)
end
end
end
end
find_app_id()
$/home/julia/find_app_id.jl, does not print the second line though it contains the regex!
How do I solve this problem?
Your regular expression looks odd. If you change the line which assigns to app_pattern to
app_pattern = r"app2.*"
it should work better.
For example, the following prints "Found it" when run:
app_pattern = r"app2.*"
if occursin(app_pattern, "app2 = blah-blah-blah")
println("Found it")
else
println("Nothing there")
end
Best of luck.
I'm not sure, how regex matching works in Julia, this post might help you to figure it out.
However, in general, your pattern is quite simple, and you probably do not need regular expression matching to do this task.
This RegEx might help you to design your expression.
^app[0-9]+\s=\s\x22([0-9]+)\x22$
There is a simple ([0-9]+) in the middle where your desired app ids are, and you can simply call them using $1:
This graph shows how the expression would work:
I have a partial solution to convert this
USERNAME=CONSTANT[myUserName]
PASSWORD=CONSTANT[mypwd]
to
"USERNAME":"myUserName",
"PASSWORD":"mypwd"
I see a similar solution here
properties file to json. Basically I am looking for zero or more spaces 1.) anywhere before or after a key 2.) before and after = sign
USERNAME = CONSTANT[myUserName]
PASSWORD = CONSTANT[mypwd]
Find What: (^[^ \t]+)(\s.*=\s*CONSTANT\[)(.*[^\n])(\])
Replace: "$1":"$2",
"USERNAME":"myUserName",
"PASSWORD":"mypwd",
Also I want to make sure I do this for each line and some times it matches multiple lines which is wrong. I hope one can find a solution that works in Eclipse on Windows.
Make sure to use ^ and $ in order to avoid your regex matching multiple lines. Try something like this:
^\s*(\w+)\s*?\=\s*?\w+\[(\w+)\]$
Replace with:
"$1":"$2",
Demo: https://regex101.com/r/mxF8lI/1/
trying to figure out next case:
I have txt file with parameters
environment=trank
Browser=iexplore
id=1988
Url=www.google.com
maautomate=no
When I parse this txt file with regex pattern like
/environment=([^\s]+)/
I got "trankBrow" as result, or
/Url=([^\s]+)/
I got www.google.commaautomate=no
So why second parameters appended? And how to get "trank" only?
environment=([^\\s]+)
You need to use this. \s in your case is escaping s and so the output is trankBrow because after that s is there.