trying to figure out next case:
I have txt file with parameters
environment=trank
Browser=iexplore
id=1988
Url=www.google.com
maautomate=no
When I parse this txt file with regex pattern like
/environment=([^\s]+)/
I got "trankBrow" as result, or
/Url=([^\s]+)/
I got www.google.commaautomate=no
So why second parameters appended? And how to get "trank" only?
environment=([^\\s]+)
You need to use this. \s in your case is escaping s and so the output is trankBrow because after that s is there.
Related
I'm very new to regex, I'm trying to analyse data that come from a simple text file. Before I start the data analysis, I need to make sure the format or structure of the content in the simple text file is correct, then only can continue the process. The content in the file look like this:
,file_06,,
x data,y data
-969.0,-42.18187,
-958.0,-39.62946,
-948.0,-37.748737,
-938.0,-35.73368,
-929.0,-33.9873,
-919.0,-32.24092,
-910.0,-30.76321,
-899.0,-29.01683,
-891.0,-27.40478,
-878.0,-26.19575,
-872.0,-24.986712,
-864.0,-23.24033,
-853.0,-22.16563,
Looking for help in writing the regex.
I tried to write out some regex, but I keep match the first line only. I can't match the whole content.
Regex pattern :
/(,file_[\d]*,,)\n(x data,y data)\n((-?[\d]*.[\d]*,-?[\d]*.[\d]*,?)\n)*(,,)?/g
This will work
/(?=-)(.?[^\,]*)/gm
Using positive lookahead to start at the '-' then delimiting everything by the ','.
Use
/(?=-)(.*)/gm
if you want to capture the pairs of data together.
Sample at https://regex101.com/r/a5Dk5Y/1/
I am trying to exclude delimiters within text qualifiers. For this, I am trying to use Regex. However, I am new to Regex and am not able to fully accomplish my needs. I would be very greatful if someone can help me out.
In Alteryx, I load a delimited flat text file as 'non-delimited' and say that it does not have text qualifiers. Thus, the input will look something like this:
"aabb"|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|gghh
"aa|bb"|ccdd|"ee|ff"|gghh
"aa|bb"|"cc|dd"|"ee|ff"|"gg|hh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gg|hh"
aabb|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|"gg|hh"
aabb|cc|dd|eeff|gghh
aabb|"cc||dd"|eeff|gghh
aabb|"c|c|dd"|eeff|gghh
"aa||bb"|ccdd|eeff|gghh
"a|a|b|b"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"g|g|hh"
"aabb"|ccdd|eeff|"gg||hh"
I want to exclude all delimiters that are in between text qualifiers.
I have tried to use Regex to replace the delimiters within text qualifiers with nothing.
So far, I have tried the following Regex code for my target:
(")(.*?[^"])\|+(.*?)(")
And I have used the following for my replace:
$1$2$3$4
However, this will not fix te lines 11, 13, 14 and 15.
I wish to obtain the following results:
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|"eeff"|gghh
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
aabb|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
aabb|cc|dd|eeff|gghh
aabb|"ccdd"|eeff|gghh
aabb|"ccdd"|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
"aabb"|ccdd|eeff|"gghh"
Thank you in advance for helping me out!
With kind regards,
Robin
I can't think of the correct syntax in REGEX unless you are putting in each pattern that could be found.
However, an easier way (maybe not as performant), would be to use a Text to Columns selecting Ignore delimiters in quotes. If you need it back together in one cell afterwards, you can transpose, then remove delimiters followed by a Summarize to concatenate each RecordID Group.
Please how do I overcome the problem of
TypeError: cannot use a string pattern on a bytes-like object
when trying to run multiple regexes match against a line from the file?
The multiple match I am trying is:
re.match('|'.join('(?:{0})'.format(x) for x in (regex1, regex2, regex3)), line):
which works in plain text file matches and which I attribute to StackOverflow assistance.
I have compiled the regexes like so:
regex1 = re.compile(b'http\:\/\/ipaddress\:port\/service\?')
regex2 = re.compile(b'\_event\=new?')
regex3 = re.compile(b'askment\:')
but this TypeError still appears.
Earlier in my script I can get away with this:
match = re.search(b'something-string:\s+111+\d{2,5}', line)
So I thought prefixing the regexes with 'b' in the multiple match was sufficient.
Please what am I doing wrong?
I had to decode the line, since its coming in as a binary stream.
re.match('|'.join('(?:{0})'.format(x) for x in (regex1, regex2, regex3)), line.decode("ascii or something else")):
I have a log file.
In the log file I have a lot of lines and each line contain something like this:
<h4>adi</h4><small>08/02/2015 11:14:16</small>
The name between h4 tag different in every line also the time
I want to catch, using regex the time and the date in the line where I can find the name "adi", and as I said, there is only one line contains the name "adi".
Btw - the log is html.
This matches your target input:
(?<=^<h4>adi</h4><small>)[^<]+
See live demo.
Warning:Proceed with caution. Regex is not supposed to be used for HTML parsing.Use a parser instead!
(?<=adi</h4>\s*<small>)[^<]+
I simply can't figure this out and have been trying for awhile. I need a regex that will parse data in the following manner:
Lets say I've got an input in the following format:
www.google.com
www.google.com/
www.google.com/something
I need a regex that will parse the above three URLs (individually) to final result of:
www.google.com
www.google.com
www.google.com
However, the way it needs to match them, is based on the following:
Parse and return everything to the left of a "/" if one exists in the line
Parse and return the entire line if no "/" exists in the line
I'm new to regex, so while this may be simple, I can't figure it out.
Try the following regex:
[^/]*
Substiture /.* with nothing. It doesn't matter if there isn't a / at all since in this case the regex will not match.