I'm very new to regex, I'm trying to analyse data that come from a simple text file. Before I start the data analysis, I need to make sure the format or structure of the content in the simple text file is correct, then only can continue the process. The content in the file look like this:
,file_06,,
x data,y data
-969.0,-42.18187,
-958.0,-39.62946,
-948.0,-37.748737,
-938.0,-35.73368,
-929.0,-33.9873,
-919.0,-32.24092,
-910.0,-30.76321,
-899.0,-29.01683,
-891.0,-27.40478,
-878.0,-26.19575,
-872.0,-24.986712,
-864.0,-23.24033,
-853.0,-22.16563,
Looking for help in writing the regex.
I tried to write out some regex, but I keep match the first line only. I can't match the whole content.
Regex pattern :
/(,file_[\d]*,,)\n(x data,y data)\n((-?[\d]*.[\d]*,-?[\d]*.[\d]*,?)\n)*(,,)?/g
This will work
/(?=-)(.?[^\,]*)/gm
Using positive lookahead to start at the '-' then delimiting everything by the ','.
Use
/(?=-)(.*)/gm
if you want to capture the pairs of data together.
Sample at https://regex101.com/r/a5Dk5Y/1/
Need to extract website urls from the text. Can you tell me where am I missing.
Data:
gmail.com
2.0
Dolphins.com.
B.TECH
62.1%.
github.com/XYZ
abcd.com
github.com/abcd
linkedin.com/in/abcd
abcd.wordpress.com/
https://xyz/stackoverflow.com
Regex pattern:
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w+/\-?=%.]+\.[\w+/\-?=%.]+', text)
Expected Output:
github.com/XYZ
abcd.com
github.com/abcd
linkedin.com/in/abcd
abcd.wordpress.com/
https://xyz/stackoverflow.com
Current output:
Its extracting all the items which are written in Data. Can someone tell me what changes are required in my regex to get the expected output?
I used below regex and it worked in regex101.com
.*(?:https?:\/\/)?(?:www\.)?[a-z-]+\.(?:com|org)(?:\.[a-z]{2,3})?.*
But when I use it in my code with re.findall() it returns entire text file, and if we use it with re.finditer() it says json is not serializable. Im trying to return my output in json. So what can be done here?
There is a txt file containing multiple lines with - Browser("something").page("something_else").webEdit("some").
I need to retrieve the names of the browser, page and fields (names surrounded by double quotes ) and replace the line with "something_somethingelse_some" (concatinating the names of the browser, page n filed respectively), please help.
the names can be anything so we should go with regex. Note we have to convert everything comes in the above format within the text file till the EOF..
You may try this:
^Browser\("(.*?)"\).page\("(.*?)"\).webEdit\("(.*?)"\).*$
and replace by:
$1_$2_$3
Regex Demo
I am trying to extract email details from raw text using pig.
Here's the sample data:
Sample data for email abc.123#gmail.com
Sample data for email xyz#abc.com
I am trying with REGEX method, Regular expression i took from: http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/
here's the script:
A = Load '----' using PigStorage as (value: chararray);
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(value, '^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9-]+(\\.[A-Za-z]{2,})$')) AS (f1: chararray)
dump B;
After dumping the output into the terminal, i am getting blank output:
()
()
Is there any problem in script syntax?
Please share some links also regarding regular expression writing, it would be very much helpful.
Your help is appreciated, thank you.
For following input data
abc.123#gmail.com
xyz#abc.com
Output of your code is
.123 .com
.com
So there are couple of problems in your code
You need to add parenthesis around the whole regex to capture the complete email address. The code should then work if you have only one token (word or email-id) in each line
If each input line can be a sentence, then you have to first tokenize and then on tokens you can to do regex match.
The reason that the regex you have works only on token and not on line is "^" indicates beginning of string and "$" indicates end of string, so the match is going to successful only when the entire line is an email-id which means you can have only one token per line.
I have a string and I can find the following
Kbps
Duration
Mb
Song Title
Website
http://abmp3.com/
I can't seem to find the URL i used Expresso to create the regex and used the source from the webpage to get matches but for some reason when i add this href="(.*.mp3)" to the end of the string it won't find anything. The kbps,duration,and mb are on all on the same line. The Song Title is on a different line and so is the URL
My question is how would you add the href="(.*.mp3)" to the end of the regex string?
Regex Code
":6px;"">(.* Kbps)<br>(.*)<br> (.* Mb)</div></td>\D+\S+<strong>(.*) mp3"
Need to add this to the end
href="(.*.mp3)"
Thanks in advance!
Looking at the website, it appears this would work for you:
href=\".*\.mp3\"