Regex - How to create a regex to check two strings with different length but one depends on the other [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
If there is a string like this "abcdefghijklm.com 80 /abcdefgh.php" where the domainname followed by http port and the sub string is first 8 digits of the domain name always and followed by ".php" (the sub-string character will change to 6 OR 8 OR 5 at times but however all those length would contain same characters of domain name and endswith .php
more examples like this,
xyzklmopqr.com 80 xyzklm.php
lkjhgfdsaq.com 80 lkjhg.php
mjuyhnbgtr.com 80 mjuyhnbg.php

This works and you can easily change the numbers
(\w{5,6}|\w{8})\w*\.com 80 \1\.php
It's a little simpler than the other guy's solution

The following should work:
(((\w{5})\w?)\w{2}?)\w*\.com 80 (\1|\2|\3)\.php
Note that this works for the specific lengths you mentioned in your question (5, 6, and 8), not for any generic length substring.
Example: http://www.rubular.com/r/NwCcihN6o6

I would try ([a-z]{6})\S* 80 \1\.php
That would work for your 6 case, you can change the number as needed for your other cases.

Related

Merging broken lines [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 14 days ago.
This post was edited and submitted for review 14 days ago.
Improve this question
In a text with many lines in notepad++, some lines are unintentionally broken into the next line without an end point. I want to merge lines that are more than 10 characters long that do not end with a dot(.) with of regex. Also put a space between merged lines.
For example, the following text:
tttttttttt
aaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbb.
ccccccccccccccccc
dddddddddddddddddd.
Convert to:
tttttttttt
aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbb.
ccccccccccccccccc dddddddddddddddddd.
I also tried the following regex code but it didn't work:
[^\.]\n

Conditional Regex for Percentage based values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I've never been very good at regex, but I really need to grab the percentage information from these log entries; however, the warn/critical message moves around depending on where the warning was located in either the In or the Out utilization. I just can't figure out the regex. Here are two example entries that show both in and out issues:
["XXXXXXX"], (up), MAC: XX:XX:XX:XX:XX:XX, Speed: 2 GBit/s, In: 0 Bit/s (0%), Out: 6.53 GBit/s (warn/crit at 1.6 GBit/s/1.8 GBit/s) (326.45%)(!!)
["XXXXXXX"], (up), MAC: XX:XX:XX:XX:XX:XX, Speed: 2 GBit/s, In: 0 Bit/s (warn/crit at 1.6 GBit/s/1.8 GBit/s) (95.45%), Out: 6.53 GBit/s (32.00%)(!!)
Ultimately I need to use capture groups to capture both the in and out utilization percentage. But every regex I try only finds a single percentage. Help on this would be greatly appreciated. Thanks in advance.
EDIT SHOWING EXPECTED RESULT:
for each line the regex capture groups would identify in and out so the program can see both the in and out utilization. The program is expecting a key value pair from every log entry like the following:
IN:0% OUT:326.45%
IN:95.45% OUT:32.00%
Do you need something like this?
In:.+\(([0-9\.]+%)\).+Out:.+\(([0-9\.]+%)
If you just need to pull out values with percentage information, then it can help
https://regex101.com/r/B9pZeO/1

How to match regex pattern multiple times in Pyspark? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Below consists of email data present in the single column:
Requirement is to print from Call Example to additional details alone.
Input:
Summary:
Below are the details:
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Please check out the call details.
Second Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Some random text.
Output:
Both of the call examples needs to be populated in the new column 'Calldetails1' in two different rows using Pyspark.
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Regex_extract which i used to print from call example to additional details:
result = df.withColumn('result',regex_extract('comments','(?s)(?=Call Example)(.?additional details:\s[\w+])',1))
It's working for one group. Please suggest options to work globally in python
As mentioned in the chat:
(?=Call Example)([\w\s:\*]+?[\S])$
(?=Call Example) will assert whether there is a string that starts with Call Example
[\w\s:*]+? - Will do a lazy check of atleast 1 or more characters until the last occurence of a character till end of line.
Extracting multiple captured groups using pySpark
https://stackoverflow.com/questions/58930893/extracting-several-regex-matches-in-pyspark
https://stackoverflow.com/questions/54597183/i-have-an-issue-with-regex-extract-with-multiple-matches

Regex (Bigquery) get specific values from STRING [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have the STRING - TX1234XT batch 44, 1111ABCDEF
TX1234XT (Can be different length)
batch 44 (number can be different length)
ABCDEF (can be a different length, but always have 1111 at the start)
What I need is to generate two columns:
BatchNumber Name
44 1111ABCDEF
1 1111SAMPLE
999 1111Example
Starting point:
First is done:
REGEXP_EXTRACT(reference, r'1111[a-zA-Z0-9_.+-]+') AS Name
Second
- REGEXP_REPLACE(REGEXP_EXTRACT(reference, r'batch [0-9_.+-]+'),r'batch ','') AS BatchNumber
SORTED ^_^
I don't really know Google Big Query, but if you want to extract the batch number and the value at the end, you could go with this regular expression:
/^.*?batch\s*(\d+),\s*(1111.+)$/
(\d+) will capture your batch id.
(1111.+) will capture the value starting with 1111.
Example here: https://regex101.com/r/SJXmIV/2

How to rename bunch of files via terminal, keeping the filenames prefix and suffix and removing wildcard in the middle? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Given +400 files such as :
Remi_Brun_-blablabla_blalala-ASpi777XisA.en.vtt
Remi_Brun_-not_important_but_here_to_nag-ZIBcQ5tMB2U.en.vtt
Remi_Brun_-still_some_wildcard_noise_here-hOxG4g05z4w.en.vtt
...
Given this regex match these titles :
(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)
I want to rename my files into filenames such :
Remi_Brun-ASpi777XisA.en.vtt
Remi_Brun-ZIBcQ5tMB2U.en.vtt
Remi_Brun-hOxG4g05z4w.en.vtt
...
How to keep the speaker name prefix, remove the variable noise at the center, then keep the finale 11 characters youtube id and the extension suffix ?
If you want to remove everything between the first and last - before the youtube id, while allowing for any nonzero-sized language code, then this will work:
rename 's/-.*-([a-zA-Z0-9-_]{11}\..+\.vtt)/-\1/' Remi*
or for a more readable answer :
rename 's/(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)/$1-$3/' Remi*
Edit:
My earlier answer
rename 's/-.*-/-/' Remi* #didn't account for hyphens in youtube id