How to match regex pattern multiple times in Pyspark? [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Below consists of email data present in the single column:
Requirement is to print from Call Example to additional details alone.
Input:
Summary:
Below are the details:
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Please check out the call details.
Second Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Some random text.
Output:
Both of the call examples needs to be populated in the new column 'Calldetails1' in two different rows using Pyspark.
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Regex_extract which i used to print from call example to additional details:
result = df.withColumn('result',regex_extract('comments','(?s)(?=Call Example)(.?additional details:\s[\w+])',1))
It's working for one group. Please suggest options to work globally in python

As mentioned in the chat:
(?=Call Example)([\w\s:\*]+?[\S])$
(?=Call Example) will assert whether there is a string that starts with Call Example
[\w\s:*]+? - Will do a lazy check of atleast 1 or more characters until the last occurence of a character till end of line.
Extracting multiple captured groups using pySpark
https://stackoverflow.com/questions/58930893/extracting-several-regex-matches-in-pyspark
https://stackoverflow.com/questions/54597183/i-have-an-issue-with-regex-extract-with-multiple-matches

Related

Conditional Regex for Percentage based values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I've never been very good at regex, but I really need to grab the percentage information from these log entries; however, the warn/critical message moves around depending on where the warning was located in either the In or the Out utilization. I just can't figure out the regex. Here are two example entries that show both in and out issues:
["XXXXXXX"], (up), MAC: XX:XX:XX:XX:XX:XX, Speed: 2 GBit/s, In: 0 Bit/s (0%), Out: 6.53 GBit/s (warn/crit at 1.6 GBit/s/1.8 GBit/s) (326.45%)(!!)
["XXXXXXX"], (up), MAC: XX:XX:XX:XX:XX:XX, Speed: 2 GBit/s, In: 0 Bit/s (warn/crit at 1.6 GBit/s/1.8 GBit/s) (95.45%), Out: 6.53 GBit/s (32.00%)(!!)
Ultimately I need to use capture groups to capture both the in and out utilization percentage. But every regex I try only finds a single percentage. Help on this would be greatly appreciated. Thanks in advance.
EDIT SHOWING EXPECTED RESULT:
for each line the regex capture groups would identify in and out so the program can see both the in and out utilization. The program is expecting a key value pair from every log entry like the following:
IN:0% OUT:326.45%
IN:95.45% OUT:32.00%
Do you need something like this?
In:.+\(([0-9\.]+%)\).+Out:.+\(([0-9\.]+%)
If you just need to pull out values with percentage information, then it can help
https://regex101.com/r/B9pZeO/1

How to rename bunch of files via terminal, keeping the filenames prefix and suffix and removing wildcard in the middle? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Given +400 files such as :
Remi_Brun_-blablabla_blalala-ASpi777XisA.en.vtt
Remi_Brun_-not_important_but_here_to_nag-ZIBcQ5tMB2U.en.vtt
Remi_Brun_-still_some_wildcard_noise_here-hOxG4g05z4w.en.vtt
...
Given this regex match these titles :
(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)
I want to rename my files into filenames such :
Remi_Brun-ASpi777XisA.en.vtt
Remi_Brun-ZIBcQ5tMB2U.en.vtt
Remi_Brun-hOxG4g05z4w.en.vtt
...
How to keep the speaker name prefix, remove the variable noise at the center, then keep the finale 11 characters youtube id and the extension suffix ?
If you want to remove everything between the first and last - before the youtube id, while allowing for any nonzero-sized language code, then this will work:
rename 's/-.*-([a-zA-Z0-9-_]{11}\..+\.vtt)/-\1/' Remi*
or for a more readable answer :
rename 's/(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)/$1-$3/' Remi*
Edit:
My earlier answer
rename 's/-.*-/-/' Remi* #didn't account for hyphens in youtube id

Replace the words "can't, don't" by "can not, do not" using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I need to replace words like "{can't, don't, won't }" by "{can not, do not, would not}" using python
The problem is:
"can't" can be detected by checking suffix "n't", so we can replace "n't" by "not"
But how can we transform "ca" to "can" as when we split "can't" it should be transformed to "can not"?
Since the rules of English are large and sometimes inconsistent, your best bet is probably just to set up full word maps rather than trying to figure out on the fly which letters are represented by the apostrophe.
In other words, a dictionary with values like:
can't -> can not
don't -> do not
won't -> will not
:
oughtn't -> ought not

How to get filename structure in a folder in Matlab [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am designing a GUI in Matlab,
I have a folder called sth. It contains many files having same structure like,
filename_1_something.mat
filename_2_something.mat
In order to loop over filenames by selecting via index, I need to find a resulting string like this;
filename_%d_something.mat
So I don't need to read all the files in the dir. Two of the filenames are enough to compare strings and find the different char array item and change by %d.
Or anything different than this also appreciated.
using the regex provided by #rock321987 -
names = dir('*.mat');
num = length(names);
expression = '\w*_\d+_\w*\.mat';
for n = 1:num
str = names(n).name;
nameList{n} = regexp(str,expression,'match')
end
works on:
test_1_something.mat
test_10_something.mat
changing the regex to just \w*_\w*\.mat
works for
test_1.mat
1_test.mat
test_1_something.mat
test_10_something.mat
but also works for anything with an string joined by underscore .mat

Recognizing patterns given a set of sentences [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a text file with lots of sentences. These sentences can occur in patterns. How do I recognize these patterns?
For example:
i woke up in the morning
i went to school
i played football
i came back home
i woke up in the morning
i went to school
i played basketball
At this point I want the program to say that "I played football" should have appeared.
This task seems to little bit complicate,but you can try this simple code for understanding or if finds it useful you can further implement it::
//the sentences/input input String
String sampleString1="xyz";
String[] sampleString2=sampleString1.split(".");
for(int i=1;i<=sampleString2.length;i++){
//The pattern which you can specify to match with the sentence
if(sampleString2[i].substring(0, 14).equals(sampleString2[0].substring(0,16))){
//code to execute the matched sentence.
System.out.println("Sentence matching with pattern ::" + sampleString2[i]);
}
}
If the pattern to be matched is the first line of the sequence ,then try this code.