How to avoid unrelated data from postgresql search [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to get the data to contain keyword of both "LED" and "car"
select count ( * ) from test_eu where eng_discription ~ '.* led .* AND .* car .*';
When I search PostgreSQL with the above code, results include those unrelated data like
so-called cardboard
carefully installed
In order to avoid this, I thought both sides of the searching keyword contain space " " solve this problem.
regex of space is
\s
so I made this code
select count ( * ) from test_eu where eng_discription ~ '\sled\s and \scar\s';
but still does not work.
How should I modify my code?

Assuming you want to check for the presence of both LED and car, anywhere in the description column, you could try:
SELECT COUNT(*) AS cnt
FROM test_eu
WHERE eng_discription ~* '\yled\y' AND eng_discription ~* '\ycar\y';

Related

How to match regex pattern multiple times in Pyspark? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Below consists of email data present in the single column:
Requirement is to print from Call Example to additional details alone.
Input:
Summary:
Below are the details:
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Please check out the call details.
Second Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Some random text.
Output:
Both of the call examples needs to be populated in the new column 'Calldetails1' in two different rows using Pyspark.
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Regex_extract which i used to print from call example to additional details:
result = df.withColumn('result',regex_extract('comments','(?s)(?=Call Example)(.?additional details:\s[\w+])',1))
It's working for one group. Please suggest options to work globally in python
As mentioned in the chat:
(?=Call Example)([\w\s:\*]+?[\S])$
(?=Call Example) will assert whether there is a string that starts with Call Example
[\w\s:*]+? - Will do a lazy check of atleast 1 or more characters until the last occurence of a character till end of line.
Extracting multiple captured groups using pySpark
https://stackoverflow.com/questions/58930893/extracting-several-regex-matches-in-pyspark
https://stackoverflow.com/questions/54597183/i-have-an-issue-with-regex-extract-with-multiple-matches

How to rename bunch of files via terminal, keeping the filenames prefix and suffix and removing wildcard in the middle? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Given +400 files such as :
Remi_Brun_-blablabla_blalala-ASpi777XisA.en.vtt
Remi_Brun_-not_important_but_here_to_nag-ZIBcQ5tMB2U.en.vtt
Remi_Brun_-still_some_wildcard_noise_here-hOxG4g05z4w.en.vtt
...
Given this regex match these titles :
(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)
I want to rename my files into filenames such :
Remi_Brun-ASpi777XisA.en.vtt
Remi_Brun-ZIBcQ5tMB2U.en.vtt
Remi_Brun-hOxG4g05z4w.en.vtt
...
How to keep the speaker name prefix, remove the variable noise at the center, then keep the finale 11 characters youtube id and the extension suffix ?
If you want to remove everything between the first and last - before the youtube id, while allowing for any nonzero-sized language code, then this will work:
rename 's/-.*-([a-zA-Z0-9-_]{11}\..+\.vtt)/-\1/' Remi*
or for a more readable answer :
rename 's/(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)/$1-$3/' Remi*
Edit:
My earlier answer
rename 's/-.*-/-/' Remi* #didn't account for hyphens in youtube id

Extracting date from the format [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am struggling through this date extraction. I have a date like this
("D("yyyy-mm-dd")).
I want to get this "yyyy-mm-dd" and I cannot strip ("D(") this also because I have this format in other places so I tried like this
first searching the string but I am not sure if I am on right track
eg. intabc = istrdate.SearchSubString("D(");
so please suggest how can I get this value.
Input is
"(D(YYYY-MM-DD))"
OUTPUT that I want
(YYYY-MM-DD)
What i have done(not correct way I think )
intabc = istrdate.SearchSubString("D(");
you can use substr() and string::erase() functions in c++98
string str = "\"D(\"yyyy-mm-dd\")";
string result = str.substr(3);
result.erase(result.end() - 1)
result.erase(result.end() - 1)
if you are using c++11 you can also use string::pop_back() method.

Replace the words "can't, don't" by "can not, do not" using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I need to replace words like "{can't, don't, won't }" by "{can not, do not, would not}" using python
The problem is:
"can't" can be detected by checking suffix "n't", so we can replace "n't" by "not"
But how can we transform "ca" to "can" as when we split "can't" it should be transformed to "can not"?
Since the rules of English are large and sometimes inconsistent, your best bet is probably just to set up full word maps rather than trying to figure out on the fly which letters are represented by the apostrophe.
In other words, a dictionary with values like:
can't -> can not
don't -> do not
won't -> will not
:
oughtn't -> ought not

How to get filename structure in a folder in Matlab [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am designing a GUI in Matlab,
I have a folder called sth. It contains many files having same structure like,
filename_1_something.mat
filename_2_something.mat
In order to loop over filenames by selecting via index, I need to find a resulting string like this;
filename_%d_something.mat
So I don't need to read all the files in the dir. Two of the filenames are enough to compare strings and find the different char array item and change by %d.
Or anything different than this also appreciated.
using the regex provided by #rock321987 -
names = dir('*.mat');
num = length(names);
expression = '\w*_\d+_\w*\.mat';
for n = 1:num
str = names(n).name;
nameList{n} = regexp(str,expression,'match')
end
works on:
test_1_something.mat
test_10_something.mat
changing the regex to just \w*_\w*\.mat
works for
test_1.mat
1_test.mat
test_1_something.mat
test_10_something.mat
but also works for anything with an string joined by underscore .mat