How to rename bunch of files via terminal, keeping the filenames prefix and suffix and removing wildcard in the middle? [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Given +400 files such as :
Remi_Brun_-blablabla_blalala-ASpi777XisA.en.vtt
Remi_Brun_-not_important_but_here_to_nag-ZIBcQ5tMB2U.en.vtt
Remi_Brun_-still_some_wildcard_noise_here-hOxG4g05z4w.en.vtt
...
Given this regex match these titles :
(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)
I want to rename my files into filenames such :
Remi_Brun-ASpi777XisA.en.vtt
Remi_Brun-ZIBcQ5tMB2U.en.vtt
Remi_Brun-hOxG4g05z4w.en.vtt
...
How to keep the speaker name prefix, remove the variable noise at the center, then keep the finale 11 characters youtube id and the extension suffix ?

If you want to remove everything between the first and last - before the youtube id, while allowing for any nonzero-sized language code, then this will work:
rename 's/-.*-([a-zA-Z0-9-_]{11}\..+\.vtt)/-\1/' Remi*
or for a more readable answer :
rename 's/(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)/$1-$3/' Remi*
Edit:
My earlier answer
rename 's/-.*-/-/' Remi* #didn't account for hyphens in youtube id

Related

Merging broken lines [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 14 days ago.
This post was edited and submitted for review 14 days ago.
Improve this question
In a text with many lines in notepad++, some lines are unintentionally broken into the next line without an end point. I want to merge lines that are more than 10 characters long that do not end with a dot(.) with of regex. Also put a space between merged lines.
For example, the following text:
tttttttttt
aaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbb.
ccccccccccccccccc
dddddddddddddddddd.
Convert to:
tttttttttt
aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbb.
ccccccccccccccccc dddddddddddddddddd.
I also tried the following regex code but it didn't work:
[^\.]\n

How to avoid unrelated data from postgresql search [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to get the data to contain keyword of both "LED" and "car"
select count ( * ) from test_eu where eng_discription ~ '.* led .* AND .* car .*';
When I search PostgreSQL with the above code, results include those unrelated data like
so-called cardboard
carefully installed
In order to avoid this, I thought both sides of the searching keyword contain space " " solve this problem.
regex of space is
\s
so I made this code
select count ( * ) from test_eu where eng_discription ~ '\sled\s and \scar\s';
but still does not work.
How should I modify my code?
Assuming you want to check for the presence of both LED and car, anywhere in the description column, you could try:
SELECT COUNT(*) AS cnt
FROM test_eu
WHERE eng_discription ~* '\yled\y' AND eng_discription ~* '\ycar\y';

Replace the words "can't, don't" by "can not, do not" using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I need to replace words like "{can't, don't, won't }" by "{can not, do not, would not}" using python
The problem is:
"can't" can be detected by checking suffix "n't", so we can replace "n't" by "not"
But how can we transform "ca" to "can" as when we split "can't" it should be transformed to "can not"?
Since the rules of English are large and sometimes inconsistent, your best bet is probably just to set up full word maps rather than trying to figure out on the fly which letters are represented by the apostrophe.
In other words, a dictionary with values like:
can't -> can not
don't -> do not
won't -> will not
:
oughtn't -> ought not

Removing lines containing not alphabetic characters. [Notepad++ / Regex] [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm compiling a dictionary file from a bunch of different smaller dictionary files. The compiled list in the format:
apple
banana
carrot
But some of the files contain weird unicode characters, comments, and spaces. I want to completely remove any lines that contain any non-alphabetic characters. So for a list like this:
apple
Ϥ
#comment
banana carrot
zeta
Would become:
apple
zeta
What would be the best way to do this?
Edit: This includes removing blank lines.
you need to type ctrl+h to open the Replace window and fill it as follows:
you can then click on Replace All
If you want to remove blank lines repeat the same thing: Fill Find what by ^\s* and keep replace with empty
writeTo=open("newTable.txt","a")
for line in open("table.txt","r"):
if(not line=="\n"):
use=True
for char in line.replace("\n",""):
if(not char.isalnum() and not char==' '):
use=False
break
if(use):
writeTo.write(line)
writeTo.close()
Clarification: This is python code that requires the input directory to be table.txt and writes its output to newTable.txt

How to get filename structure in a folder in Matlab [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am designing a GUI in Matlab,
I have a folder called sth. It contains many files having same structure like,
filename_1_something.mat
filename_2_something.mat
In order to loop over filenames by selecting via index, I need to find a resulting string like this;
filename_%d_something.mat
So I don't need to read all the files in the dir. Two of the filenames are enough to compare strings and find the different char array item and change by %d.
Or anything different than this also appreciated.
using the regex provided by #rock321987 -
names = dir('*.mat');
num = length(names);
expression = '\w*_\d+_\w*\.mat';
for n = 1:num
str = names(n).name;
nameList{n} = regexp(str,expression,'match')
end
works on:
test_1_something.mat
test_10_something.mat
changing the regex to just \w*_\w*\.mat
works for
test_1.mat
1_test.mat
test_1_something.mat
test_10_something.mat
but also works for anything with an string joined by underscore .mat