Removing lines containing not alphabetic characters. [Notepad++ / Regex] [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm compiling a dictionary file from a bunch of different smaller dictionary files. The compiled list in the format:
apple
banana
carrot
But some of the files contain weird unicode characters, comments, and spaces. I want to completely remove any lines that contain any non-alphabetic characters. So for a list like this:
apple
Ϥ
#comment
banana carrot
zeta
Would become:
apple
zeta
What would be the best way to do this?
Edit: This includes removing blank lines.

you need to type ctrl+h to open the Replace window and fill it as follows:
you can then click on Replace All
If you want to remove blank lines repeat the same thing: Fill Find what by ^\s* and keep replace with empty

writeTo=open("newTable.txt","a")
for line in open("table.txt","r"):
if(not line=="\n"):
use=True
for char in line.replace("\n",""):
if(not char.isalnum() and not char==' '):
use=False
break
if(use):
writeTo.write(line)
writeTo.close()
Clarification: This is python code that requires the input directory to be table.txt and writes its output to newTable.txt

Related

Merging broken lines [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 14 days ago.
This post was edited and submitted for review 14 days ago.
Improve this question
In a text with many lines in notepad++, some lines are unintentionally broken into the next line without an end point. I want to merge lines that are more than 10 characters long that do not end with a dot(.) with of regex. Also put a space between merged lines.
For example, the following text:
tttttttttt
aaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbb.
ccccccccccccccccc
dddddddddddddddddd.
Convert to:
tttttttttt
aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbb.
ccccccccccccccccc dddddddddddddddddd.
I also tried the following regex code but it didn't work:
[^\.]\n

How to match regex pattern multiple times in Pyspark? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Below consists of email data present in the single column:
Requirement is to print from Call Example to additional details alone.
Input:
Summary:
Below are the details:
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Please check out the call details.
Second Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Some random text.
Output:
Both of the call examples needs to be populated in the new column 'Calldetails1' in two different rows using Pyspark.
Call Example:
dialFromNumber:***** dialToNumber:***** date:*** time:*** additional details:xxxx
Call Example:
dialFromNumber:*****
dialToNumber:*****
date:***
time:***
additional details:xxxx
Regex_extract which i used to print from call example to additional details:
result = df.withColumn('result',regex_extract('comments','(?s)(?=Call Example)(.?additional details:\s[\w+])',1))
It's working for one group. Please suggest options to work globally in python
As mentioned in the chat:
(?=Call Example)([\w\s:\*]+?[\S])$
(?=Call Example) will assert whether there is a string that starts with Call Example
[\w\s:*]+? - Will do a lazy check of atleast 1 or more characters until the last occurence of a character till end of line.
Extracting multiple captured groups using pySpark
https://stackoverflow.com/questions/58930893/extracting-several-regex-matches-in-pyspark
https://stackoverflow.com/questions/54597183/i-have-an-issue-with-regex-extract-with-multiple-matches

How to rename bunch of files via terminal, keeping the filenames prefix and suffix and removing wildcard in the middle? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Given +400 files such as :
Remi_Brun_-blablabla_blalala-ASpi777XisA.en.vtt
Remi_Brun_-not_important_but_here_to_nag-ZIBcQ5tMB2U.en.vtt
Remi_Brun_-still_some_wildcard_noise_here-hOxG4g05z4w.en.vtt
...
Given this regex match these titles :
(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)
I want to rename my files into filenames such :
Remi_Brun-ASpi777XisA.en.vtt
Remi_Brun-ZIBcQ5tMB2U.en.vtt
Remi_Brun-hOxG4g05z4w.en.vtt
...
How to keep the speaker name prefix, remove the variable noise at the center, then keep the finale 11 characters youtube id and the extension suffix ?
If you want to remove everything between the first and last - before the youtube id, while allowing for any nonzero-sized language code, then this will work:
rename 's/-.*-([a-zA-Z0-9-_]{11}\..+\.vtt)/-\1/' Remi*
or for a more readable answer :
rename 's/(Remi_Brun)(_.+)([a-zA-Z0-9-_]{11}.en.vtt)/$1-$3/' Remi*
Edit:
My earlier answer
rename 's/-.*-/-/' Remi* #didn't account for hyphens in youtube id

How would I go about merging two text files into a new text file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have two text files. First one looks like this.
00000000000000000000000000000000
11100000000000000000000000000000
00010000000000000000000000000000
10100000000000000000000000000000
10100000000000000000000000000000
(the empty spaces in this file are a ' ' space character)
and the other one looks like this
11100000000000000000000000000000
00010000000000000000000000000000
10100000000000000000000000000000
10100000000000000000000000000000
00010000000000000000000000000000
i'd like to insert or replace the empty lines in the first text file with the second text file
The algorithm is pretty straightforward - it follows the general approach to two-way merging that you see in all algorithms:
Open both input files, and the output file, as streams
Read lines from the first file one-by-one
If the line that you read is non-empty, copy it into the output
Otherwise, read the next line from the second file, and copy it into the output
Once the first file is exhausted, copy the rest of the second file into the output.

How to get filename structure in a folder in Matlab [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am designing a GUI in Matlab,
I have a folder called sth. It contains many files having same structure like,
filename_1_something.mat
filename_2_something.mat
In order to loop over filenames by selecting via index, I need to find a resulting string like this;
filename_%d_something.mat
So I don't need to read all the files in the dir. Two of the filenames are enough to compare strings and find the different char array item and change by %d.
Or anything different than this also appreciated.
using the regex provided by #rock321987 -
names = dir('*.mat');
num = length(names);
expression = '\w*_\d+_\w*\.mat';
for n = 1:num
str = names(n).name;
nameList{n} = regexp(str,expression,'match')
end
works on:
test_1_something.mat
test_10_something.mat
changing the regex to just \w*_\w*\.mat
works for
test_1.mat
1_test.mat
test_1_something.mat
test_10_something.mat
but also works for anything with an string joined by underscore .mat