Select n-th line with first line condition - regex

I have subtitles file which was auto-generated for one of the Youtube videos.
Here, there are 4 speeches. Every speech has number, time, first text line and second text line.
I would like to delete every first text of line in every time spans. I need it because currently when new text comes I see the old one and the new one. In other words, old text is moving up and new comes from the bottom. I would like to see only the new one.
1
00:00:02,880 --> 00:00:06,550
[empty]<--to be removed
[Music]
2
00:00:06,550 --> 00:00:06,560
[Music]<--to be removed
[empty]
3
00:00:06,560 --> 00:00:09,290
[Music]<--to be removed
my name is Maria and I'm a technical
4
00:00:09,290 --> 00:00:09,300
my name is Maria and I'm a technical<--to be removed
[empty]
What have I tried? I am only able to select: number line, time line and first text line. Somehow (?=regexp) doesn't work with my query. Here is my query:
(^\d+$\n.+$\n)
^\d+$ - starts and ends with digit elements
\n.+$ - select new line, select all elements till the end of the line
\n - select one more line but don't select elements

You may use the following regex:
^(\d+\r?\n.*?-->.*)\r?\n.+
Replace with $1. See the regex demo.
Details
^ - start of a line
(\d+\r?\n.*?-->.*) - Capturing group 1:
\d+ - 1+ digits
\r?\n - a CRLF or LF line break
.*?-->.* - a line that has --> (this is to make matching safer, your .+ can do, too, if you are sure there are no subtitle text lines that are only made up of digits)
\r?\n - CRLF or LF
.+ - 1 or more chars other than line break chars.

Related

Get tag name of the first question by using regex

I got a Problem with the following regex pattern:
m).*?^([^n]*)(modified)([^n]*)$.*
I want to replace the clipboard with
Clipboard := RegExReplace(Clipboard, "m).*?^([^n]*)(modified)([^n]*)$.*" ,"" )
Source looks like:
Ask Question Interesting 326 Featured
Hot Week Month 1 vote 0 answers 12 views
Type Guard for empty object
typescript modified 2 mins ago kremerd 312
0 votes
Expected result should be:
typescript modified 2 mins ago kremerd 312
But its replacing nothing. If this works i want to get later the tagnames ^([^n]*) by using regExMatch.
I am scripting with autohotkey (a windows open souce) from https://autohotkey.com
You want to match a line that contains a modified substring. The dot in a regex does not match the newline by default, so you need to pass the s (DOTALL) modifier (you may add it together with m, MULTILINE, modifier that makes ^ match the start of string position and $ to match the end of line position). Besides, to match non-newlines you need [^\n] (not [^n]).
To solve the issue you may use
RegExMatch(Clipboard, "s)^.*?(\n[^\n]*)(modified|asked|answered)", res)
Grab the whole line value via res, the text before the keywords via res1 and the keyword itself with res2.
Details
s) - the . now matches any char including line break chars
^ - start of the string
.*? - any 0+ chars, as few as possible
(\n[^\n]*) - Group 1 (accessed via res1 later): a newline followed with 0+ chars other than newline chars
(modified|asked|answered) - any of the three alternatives: modified, asked or answered.

Use RegEx to match data in cell in order to pull out to new rows in Excel

I have a spreadsheet containing numerous cells of data, but each cell contains numerous lines without carriage return or line feed. I want to create new rows by matching each occurrence of a ten digit number and grabbing the number and all text up until the next occurrence.
For example, this is one cell's text.
8770304350 PRINTER 4610-2CR W/IRON GRAY COVERS (2921) $750.75 2881057001 PAYMENT DEVICE - VERIFONE MX915 - WALMART CONSIGNE 8770242020 DISPLAY 4820-5GB USB W/ I/O SUPPORT IRON GRAY $907.27 8770242216 KEYPAD-MSR 3 TRACK IRON GREY $213.85 2881037020 CONSIGNED- SCANNER DS6878-SR20117WR IMAGER 2D BLUE
I want to split it into new rows each time there is a ten digit number so it would end up looking like this where each line is a new row.
8770304350 PRINTER 4610-2CR W/IRON GRAY COVERS (2921) $750.75
2881057001 PAYMENT DEVICE - VERIFONE MX915 - WALMART CONSIGNE
8770242020 DISPLAY 4820-5GB USB W/ I/O SUPPORT IRON GRAY $907.27
8770242216 KEYPAD-MSR 3 TRACK IRON GREY $213.85
2881037020 CONSIGNED- SCANNER DS6878-SR20117WR IMAGER 2D BLUE
I tried using RegEx on my own, but i was either matching just the number or the entire string and it's very complicated to me.
For example, this tried the look ahead but ended up selecting all text except first number and last selection.
(?<=[0-9]{10}).*(?=[0-9]{10})
You may use
\b\d{10}.*?(?=\s*\b\d{10}|$)
See the regex demo. If there can be line breaks, replace .*? with [\s\S]*?.
Details
\b - leading word boundary
\d{10} - 10 digits
.*? - any 0+ chars other than line break chars as few as possible
(?=\s*\b\d{10}|$) - a positive lookahead that, immediately to the right of the current location, requires
\s*\b\d{10} - 0+ whitespaces, word boundary and 10 digits
| - or
$ - end of string.

Use recoginsed data for replacing

I have column of dates in my Notepad++:
2017-06-12
2017-06-13
2017-06-14
2017-06-15
2017-06-16
2017-06-17
2017-06-18
2017-06-19
2017-06-20
2017-06-20
2017-06-21
2017-06-22
2017-06-23
2017-06-24
2017-06-25
2017-06-26
2017-06-27
2017-06-28
2017-06-29
2017-06-30
2017-07-01
2017-07-02
2017-07-03
2017-07-04
2017-07-05
2017-07-06
2017-07-07
2017-07-08
2017-07-09
2017-07-10
I need it to cut in weeks by placing \r\n after each week like :
2017-06-12
2017-06-13
2017-06-14
2017-06-15
2017-06-16
2017-06-17
2017-06-18
2017-06-19
2017-06-20
2017-06-20
2017-06-21
2017-06-22
2017-06-23
2017-06-24
2017-06-25
2017-06-26
2017-06-27
2017-06-28
2017-06-29
2017-06-30
2017-07-01
2017-07-02
2017-07-03
2017-07-04
2017-07-05
2017-07-06
2017-07-07
2017-07-08
2017-07-09
2017-07-10
I do replace by using RegEx. I find 7 days:
\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n
And now I would like to add \r\n
But how to use selected data for replace with itself plus \r\n ?
If you are sure that the first date is monday, you could that:
Ctrl+H
Find what: (?:\d{4}-\d\d-\d\d\R){7}
Replace with: $0\r\n
Replace all
In your example input there are some lines doubled. e.g. the 2017-06-20. In your example output this line is also doubled and the week-block consists of eight lines. Seven unique lines and one doubled line for 2017-06-20. I assume that all lines in the input are sorted, thus non unique lines are behind each other. Additionally I assume that the first line marks the first day of a week.
Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (((.*\R)\3*){7})
Replace With: \1\r\n
Check regular expression, do not check . matches newline
Click Replace or Replace All
Explanation
Lets explain (((.*\R)\3*){7}) from the inside out, starting at the third inner group: in the following x,y are regex-parts and do not mean literal characters.
(.*\R) the third group is just one line from start to end
(y\3*) we look for a y followed by an optional part that is captured in the third braces group, here it means a y followed by an optional number of repetitions of y, here y is the third group referenced by \3; this deals with the 2017-06-20 case
(x{7}) we match seven repetions of x, which means here seven unique rows wich can have repetitions in the block, so 8 line with one line doubled is ok

Vim - how to join lines using matching pattern

I have a txt file that contains contact info for businesses. Currently, each line contains a different piece of data for the business. I'm attempting to construct a pipe-delimited file with all the info for each business on a single line. The catch is that there are a different number of lines for each business. So the file looks like this:
Awesome Company Inc|
Joe Smith, Owner|
Jack Smith, Manager|
Phone: (555)456-2349|
Fax: (555)456-9304|
Website: www.awesomecompanyinc.com [HYPERLINK: http://www.awesomecompanyinc.com]|
* Really Cool Company|
* Line of business: Awesomesauce|
Killer Products LLC|
Jack Black, Prop|
Phone: (555)234-4321|
Fax: (555)912-1234|
1234 Killer Street, 1st Floor|
Houston, TX 77081|
* Apparel for the classy assassin|
* Fearful Sunglasses|
* Member of the National Guild of Killers since 2001|
* Line of business: Fuhgettaboutit|
etc.
So I can use :g/<pattern>/j to join lines within a pattern but I'm having trouble working out what the pattern should be. In the example above, lines 1-9 need to joined, and then lines 10-19.
The key is the lines that begin with 2 spaces and an asterisk:
* Line of business
The pattern should basically say: "Starting with the first line beginning with a letter, join all lines until the first line after the last line beginning with \ \ \*\, then repeat until the end of the file."
How would I write this? Should I maybe do it in two steps - i.e., is there a way to first join all the lines starting with letters, then all the lines starting with \ \ \*\, then join each resulting pair?
Starting with the first line beginning with a letter, join all lines until the first line after the last line beginning with \ \ *\, then repeat until the end of the file.
You can actually translate that almost literally to Vimscript:
Starting with the first line beginning with a letter is /^\a/
until the first line after the last line beginning with * is /^ \* .*\n\a: find a line starting with the bullet (^ \*), match the rest of the line (.*), and assert that the next line isn't a bulleted one (\n\a)
then repeat until the end of the file. is done via :global
Taken together:
:global/^\a/,/^ \* .*\n\a/join
Edit: nevermind, just realized that there are a bunch of settings that would have to be set for my solution to work. To make it work generally you would need
for i in range(10)
try
v/business/join
endtry
endfor
And even that assumes that there isn't a business block that has more than 1024 lines. At that point you might as well use ranges

c# split text file by changing the line number

I'm trying to split text file by line numbers,
for example, if I have text file like:
1 ljhgk uygk uygghl \r\n
1 ljhg kjhg kjhg kjh gkj \r\n
1 kjhl kjhl kjhlkjhkjhlkjhlkjhl \r\n
2 ljkih lkjhl kjhlkjhlkjhlkjhl \r\n
2 lkjh lkjh lkjhljkhl \r\n
3 asdfghjkl \r\n
3 qweryuiop \r\n
I want to split it to 3 parts (1,2,3),
How can I do this? the size of the text is very large (~20,000,000 characters) and I need an efficient way (like regex).
Another idea, you can use linq to get the groups you're after, by splitting by each first word. Note that this will take each first word, so make sure you only have numbers there. This is using the split/join antipattern, but it seems to work nice here.
var lines = from line in s.Split("\r\n".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries)
let lineNumber = line.Split(" ".ToCharArray(), 2).FirstOrDefault()
group line by lineNumber
into g
select String.Join("\n", g);
Notes:
GroupBy is gurenteed to return lines in the order they appeared.
If a block appears more than once (e.g. "1 1 2 2 3 3 1"), all blocks with the same number will be merged.
You can use a regex, but Split will not work too well. You can Match for the following pattern:
^(\d).*$ # Match first line, capture number
([\r\n]+^\1.*$)* # Match additional lines that begin with the same number
Example: here
I did try to split by$(?<=^(\d+).*)[\r\n]+^(?!\1), but it adds the line numbers as additional elementnt in the array.