get text in between multiple flags using regex

get text in between multiple flags using regex - regex

i have the following text :
The first Heading This is heading 1 data
The second header : this is heading 2 data
third header this is heading 3 data
So, i am trying to write a single regex. i know for a fact that to extract data between heading 1 and heading 2, the following regex will work
The first Heading(.*?)The second header
The above will give the text "This is heading 1 data".
But, what i am trying to get is to look for all the heading's that is a regex, which will return a list as follows
["This is heading 1 data","This is heading 2 data","This is heading 3 data"]
What i had in mind was the following
The first Heading(.*?)The second header(.*?)third header (.*?)
But, i am not getting any data for the above regex. can anyone help me with the solution

This should do it:
import re
a = '''Heading 1 This is heading 1 data
Heading 2 This is heading 2 data
Heading 3 This is heading 3 data'''
print(re.findall('(?<=Heading \d\s)(.*)(?:Heading \d|$)?', a)))
#['This is heading 1 data', 'This is heading 2 data', 'This is heading 3 data']

Related

Regex to get text between 2 large spaces

I want to try and regex this text to only get "Second Baptist School" as the output by using Customer: as the set beginning for it to recognize. How would I get it so that it recognizes the beginning and gets all of the text in between the large sections of blanks?
Customer: Second Baptist School Date of Sale: 9/26/2022
Right now I'm using Customer:\s*([^ -.]+) but it only gets "Second" as the output.

You can look for 2 or more white spaces with:
Customer:\s*(.*?)\s{2,}
this should align with your above examples. The {2,} says 2 or more.
https://regex101.com/r/1HapOO/1

Using awk, how do I match pattern and variants?

I've been struggling with this for a while in regex testers but what came up as a correct regex pattern actually failed. I've got a large file, tab delimited, with numerous types of data. I want to print a specific column, with the characters XYZ, and it's subsequent values.
In the specific column I'm interested in I have values like:
XYZ
ABCDE
XYZ/WORDS
XYZ/ABCDE
ABFE
XYZ
regex tester that was successful was something like:
XYZ(.....)*
It obviously fails when implemented as:
awk '{if ($1=="XYZ(......)*") print$0}'
What regex character do I use to denote that I want everything after the backslash(/), including the original pattern (XYZ)?
Specifically, I want to be able to capture all instances of XYZ, and print the other columns that go along with them (hence the print$0). Specifically, capture these values:
XYZ
XYZ/WORDS
XYZ/ABCDE
Thank you

Setup: (assuming actual data file does not include blank lines)
$ cat x
XYZ 1 2 3 4
ABCDE 1 2 3 4
XYZ/WORDS 1 2 3 4
XYZ/ABCDE 1 2 3 4
ABFE 1 2 3 4
XYZ 1 2 3 4
If you merely want to print all rows where the first field starts with XYZ:
$ awk '$1 ~ /^XYZ/' x
XYZ 1 2 3 4
XYZ/WORDS 1 2 3 4
XYZ/ABCDE 1 2 3 4
XYZ 1 2 3 4
If this doesn't provide the expected results then please update the question with more details (to include a more representative set of input data and the expected output).

Preserve line breaks when Removing Bookmarked Lines

I have a large text file with several lines and I want to replace several of those lines with a blank line. I used regex to search for certain patterns, marked and bookmarked them, then used:Search > Bookmark > Inverse Bookmark to hopefully highlight those strings I want to blank-replace.
However, I find only Remove Bookmarked Lines and Remove Unmarked Lines, both of which strip line breaks in the text file.
Is there a way to preserve line breaks while replacing those inverse-bookmarked lines with a blank line?
Sample text (lines 1 and 6 are bookmarked for replacing with an empty/blank line):
1 Oroc-Osoc PS
2 Osiao Paglingap Elementary School
3 Osmena E/S
4 Osmena Elementary School
5 Osmena ES
6 Pablo .M. Conag CS
Expected output:
1
2 Osiao Paglingap Elementary School
3 Osmena E/S
4 Osmena Elementary School
5 Osmena ES
6

You can do any of those alternatives:
Alternative A)
Copy a space to the clpiboard with Control+C for example
Do: Search => Bookmark => Replace bookmarked lines
If you don't want to leave a space at the beginning of the lines use Alternative B)
Copy something that cannot repeated on the whole file to the clipboard <<<EOL>>> for example.
Do: Search => Bookmark => Replace bookmarked lines
Replace <<<EOL>>> by \r\n be sure to mark extended match

Count the occurrences of word by pattern in R

Perhaps an oft asked question, am royally stuck here.
From an XML File, I'm trying to search for all occurrences, their lines and the total count of occurrence of each 12 character string containing only alpha and numerals (literally alpha-numeric).
For example: if my file is xmlInput, I'm trying to search and extract all the occurrences,positions and total counts of a 12-character alpha-num string.
Example output:
String Total Count Line-Num
CPXY180D2324 2 132,846
CPXY180D2131 1 372
CPCY180D2139 1 133
I know that, I could use regmatches to get all occurrences of a string by pattern. I've been using the below for that: (Thanks to your help on this).
ProNum12<-regmatches(xmlInput, regexpr("([A-Z0-9]{12})", xmlInput))
ProNum12
regmatches give me all the matches that follow the pattern. but it doesnt give me the line numbers of where the pattern appeared. grep gives me the line numbers of all occurrences.
I thought I could use the textcnt package of library Tau but couldnt get it to run correctly. Perhaps it is not the right package?
Is there a package/library in R which will search for all words matching the pattern and return the total count of appearence and linenumers of each occurrence? If no such pacakge exists, any idea how I can do this using any of the above or better?

Without seeing your data, it is hard to offer a suggestion on how to proceed. Here is an example with some plain character strings that might help you get started on finding a solution of your own.
First, some sample data (which probably looks nothing like your data):
x <- c("Some text with a strange CPXY180D2324 string stuck in it.",
"Some more text with CPXY180D2131 strange strings CPCY180D2139 stuck in it.",
"Even more text with strings that CPXY180D2131 don't make much sense.",
"I'm CPXY180D2324 tired CPXY180D2324 of CPXY180D2324 text with CPXY180D2131 strange strings CPCY180D2139 stuck in it.")
We can split it by spaces. This is another area it might not fit with your actual problem, but again, this is just to help you get started (or help others provide a much better answer, as may be the case.)
x2 <- strsplit(x, " ")
Search the split data for values matching your regex pattern. Create a data.frame that includes the line numbers and the matched string.
temp <- do.call(rbind, lapply(seq_along(x2), function(y) {
data.frame(line = y,
value = grep("([A-Z0-9]{12})", x2[[y]],
value = TRUE))
}))
temp
# line value
# 1 1 CPXY180D2324
# 2 2 CPXY180D2131
# 3 2 CPCY180D2139
# 4 3 CPXY180D2131
# 5 4 CPXY180D2324
# 6 4 CPXY180D2324
# 7 4 CPXY180D2324
# 8 4 CPXY180D2131
# 9 4 CPCY180D2139
Create your data.frame of line numbers and counts.
with(temp, data.frame(
lines = tapply(line, value, paste, collapse = ", "),
count = tapply(line, value, length)))
# lines count
# CPXY180D2324 1, 4, 4, 4 4
# CPCY180D2139 2, 4 2
# CPXY180D2131 2, 3, 4 3
Anyway, this is purely a guess (and me killing time....)

redmine wiki - heading with auto numbering

In Redmine wiki, is there any way to use bullet point numbering in headings something like
# h1. Heading 1
## h2. Sub Heading 1
# h1. Heading 2
With an output like below.
1. Heading 1
1.1 Sub Heading 1
2. Heading 2

You could extend your redmine with that macro linked here:
http://projects.andriylesyuk.com/boards/22/topics/153-hierarcial-numbered-headers
This implements exactly what you need.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

get text in between multiple flags using regex - regex

This should do it: import re a = '''Heading 1 This is heading 1 data Heading 2 This is heading 2 data Heading 3 This is heading 3 data''' print(re.findall('(?<=Heading \d\s)(.*)(?:Heading \d|$)?', a))) #['This is heading 1 data', 'This is heading 2 data', 'This is heading 3 data']

Related

Regex to get text between 2 large spaces

Using awk, how do I match pattern and variants?

Preserve line breaks when Removing Bookmarked Lines

Count the occurrences of word by pattern in R

redmine wiki - heading with auto numbering

Categories

Resources