Regex for remove citation - regex

I searched regex for removing citation from text ( they sound strange from voice reading software ).
I want to remove from text all citation in form
(Author, 2000), (Author, in press)
and
(Author something, something 2004, Author2 2005)
But in same time not remove normal text in braces, and for ex. (Figure 3) (which might be helpful for reader).
Example of text with citations: http://journal.frontiersin.org/article/10.3389/fnhum.2014.00114/full

I've better:
\([^\)]*,[^\)]*\)
See LiveDemo

I use the one I found here:
r"([A-Z][\w\-]+ )?\((\D*\d{4}(: ?[\d\-]*)*(, \d{4}(: ?[\d\-]*)*)*;?)*\)"

Best i found is
[\(].?[^\)]*?[\d\d\d\d]{1}.*?[\)]{1}
It might not be optimal, because it selects (1) which in some cases might not be what reader want, still it is close to optimum.

Related

Regex: How to extract dialogue tags from fiction, with speaker information

Totally stumped on this. I need help extracting dialogue from a story so I can hand it off for narration.
Basically, this is a problem where I have a big chunk of text (a novel), and I want to extract all the dialogue from the text in a format I can pipe into a spreadsheet.
But, I also want, if it exists, the speaker information as well. So, given a string like:
'"I'm really hungry," she said.'
I would like the values returned as:
[ "I'm really hungry", "she said" ]
If there is no dialogue, as in this example:
"I'm not hungry."
the result would just be:
["I'm really hungry."]
Is this madness? Is it even possible? I have fooled around with this regex (am not a regex guru, knowing only enough to be dangerous):
"([^"]*)"
Which seems to get the dialogue tags, but doesn't get the speaker info. Any advice in how to get the speaker info as well would be greatly appreciated. I've been wrestling with this for awhile now.
Maybe a better approach would be to get the dialogue in one field, and the entire paragraph it is found in as the second field. That could also work, but I have no idea where to start with this.
Basically I want to put these all into a spreadsheet so I can hand them off to a narrator with enough context that they know whose dialogue is who's in the story.
Any help is greatly appreciated!
It definitely is possible
Look at this regex: ^.*?'?(?P<line>\".*\")(?P<actor>[^'\n]*)'?.*?$
demo here: https://regex101.com/r/UCRZwY/5
It basically marks the outer quotes as optional, but if it does find them, stores whatever provided as '$actor' (and the line as '$line') these are of course just names i've given them, feel free to change
Note updated to include such text as part of regular sentence, see example in demo

how to match a sentence having particular word in different patterns

We have a problem here...
We have a text having different patterns of sentences.
We want to get the sentence having a particular word.
Eg:
One further point, by way of providing another model. The analysis in
the second paragraph could lead in the following direction. 'The
Destructors' deals with, obviously, destruction, whilst the book of
Genesis deals with creation. The vocabulary is similar: Blackie
notices that 'chaos had advanced', an ironic reversal of God's
imposing of form on a void. Furthermore, the phrase 'streaks of light
came in through the closed shutters where they worked with the
seriousness of creators', used in the context of destruction, also
parodies the creation of light and darkness in the early passages of
the Biblical book. Greene's ironic use of the vocabulary of the Bible
might be making the point that, for him, the Second World War
signalled the end of a particular Christian era. Now, it is perfectly
arguable that the rise of fascism is linked to this, or that it is the
cause. The cult of personality and secular leadership has, for Greene,
taken over from the key role of the church in Western societies. In
this way the two main themes identified above - the tension between
individual and community, and religion - are linked. In terms of essay
writing this link could well be made after the discussion of the theme
of the individual and the community, and its links with the theme of
leadership. This might be the general conclusion to the essay. After
thoughtful consideration and interpretation a student may well decide
that this is what the (destructors.)' boils down to: Greene is making a
clear link between the rise of fascism and the decline of the Church's
influence. Despite the fact that fascism has been recently defeated,
Greene sees the lack of any contemporary values which could provide
social cohesion as providing the potential for its reappearance.
In the above text, we have bold words (destructors). We want to get the sentences which are having the word "destructors".
The word "destructors" can be present in different formats. Eg: (destructors), (DesTrucTors), (Des.tructors), DESTRUCTORS, destructors, des-tructors.
When we tried writing a regex to match the sentences, we are failing to get the sentences at some conditions(like we are getting half sentences, etc.,).
Could you please help us with this.
If this information doesn't help you to solve, please let us know. Will update it.
Thank you...
I'm not too sure about Python, but I believe this might work:
for match in re.finditer(r"[^.]*destructors[^.]*\.[^\w\s]*", subject, re.IGNORECASE):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
In any case, I think the regex you want is:
[^.]*destructors[^.]*\.[^\w\s]*
with the case insensitive and global flags set.
It will be helpful if you could provide the regex pattern which you have tried with so far. The best I can come up with is,
str_text='your text here containing DESTRUCTORS'
match=re.search('pass all the destructors combination here', str_text, flags=re.IGNORECASE)
Try for more patterns available for string formatting with regex here,https://docs.python.org/3/library/re.html

notepad++ regex custom length word wrap macros

The time of day that you have missed. Recently I began to study the expressive expressions, but the task before me is too complicated.
I think many people need the ability to quickly format texts of almost any type. Problem is not easy and I hope to get a solution to this problem from professionals.
If you use a limited breakdown for each line, then only large lines are broken which is much better. That is, to break only those lines that are larger than a certain size and which are more or less evenly broken Correct formatting of the text is quite Difficult to take into account a lot, I will go out for a long time. As you can see, there are small lines and they have to be sent back by connecting the previous line, but the problem is. And again have to apply formatting. But it is unclear whether this new formatting will create new problems.
All this is write in the macro via notepad ++ one after another and use.
However, it is necessary to solve the most important problems:
It is important to
I want to immediately note: textFX does not offer. My attempts to write a macro with textFX (including attempts with 75 or a number from the clipboard failed, the text was written with unreadable code), I sent a corresponding message to the site notepad ++ a day ago
Use a regular point also do not offer. The fact is that notepad ++ macro does not understand the point of replacing the indentation with spaces by default 4pcs. (Yes, the macro must also indent the text, but this is all right)
About soft wordwrap (not formatted just click) is also not worth talking about.
inscrease line indent 2-4
split lines
descrease line indet 2-4
^(.?)$\s+?^(?=.^\1$) null
2017 (\d+:\d+) 17
subscribe .+$ null
(^.{1,30}$)\R \1
\r\r\n \n
^ spaces\1
i create macro and solved more problems

how to do vi search and replace within a range in sublime text

I enabled vintage mode on sublime text.. but there are some important vim commands that are lacking.. so let's say I want to do a search and replace like so
:10,25s/searchedText/toReplaceText/gc
so I wanna search searchedText and replace it with toReplaceText from lines 10 to 25 and be prompted every time (ie yes/no)..
how do I do this with Sublime Text? everytime I hit : it gives me this funny menu.. any way around that?
If you so much would like to see vim in action, try the other way around; ie enable sublime stuff in vim.
Here are 2 links that might come in handy:
subvim and vim multiple cursors (Which is one amazing feature in sublime that lacks in native vim).
Hope that gets you creative ;)
Unfortunately vintage mode does not understand ranges. The best way I know how to do this is with incremental search:
highlight the first occurrence of searchedText on line 10
hit cmnd/ctrl D to have Sublime find the next occurence
If you you want the next occurrence ignored, hit cmnd/ctrl K
Once you have highlighted all the occurrences, you can replace them all at once, as Sublime has left cursors behind on every occurrence you opted in on.
VintageEx gives you a Vim-like command-line where you can at least perform substitutions. Well, that's how far I went when trying it. I don't know how extended the subset of Vim commands it implements is but I'd guess that it's not as large as the original and, like with Vintage, probably different and unsettling enough to keep a relatively experienced Vimmer out.
Anyway, I just tried it again and indeed you can more or less do the kind of substitution you are looking for, which instantly makes ST a lot more useful:
:3,5s/foo/bar/g
:.,5s/bar/foo/g
:,5/foo/bar/g
:,+5/bar/foo/g
Unfortunately, it doesn't support the /c flag.
a plugin named vintageous offers more features including search function. It's available in package control
although this question is answered.. i figured this would add some value
the full functionality of vi search/replace is possible with the ruby mine IDE, once you install the ideavim plugin. The idea is perfect for ruby on rails by the way.

Controlling Word Wrap in a container

I have a peculiar problem. I have an email group that pipes emails to a message board. The word wrap of the emails varies. In yahoo, the messages tend to fill the entire container on the message board. But in all other mail clients, only part of the container width is filled, because the original mail was wrapped. I want all of the email messages to fill the entire width of the container. I've thought of two possible solutions: CSS, or a Regex that eliminates line breaks. Because I am only a garage mechanic (at these sorts of things), I simply cannot get the job done. Any help out there?
Here is a link that shows the issue: http://seanwilson.org/forum/index.php?t=msg&th=1729&start=0&S=171399e41f2c10c4357dd9b217caaa3f
(compare the message of "sean" with that of "rob." One fills the container, the other not).
Can any of you suggest how to get all the mail to fill the container?
You gave too little information - what programming language are you using - PHP/Javascript/anything different?
I think you only need to replace \n, \r and \r\n with whitespace. PHP code for that:
$nowrap = str_replace('\r\n', ' ', $nowrap);
$nowrap = str_replace('\r', ' ', $nowrap);
$nowrap = str_replace('\n', ' ', $nowrap);
You can do that analogically in other languages (for JS see string.replace method: http://www.tizag.com/javascriptT/javascript-string-replace.php).
Depending on the situation (people always seem to add 2 linebreaks between paragraphs), you could say the problem is: replace all newlines not directly preceded or followed by a newline with a space.
//just to be sure, remove \r's
$string = str_replace("\r",'',$string);
$string = preg_replace('/(?<!\n)\n(?!\n)/',' ',$string);
While allowing \r's:
$string = preg_replace('/(?<!\r|\n)\r?\n(?!\r|\n)/',' ',$string);
Edit: nevermind: do not use: while people tend to write their email text in paragraphs, you will break their signature / signoff with this regex. One could fiddle around with a minimum linelength before deeming it 'breakable' (i chose 63), but fiddly it will be:
$string = preg_replace('/([^\r\n]{63,})\r?\n(?!\r|\n)/','$1 ',$string);
The problem is: there are no assurance the linebreak wasn't intended. With a fiddleable line-length you could base it on average users, but the question is: what do they mind more: the differences between breaking & non-breaking paragraphs, or the breaking of their signatures?
Thanks for getting back so quickly!
The discussion board uses php (and also CSS). The only trouble is that I am somewhat limited in my ability to tinker with its programing. If I am to do this at my current level of skilty, I have only one of two options.
using a preg-replace in php. The discussion board allows us to do this from a control panel. So If I could do it with one preg-replace statement, it should work.
Would Wrikken's solution work if I do not remove \r's? Because that seems to be spot on. (could the \r's be added to the preg-replace?)
I had hoped the solution could come through a css property of some sort. I guess that isn't possible.
Thanks so much for your help!
[NOTE: thanks so much for your help! The solution worked!!! I changed the number to 53 or so. It needed to be a little smaller. I don't care that a rare, long signature lines may lose its carriage return. That's a small price to pay for a full message box! You easily saved me several days of learning something that was bound to be moderately frustrating, Thanks so much for that quick fix. I am joyous at the help I received here.]