Change font style for RegEx globally in LaTeX - regex

Although I know that Latex is a "semantic language", I´m trying to find a solution for the following problem.
In my 60k words document are about 250 strings I can search for via a RegEx, e.g. something like [A-Z]{2}\d\d[A-Z]{2} for a string like SE25GH.
Is it possible to define a font style for this RegEx string globally in my document, so that I can change it easily without crawling through my document or typing a defined command like \makemytextbold{SE25GH}?
Thanks.

Related

Regular expressions in ignored_words in Sublime Text 3 spell_check?

I'm trying to spellcheck a latex file. I would like the spellchecker to ignore strings containing a number. In my settings file I have
"ignored_words":
[
"textbf",
"renewenvironment",
etc...
]
If I add something like ".*[0-9].*" to "ignored_words" it doesn't seem to do anything. Is there a way to accomplish this?
It is not possible to use regex in spell checking at this point.
ST uses Hunspell as its spell checker. Adding regex to Hunspell is an open feature request. Not being closed means there is some hope that it may be on a long term enhancement list, maybe.
Until Hunspell adds this capability it seems impossible to achieve what you are seeking in ST.
Keeping an eye on the feature request may be worth it to see if there is any progress.

Stripping superscript from plaintext

I often grab quotes from articles that include citations that include superscripted footnotes, which when copied are a pain in the ass. They show up as actual letters in the text as they are pasted in plaintext and not in html.
Is there a way I could run this through a regex to take out these superscripts?
For example
In the abeginning bGod ccreated the dheaven and the eearth.
Should become
In the beginning God created the heaven and the earth.
I can't think of a way to have regex search for misspellings and a corresponding sequential set of numbers and letters.
Any thoughts? I'm also using Sublime Text 3 for the majority of my writing, but I wouldn't mind outsourcing this to an AppleScript, or text replacement app (aText, textExpander, etc.).
Matching Code vs. Matching a Screen
It's hard to tell without seeing an example, but this should be doable if you copy the text from code view, as opposed to the regular browser view. (Ctrl or Cmd-J is your friend). Since writing the rules will take time, this will only be worthwhile for large chunks of text.
In code view, your superscript will be marked up in a way that can be targetted by regex. For instance:
and therefore bananas make you smartera
in the browser view (where the a at the end is a citation note) may look like this in code view:
and therefore bananas make you smarter<span class="mycitations">a</span>
In your editor, using regex, you can process the text to remove all tags, or just certain tags. The rules may not always be easy to write, and of course there are many disclaimers about using regex to parse html.
However, if your source is always the same (Wikipedia for instance), then you can create and save rules that should work across many pages.

Export a specific line in Notepad++

I have a large XHTML file that contains a lot of code, see the below example:
<a:CreationDate>0</a:CreationDate>
<a:Creator/>
<a:ModificationDate>0</a:ModificationDate>
<a:Modifier/>
<a:name>stack</a:name>
<a:CreationDate>0</a:CreationDate>
<a:Creator/>
<a:ModificationDate>0</a:ModificationDate>
<a:Modifier/>
<a:name>user</a:name>
How can I export or select a specific line? In the example I want to have such result:
<a:name>stack</a:name>
<a:name>user</a:name>
and the rest of the code should be ignored.
okay I found my desire result:
^((?!<a:name>.*</a:name>).)*$
As it seems it is a kind of xml document if you want to search a line for example
<a:CreationDate>0</a:CreationDate>
or
<a:name>user</a:name>
you can search by the closing tags like </a:name> or </a:CreationDate>
or you can use a scripting language like php or javascript to select the line.

Is RTF text empty

Is there an easy way in C++ to tell if a RTF text string has any content, aside pure formatting.
For example this text is only formatting, there is no real content here:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}}
Loading RTF text in RichTextControl is not an option, I want something that will work fast and require minimum resources.
The only sure-fire way is to write your own RTF parser [spec], use a library like LibRTF, or you might consider keeping a RichTextControl open and updating it with new RTF documents rather than destroying the object every time.
I believe RTF is not a regular language, so cannot be properly parsed by RegEx (not unlike HTML, despite millions of attempts to do so), but you do not need to write a complete RTF parser.
I'd start with a simple string parser. Try:
Remove content between {\ and }
Remove tags. Tags begin with a backslash, \, and are followed by some text. If a backslash is followed by whitespace, it is not a tag.
The document should end with at least one closing curly brace, }
Any content left which isn't whitespace should be document content, though this may have some exceptions so you'll want to test on numerous samples of RTF.

Find Lines with N occurrences of a char

I have a txt file that I’m trying to import as flat file into SQL2008 that looks like this:
“123456”,”some text”
“543210”,”some more text”
“111223”,”other text”
etc…
The file has more than 300.000 rows and the text is large (usually 200-500 chars), so scanning the file by hand is very time consuming and prone to error. Other similar (and even more complex files) were successfully imported.
The problem with this one, is that “some lines” contain quotes in the text… (this came from an export from an old SuperBase DB that didn’t let you specify a text quantifier, there’s nothing I can do with the file other than clear it and try to import it).
So the “offending” lines look like this:
“123456”,”this text “contains” a quote”
“543210”,”And the “above” text is bad”
etc…
You can see the problem here.
Now, 300.000 is not too much if I could perform a search using a text editor that can use regex, I’d manually remove the quotes from each line. The problem is not the number of offending lines, but the impossibility to find them with a simple search. I’m sure there are less than 500, but spread those in a 300.000 lines txt file and you know what I mean.
Based upon that, what would be the best regex I could use to identify these lines?
My first thought is: Tell me which lines contain more than 4 quotes (“).
But I couldn’t come up with anything (I’m not good at Regex beyond the basics).
this pattern ^("[^"]+){4,} will match "lines containing more than 4 quotes"
you can experiment with replacing 4 with 5 or more, depending on your data.
I think that you can be more direct with a Regex than you're planning to be. Depending on your dialect of Regex, something like this should do it:
^"\d+",".*".*"
You could also use a regex to remove the outside quotes and use a better delimeter instead. For example, search for ^"([0-9]+)","(.*)"$ and replace it with \1+++++DELIM+++++\2.
Of course, this doesn't directly answer your question, but it might solve the problem.