How can I make a roman numeral list in Markdown? - list

How do I create a roman numeral list in Markdown?
Like this...
i) Higher per acre field of sugarcane.
ii) Higher sucrose content of sugarcane.
iii) Lower Labor cost.
iv) Longer crushing period.

Pandoc Supports Roman Numerals for Ordered Lists
The original Markdown doesn't support Roman numerals, and currently (as of version 0.28) neither does CommonMark. However, some Markdown flavors and processors do support them, either natively or with certain extensions enabled.
Pandoc allows you to use two spaces after a letter or Roman numeral to indicate such a list. For example:
I. Foo
A. Bar
B. Baz
II. Quux
will render the following HTML:
<ol type="I">
<li>Foo
<ol type="A">
<li>Bar</li>
<li>Baz</li>
</ol></li>
<li>Quux</li>
</ol>
This specific example doesn't require CSS or non-default extensions, but does require that you render your markup with Pandoc rather than some other processor. If you're using anything else, your mileage will vary.

Roman numeral lists are not supported in pure Markdown.
You can use CSS to define a roman numeral list with lowercase letters, although it will look slightly different:
list-style-type:lower-roman;
Result:
i. Lorem ipsum
ii. Dolor sit amet
iii. Foobar

Related

Regex to Remove Empty Line and Number

I am having trouble removing the number following an empty line using Regex. Here's the sample paragraph that I have:
1
- Lorem Ipsum is simply dummy text of
2
the printing and typesetting industry.
49
and more recently with desktop publishing software like Aldus PageMaker.
I need to remove all the numbers from the beginning of the sentence as well as the empty lines:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. and more recently with desktop publishing software like Aldus PageMaker.
This is the regex that I can think of [\n](.) ,but it can only remove one digit of number
The difficult part is to remove the number because the number of digits are not necessary 1 or 2 digits. How do I tackle this problem?
Do a regex replace of the following regex with blank:
^\d*\n
See live demo.

re.findall between two strings (but dismiss numeric digits)

I am trying to parse many txt files. The following textis just a part of a bigger txt files.
<P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0; text-align: justify">Prior to this primary offering, there has
been no public market for our common stock. We anticipate that the public offering price of the shares will be between $5.00 and
$6.00. We have applied to list our common stock on the Nasdaq Capital Market (“Nasdaq”) under the symbol “HYRE.”
If our application is not approved or we otherwise determine that we will not be able to secure the listing of our common stock
on the Nasdaq, we will not complete this primary offering.</P>
My desired output: be between $5.00 and and $6.00. So, I need to extract anything between the be betweenuntil the following . (but not taking into account the decimal 5.00 point!). I tried the following (Python 3.7):
shareprice = re.findall(r"be between\s\$.+?\.", text, re.DOTALL)
But this code gives me: be between $5. (stops at the decimal point). I initially add a \s at the end of the string to require a white space after the . which would keep the 5.00 point decimal, but many other txt files do not have a white space right after the ending . of the sentence.
Is there anyway I can specify in my string that I want to "skip" numeric digits after the \.?
Thank you very much. I hope it was clear.
Best
After parsing the plain text out of the HTML, you may consider matching any 0+ chars as few as possible followed with a . that is not followed with a digit:
r"be between\s*\$.*?\.(?!\d)"
See the regex demo.
Alternatively, if you only want to ignore the dot STRICTLY in between two digits you may use
r"be between\s*\$.*?\.(?!(?<=\d\.)\d)"
See this regex demo. The (?!(?<=\d\.)\d) makes sure the \d\.\d pattern is skipped up to the first matching ., and not just \.\d.

Converting extracted text string to date where string varies in length in Postgres

I have a materialized view of a text column that extracts a string of numbers representing a date.
The materialized view is created using the following function:
(regexp_replace(left(substring(lower(replace(content,' ','_')) from 're-inspection_date:_(.*)_'),10),'\D','','g'))
And outputs a text string in the format of MMDDYYYY except it does not account for leading zeroes for single-digit months and days.
When I try to use the "to_date" function specifying the format MMDDYYYY using the following:
(to_date(regexp_replace(left(substring(lower(replace(content,' ','_')) from 're-inspection_date:_(.*)_'),10),'\D','','g'),'MMDDYYYY'))
I get the error "date/time field value out of range: '12122018'".
I believe the issue is due to one or both of the following reasons:
The resulting strings from my current regexp in the materialized view vary in length (e.g. 12212018 8222018 962018) due to my regexp removing all non-integer characters. The dates are 6, 7 or 8 digits long.
As a result, I haven't yet been able to come up with a way of inserting a delimiter between the month/day/year values.
Is there a way to make change these output strings to date format without changing my regexp?
If not, how could I change my regexp for extracting these values?
Bear in mind that the date I'm after in the source text is formatted as 12/1/2018 and also doesn't account for leading 0's in days or months. Also, there is another date preceding the target date in the text formatted the same way.
Here is a sample of the source text:
PLACEHOLDER TEXT FOR REDACTED STUFF BLAH BLAH BLAH
**** Loremipsum
11/28/2018 4: 21: 37 PM ****1 of 2 Facility Information Permit
Number: 12-34-56789 Name of Facility: Dolor sit amet-consectetur
Address: 123 Fake Street City, Zip: adipiscing elit12345 RESULT: sed
Do Eiusmod tempor: by 8: 00 AM Re-Inspection Date: 12/4/2018 Type: Blah-Type Stuff Etc: Dolor sit amet-consectetur...
Where the "Re-Inspection Date: 12/4/2018" is what I'm after.
I'm on Postgres 11.
Kaushik Nayak is correct I guess. I get the same thing with this regex using a positive lookbehind (?<= Re-Inspection Date:) and allowing for any number of integers [0-9]* seperated with one slash /{1}
SELECT to_date(substring('string'
from '(?<=Re-Inspection Date: )[0-9]*/{1}[0-9]*/{1}[0-9]*'), 'mm/dd/yyyy');
You may specify varying lengths of integers using the repetition {} pattern
select to_date(substring(lower(content)
from 're-inspection date:\s*(\d{1,2}/\d{1,2}/\d{4})' ),'mm/dd/yyyy') from t
Demo

Regex to to add periods after single capitals but only up to "|" in line

I'm working on a 3.75 million line text catalog of Authors names and titles in Editpad Pro. I need to standardize the authors initials to have periods after them.
The catalog has the authors name and book titles separated by a vertical bar "|" character, like this:
A N Author|A Title
A. N. Name|A Blah
Some A Name|Blah A Lot
A Name|Blah I
Name A|I Blah
B O'Name|A Book
Normally in Calibre I use this regex to standardize the initials
\b([A-Z])\.?\s?(?!'|\-|\.)\b
Replace:"\1. "
but here I need it to only work up to the vertical bar "|" character, and not make any changes to the titles. I cannot seem to get anything to work on all the above authors names without it also changing the titles.
Results I'm looking for:
A. N. Author|A Title
A. N. Name|A Blah
Some A. Name|Blah A Lot
A. Name|Blah I
Name A.|I Blah
B. O'Name|A Book
Thanks.
Add to your regex a positive lookahead:
(?=.*\|)
It means: Somewhere later in the line there must be a |.
It works as long as there is a single | in the line, but your source
text sample meets this condition.
Single letters before it are matched, single letters after it aren't.

Regex add tag to subtitles

I have a subtitle file of a movie, like below:
2
00:00:44,687 --> 00:00:46,513
Let's begin.
3
00:01:01,115 --> 00:01:02,975
Very good.
4
00:01:05,965 --> 00:01:08,110
What was your wife's name?
5
00:01:08,943 --> 00:01:12,366
- Mary.
- Mary, alright.
6
00:01:15,665 --> 00:01:18,938
He seeks the spirit
of Mary Browning.
7
00:01:20,446 --> 00:01:24,665
Mary, we invite you
into our circle.
8
00:01:28,776 --> 00:01:32,834
Mary Browning,
we invite you into our circle.
....
Now I want to match only the actual subtitle text content like,
- Mary.
- Mary, alright.
Or
He seeks the spirit
of Mary Browning.
including the special characters, numbers and/or newline characters they may contain. But I don't want to match the time string and serial numbers.
So basically I want to match all lines that contains numbers and special characters only with alphabets, not numbers and special characters which are alone on other lines like time-string and serial numbers.
How can I match and add tag <font color="#FFFF00">[subtitle text any...]</font> to each subtitle I matched with Regex's help ?
Means like below:
<font color="#FFFF00">He seeks the spirit
of Mary Browning.</font>
Well I just figured out by checking and analysing carefully, the key to match all the subtitle text lines.
First from any subtitle(.srt) file I have to remove unnecessary "line-feed" characters, i.e. \r.
Find: \r+
Replace with:
(nothing i.e. null character)
Then I just have to match those lines not starting with digits & newlines(i.e. blank lines) at all and then replace them with their own text wrapped around with <font> tag with color values as below:
Find: ^([^\d^\n].*)
Replace with: <font color="#FFFF00">\1</font>
(space after colon are just for better presentation and not included in code).
Hope this helps everyone head-banging with subtitles everyday.