Manual numbering in bookdown - r-markdown

Is it possible to control chapter/section numbering in bookdown?
e.g.
# Introduction {1}
# Theory {14}
# Methods {3}
would give the following in the output...
1. Introduction
14. Theory
3. Methods

Sorry for the late answer, but I've just bumped into the same issue.
If you globally turn off numbering, adding the line
number_sections: false
under bookdown::gitbook: to your _output.yml file, then you can number your chapters and sections in any way you like:
# 1. Introduction
# 14. Theory
# 3. Methods
To repeat: my _output.yml file contains the lines
bookdown::gitbook:
number_sections: false

I'm not sure what you are trying to do: The curly brackets at the end of a header are not to control section numbering. You can use it to manually assign an ID to a section header
e.g. # Introduction {#intro}
and/or to exclude the header from numbering
e.g. # Introduction {-} respectively at the same time # Introduction {- #intro}.

Related

Regex: Syntax pop on two linefeeds

I'm currently writing a Sublime syntax mode (YAML, for v3) for a language which has an unusual comment format.
Documentation comments:
start with the symbol # as the first character in a LOC, and
end with two newlines
A simple example is this:
# The following function returns
the opposite of what you think it does.
code...
and a worst-case example:
#
This is a comment,
this is still the same comment.
This, too. These don't matter: # foobar ##
code...
My current approach is to use the stack.
Push:
- match: '#'
scope: punctuation.definition.comment.mona
push: doc_comment
Pop:
line_comment:
- meta_scope: comment.line.mona
- match: '\n\n'
pop: true
That doesn't work though. I tried to fix this by using s, thinking that it would produce behavior like this, but it produces a Sublime error (invalid option for capture group).
How can I match this comment format correctly with S3 YAML?
Apparently I'm a bit thick today. The answer was obvious, to match on an empty line:
- match: '^$'
pop: true

How to exclude lines with ... in regular expression

I have the following table of contents and sections in my file:
1.2 Purpose .................... 8
1.3 System Overview ............ 8
1.4 Document Overview .......... 8
1.5 Definitions and Acronyms ......... 9
2.1.3.3.8 FOO
2.1.3.3.9 BAR
2.1.4 TEST
I'd like to extract the section names and ignore the lines that are part of the table of contents.
I've been trying this regular expression:
^((?:\d{1,2}\.)+(?:\d{1,2})+)\s.+(?!\.\.\.).*$
However, I keep capturing the table of contents lines.
How can I exclude the lines with the .... strings?
Thanks!
The problem here was that you were only excluding .s at a very specific place; your negative lookahead match didn't go beyond the position it was placed in. Consider instead:
^(\d{1,2}(?:\.\d{1,2})*)\s*[^.]*(?!.*\.{3}).*$
# ^^
...the characters with the carrot below them are critical: They make the negative lookahead apply not only at that specific point, but at anywhere after it as well.

Parsing whitespace-oriented conf file with Regex

I'm trying to parse a gitolite.conf file, which is a whitespace-oriented conf file with a few regexes. The worst problem is that some options might appear anywhere:
#staff = dilbert alice # line 1
#projects = foo bar # line 2
repo #projects baz # line 3
RW+ = #staff # line 4
- master = ashok # line 5
RW = ashok # line 6
R = wally # line 7
config hooks.emailprefix = '[%GL_REPO] ' # line 8
Check the "master" attribute. Some repos have them, others do not. It's a real pain.
This answer assumes a goal of extracting key/value pairs into capturing groups, where key consists of contiguous non-whitespace before = and value includes everything after = but before #, trimmed of leading/trailing whitespace.
Basic version
([^\s]+)\s*=\s*((?:\s*[^\s#]+)*)
More advanced version
The regex above doesn't handle quoted strings very well (e.g. prefix = ' Quoted with # and leading/trailing whitespace '). Regex isn't great at this kind of thing but simple cases can be handled as follows:
([^\s]+)\s*=\s*('[^']*'|"[^"]*"|(?:(?:\s*[^\s#]+)*))
Here's the demo if you need to see what is captured and play around with it more: Debuggex Demo
First, you should know that this isn't entirely possible with Regex. Regex is a great tool for parsing regular languages (including some types of configuration files), but as soon as you get into "Well, this line is actually a header line and we need all lines under it, and some lines might have this token, and others might not", it gets quite messy. I'm not saying it's impossible, but you're going to waste a lot of time debugging your Regex pattern instead of just writing a parser in whatever language you're using this with.
Second, if you're going to ask a quesiton about Regex, it is always helpful to know what you want out of the expression. Do you want to tokenize everything, do you only want the configuration keys, do you only want the comments?
That being said, I took my best guess, here's an expression to get you started:
^(?:([^=#]+?)\s.?=?\s.?([^=#]+?)\s.?(?:#|$))
With this expression, please apply the g and m flags (global and multiline). In PCRE, this would look like:
/^(?:([^=#]+?)\s.?=?\s.?([^=#]+?)\s.?(?:#|$))/gm
There are two capture groups, one is whatever is before the = sign, and the other is whatever is after. If there is no = sign, the first capture group contains everything. Anything after "#" is ignored.
Here's a fiddle to demonstrate: http://www.rexfiddle.net/eQexbZU

regex: return ini section as string

Using regex, (I am using Autohotkey, which is PCRE) how can I match the section of an ini file? I don't need to get the individual keys - just the section block.
I've come up with this, which seems to match as long as there is a section after the sought section, but if it is the last section, it fails.
iniregex := "ms)(?<=^\[keys\]).*(?=^\[)"
Example, I want to get the entire contents of the section, [keys], whilst excluding the comments and ignoring the empty lines (it should capture test=2, however, but exclude the comment on that line:
[settings]
settings=0
;settings=1
[keys]
test=0
;test=1
test=2 ;comment
test=3
[nextsection]
this section has an empty and should be caught.
there is an empty line after this line, and it should be caught, too.
eof
I found this, but I'm not sure where to put the sought section name.
You cannot achieve this with a single regexp.
What you can do is using this regexp based on your quote to extract the [keys] section without including the [keys] tag:
/^(?<=\[keys\]\r\n)(?:(?!^\[).)*(?=\r\n)/ms
Afterwards you can use this regexp for the extracted section to exclude comments/blank lines:
/^[^;\s][^;\r\n]*/gm
From your linked question, you would put the sought section name here:
(?ms)^\[keys](?:(?!^\[[^]\r\n]+]).)*
I don't think you'll be able to strip the comments out in the same regex as the capture, however. You'll have to do that in a secondary step.
Your regex fails if there is no section after [keys] because you need to put a "0 or more" type quantifier for the next section. Something like:
iniregex := "ms)(?<=^\[keys\]).*(?:(?=^\[))?"

Regular Expression break down URL into parts

I've just recently started learning Regex so i'm not sure yet about a couple of aspects of the hole thing.
Right now my web page reads in the URL breaks it up into parts and only uses certain parts for processing:
E.g. 1) http://mycontoso.com/products/luggage/selloBag
E.g. 2) http://mycontoso.com/products/luggage/selloBag.sf404.aspx
For some reason Sitefinity is giving us both possibilities, which is fine, but what I need from this is only the actual product details as in "luggage/selloBag"
My current Regex expression is: "(.*)(map-search)(\/)(.*)(\.sf404\.aspx)", I combine this with a replace statement and extract the contents of group 4 (or $4), which is fine, but it doesn't work for example 2.
So the question is: Is it possible to match 2 possibilities with regular expressions where a part of a string might or might not be there and then still reference a group whose value you actually want to use?
RFC-3986 is the authority regarding URIs. Appendix B provides this regex to break one down into its components:
re_3986 = r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"
# Where:
# scheme = $2
# authority = $4
# path = $5
# query = $7
# fragment = $9
Here is an enhanced (and commented) regex (in Python syntax) which utilizes named capture groups:
re_3986_enhanced = re.compile(r"""
# Parse and capture RFC-3986 Generic URI components.
^ # anchor to beginning of string
(?: (?P<scheme> [^:/?#\s]+): )? # capture optional scheme
(?://(?P<authority> [^/?#\s]*) )? # capture optional authority
(?P<path> [^?#\s]*) # capture required path
(?:\?(?P<query> [^#\s]*) )? # capture optional query
(?:\#(?P<fragment> [^\s]*) )? # capture optional fragment
$ # anchor to end of string
""", re.MULTILINE | re.VERBOSE)
For more information regarding the picking apart and validation of a URI according to RFC-3986, you may want to take a look at an article I've been working on: Regular Expression URI Validation
Depends on your regex implementation, but most support a syntax like
(\.sf404\.aspx|)
Assuming that's your group 4 (i.e. zero-indexed groups). The | lists two alternatives, one of which is the empty string.
You don't say if you're doing this in javascript, but if you are, the parseUri lib written by Steven Levithan does a pretty damn good job at parsing urls. You can get it from various places, including here (click on the "Source Code" tab) and here.