I have the following sample text file with all my references which I use for citation in another software (LaTex). I want to remove the "abstract" field and its contents to help reduce the file-size and make its content more relevant.
The sample text is given below:
doi = {10.3389/fsufs.2021.575056},
abstract = {Agriculture has come under pressure to meet global food demands, whilst having to meet economic and ecological targets. This has opened newer avenues for investigation in unconventional protein sources. Current agricultural practises manage marginal lands mostly through animal husbandry, which; although effective in land utilisation for food production, largely contributes to global green-house gas (GHG) emissions. Assessing the revalorisation potential of invasive plant species growing on these lands may help encourage their utilisation as an alternate protein source and partially shift the burden from livestock production; the current dominant source of dietary protein, and offer alternate means of income from such lands. Six globally recognised invasive plant species found extensively on marginal lands; Gorse (
Ulex europaeus
), Vetch (
Vicia sativa
), Broom (
Cytisus scoparius
), Fireweed (
Chamaenerion angustifolium
), Bracken (
Pteridium aquilinum
), and Buddleia (
Buddleja davidii
) were collected and characterised to assess their potential as alternate protein sources. Amino acid profiling revealed appreciable levels of essential amino acids totalling 33.05 ± 0.04 41.43 ± 0.05, 33.05 ± 0.11, 32.63 ± 0.04, 48.71 ± 0.02 and 21.48 ± 0.05 mg/g dry plant mass for Gorse, Vetch, Broom Fireweed, Bracken, and Buddleia, respectively. The availability of essential amino acids was limited by protein solubility, and Gorse was found to have the highest soluble protein content. It was also high in bioactive phenolic compounds including cinnamic- phenyl-, pyruvic-, and benzoic acid derivatives. Databases generated using satellite imagery were used to locate the spread of invasive plants. Total biomass was estimated to be roughly 52 Tg with a protein content of 5.2 Tg with a total essential amino acid content of 1.25 Tg ({\textasciitilde}24\%). Globally, Fabaceae was the second most abundant family of invasive plants. Much of the spread was found within marginal lands and shrublands. Analysis of intrinsic agricultural factors revealed economic status as the emergent factor, driven predominantly by land use allocation, with shrublands playing a pivotal role in the model. Diverting resources from invasive plant removal through herbicides and burning to leaf protein extraction may contribute toward sustainable protein, effective land use, and achieving emission targets, while simultaneously maintaining conservation of native plant species.},
doi = {10.1186/s12864-016-3367-x},
abstract = {Background: Propionibacterium freudenreichii is an Actinobacterium widely used in the dairy industry as a ripening culture for Swiss-type cheeses, for vitamin B12 production and some strains display probiotic properties. It is reportedly a hardy bacterium, able to survive the cheese-making process and digestive stresses.
Results: During this study, P. freudenreichii CIRM-BIA 138 (alias ITG P9), which has a generation time of five hours in Yeast Extract Lactate medium at 30 °C under microaerophilic conditions, was incubated for 11 days (9 days after entry into stationary phase) in a culture medium, without any adjunct during the incubation. The carbon and free amino acids sources available in the medium, and the organic acids produced by the strain, were monitored throughout growth and survival. Although lactate (the preferred carbon source for P. freudenreichii) was exhausted three days after inoculation, the strain sustained a high population level of 9.3 log10 CFU/mL. Its physiological adaptation was investigated by RNA-seq analysis and revealed a complete disruption of metabolism at the entry into stationary phase as compared to exponential phase.
Conclusions: P. freudenreichii adapts its metabolism during entry into stationary phase by down-regulating oxidative phosphorylation, glycolysis, and the Wood-Werkman cycle by exploiting new nitrogen (glutamate, glycine, alanine) sources, by down-regulating the transcription, translation and secretion of protein. Utilization of polyphosphates was suggested.},
language = {en},
I want to prune out the abstract and all its contents. So the corresponding output should look like:
doi = {10.3389/fsufs.2021.575056},
doi = {10.1186/s12864-016-3367-x},
language = {en},
I am trying to achieve this using the following 'sed' command: sed 's/\s*abstract.*(\n*.*)*.*[$}]// gm' Test.txt
But it does not seem to work. I have checked using online tools such as https://regex101.com/, and it seems to select the relevant text. But when I try to execute it on my laptop, it doesn't work properly.
I am running this on a Lenovo Thinkpad, MXLinux.
Using GNU sed
$ sed -Ez 's/abstract =[^}]*}([^}]*\.})?,\n +?//g' input_file
doi = {10.3389/fsufs.2021.575056},
doi = {10.1186/s12864-016-3367-x},
language = {en},
Enabling extended functionality -E and separating lines by nul chars -z, you can then find the match starting from abstract =
[^}]*} - Match up till then next occurrence of } and include the curly brace
([^}]*\.)? - This is an optional condition, as above, match till the next occurance of curly brace, but this time, ensure there is a full stop before the curly brace.
\n - Include the newline in the match to be removed.
+? - Another optional condition, if there is one or more spaces after the newline, remove them also.
The g flag at the end will repeat the removal of the match as many times as it finds it.
This might work for you (GNU sed):
sed -n '/abstract = {/{:a;/},$/b;n;ba};p' file
Turn off implicit printing -n.
If a line contains abstract = {, as long as the current line does not end in },, replace the current line with the next and if it does match, then effectively delete it.
Otherwise print all other lines.
In GNU awk you could try following awk code. Written and tested in GNU awk. Using RS variable of GNU awk to mention regex in it and get the required output as per OP's request.
awk -v RS='(^[[:space:]]*|\n[[:space:]]*)doi = {[^}]*},|[[:space:]]+language = {en},' '
RT{ print RT }
' Input_file
Here is the Online demo for above code(NOTE: Regex online uses non-capturing group, which is not supported by awk, that's mentioned in their only for understanding purposes).
I am trying to knit a .Rmd file to a word file, where I need to include equations and their numbers.
The code I tried looks like below:
$$
\begin{cases}
\tag{1}
\frac{dX}{dt}=a\\\frac{dY}{dt}=b
\end{cases}
$$
But this doesn't work.
There is a similar question
Equation Numbering in Rmarkdown - For Export to Word
Yet the answer doesn't work for me. I am wondering if anyone have a good solution to this.
Thanks a lot!
Welcome to SO, #user15578296.
Here is your equation ready for Word:
---
output: word_document
---
$$\begin{cases} \frac{dX}{dt}=a\\\frac{dY}{dt}=b \end{cases}$$
Regarding the numbering, you should follow the answer provided in your mentioned linked question.
I plan to use Google Sheet's conditional formatting to highlight cells where the text DOES NOT contain:
Retail
FinServ
Manufacturing
Field Service
Managed Services
Digital Transformation
Ecommerce
Data and Analytics
For the above phrases, I want to be able to add additional details, separated by an underscore (_), and have the row still NOT be highlighted. For ex: Retail_Blog should still NOT be highlighted because it begins with one of the phrases above.
To do this, I'm currently using the formula:
=regexmatch(F:F,"Retail|FinServ|Manufacturing|Field Service|Managed Services|Digital Transformation|Ecommerce|Data and Analytics")=FALSE
This formula works great for the specifications above, but I also would like the formula to do adhere to another rule.
For the phrases below, I would like the formula to highlight cells if they DON'T EXACTLY match the phrases. For ex: "Meetings" should NOT be highlighted, but "meetings," "Meeting," and "Meetings_whatever" SHOULD be highlighted.
Meetings
Website Updates
Press Release and Distribution
Calendar Planning
Also, this formula would be for the range F:F.
Formula
=regexmatch(F:F,"Retail|FinServ|Manufacturing|Field Service|Managed Services|Digital Transformation|Ecommerce|Data and Analytics|^Meetings$|^Website Updates$|^Press Release and Distribution$|^Calendar Planning$")=FALSE
Explanation
^ means match start
$ means match end
It's possibly a stupid question, but I need some help as I can't find answer.
I indent some code in ROR, then later I deleted the upper layer and wanted to move that back, but i can't find the shortcut as it's quite a lot of lines of code...
example:
originally
if abc
if def
some code
some code
end
end
but now
if def
some code
some code
end
Select the text to outdent and press Shift-Tab
Info from keybindings page of the documentation: https://docs.c9.io/docs/keybindings
I am trying to write a regex that searches for a series of items. However, One of the projects that I am using the search to find does not follow a standard naming convention all the way through so I am having issues consolidating my regex search small enough.
Here is an example.
I have
Master Project
Building Project
School Project Phase 1
School Project Phase 2
SchoolProject
Right now I have the following as my regex.
(^Master Project|^Building Project|^School Project ([]A-Z))
This works on everything EXCEPT on the single word schoolproject. I tried adding in ? between school and Project after the white space but that did not seem to help. I tried a lookaround and lookaround negative but I am not really able to get it to work with either method.
Assuming your "project names" are all on one line as shown in the example:
(Master Project|Building Project|School ?Project([ A-Za-z]* [0-9])?)
This also assume that the "Phases" come after the rest of the name, e.g. School Project Phase 1, not Phase 1 School Project.
If each "project name" is in its own string, you can add back in the ^ before "Master", "Building", etc.
http://regexr.com/3ceno