Unix: replace every odd | with \left| and every even | with \right| - regex

An enormous equation. You need to add \left| on the left side of corresponding |. The corresponding | you need to replace with \right|.
Equation
\begin{equation}
| \Delta w_{0} | = \frac{|w_{0}|}{2} \left( |\frac{\Delta g}{g}|+|\frac{\Delta (\Delta r)}{\Delta r}| + |\frac{\Delta r}{r}| +|\frac{\Delta L}{L}| \right)
\end{equation}
[Premises]
The amount of | is even.
No nesting. So scenario such as M_OPEN|----X_OPEN|-----X_CLOSED|------M_CLOSED| is not possible, just M_OPEN|---M_CLOSED|---H_OPEN|----H_CLOSED|.

sed -r -e 's/\|([^|]+)\|/\\left|\1\\right|/g'
But this works only if you do not have nested |...|.

Related

How to remove a vertical bar or pipe | inside double vertical bar/pipe | using sed?

I've been trying for hours to do the following to a file that I'm converting from CSV to pipe delimited. After it's been converted I want to remove only the pipe between two pipes. I don't know if that is possible.
Example:
Original input
X, Y, This is a | test for me, or
X, Y, This is a|test for me,
Original output:
| X | Y | This is a | test for me| or
|X|Y|This is a|test for me|
Desired output:
| X | Y | This is a test for me|
I have tried but I just can't do it, can't find the regexpr or sed - regexp has always been hard for me.
I'm new to C, script. I handled the conversion and also if we get something like Street name, apt number, so we remove the comma between name and apt but keep the one after number which is the one to be converted to pipe.
I do a cat with several sed events to handle other things, do you think is best to do it there and will do it to the 1k plus rows I have? It used an awk for part of the script which I'm also not familiar.
Is my question the best solution or should I handle it before I even convert it to pipe? I think what the script does too is enclose in double quotes cases like "street name, apt #", so that way it can just remove the comma inside the quotes.
No luck with several tries and
cat <input> | sed 's/ | / /g' | tr , '|'
or:
cat <input> | sed 's/ | / /g;s/,/\|/g'
this is the script that does what i describe above for the commas i need to add the pipe handler when it comes as my example because otherwise it divides my string into two
Anyone want to help?
This should do:
echo "X, Y, This is a | test for me" | sed 's/ |//;s/, /|/g'
X|Y|This is a test for me
Based solely on the limited set of input data, some assumptions:
ignore the trailing or for the first line of the sample input since the or does not show up in the expected output, otherwise OP needs to provide details on the logic for when to remove trailing strings
input data does not contain any commas (,), ie, all commas are delimiters
output lines have a space separating each field from the | delimiter, which means the last field should have a trailing space before the final |, just like the 1st/2nd fields show a trailing space (in the expected output)
all input/output lines end with a delimiter (, or |)
all output lines begin with a | delimiter
all white space are actual spaces, ie, do not need to deal with tabs
NOTE: assuming question is updated with more details then some of these assumptions can be removed and the proposed code updated accordingly ...
Sample input data:
$ cat raw.csv
X, Y, This is a | test for me,
X, Y, This is a|test | for me | ,
One sed idea:
sed -E 's/[ ]*\|[ ]*/ /g; s/^[ ]*/\| /g; s/[ ]*,[ ]*$/ \|/g; s/[ ]*,[ ]*/ | /g' raw.csv
Where:
1st sub replaces variable # of spaces + | + variable number of spaces with a single space [remove unwanted | before adding | delimiters]
2nd sub replaces start of line + variable number of spaces with | (single trailing space)
3rd sub replaces variable number of spaces + , + variable number of spaces + end of line with a space | (single leading space)
4th sub replaces variable number of spaces + , + variable number of spaces with | (single leading/trailing spaces)
This generates:
| X | Y | This is a test for me |
| X | Y | This is a test for me |

How do I select a substring using a regexp in robot framework

In the Robot Framework library called String, there are several keywords that allow us to use a regexp to manipulate a string, but these manipulations don't seem to include selecting a substring from a string.
To clarify, what I intend is to have a price, i.e. € 1234,00 from which I would like to select only the 4 primary digits, meaning I am left with 1234 (which I will convert to an int for use in validation calculations). I have a regexp which will allow me to do that, which is as follows:
(\d+)[\.\,]
If I use Remove String Using Regexp with this regexp I will be left with exactly what I tried to remove. If I use Get Lines Matching Regexp, I will get the entire line rather than just the result I wanted, and if I use Get Regexp Matches I will get the right result except it will be in a list, which I will then have to manipulate again so that doesn't seem optimal.
Did I simply miss the keyword that will allow me to do this or am I forced to write my own custom keyword that will let me do this? I am slightly amazed that this functionality doesn't seem to be available, as this is the first use case I would think of when I think of using a regexp with a string...
You can use the Evaluate keyword to run some python code.
For example:
| Using 'Evaluate' to find a pattern in a string
| | ${string}= | set variable | € 1234,00
| | ${result}= | evaluate | re.search(r'\\d+', '''${string}''').group(0) | re
| | should be equal as strings | ${result} | 1234
Starting with robot framework 2.9 there is a keyword named Get regexp matches, which returns a list of all matches.
For example:
| Using 'Get regexp matches' to find a pattern in a string
| | ${string}= | set variable | € 1234,00
| | ${matches}= | get regexp matches | ${string} | \\d+
| | should be equal as strings | ${matches[0]} | 1234

How to match sub pattern in Robot Framework?

I am doing following things in RFW:
STEP 1 : I need to match the "NUM_FLOWS" value from the following command output.
STEP 2 : If its "Zero - 0" , Testcase should FAIL. If its NON-ZERO, Test case is PASS.
Sample command output:
router-7F2C13#show app stats gmail on TEST/switch1234-15E8CC
--------------------------------------------------------------------------------
APPLICATION BYTES_IN BYTES_OUT NUM_FLOWS
--------------------------------------------------------------------------------
gmail 0 0 4
--------------------------------------------------------------------------------
router-7F2C13#
How to do this with "Should Match Regexp" and "Should Match" keywords? How to check only that number sub-pattern? (Example: In the above command output, NUM_FLOWS is NON-ZERO, Then testcase should PASS.)
Please help me to achieve this.
Thanks in advance.
My New robot file content:
Write show dpi app stats BitTorrent_encrypted on AVC/ap7532-15E8CC
${raw_text} Read Until Regexp .*#
${data[0].num_flows} 0
| | ${data}= | parse output | ${raw_text}
| | Should not be equal as integers | ${data[0].num_flows} | 0
| | ... | Excepted num_flows to be non-zero but it was zero | values=False
There are many ways to solve this. A simple way is to use robot's regular expression keywords to look for "gmail" at the start of a line, and then expect three numbers and then the number 0 (zero) followed by the end of the line. This assumes that a) NUM_FLOWS is always the last column, and b) there is only one line that begins with "gmail". I don't know if those are valid assumptions or not.
Because the data spans multiple lines, the pattern includes (?m) (the multiline flag) so that $ means "end of line" in addition to "end of string".
| | Should not match regexp | ${data} | (?m)\\s+gmail\\s+\\d+\\s+\\d+\\s+0\\s*$
| | ... | Expected non-zero value in the fourth column for gmail, but it was zero.
There are plenty of other ways to solve the problem. For example, if you need to check for other values in other columns, you might want to write a python keyword that parses the data and returns some sort of data structure.
Here's a quick example. It's not bulletproof, and makes some assumptions about the data passed in. I wouldn't use it in production, but it illustrates the technique. The keyword returns a list of items, and each item is a custom object with four attributes: name, bytes_in, bytes_our and num_flows:
# python library
import re
def parse_output(data):
class Data(object):
def __init__(self, raw_text):
columns = re.split(r'\s*', raw_text.strip())
self.name = columns[0]
self.bytes_in = int(columns[1])
self.bytes_out = int(columns[2])
self.num_flows = int(columns[3])
lines = data.split("\n")
result = []
# skip first four lines and the last two
for line in lines[4:-3]:
result.append(Data(line))
return result
Using it in a test:
*** Test Cases ***
| | # <put your code here to get the data from the >
| | # <router and store it in ${raw_text} >
| | ${raw_text}= | ...
| | ${data}= | parse output | ${raw_text}
| | Should not be equal as integers | ${data[0].num_flows} | 0
| | ... | Excepted num_flows to be non-zero but it was zero | values=False

notepad++: keep regex (multi occurence per line) and line structure, remove other characters

I have a 130k line text file with patent information and I just want to keep the dates (regex "[0-9]{4}-[0-9]{2}-[0-9]{2} ") for subsequent work in Excel. For this purpose I need to keep the line structure intact (also blank lines). My main problem is that I can't seem to find a way to identify and keep multiple occurrences of date information in the same line while deleting all other information.
Original file structure:
US20110228428A1 | US | | 7 | 2010-03-19 | SEAGATE TECHNOLOGY LLC
US20120026629A1 | US | | 7 | 2010-07-28 | TDK CORP | US20120127612A1 | US | | EXAMINER | 2010-11-24 | | US20120147501A1 | US | | 2 | 2010-12-09 | SAE MAGNETICS HK LTD,HEADWAY TECHNOLOGIES INC
Desired file structure:
2010-03-19
2010-07-28 2010-11-24 2010-12-09
Thank you for your help!
Search for
.*?(?:([0-9]{4}-[0-9]{2}-[0-9]{2})|$)
And replace with
" $1"
Don't put the quotes, just to show there is a space before the $1. This will also put a space before the first match in a row.
This regex will match as less as possible .*? before it finds either the Date or the end of the row (the $). If a date is found it is stored in $1 because of the brackets around. So as replacement just put a space to separate the found dates and then the found date from $1.

How to replace CSV column separators with numbered labels in Vim?

I want to replace a series of pipeline characters with different values. How would I do this with regular expressions?
Example:
This | is | a | sentence
And | this | is | the | second | one
Final result:
This new is new2 a new3 sentence
And new this new2 is new3 the new4 second new5 one
If substitution values differ only in the numbers at the ends, use the command
:let n=[0] | %s/|/\='new'.map(n,'v:val+1')[0]/g
(See my answer to the question "gVim find/replace with counter" for
detailed description of the technique.)
In case of substitution values that differ essentially from each other, change
the command to substitute not a serial number of an occurrence, but an item of
a replacement list with that number as an index.
:let n=[-1] | %s/|/\=['one','two','three'][map(n,'v:val+1')[0]]/g
To perform the substitutions on every line independently of each other, use
the :global command to iterate one of the above commands through the lines
of a buffer.
:g/^/let n=[0] | s/|/\='new'.map(n,'v:val+1')[0]/g
Similarly,
:g/^/let n=[-1] | s/|/\=['one','two','three'][map(n,'v:val+1')[0]]/g
Define a function:
fun CountUp()
let ret = g:i
let g:i = g:i + 1
return ret
endf
Now, use:
:let i = 1 | %s/|/\="new" . CountUp()/g