Swapping columns in vi with regex without using awk, read, etc - regex

I have a file of 1000 lines, with 5 to 8 columns in each line separated by :
1:2:3:4:5:6:7:8
4g10:8s:45:9u5b:a:z1
I want to have all lines in some order 4:3:1:2:5:6:7...
How would I swap only first 4 columns with regex?

I think this would probably be easier to do with another approach, but you could use ex to do it, so be in command mode and enter:
:%s/^\([^:]\+\):\([^:]\+\):\([^:]\+\):\([^:]\+\):/\4:\3:\1:\2:/
which will create capture groups for the first 4 colon delimited fields, then replace them in a different order than they were there originally.

Here is a regex that should do what you are looking for:
newtext = re.sub("([^:]+):([^:]+):([^:]+):([^:]+)(:)?(.*)?",r"\4:\3:\1:\2\5\6",text)
The take away is you'll want to use parans for capturing and then reorder them in the order you want them in the replace. Each capture "group" is just one or more non : separated by : If there is possibility of empty groups change each + to a *
Here is a sample in Python for clarity:
import re
textlist = [
"1:2:3:4:5:6:7:8",
"1:2:3:4:5",
"1:2:3:4",
]
for text in textlist:
newtext = re.sub("([^:]+):([^:]+):([^:]+):([^:]+)(:)?(.*)?",r"\4:\3:\1:\2\5\6",text)
print (newtext)
output:
4:3:1:2:5:6:7:8
4:3:1:2:5
4:3:1:2

Related

How can I separate a string by underscore (_) in google spreadsheets using regex?

I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)
=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("&REGEXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns
To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.
I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!

Have Tabulize ignore some lines and align the others

I would want Tabulize to ignore lines which do not have a particular character and then align/tabularize the lines ..
text1_temp = text_temp;
temporary_line;
text2 = text_temp;
In the end i would like the following :
text1_temp = text_temp;
temporary_line;
text2 = text_temp;
// The 2nd "=" is spaced/tabbed with relation to the first "="
If i run ":Tabularize /=" for the 3 lines together I get :
text1_temp = text_temp;
temporary_line;
text2 = text_temp;
Where the two lines with "=" are aligned with respect to the length of the middle line
Any suggestions .. ?
PS: I edited the post possibly to explain the need better ..
I am not sure how to do this with Tabular directly. You might be able to use Christian Brabandt's NrrwRgn plugin to filter out only lines with = using :NRP then running :NRM. This will give you a new buffer with only the lines with = so you can run :tabularize/=/ and then save the the buffer (:w, :x, etc).
:g/=/NRP
:NRM
:tabularize/=/
:x
The easiest option is probably to use vim-easy-align which supports such behavior out of the box it seems. Example of using EasyAlign (Using ga as EasyAlign's mapping you):
gaip=
What about a simple replace, like :g/=/s/\t/ /g ?
If that doesn't work, you can try this too: :g/=/s/ \+= \+/ = /g
Explanation:
The :/g/=/s will find all the lines that contain '=', and do the replacement for them.
So, s/\t/ /g will replace tabs with spaces. These two things combined will do what you need.

R: replacing special character in multiple columns of a data frame

I try to replace the german special character "ö" in a dataframe by "oe". The charcter occurs in multiple columns so I would like to be able to do this all in one by not having to specify individual columns.
Here is a small example of the data frame
data <- data.frame(a=c("aö","ab","ac"),b=c("bö","bb","ab"),c=c("öc","öb","acö"))
I tried :
data[data=="ö"]<-"oe"
but this did not work since I would need to work with regular expressions here. However when I try :
data[grepl("ö",data)]<-"oe"
I do not get what I want.
The dataframe at the end should look like:
> data
a b c
1 aoe boe oec
2 ab bb oeb
3 ac ab acoe
>
The file is a csv import that I import by read.csv. However, there seems to be no option to change to fix this with the import statement.
How do I get the desired outcome?
Here's one way to do it:
data <- apply(data,2,function(x) gsub("ö",'oe',x))
Explanation:
Your grepl doesn't work because grepl just returns a boolean matrix (TRUE/FALSE) corresponding to the elements in your data frame for which the regex matches. What the assignment then does is replace not just the character you want replaced but the entire string. To replace part of a string, you need sub (if you want to replace just once in each string) or gsub (if you want all occurrences replaces). To apply that to every column you loop over the columns using apply.
If you want to return a data frame, you can use:
data.frame(lapply(data, gsub, pattern = "ö", replacement = "oe"))

How to split CSV line according to specific pattern

In a .csv file I have lines like the following :
10,"nikhil,khandare","sachin","rahul",viru
I want to split line using comma (,). However I don't want to split words between double quotes (" "). If I split using comma I will get array with the following items:
10
nikhil
khandare
sachin
rahul
viru
But I don't want the items between double-quotes to be split by comma. My desired result is:
10
nikhil,khandare
sachin
rahul
viru
Please help me to sort this out.
The character used for separating fields should not be present in the fields themselves. If possible, replace , with ; for separating fields in the csv file, it'll make your life easier. But if you're stuck with using , as separator, you can split each line using this regular expression:
/((?:[^,"]|"[^"]*")+)/
For example, in Python:
import re
s = '10,"nikhil,khandare","sachin","rahul",viru'
re.split(r'((?:[^,"]|"[^"]*")+)', s)[1::2]
=> ['10', '"nikhil,khandare"', '"sachin"', '"rahul"', 'viru']
Now to get the exact result shown in the question, we only need to remove those extra " characters:
[e.strip('" ') for e in re.split(r'((?:[^,"]|"[^"]*")+)', s)[1::2]]
=> ['10', 'nikhil,khandare', 'sachin', 'rahul', 'viru']
If you really have such a simple structure always, you can use splitting with "," (yes, with quotes) after discarding first number and comma
If no, you can use a very simple form of state machine parsing your input from left to right. You will have two states: insides quotes and outside. Regular expressions is a also a good (and simpler) way if you already know them (as they are basically an equivalent of state machine, just in another form)

Search and replace in a range of line and column

I want to apply a search and replace regular expression pattern that work only in a given range of line and column on a text file like this :
AAABBBFFFFBBBAAABBB
AAABBBFFFFBBBAAABBB
GGGBBBFFFFBHHAAABBB
For example i want to replace BBB with YYY in line range 1 to 2 and from column 4 to 6, then obtaining this output :
AAAYYYFFFFBBBAAABBB
AAAYYYFFFFBBBAAABBB
GGGBBBFFFFBHHAAABBB
Is there a way to do it with Vim ?
:1,2 s/\%3cBBB/YYY/
\%3c means third column (see :help /\%c or more globally :help pattern)
If this is always the first one you want to replace, simply don't specify /g
:1,2s/BBB/YYY/
would work fine.
Alternatively, if you need to exactly specify which column you want replaced, you can use the \%Nv syntax, where N is the virtual column (column as it looks, so tabs are multiple columns, use c instead of v for actual columns)
Replacing the second set of B's on lines 1 and 2 could be done with:
:1,2s/\%11vBBB/YYY/