Easily aligning characters after whitespace in vim - regex

I would like to create a mapped vim command that helps me align assignments for variables across multiple lines. Imagine I have the following text in a file:
foo = 1;
barbar = 2;
asdfasd = 3;
jjkjfh = 4;
baz = 5;
If I select multiple lines and use the regex below, noting that column 10 is in the whitespace for all lines, stray whitespace after column 10 will be deleted up to the equals sign.
:'<,'>s/^\(.\{10}\)\s*\(=.*\)$/\1\2/g
Here's the result:
foo = 1;
barbar = 2;
asdfasd = 3;
jjkjfh = 4;
baz = 5;
Is there a way to get the current cursor position (specifically the column position) while performing a visual block selection and use that column in the regular expression?
Alternatively, if it is possible to find the max column for any of the equals signs on the selected lines and insert whitespace so all equals signs are aligned by column, that is preferred to solving the previous problem. Imagine quickly converting:
foo = 1;
barbar = 2;
asdfasd = 3;
jjkjfh = 4;
baz = 5;
to:
foo = 1;
barbar = 2;
asdfasd = 3;
jjkjfh = 4;
baz = 5;
with a block selection and a key-combo.

Without plugins
In this case
foo = 1
fizzbuzz = 2
bar = 3
You can add many spaces with a macro:
0f=10iSPACEESCj
where 10 is an arbitrary number just to add enough space.
Apply the macro M times (for M lines) and get
foo = 1
fizzbuzz = 2
bar = 3
Then remove excessive spaces with a macro that removes all characters till some column N:
0f=d12|j
where 12 is the column number you want to align along and | is a vertical bar (SHIFT + \). Together 12| is a "go to column 12" command.
Repeat for each line and get
foo = 1
fizzbuzz = 2
bar = 3
You can combine the two macros into one:
0f=10iSPACEESCd11|j

Not completely satisfied with Tabular and Align, I've recently built another similar, but simpler plugin called vim-easy-align.
Check out the demo screencast: https://vimeo.com/63506219
For the first case, simply visual-select the lines and enter the command :EasyAlign= to do the trick.
If you have defined a mapping such as,
vnoremap <silent> <Enter> :EasyAlign<cr>
you can do the same with just two keystrokes: Enter and =
The case you mentioned in the comment,
final int foo = 3;
public boolean bar = false;
can be easily aligned using ":EasyAlign*\ " command, or with the aforementioned mapping, Enter, *, and space key, yielding
final int foo = 3;
public boolean bar = false;

There are two plugins for that: Either the older Align - Help folks to align text, eqns, declarations, tables, etc, or Tabular.

Related

How to optimize the python code written using regex and for loops?

I have two lists and I need to perform a string match. I have used three for loops and re.pattern to solve. I am getting the expected using existing code (part1), but I need to optimized the code (part2) as it takes a longer time when I apply for lengthy data.
part1
texts = ['foo abc', 'foobar xyz', 'xyz baz32', 'baz 45','fooz','bazzar','foo baz']
terms = ['foo','baz','apple']
output_list = []
for term in terms:
pattern_term = r'\b(?:{})\b'.format(term)
try:
for i in range(len(texts)):
line_text = texts[i]
for match in re.finditer(pattern_term, line_text):
start_index = match.start()
output_list.append([i, start_index, line_text[start_index:], term])
except:
pass
output:
Explaination fo columns names :
Index = index of texts when pattern matches
Start_index = start index where pattern matches inside text
Match_text = complete text of that matching
Match_term = term with it matches
pd.DataFrame(output_list, columns = ['Index', 'Start_index', 'Match_text', 'Match_term'])
Index Start_index Match_text Match_term
0 0 0 foo abc foo
1 6 0 foo baz foo
2 3 0 baz 45 baz
3 6 4 baz baz
I have tried the following code (part2), but its output is partial:
part 2
df = pd.DataFrame({'Match_text': texts})
pat = r'\b(?:{})\b'.format('|'.join(terms))
df[df['Match_text'].str.contains(pat)]
output
Match_text
0 foo abc
3 baz 45
6 foo baz
Your code is already good since you need to find occurrences of whole words inside longer strings, and you create the regex pattern before the loop where the texts are processed with the regex.
The regex already is good, the only thing about it is the redundant non-capturing group that you may discard because you check term by term, there is no alternation inside the group. You might also compile the regex:
pattern_term = re.compile(r'\b{}\b'.format(term))
Then, you may get rid of temporary variables in the for loop:
for i in range(len(texts)):
for match in pattern_term.finditer(texts[i]):
output_list.append([i, match.start(), texts[i][match.start():], term])

How to split 1 long paragraph to 2 shorter paragraphs? Google Document

I want paragraphs to be up to 3 sentences only.
For that, my strategy is to loop on all paragraphs and find the 3rd sentence ending (see note). And then, to add a "\r" char after it.
This is the code I have:
for (var i = 1; i < paragraphs.length; i++) {
...
sentEnds = paragraphs[i].getText().match(/[a-zA-Z0-9_\u0590-\u05fe][.?!](\s|$)|[.?!][.?!](\s|$)/g);
//this array is used to count sentences in Hebrew/English/digits that end with 1 or more of either ".","?" or "!"
...
if ((sentEnds != null) && (sentEnds.length > 3)) {
lineBreakAnchor = paragraphs[i].getText().match(/.{10}[.?!](\s)/g);
paragraphs[i].replaceText(lineBreakAnchor[2],lineBreakAnchor[2] + "\r");
}
}
This works fine for round 1. But if I run the code again- the text after the inserted "\r" char is not recognized as a new paragraph. Hence, more "\r" (new lines) will be inserted each time the script is running.
How can I make the script "understand" that "\r" means new, separate paragraph?
OR
Is there another character/approach that will do the trick?
Thank you.
Note: I use the last 10 characters of the sentence assuming the match will be unique enough to make only 1 replacement.
Without modifying your own regex expression you can achieve this.
Try this approach to split the paragraphs:
Grab the whole content of the document and create an array of sentences.
Insert paragraphs with up to 3 sentences after original paragraphs.
Remove original paragraphs from hell.
function sentenceMe() {
var doc = DocumentApp.getActiveDocument();
var paragraphs = doc.getBody().getParagraphs();
var sentences = [];
// Split paragraphs into sentences
for (var i = 0; i < paragraphs.length; i++) {
var parText = paragraphs[i].getText();
//Count sentences in Hebrew/English/digits that end with 1 or more of either ".","?" or "!"
var sentEnds = parText.match(/[a-zA-Z0-9_\u0590-\u05fe][.?!](\s|$)|[.?!][.?!](\s|$)/g);
if (sentEnds){
for (var j=0; j< sentEnds.length; j++){
var initIdx = 0;
var sentence = parText.substring(initIdx,parText.indexOf(sentEnds[j])+3);
var parInitIdx = initIdx;
initIdx = parText.indexOf(sentEnds[j])+3;
parText = parText.substring(initIdx - parInitIdx);
sentences.push(sentence);
}
}
// console.log(sentences);
}
inThrees(doc, paragraphs, sentences)
}
function inThrees(doc, paragraphs, sentences) {
// define offset
var offset = paragraphs.length;
// Create paragraphs with up to 3 sentences
var k=0;
do {
var parText = sentences.splice(0,3).join(' ');
doc.getBody().insertParagraph(k + offset , parText.concat('\n'));
k++
}
while (sentences.length > 0)
// Remove paragraphs from hell
for (var i = 0; i < offset; i++){
doc.getBody().removeChild(paragraphs[i]);
}
}
In case you are wondering about the custom menu, here is it:
function onOpen() {
var ui = DocumentApp.getUi();
ui.createMenu('Custom Menu')
.addItem("3's the magic number", 'sentenceMe')
.addToUi();
}
References:
DocumentApp.Body.insertParagraph
Actually the detection of sentences is not an easy task.
A sentence does not always end with a dot, a question mark or an exclamation mark. If the sentence ends with a quote then punctuation rules in some countries force you to put the end of the sentence mark inside the quote:
John asked: "Who's there?"
Not every dot means an end of a sentence, usually the dot after an uppercase letter does not end the sentence, because it occurs after an initial. The sentence does not end after J. here:
The latest Star Wars movie has been directed by J.J. Abrams.
However, sometimes the sentence does end after a capital letter followed by a dot:
This project has been sponsored by NASA.
And abbreviations can make it very hard:
For more information check the article in Phys. Rev. Letters 66, 2697, 2013.
Having in mind these difficulties let's still try to get some expression which will work in "usual" cases.
Make a global match and substitution. Match
((?:[^.?!]+[.?!] +){3})
and substitute it with
\1\r
Demo
This looks for 3 sentences (a sentence is a sequence of not-dot, not-?, not-! characters followed by a dot, a ? or a ! and some spaces) and puts a \r after them.
UPDATED 2020-03-04
Try this:
var regex = new RegExp('((?:[a-zA-Z0-9_\\u0590-\\u05fe\\s]+[.?!]+\\s+){3})', 'gi');
for (var i = 1; i < paragraphs.length; i++) {
paragraphs[i].replaceText(regex, '$1\\r');
}

How extract (changeable variable) word & number using regular expression matlab

I have more than 10k text files look similar like this, all of them are similar in format but not in size, sometime is bigger or smaller.
[{u'language': u'english', u'area': 3825.8953168044045, u'class': u'machine printed', u'utf8_string': u'troia', u'image_id': 428035, u'box': [426.42422762784093, 225.33333055900806, 75.15151515151516, 50.909090909090864], u'legibility': u'legible', u'id': 1056659}, {u'language': u'na', u'area': 24201.285583103767, u'id': 1056660, u'image_id': 428035, u'box': [223.99998520359847, 249.57575480143228, 172.12121212121215, 140.6060606060606], u'legibility': u'illegible', u'class': u'machine printed'}]
I want to extract two changeable variable in every text using regular expression.
The output should be like this
box = [223.99998520359847, 249.57575480143228, 172.12121212121215, 140.6060606060606]
box1 = .. sometime there is more than one
&
second output
word = troia
word1 = ... sometime there is more than one word
My code 1: for the word extraction
fid = fopen('text1.txt','r');
C = textscan(fid, '%s','Delimiter','');
fclose(fid);
C = C{:};
Lia = ~cellfun(#isempty, strfind(C,'utf8_string'));
output = [C{find(Lia)}];
expression = 'u''utf8_string'': u+'
matchStr = regexp(output, expression,'match');
My code 1 result give me only the
utf8_string
My code 2: for the box number extraction
s = sprintf('text_.txt');
fid = fopen(s);
tline = fgetl(fid);
C = regexp(tline,'u''box'': +\[([0-9\. ,]+)\]','tokens');
C = cellfun(#(x) x{1},C,'UniformOutput',false)';
M = cell2mat(cellfun(#(x) x', cat(1,C2{:}),'UniformOutput',false));
This code 2 is running but not with every text something i got this error
Error using cat Dimensions of matrices being concatenated are not consistent
If you do not insist on regexp: The input strings looks like json, so the following short code does even more than you want:
% Read the whole file
s = fileread('test.txt');
% Remove the odd u'
s = strrep(s, 'u''', '''');
% Replace ' by "
s = strrep(s, '''', '"');
% See http://www.mathworks.com/matlabcentral/fileexchange/20565
t = parse_json(s);
Now t a is cell object containing structs with the data. So
word = t{1}.utf8_string;
box = cell2mat(t{1}.box);
will give you the first word and box. If you have a newer Matlab version you can probably use jsondecode instead of parse_json.

Regex pattern in Word 2013

I have a word document which contains 6 series of numbers (plain text, not numbered style) as following:
1) blah blah blah
2) again blah blah blah
.
.
.
20) something
And this pattern has been repeated six times. How can I used Regex and serialise all numbers before parentheses so that they start with 1 and end up with 120?
You can use VBA - add this to the ThisDocument module:
Public Sub FixNumbers()
Dim p As Paragraph
Dim i As Long
Dim realCount As Long
realCount = 1
Set p = Application.ActiveDocument.Paragraphs.First
'Iterate through paragraphs with Paragraph.Next - using For Each doesn't work and I wouldn't trust indexing since we're making changes
Do While Not p Is Nothing
digitCount = 0
For i = 1 To Len(p.Range.Text)
'Keep track of how many characters are in the number
If IsNumeric(Mid(p.Range.Text, i, 1)) Then
digitCount = digitCount + 1
Else
'We check the first non-number character we find to see if it is the list delimiter ")" and we make sure that there were some digits before it
If Mid(p.Range.Text, i, 1) = ")" And digitCount > 0 Then
'If so, we get rid of the original number and put the correct one
p.Range.Text = realCount & Right(p.Range.Text, Len(p.Range.Text) - digitCount) 'It's important to note that a side effect of assigning the text is that p is set to p.Next
'realCount holds the current "real" line number - everytime we assign a line, we increment it
realCount = realCount + 1
Exit For
Else
'If not, we skip the line assuming it's not part of the list numbering
Set p = p.Next
Exit For
End If
End If
Next
Loop
End Sub
You can run it by clicking anywhere inside of the code and clicking the "play" button in the VBA IDE.

vim - code folding by expression

I have some sourcecode with curly brackets code blocks
I want to be able to fold the blocks having some if condition in front, and leave the other code blocks unfolded.
example input:
print "this is a test"
if a == b {
{ x = 1
y = 2
z = 3
}
k = [1, 2, 3]
}
{ l = 5 }
return "foo"
expected output:
print "this is a test"
if a == b {
+-- 6 lines:
}
{ l = 5 }
return "foo"
I've read this and this, but still no idea how to face the problem.
Any suggestions ?
Assuming that the if closing '}' brace is at the beginning of a line, you can use:
:g/if.*{/+,/^}/-fold
This folds the statements within the {} braces of the if, excluding the braces themselves.
This is achieved through the + and - movements put after the patterns that define the g range (there's a coma between the patterns): + moves down the range by one line from the first matched pattern (/if.*{/) and the - moves the range one line up from the second matched pattern (/^}/)
If you have indented closing '}' braces or for any circumstance where the above command does not apply, you can try to look for other patterns that you can exploit and change the ex command above as needed.