J turns carriage return into newline - console-application

I'm trying to implement a progress bar for a command line application, e.g.
[##### ] 50% complete
I know I can just backspace to the start of the line and overwrite, but that seems so gross. I'd rather use the carriage return to put the cursor at the first column and then overwrite.
The problem is that the J engine appears to not render the carriage return character, instead rendering a newline+carriage return.
Here is what I have tried:
echo 'hi',(10{a.),'world' (where 10{a. is ASCII 10, i.e. carriage return) which prints
hi
world
echo 'hi',(13{a.),'world' (newline) which prints
hi
world
shell 'printf "%s\r%s" hi world' which prints
hi
world
shell 'printf "%s\n%s" hi world' which prints
hi
world
Finally, I tried all of the above in JHS instead of Jconsole, with identical results.
From this, three things are apparent:
The J front ends turn the carriage return into a carriage return + newline.
The J front end also processes carriage returns generated externally (for example by printf) into newlines.
J does recognize a newline by itself as shown in the last example.
Any help?

Ugly but works:
0$ stdout shell 'printf "99 problems\rno"'
no problems
UPDATE - 50% less ugly!
Nicer to avoid calling printf from the shell:
0$stdout 'hi world',(13{a.),'12'
12 world
UPDATE - 75% less ugly!
Thanks to a comment from #Eelvex
0$stdout 'hi world',CR,'12'
12 world

Related

Highlight line for specific commit in git log graph

I am trying to highlight the whole line for a specific commit in my git log graph. I have since before created a git log alias to format the output of my logs. I have attempted to highlight a specific line containing the commit-id, using my alias.
Alias in ~/.gitconfig
# Base command for log formatting
lg-base = "log --graph --decorate=short --decorate-refs-exclude='refs/tags/*' --color=always"
# Version 1 log format
lg1 = !"git lg-base --format=format:'%C(#f0890c)%h%C(reset) - %C(bold green)(%ar)%C(reset) %C(white)%s%C(reset) %C(dim white)- %an%C(reset)%C(#d10000)%d%C(reset)'"
Doing a test with searching for 6 months just because it should behave the same and might showcase my issue a bit better.
git lg1 | grep --color=always -E '(6 months).*|$'
Matches the correct lines. But it doesn't highlight the whole line to the right and when trying to highlight the left part of the line as well, it doesn't work as expected. Probably because of my lack in skills of using regex.
git lg1 | grep --color=always -E '.*(6 months).*|$'
Instead it marks the * in the beginning.
If you have a total other approach, that is fine with me as long as I can use my formatted git log alias.
Thomas' comment is the key to the issue here: although grep is adding its own color (or colour) changing escape sequences to highlight the line, Git has already put in color changing directives. Each such directive, for one of the named colors, looks like this:
ESC[numberm
where the number part is 30 through 37 for a foreground color and 40 through 47 for a background color (plus some extra codes for bold or dim, which I won't include here). (%C(reset) sends ESC [ m and your orange selector uses a 24-bit color directive, which is less widely supported than the eight base colors, which go back to the 1990s). Hence the original output reads:
* <sp> <orange> <hash-ID> <reset> <sp> - <sp> <blue> (n months ago) <reset> ...
The grep adds red, which is ESC [ 31 m, and a reset, around the matched expression—but the existing escapes within the expression remain.
The easiest way by far to avoid all this is to stop using color escape sequences at all, so that grep's added ones stick out like a sore red thumb. Of course that defeats your goal, which is to keep the color-changing escapes in lines that aren't highlighted. But you haven't explained what you'd like done with the color-changing escapes in lines, or parts of lines, that are highlighted. Answering that will determine what to do next.
There are any number of ways you could handle this. For instance, instead of %C(color)%<directive>%C(reset) you could use %x1b(name-of-color)%<directive>%x1b(reset) to insert the literal sequences ESC ( name of color or reset ), or assume that the terminal in question will use ANSI style escapes that end with the lowercase m character, and try to write something up in sed or awk (I'd use awk for something this complex, just because it's less like writing line noise) that does the match—awk supports regex matching—and if found, strips out the color sequences from the matched part and adds its own. Post-process this with something that inserts the appropriate terminal-dependent color-change sequences, or keep the original ESC [ ... m sequences on the assumption that you're in a window that uses that form, and you'll have the output you want (which you can now pipe through less -R if desired).
A skeleton awk program that does what you want is:
/<desired regex>/ { handle matched line; next; }
{ print }
The hard part is the "handle matched line". GNU awk has RSTART and RLENGTH to help out a lot; see, e.g., this answer. The substring of the line from the beginning to RSTART-1 wasn't matched (this may be empty), and the substring from RSTART+RLENGTH to the end of the line (which may also be empty) also was not matched; the substring of $0 at RSTART for length RLENGTH was matched and here's where you would strip out any color-changing sequences, if you want your basic red (or whatever) applied throughout.
Sample script (by Robin Hellmers)
Creating a script and placing it where you please, e.g.
~/.local/bin/highlight-commit.awk
with the contents
#!/usr/bin/nawk -f
BEGIN {
n = split(commits,arrayCommits," ");
background="145;0;0"
foreground="255;255;255"
}
{
# Compare with every given input e.g. commit id
for (i=1; i <= n; i++) {
if(match($0,arrayCommits[i])) {
# Remove any ANSI color escape sequence for matching row
gsub("\x1b\\[[0-9;]*m","",$0)
# Create ANSI color escape sequence for whole row
$0 = sprintf("\x1b[48;2;%sm\x1b[38;2;%sm%s\x1b[0m\x1b[0m",
background,
foreground,
$0);
break;
}
}
printf("%s\n", $0);
}
In ~/.gitconfig, add the following alias:
[alias]
highlight-commit = "!f() { git lg | awk -v commits=\"$*\" -f ~/.local/bin/highlight-commit.awk | less -XR; }; f"
By calling with e.g. two commits:
git highlight-commit 82451f8 310fca4

Regex Python concatenate lines if some text is found the line below

import re
output = open("teste-out.txt","w")
input = open("teste.txt")
for line in input:
output.write(re.sub(r"\n\r03110", r"|03110", line))
input.close()
output.close()
Why this code isn´t working, anyone can help me fix it? I wanna read from a txt and if the line starts with 03110 I wanna merge only this line with the previous line and add | before the merge
I´ve tried \n03110 \r03110 and other options, but none is working. In notepad++ I can do this using \R++03110 and replace with |03110 using regular expressions, but I wanna a python solution to optimize the job.
Input
01000|0107160
02000|1446
03100|01|316,00
03110|||316,00|0|0|7|
03100|29|135,00
03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
02000|4720,905|1967,05|0
03100|31|705,26
03100|32|6073,00
03110|||6073,00|0|0|0,00|8
99999|23
Output
01000|0107160
02000|1446
03100|01|316,00|03110|||316,00|0|0|7|
03100|29|135,00|03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
02000|4720,905|1967,05|0
03100|31|705,26
03100|32|6073,00|03110|||6073,00|0|0|0,00|8
99999|23
I´m using python at windows.
2nd EDIT: sorry - I guess I didn't read carefully enough...
Well, to merge lines with regards to the beginning of the second line is also possible, but perhaps not as beautifully clean:
with open('teste.txt') as fin, open('teste-out.txt', 'w') as fout:
fout.write(next(fin)[:-1])
for line in fin:
if line.startswith('03110'):
fout.write(f'|{line[:-1]}')
else:
fout.write(f'\n{line[:-1]}')
fout.write('\n')
EDIT: solution working with files:
with open('teste.txt') as fin, open('teste-out.txt', 'w') as fout:
for line in fin:
if line.startswith('03100'):
fout.write(line[:-1] + '|' + next(fin))
else:
fout.write(line)
Just for the case of interest - this is no re job imho:
s_in = '''01000|0107160
02000|1446
03100|01|316,00
03110|||316,00|0|0|7|
03100|29|135,00
03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489'''
from io import StringIO
with StringIO(s_in) as fin:
for line in fin:
if line.startswith('03100'):
print(line[:-1] + '|' + next(fin), end='')
else:
print(line, end='')
results in requested
01000|0107160
02000|1446
03100|01|316,00|03110|||316,00|0|0|7|
03100|29|135,00|03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
For those who like sed, this is a very short solution (not that efficient, though, as it reads all lines before printing anything):
< input_file sed '$!N;s/\n03110/03110/g'
The following sed script is a more efficient solution:
#!/usr/bin/sed -f
:h
N
s/\n03110/|03110/
t h
h
s/\n.*//
p
g
D
For the casual reader who really likes sed like I do, here's a short explanation:
the 4 lines from :h to t h are essentially a "do-while" loop in which we append a new line to the pattern space (N), and we keep doing so (t h is a "goto"), as long as the substitution command (s) is successful in changing the embedded newline \n to a |;
as soon as the s command is unsuccessful, we "save" the multiline pattern space copying it into the hold space (h), safely delete the \n and whatever is after it (s/\n.*//), and finally print the what remains (p), which is the lines that we've been successfully joining;
it's now time to get back the last line we appended which did not start by 03110: we get (g) the multiline back from the hold space, delete \n together with whatever precedes it and go to the top without printing (D).
we are back to the top of the script with a line which is not printed yet, just like we started.

Vim: How to delete repetition in a line

I am having a log file for analysis, in that few of the line will have repetition of it own, but not complete repetition, say
Alex is here and Alex is here and we went out
We bothWe both went out
I want to remove the first occurrence and get
Alex is here and we went out
We both went out
Please share a regex to do in Vim in Windows.
I don't recommend trying to use regex magic to solve this problem. Just write an external filter and use that.
Here's an external filter written in Python. You can use this to pre-process the log file, like so:
python prefix_chop.py logfile.txt > chopped.txt
But it also works by standard input:
cat logfile.txt | prefix_chop.py > chopped.txt
This means you can use it in vim with the ! command. Try these commands: goto line 1, then pipe from current line through the last line through the external program prefix_chop.py:
1G
!Gprefix_chop.py<Enter>
Or you can do it from ex mode:
:1,$!prefix_chop.py<Enter>
Here's the program:
#!/usr/bin/python
import sys
infile = sys.stdin if len(sys.argv) < 2 else open(sys.argv[1])
def repeated_prefix_chop(line):
"""
Check line for a repeated prefix string. If one is found,
return the line with that string removed, else return the
line unchanged.
"""
# Repeated string cannot be more than half of the line.
# So, start looking at mid-point of the line.
i = len(line) // 2 + 1
while True:
# Look for longest prefix that is found in the string after pos 0.
# The prefix starts at pos 0 and always matches itself, of course.
pos = line.rfind(line[:i])
if pos > 0:
return line[pos:]
i -= 1
# Stop testing before we hit a length-1 prefix, in case a line
# happens to start with a word like "oops" or a number like "77".
if i < 2:
return line
for line in infile:
sys.stdout.write(repeated_prefix_chop(line))
I put a #! comment on the first line, so this will work as a stand-alone program on Linux, Mac OS X, or on Windows if you are using Cygwin. If you are just using Windows without Cygwin, you might need to make a batch file to run this, or just type the whole command python prefix_chop.py. If you make a macro to run this you don't have to do the typing yourself.
EDIT: This program is pretty simple. Maybe it could be done in "vimscript" and run purely inside vim. But the external filter program can be used outside of vim... you can set things up so that the log file is run through the filter once per day every day, if you like.
Regex:\b(.*)\1\b
Replace with:\1 or $1
If you want to deal with more than two repeating sentences you can try this
\b(.+?\b)\1+\b
--
|->avoids matching individual characters in word like xxx
NOTE
Use \< and \> instead of \b
You could do it by matching as much as possible at the beginning of the line and then using a backreference to match the repeated bit.
For example, this command solves the problem you describe:
:%s/^\(.*\)\(\1.*\)/\2

Regular Expression over multiple lines

I'm stuck with this for several hours now and cycled through a wealth of different tools to get the job done. Without success. It would be fantastic, if someone could help me out with this.
Here is the problem:
I have a very large CSV file (400mb+) that is not formatted correctly. Right now it looks something like this:
This is a long abstract describing something. What follows is the tile for this sentence."
,Title1
This is another sentence that is running on one line. On the next line you can find the title.
,Title2
As you can probably see the titles ",Title1" and ",Title2" should actually be on the same line as the foregoing sentence. Then it would look something like this:
This is a long abstract describing something. What follows is the tile for this sentence.",Title1
This is another sentence that is running on one line. On the next line you can find the title.,Title2
Please note that the end of the sentence can contain quotes or not. In the end they should be replaced too.
Here is what I came up with so far:
sed -n '1h;1!H;${;g;s/\."?.*,//g;p;}' out.csv > out1.csv
This should actually get the job done of matching the expression over multiple lines. Unfortunately it doesn't :)
The expression is looking for the dot at the end of the sentence and the optional quotes plus a newline character that I'm trying to match with .*.
Help much appreciated. And it doesn't really matter what tool gets the job done (awk, perl, sed, tr, etc.).
Multiline in sed isn't necessarily tricky per se, it's just that it uses commands most people aren't familiar with and have certain side effects, like delimiting the current line from the next line with a '\n' when you use 'N' to append the next line to the pattern space.
Anyway, it's much easier if you match on a line that starts with a comma to decide whether or not to remove the newline, so that's what I did here:
sed 'N;/\n,/s/"\? *\n//;P;D' title_csv
Input
$ cat title_csv
don't touch this line
don't touch this line either
This is a long abstract describing something. What follows is the tile for this sentence."
,Title1
seriously, don't touch this line
This is another sentence that is running on one line. On the next line you can find the title.
,Title2
also, don't touch this line
Output
$ sed 'N;/\n,/s/"\? *\n//;P;D' title_csv
don't touch this line
don't touch this line either
This is a long abstract describing something. What follows is the tile for this sentence.,Title1
seriously, don't touch this line
This is another sentence that is running on one line. On the next line you can find the title.,Title2
also, don't touch this line
Yours works with a couple of small changes:
sed -n '1h;1!H;${;g;s/\."\?\n,//g;p;}' inputfile
The ? needs to be escaped and . doesn't match newlines.
Here's another way to do it which doesn't require using the hold space:
sed -n '${p;q};N;/\n,/{s/"\?\n//p;b};P;D' inputfile
Here is a commented version:
sed -n '
$ # for the last input line
{
p; # print
q # and quit
};
N; # otherwise, append the next line
/\n,/ # if it starts with a comma
{
s/"\?\n//p; # delete an optional comma and the newline and print the result
b # branch to the end to read the next line
};
P; # it doesn't start with a comma so print it
D # delete the first line of the pair (it's just been printed) and loop to the top
' inputfile

what does this sed commands does? please explain its bits and pieces

Please explain this sed command?
sed -n "s/[^>]*>/ /gp"
What is gp?
It looks for non-greater-than characters preceding a greater-than symbol, and changes all of them to a single space. Thus, it will convert this input (where I've used _ to indicate a space):
foo>_bar> b
x>>_a
to
___b
___a
As Mark notes, "g" means global, and "p" means "print the line".
g means global: i.e. replace all occurences, not just the first.
p means to print the modified line. Otherwise due to the -n switch it would not be printed.
The command finds all lines containing at least one > and prints some spaces followed by the text after the final >. The number of spaces printed is the number of > in the line.
For example if this line is in the input file:
123>456>789
Then this is printed:
789
I was typing up a long explanation, but Brian beat me to it. To clarify a tiny bit, the "p" prints the modified / matching line. The "-n" in your command line tells sed to "not print the file". Combined with the "p", it works kinda like grep, but within the scope of the script (ie, anything it changes/matches).