gdb print char* as string characters - gdb

char* buf;
...
(gdb) x/s buf
0x7fffef8f5f80: "35=DC\001\064\071=ABCD\001"
(gdb) x/12cb buf
0x7fffef8f5f80: 51 '3' 53 '5' 61 '=' 65 'D' 66 'C' 1 '\001' 52 '4' 57 '9'
0x7fffef8f5f88: 61 '=' 83 'A' 80 'B' 88 'C' 84 'D' 1 '\001'
Question> How can I enable gdb to print the buf as the following:
"35=DC\00149=ABCD\001"?
Thank you

Question> How can I enable gdb to print the buf as the following:
There is no way to do this right now. You could file a gdb bug report if you like.
What is going on here is that gdb's string-printing function has a special case to escape a digit when it follows a character that was emitted as an escape sequence. That is why you see \064 and not 4.

Related

How to get the cout buffer to flush on ubuntu

I recently submitted an assignment that I had started using VS code and the Ubuntu WSL terminal and G++ compiler, but had to switch to Visual Studio 2019 because when I would output several strings on the same line, they would write over each other. The assignment had me read data from a file and place each element into an "ArrayList" (our self made vector class) of "Car" objects and output each cars elements to the user. We would also have to search through the list of cars to find cars of certain models and print all cars of that model. Not only could I not cout all of these elements on the same line, I could not compare elements (strings) with each other. Why would this only happen on Ubuntu? Why can't I clear the cout buffer with std::cout.flush(); or std::cout << std::flush;? Why can't I compare the elements to each other?
I have tried to flush the system in numerous ways (as I have found from other posts) such as: std::cerr, std::cout << std::flush;. The only thing that seems to work is if I use std::endl, however, I need these to be placed on the same line. I also cannot compare two strings using the == operand (or any other). Although, I can compare two int elements just fine.
Here is my (shortened) cars.data (.data was a requirement for the assignment) that held all the cars' elements that were to be stored into an "ArrayList":
1
Tesla
Model 3
Black
2
Chevrolet
Volt
Grey
3
Tesla
Model S
White
4
Nissan
Leaf
White
5
Toyota
Prius
Red
My implementation for storing each element into the "ArrayList":
ArrayList cars_list(15);
std::fstream cars;
cars.open("cars.data");
int tempID;
std::string tempIDstr;
std::string tempMake;
std::string tempModel;
std::string tempColor;
if (cars.is_open())
{
for (int i = 0; !cars.eof(); ++i)
{
std::getline(cars, tempIDstr);
tempID = std::stoi( tempIDstr );
std::getline(cars, tempMake);
std::getline(cars, tempModel);
std::getline(cars, tempColor);
Car tempCar(tempID, tempMake, tempModel, tempColor);
std::cout.flush();
std::cout << tempIDstr << " ";
std::cout.flush();
std::cout << tempMake << " ";
std::cout.flush();
std::cout << tempModel << " ";
std::cout.flush();
std::cout << tempColor << " " << std::endl;
cars_list.push_back(tempCar);
}
}
cars.close();
And a function that I have used to compare strings to search the list:
void searchByMake(ArrayList list)
{
std::string make;
std::cout << "Enter the make you would like to search: ";
std::cin >> make;
std::cin.clear();
std::cin.ignore(10000,'\n');
// Searching through the cars_list for the Make
for (int i = 0; i < list.size(); ++i)
{
Car tempCar = list.get(i);
if (make.compare(tempCar.getMake()) == 0)
{
std::cout << "ID:\t" << tempCar.getID() << "\n"
<< "Make:\t" << tempCar.getMake() << "\n"
<< "Model:\t" << tempCar.getModel() << "\n"
<< "Color:\t" << tempCar.getColor() << "\n\n";
}
}
}
The results of the first segment of code are (I noticed the spaces before each output):
Black 3
Greyvrolet
White S
Whitean
Redusta
The expected output should look like:
1 Tesla Model 3 Black
2 Chevrolet Volt Grey
3 Tesla Model S White
4 Nissan Leaf White
5 Toyota Prius Red
And whenever I try to compare strings the output returns a blank line:
Enter the make you would like to search: Tesla
Expected output would be:
Enter the make you would like to search: Tesla
id: 1
Make: Tesla
Model: Model 3
Color: Black
id: 3
Make: Tesla
Model: Model S
Color: White
My teacher mentioned that the issue may be with Ubuntu itself not being able to clear the buffer even when prompted to, but I still can't find a solution. FYI This is a passed assignment that I can no longer get credit for, this question is strictly out of curiosity and a desire to still use Ubuntu WSL as my development terminal.
Do not control your read loop with !cars.eof() see: Why !.eof() inside a loop condition is always wrong.. The crux of the issue being after your last successful read with the file-position-indicator sitting immediately before end-of-file, no .eofbit() has been set on your stream. You then check !cars.eof() (which tests true) and you proceed.
You then call, e.g. std::getline 4-times never checking the return. The read fails on your very first std::getline(cars, tempIDstr); setting .eofbit(), but you have no way of detecting that, so you continue, invoking Undefined Behavior repeatedly attempting to read from a stream with .eofbit() set, and then using the indeterminate values in tempIDstr, etc.. as if they contained valid data.
Instead either loop continually checking the return of each input function used, or use the return of your read function as the condition in your read loop, for example you could do something similar to:
std::ifstream f(argv[1]); /* open file for reading */
...
while (getline (f,tmp.IDstr) && getline (f,tmp.Make) &&
getline (f,tmp.Model) && getline (f,tmp.Color))
cars_list[n++] = tmp;
Above, your loop only succeeds if ALL of your calls to getline succeed and only then, if all succeed, is your data used in your cars_list.
Now on to your classic carriage-return problem. It is clear the file you are reading from contains DOS line-ending. For example, if you look at the actual contents of your input file, you will see:
Example Input File with DOS "\r\n" Line-Endings
Note the DOS line-endings denoted 0d 0a (decimal 13 10) "\r\n":
$ hexdump -Cv dat/cars_dos.txt
00000000 31 0d 0a 54 65 73 6c 61 0d 0a 4d 6f 64 65 6c 20 |1..Tesla..Model |
00000010 33 0d 0a 42 6c 61 63 6b 0d 0a 32 0d 0a 43 68 65 |3..Black..2..Che|
00000020 76 72 6f 6c 65 74 0d 0a 56 6f 6c 74 0d 0a 47 72 |vrolet..Volt..Gr|
00000030 65 79 0d 0a 33 0d 0a 54 65 73 6c 61 0d 0a 4d 6f |ey..3..Tesla..Mo|
00000040 64 65 6c 20 53 0d 0a 57 68 69 74 65 0d 0a 34 0d |del S..White..4.|
00000050 0a 4e 69 73 73 61 6e 0d 0a 4c 65 61 66 0d 0a 57 |.Nissan..Leaf..W|
00000060 68 69 74 65 0d 0a 35 0d 0a 54 6f 79 6f 74 61 0d |hite..5..Toyota.|
00000070 0a 50 72 69 75 73 0d 0a 52 65 64 0d 0a |.Prius..Red..|
0000007d
Your file has DOS "\r\n" line-endings. How does getline() work? By default getline() read up to the first '\n' character, extracting the '\n' from the input stream, but not storing it as part of the string returned. This leaves an embedded '\r' at the end of each string you store. Why does this matter? the '\r' (carriage-return) does just what its namesake says. Acting like an old typewriter, the cursor position is reset to the beginning of the line. (explaining why you see your output being overwritten -- it is) You write text until a '\r' is encountered and then the cursor is positioned back at the beginning of the line, what is written next overwrites what you just output there.
Instead your file should be a file with Unix/POSIX line-endings:
Example Input File with Unix/POSIX '\n' Line-Endings
Note the Unix/POSIX line-endings are denoted as 0a (decimal 10) '\n':
$ hexdump -Cv dat/cars.txt
00000000 31 0a 54 65 73 6c 61 0a 4d 6f 64 65 6c 20 33 0a |1.Tesla.Model 3.|
00000010 42 6c 61 63 6b 0a 32 0a 43 68 65 76 72 6f 6c 65 |Black.2.Chevrole|
00000020 74 0a 56 6f 6c 74 0a 47 72 65 79 0a 33 0a 54 65 |t.Volt.Grey.3.Te|
00000030 73 6c 61 0a 4d 6f 64 65 6c 20 53 0a 57 68 69 74 |sla.Model S.Whit|
00000040 65 0a 34 0a 4e 69 73 73 61 6e 0a 4c 65 61 66 0a |e.4.Nissan.Leaf.|
00000050 57 68 69 74 65 0a 35 0a 54 6f 79 6f 74 61 0a 50 |White.5.Toyota.P|
00000060 72 69 75 73 0a 52 65 64 0a |rius.Red.|
00000069
To see the effect let's look at a short example that reads your input file, both with Unix/POSIX line-endings and again with DOS line-endings to see the difference in action. The short example could be:
#include <iostream>
#include <fstream>
struct ArrayList {
std::string IDstr, Make, Model, Color;
};
int main (int argc, char **argv) {
if (argc < 2) {
std::cerr << "error: insufficient input.\n" <<
"usage: " << argv[0] << " filename.\n";
return 1;
}
std::ifstream f(argv[1]);
ArrayList cars_list[15], tmp;
size_t n = 0;
while (getline (f,tmp.IDstr) && getline (f,tmp.Make) &&
getline (f,tmp.Model) && getline (f,tmp.Color))
cars_list[n++] = tmp;
for (size_t i = 0; i < n; i++)
std::cout << " " << cars_list[i].IDstr
<< " " << cars_list[i].Make
<< " " << cars_list[i].Model
<< " " << cars_list[i].Color << '\n';
}
Now let's look at the output if your file has Unix/POSIX line endings:
Example Use/Output
$ ./bin/arraylist_cars dat/cars.txt
1 Tesla Model 3 Black
2 Chevrolet Volt Grey
3 Tesla Model S White
4 Nissan Leaf White
5 Toyota Prius Red
Now lets look at the output after reading the file with DOS line ending:
$ ./bin/arraylist_cars dat/cars_dos.txt
Black 3
Greyrolet
White S
Whiten
Redusa
That looks curiously similar to the output you report. Hint, in WSL you should have a tool called dos2unix (which converts line-endings from DOS to Unix). Use it on your input file, e.g. dos2unix filename. Now re-run your program (after fixing your read loop) using the file as input and your problem should disappear.
(if you don't have dos2unix installed, then install it, e.g. sudo apt-get dos2unix)
Look things over and let me know if you have further questions.

What does this function ft_isalnum do?

I am reading a program which contains the following function, which is
int ft_isalnum(int c)
{
return ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
|| (c >= '0' && c <= '9'));
}
I don't quite understand what does this function intend to do?
As suggested by its name, the function checks if the given character is alphanumeric.
Assuming ASCII character encoding where A-Z and a-z are stored consecutively, it checks if the character is in either the 'A' to 'Z' range, the 'a' to 'z' range, or the '0' to '9' range and returns true if any of those conditions are satisfied.
Write a program to figure it out:
#include <stdio.h>
#include <ctype.h>
int ft_isalnum(int c)
{
return ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'));
}
int main(void)
{
for (int i = 0; i < 128; putchar(++i % 8 ? ' ' : '\n'))
printf("%3d '%c' %c ", i, isprint((char unsigned)i) ? i : '?', ft_isalnum(i) ? 'X' : ' ');
putchar('\n');
}
Output
0 '?' 1 '?' 2 '?' 3 '?' 4 '?' 5 '?' 6 '?' 7 '?'
8 '?' 9 '?' 10 '?' 11 '?' 12 '?' 13 '?' 14 '?' 15 '?'
16 '?' 17 '?' 18 '?' 19 '?' 20 '?' 21 '?' 22 '?' 23 '?'
24 '?' 25 '?' 26 '?' 27 '?' 28 '?' 29 '?' 30 '?' 31 '?'
32 ' ' 33 '!' 34 '"' 35 '#' 36 '$' 37 '%' 38 '&' 39 '''
40 '(' 41 ')' 42 '*' 43 '+' 44 ',' 45 '-' 46 '.' 47 '/'
48 '0' X 49 '1' X 50 '2' X 51 '3' X 52 '4' X 53 '5' X 54 '6' X 55 '7' X
56 '8' X 57 '9' X 58 ':' 59 ';' 60 '' 63 '?'
64 '#' 65 'A' X 66 'B' X 67 'C' X 68 'D' X 69 'E' X 70 'F' X 71 'G' X
72 'H' X 73 'I' X 74 'J' X 75 'K' X 76 'L' X 77 'M' X 78 'N' X 79 'O' X
80 'P' X 81 'Q' X 82 'R' X 83 'S' X 84 'T' X 85 'U' X 86 'V' X 87 'W' X
88 'X' X 89 'Y' X 90 'Z' X 91 '[' 92 '\' 93 ']' 94 '^' 95 '_'
96 '`' 97 'a' X 98 'b' X 99 'c' X 100 'd' X 101 'e' X 102 'f' X 103 'g' X
104 'h' X 105 'i' X 106 'j' X 107 'k' X 108 'l' X 109 'm' X 110 'n' X 111 'o' X
112 'p' X 113 'q' X 114 'r' X 115 's' X 116 't' X 117 'u' X 118 'v' X 119 'w' X
120 'x' X 121 'y' X 122 'z' X 123 '{' 124 '|' 125 '}' 126 '~' 127 '?'
The output indicates, on my machine, that characters 0 to 9 and letters A to Z and a to z return a 1 while everything else returns a 0.
Note
Not all characters are printable.
Thanks
To #Swordfish for making the output more attractive and readable.

Removing `^` from `s/^/1/;` causes my code to fail. Why?

I've been working on this problem over at the code golf exchange which is why my code looks so funny.
Here's a program with use strict and use warnings that recreates the problem:
use strict;
use warnings;
$_ = "";
for my $i (1..33){
s//1/; # Just prepends 1 to the string $_
}
print "$_\n";
for my $i (34..127) {
if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
print chr y/1/1/;
}
s/^/1/; # Prepends 1 to the start of the string.
}
Here is the output:
111111111111111111111111111111111
#$%&04689#ABDOPQRabdegopq
This works as I would expect. However, when I take ^ out of the second regex, the regex no longer matches and lengthens the string.
use strict;
use warnings;
$_ = "";
for my $i (1..33){
s//1/;
}
print "$_\n";
for my $i (34..127) {
if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
print chr y/1/1/;
}
s//1/; # No Longer matches!
}
Why does this happen? s//1/ works in the first loop, so why does changing it in the second one break everything?
For an additional point of confusion, if you put the if block in braces, the regex matches again:
for my $i (34..127) {
{
if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
print chr y/1/1/;
}
}
s//1/; # This prepends 1 to the string $_ again.
}
edit:
I wanted to edit my original code back into the question for reference:
use strict;
use warnings;
$_="";
until( y/1/1/ > 32){
print "test1";
s//1/;
print "test";
}
print "$_\n";
until( y/1/1/ > 125+1 ) {
if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
print chr y/1/1/;
}
s/^/1/; # this is the line we remove ^ from
}
When we remove ^ from the line, the output changes from:
test1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1test111111111111111111111111111111111
#$%&04689#ABDOPQRabdegopq
to
hanging with no output
So in this case, the line change in the second loop changes the behavior of the first one it seems.
s//1/; does not check for any or empty string. It checks against the last successful regex text before. So, the first loop uses default regex and the second one uses the last successful check from the if above.
Quote:
If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead. In this case, only the g
and c flags on the empty pattern are honored
Please, see The empty pattern //
To expand on VladimirM answer
print "regex have dynamic scope\n";
$_ = 1;
{
m/1/;
s//2/;
print "$_ one becomes two, s//2/ is really s/1/2/\n";
}
$_=1;
{
m/1/;
{
s//2/;
}
print "$_ one still becomes two, s//2/ is really s/1/2/\n";
}
$_=1;
{
{
m/1/;
}
s//2/;
print "$_ one becomes twentyone, s//2/; is really s/(?:)//2;\n";
}
__END__
regex have dynamic scope
2 one becomes two, s//2/ is really s/1/2/
2 one still becomes two, s//2/ is really s/1/2/
21 one becomes twentyone, s//2/; is really s/(?:)//2;
since regex have dynamic scope, using The empty pattern // really means using the previous pattern from same dynamic scope so don't do that :)
If you add use re 'debug'; you can see the regex engine use the previous pattern (focus on Matching REx statements, NOTHING(2) is empty without previous, EXACT <1>(3) is the previous pattern)
regex have dynamic scope
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "1" against "1"
0 <> <1> | 1:EXACT <1>(3)
1 <1> <> | 3:END(0)
Match successful!
2 one becomes two, s//2/ is really s/1/2/
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "1" against "1"
0 <> <1> | 1:EXACT <1>(3)
1 <1> <> | 3:END(0)
Match successful!
2 one still becomes two, s//2/ is really s/1/2/
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "" against "1"
0 <> <1> | 1:NOTHING(2)
0 <> <1> | 2:END(0)
Match successful!
21 one becomes twentyone, s//2/; is really s/(?:)//2;
update: because you have an infinite loop; last pattern always has 1 in it, so the substitution is essentially s/1/1/; which means your string doesn't grow, its always 33 chars ... see update :)
$_="";
until( y/1/1/ > 32){
print "test1";
s//1/;
print "test";
}
print "$_\n";
my $max = 126;
my $count = 0;
my $reps = 0;
until( y/1/1/ > 125+1 ) {
if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
print chr y/1/1/;
}
$reps =
#~ s/^/1/; # win
s//1/; # fail
$count++;
last if $count > $max;
}
print "m $max c $count r $reps l #{[ length $_ ]}\n";
__END__
win #$%&04689#ABDOPQRabdegopqm 126 c 94 r 1 l 127
fail m 126 c 127 r 1 l 33
Unless you're obfuscating append is $_ .= 1; and prepend is $_ = 1 . $_;
To expand a second time on VladimirM's answer that the empty pattern // is the problem, the following is from perldoc:
The empty pattern //
If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. In this case, only the g and c flags on the empty pattern are honored; the other flags are taken from the original pattern. If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match).
Basically, if there is another regex within the same scope that matched, then the LHS of the regex with the empty pattern will actually be the LHS of the previous regex.
In the below example inspired by the OP, I expand the string using the ones digit of the incrementer instead. However, once the other regex matches chr(33) which is a exclamation point, the LHS of the empty regex will change. It will then start matching the digits 12357 and replacing them with our ones place of the incrementer. Therefore the string will stay the same length from then on.
use strict;
use warnings;
$_ = "";
for my $i (1..127) {
my $chr = chr(length);
if( $chr =~ m'(?![#$%&])[[:punct:]12357CE-NS-Zcfh-nr-z]' ) {
print "'$chr'";
} else {
print " ";
}
s//$i % 10/e;
printf "% 4d %s\n", $i, $_;
}
The following output clearly demonstrates this:
1 1
2 21
3 321
4 4321
5 54321
6 654321
7 7654321
8 87654321
9 987654321
10 0987654321
11 10987654321
12 210987654321
13 3210987654321
14 43210987654321
15 543210987654321
16 6543210987654321
17 76543210987654321
18 876543210987654321
19 9876543210987654321
20 09876543210987654321
21 109876543210987654321
22 2109876543210987654321
23 32109876543210987654321
24 432109876543210987654321
25 5432109876543210987654321
26 65432109876543210987654321
27 765432109876543210987654321
28 8765432109876543210987654321
29 98765432109876543210987654321
30 098765432109876543210987654321
31 1098765432109876543210987654321
32 21098765432109876543210987654321
33 321098765432109876543210987654321
'!' 34 421098765432109876543210987654321
'!' 35 451098765432109876543210987654321
'!' 36 461098765432109876543210987654321
'!' 37 467098765432109876543210987654321
'!' 38 468098765432109876543210987654321
'!' 39 468098965432109876543210987654321
'!' 40 468098960432109876543210987654321
'!' 41 468098960412109876543210987654321
'!' 42 468098960422109876543210987654321
'!' 43 468098960432109876543210987654321
'!' 44 468098960442109876543210987654321
'!' 45 468098960445109876543210987654321
'!' 46 468098960446109876543210987654321
'!' 47 468098960446709876543210987654321
'!' 48 468098960446809876543210987654321
'!' 49 468098960446809896543210987654321
'!' 50 468098960446809896043210987654321
'!' 51 468098960446809896041210987654321
'!' 52 468098960446809896042210987654321
'!' 53 468098960446809896043210987654321
'!' 54 468098960446809896044210987654321
'!' 55 468098960446809896044510987654321
'!' 56 468098960446809896044610987654321
'!' 57 468098960446809896044670987654321
'!' 58 468098960446809896044680987654321
'!' 59 468098960446809896044680989654321
'!' 60 468098960446809896044680989604321
'!' 61 468098960446809896044680989604121
'!' 62 468098960446809896044680989604221
'!' 63 468098960446809896044680989604321
'!' 64 468098960446809896044680989604421
'!' 65 468098960446809896044680989604451
'!' 66 468098960446809896044680989604461
'!' 67 468098960446809896044680989604467
'!' 68 468098960446809896044680989604468
'!' 69 468098960446809896044680989604468
'!' 70 468098960446809896044680989604468
'!' 71 468098960446809896044680989604468
'!' 72 468098960446809896044680989604468
'!' 73 468098960446809896044680989604468
'!' 74 468098960446809896044680989604468
'!' 75 468098960446809896044680989604468
'!' 76 468098960446809896044680989604468
'!' 77 468098960446809896044680989604468
'!' 78 468098960446809896044680989604468
'!' 79 468098960446809896044680989604468
'!' 80 468098960446809896044680989604468
'!' 81 468098960446809896044680989604468
'!' 82 468098960446809896044680989604468
'!' 83 468098960446809896044680989604468
'!' 84 468098960446809896044680989604468
'!' 85 468098960446809896044680989604468
'!' 86 468098960446809896044680989604468
'!' 87 468098960446809896044680989604468
'!' 88 468098960446809896044680989604468
'!' 89 468098960446809896044680989604468
'!' 90 468098960446809896044680989604468
'!' 91 468098960446809896044680989604468
'!' 92 468098960446809896044680989604468
'!' 93 468098960446809896044680989604468
'!' 94 468098960446809896044680989604468
'!' 95 468098960446809896044680989604468
'!' 96 468098960446809896044680989604468
'!' 97 468098960446809896044680989604468
'!' 98 468098960446809896044680989604468
'!' 99 468098960446809896044680989604468
'!' 100 468098960446809896044680989604468
'!' 101 468098960446809896044680989604468
'!' 102 468098960446809896044680989604468
'!' 103 468098960446809896044680989604468
'!' 104 468098960446809896044680989604468
'!' 105 468098960446809896044680989604468
'!' 106 468098960446809896044680989604468
'!' 107 468098960446809896044680989604468
'!' 108 468098960446809896044680989604468
'!' 109 468098960446809896044680989604468
'!' 110 468098960446809896044680989604468
'!' 111 468098960446809896044680989604468
'!' 112 468098960446809896044680989604468
'!' 113 468098960446809896044680989604468
'!' 114 468098960446809896044680989604468
'!' 115 468098960446809896044680989604468
'!' 116 468098960446809896044680989604468
'!' 117 468098960446809896044680989604468
'!' 118 468098960446809896044680989604468
'!' 119 468098960446809896044680989604468
'!' 120 468098960446809896044680989604468
'!' 121 468098960446809896044680989604468
'!' 122 468098960446809896044680989604468
'!' 123 468098960446809896044680989604468
'!' 124 468098960446809896044680989604468
'!' 125 468098960446809896044680989604468
'!' 126 468098960446809896044680989604468
'!' 127 468098960446809896044680989604468

Consecutively regex-replace separated values

Reading a raster grid file into #grid containing arbitrary numbers, like
82 8 98 98 42 12 3342 321 34 34 09434 9232
(and many more of those rows).
Herein, I do like to replace some numbers, like 34 with 42.
But only single, separated numbers! Eg. I do not want to replace the 34 in 3342.
So for numbers $a (search,eg 34) and $b (replace, eg 42), my approach is
s/(^|\s)$a(\s|$)/$1$b$2/g for #grid;
But this only replaces every second of consecutive occurrences (like 34 34 34 34=>42 34 42 34), because the suffix \s is not taken into account as prefix of the next pattern.
Is there any solution for this problem, other than putting two of those commands back-to-back (which is slow for large arrays)?
You're looking for \b : the boundary between a word char (\w) and something that is not a word char
s/\b$a\b/$b/g
Live DEMO
You can set up a hash that contains your replacement pairs, and then capture each number on a line and do the replacement if that number's a hash key:
use strict;
use warnings;
my %replacements = ( 34 => 42, 8 => 100 );
while (<DATA>) {
s/(\d+)/exists $replacements{$1} ? $replacements{$1} : $1/ge;
print;
}
__DATA__
82 8 98 98 42 12 3342 321 34 34 09434 9232
97 8 8 8 27 37 34 55 19 100 8 34 07932 8
Output:
82 100 98 98 42 12 3342 321 42 42 09434 9232
97 100 100 100 27 37 42 55 19 100 100 42 07932 100
Hope this helps!

What does the symbol \0 mean in a string-literal?

Consider following code:
char str[] = "Hello\0";
What is the length of str array, and with how much 0s it is ending?
sizeof str is 7 - five bytes for the "Hello" text, plus the explicit NUL terminator, plus the implicit NUL terminator.
strlen(str) is 5 - the five "Hello" bytes only.
The key here is that the implicit nul terminator is always added - even if the string literal just happens to end with \0. Of course, strlen just stops at the first \0 - it can't tell the difference.
There is one exception to the implicit NUL terminator rule - if you explicitly specify the array size, the string will be truncated to fit:
char str[6] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 6 (with one NUL)
char str[7] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 7 (with two NULs)
char str[8] = "Hello\0"; // strlen(str) = 5, sizeof(str) = 8 (with three NULs per C99 6.7.8.21)
This is, however, rarely useful, and prone to miscalculating the string length and ending up with an unterminated string. It is also forbidden in C++.
The length of the array is 7, the NUL character \0 still counts as a character and the string is still terminated with an implicit \0
See this link to see a working example
Note that had you declared str as char str[6]= "Hello\0"; the length would be 6 because the implicit NUL is only added if it can fit (which it can't in this example.)
ยง 6.7.8/p14 An array of
character type may be initialized by a
character string literal, optionally
enclosed in braces. Sucessive
characters of the character string
literal (including the terminating
null character if there is room or if
the array is of unknown size)
initialize the elements of the array.
Examples
char str[] = "Hello\0"; /* sizeof == 7, Explicit + Implicit NUL */
char str[5]= "Hello\0"; /* sizeof == 5, str is "Hello" with no NUL (no longer a C-string, just an array of char). This may trigger compiler warning */
char str[6]= "Hello\0"; /* sizeof == 6, Explicit NUL only */
char str[7]= "Hello\0"; /* sizeof == 7, Explicit + Implicit NUL */
char str[8]= "Hello\0"; /* sizeof == 8, Explicit + two Implicit NUL */
Specifically, I want to mention one situation, by which you may confuse.
What is the difference between "\0" and ""?
The answer is that "\0" represents in array is {0 0} and "" is {0}.
Because "\0" is still a string literal and it will also add "\0" at the end of it. And "" is empty but also add "\0".
Understanding of this will help you understand "\0" deeply.
Banging my usual drum solo of JUST TRY IT, here's how you can answer questions like that in the future:
$ cat junk.c
#include <stdio.h>
char* string = "Hello\0";
int main(int argv, char** argc)
{
printf("-->%s<--\n", string);
}
$ gcc -S junk.c
$ cat junk.s
... eliding the unnecessary parts ...
.LC0:
.string "Hello"
.string ""
...
.LC1:
.string "-->%s<--\n"
...
Note here how the string I used for printf is just "-->%s<---\n" while the global string is in two parts: "Hello" and "". The GNU assembler also terminates strings with an implicit NUL character, so the fact that the first string (.LC0) is in those two parts indicates that there are two NULs. The string is thus 7 bytes long. Generally if you really want to know what your compiler is doing with a certain hunk of code, isolate it in a dummy example like this and see what it's doing using -S (for GNU -- MSVC has a flag too for assembler output but I don't know it off-hand). You'll learn a lot about how your code works (or fails to work as the case may be) and you'll get an answer quickly that is 100% guaranteed to match the tools and environment you're working in.
What is the length of str array, and with how much 0s it is ending?
Let's find out:
int main() {
char str[] = "Hello\0";
int length = sizeof str / sizeof str[0];
// "sizeof array" is the bytes for the whole array (must use a real array, not
// a pointer), divide by "sizeof array[0]" (sometimes sizeof *array is used)
// to get the number of items in the array
printf("array length: %d\n", length);
printf("last 3 bytes: %02x %02x %02x\n",
str[length - 3], str[length - 2], str[length - 1]);
return 0;
}
char str[]= "Hello\0";
That would be 7 bytes.
In memory it'd be:
48 65 6C 6C 6F 00 00
H e l l o \0 \0
Edit:
What does the \0 symbol mean in a C string?
It's the "end" of a string. A null character. In memory, it's actually a Zero. Usually functions that handle char arrays look for this character, as this is the end of the message. I'll put an example at the end.
What is the length of str array? (Answered before the edit part)
7
and with how much 0s it is ending?
You array has two "spaces" with zero; str[5]=str[6]='\0'=0
Extra example:
Let's assume you have a function that prints the content of that text array.
You could define it as:
char str[40];
Now, you could change the content of that array (I won't get into details on how to), so that it contains the message: "This is just a printing test"
In memory, you should have something like:
54 68 69 73 20 69 73 20 6a 75 73 74 20 61 20 70 72 69 6e 74
69 6e 67 20 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00
So you print that char array. And then you want a new message. Let's say just "Hello"
48 65 6c 6c 6f 00 73 20 6a 75 73 74 20 61 20 70 72 69 6e 74
69 6e 67 20 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00
Notice the 00 on str[5]. That's how the print function will know how much it actually needs to send, despite the actual longitude of the vector and the whole content.
'\0' is referred to as NULL character or NULL terminator
It is the character equivalent of integer 0(zero) as it refers to nothing
In C language it is generally used to mark an end of a string.
example string a="Arsenic";
every character stored in an array
a[0]=A
a[1]=r
a[2]=s
a[3]=e
a[4]=n
a[5]=i
a[6]=c
end of the array contains ''\0' to stop the array memory allocation for the string 'a'.