Regex to match C integer literals

Regex to match C integer literals - regex

I would like to use egrep/grep -E to print out the lines in C source files that contain integer literals (as described here). The following works for the most part, except it matches floats too:
egrep '\b[0-9]+' *.c
Any suggestions for how to fix this?

You can use negative Lookarounds to make sure the number isn't followed by or preceded by a .:
\b(?<!\.)[0-9]+(?!\.)\b
Edit:
Since you want to only match the 0 of 0x in hex literals as you mentioned in the comments, use the following pattern instead. It works exactly like your original regex except that it doesn't match float numbers.
\b(?<!\.)[0-9]+(?![\.\d])
Try it online.
References:
Regular expressions: Lookahead and Lookbehind

I would not try to overoptimize a pattern like this and just convert each integer literal type and the possible suffixes literally into a regex with alternations:
(?i)(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?
Only the digit separators require some more work: a separator cannot be followed by another separator and can only appear between numbers.
Suffixes are allowed for hex and binary too, as tested with C++14 here.
Demo
Note: The pattern is designed to be case-insensitive.
Run it like this: egrep -ei "(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?" input.txt
PS: If you just want to extract the values a Perl script could come handy:
use strict;
my $file = '/some/where/input.txt';
my $regex = qr/(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?/ip;
open my $input, '<', $file or die "can't open $file: $!";
while (<$input>) {
chomp;
while ($_ =~ /($regex)/g) {
print "${^MATCH}\n";
}
}
close $input or die "can't close $file: $!";

Related

Perl, use regex to find a match and replace just the last character of the match (in this case a line break)

I have to clean several csv files before i put them in a database, some of the files have a unexpected linebreak in the middle of the line, as the line should always end with a number i managed to fix the files with this one liner:
perl -pe 's/[^0-9]\r?\n//g'
while it did work it also replaces the last char before the line break
foob
ar
turns into
fooar
Is there any one liner perl that i can call that would follow the same rule without replacing the last char before the linebreak

A negative lookbehind which is an assertion and won't consume characters can also be used.
(?<!\d)\R
\d is a a short for digit
\R matches any linebreak sequence
See this demo at regex101

One way is to use \K lookbehind
perl -pe 's/[^0-9]\K\r?\n//g'
Now it drops all matches up to \K so only what follows it is subject to the replacement side.
However, I'd rather recommend to process your CSV with a library, even as it's a little more code. There's already been one problem, that linefeed inside a field, what else may be there? A good library can handle a variety of irregularities.
A simple example with Text::CSV
use warnings;
use strict;
use feature 'say';
use Text::CSV;
my $file = shift or die "Usage: $0 file.csv\n";
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $row = $csv->getline($fh)) {
s/\n+//g for #$row;
$csv->say(\*STDOUT, $row);
}
Consider other constructor options, also available via accessors, that are good for all kinds of unexpected problems. Like allow_whitespace for example.
This can be done as a command-line program ("one-liner") as well, if there is a reason for that. The library's functional interface via csv is then convenient
perl -MText::CSV=csv -we'
csv in => *ARGV, on_in => sub { s/\n+//g for #{$_[1]} }' filename
With *ARGV the input is taken either from a file named on command line or from STDIN.

Just capture the last char and put it back:
perl -pe 's/([^0-9])\r?\n/$1/g'

Wild card matching

I need to match a sentences which contains both wild card character \ and . in same sentence.How to do it with Perl?
Say suppose my file has following sentences :
ttterfasghti.
ddseghies/affag
hhail/afgsh.
asfsdgagh/
adterhjc/sgsagh.
My expected output should be :
hhail/afgsh.
adterhjc/sgsagh.

Given a clarification from a comment
Any order but the matching line should contain both / and .
an easy way
perl -wne'print if m{/} and m{\.}' filename
This is inefficient in the sense that it starts the regex engine twice and scans each string twice. However, in most cases that is unnoticable while this code is much clearer than a single regex for the task.
I use {} delimiters so to not have to escape the /, in which case the m in front is compulsory. Then I use the same m{...} on the other pattern for consistency.
A most welcome inquiry comes that this be done in a script, not one-liner! Happy to oblige.
use warnings;
use strict;
my $file = shift || die "Usage: $0 file\n";
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>) {
print if m{/} and m{\.};
}
close $fh;

This feels like a duplicate, but I just can't find a good previous question for this.
For / there are two ways:
use m// operator with different separator characters, e.g. m,<regex with />,, m{<regex with />}, or
escape it, i.e. /\//
For . use escaping.
Note that inside a character class ([...]) many special characters no longer need escaping.
Hence we get:
$ perl <dummy.txt -ne 'print "$1: $_" if m,(\w+/\w*\.),'
hhail/afgsh.: hhail/afgsh.
adterhjc/sgsagh.: adterhjc/sgsagh.
i.e. the line is printed if it contains one-or-more word characters, followed by a /, zero-or-more word characters, ending with a ..
Recommended reading perlrequick, perlretut & perlre.
UPDATE after OP clarified the requirement in a comment:
$ perl <dummy.txt -ne 'print if m,/, && m{\.}'
hhail/afgsh.
adterhjc/sgsagh.

Replacing a single character in a perl regex match

How can I replace the 6th "_" that appears in the regex match?
Here is the literal input to be searched. It is not representing a path to the input:
/Users/rob/Documents/Test/m160505_031746_42156_c100980652550000001823221307061611_s1_p0_30_0_59.fsa
Here is my code, which parses out what I need. I just now need to replace the last matched "_" with a "/":
#!/usr/bin/perl
use strict;
use warnings;
open(IN, '<', '/Users/roblogan/Test_Database.txt') or die $!;
open(OUT, '>', '/Users/roblogan/Test_Output.txt') or die $!;
while (my $line = <IN>){
if ($line =~ m/(m160505_031746_42156_c100980652550000001823221307061611_s1_p0_[0-9]*)/){
print OUT $1, "\n";
}
}
Current output:
m160505_031746_42156_c100980652550000001823221307061611_s1_p0_30
Desired output:
m160505_031746_42156_c100980652550000001823221307061611_s1_p0/30
I have tried:
if ($line =~ s/(m160505_031746_42156_c100980652550000001823221307061611_s1_p0_[0-9]*)/(m160505_031746_42156_c100980652550000001823221307061611_s1_p0\/[0-9]*)/){
Any help would be appreciated.

This Perl code will do what I think you need, determined from your subject line and example output
It finds the sixth occurrence of an underscore in the target string and, if that underscore is followed by decimal digits, it changes the underscore to a slash and removes everything following the digits
I have used the pipe character | as the delimiter for the substitute operator s/// to avoid the need to escape forward slashes
use strict;
use warnings 'all';
my $path = q{/Users/rob/Documents/Test/m160505_031746_42156_c100980652550000001823221307061611_s1_p0_30_0_59.fsa};
$path =~ s|^(?:[^_]*_){5}[^_]*\K_(\d+).*|/$1|s;
print $path, "\n";
output
/Users/rob/Documents/Test/m160505_031746_42156_c100980652550000001823221307061611_s1_p0/30

From your description, the easiest way is:
$line =~ s!(m160505_031746_42156_c100980652550000001823221307061611_s1_p‌‌0)_!$1/!
I've chosen ! as the delimiter because / is used in the replacement part.
$1 is a variable containing the text matched by the first ( ) group in the regex (I didn't want to repeat the whole thing twice).
The final _ is not included in $1 (it's outside of the parens); instead we put / in the replacement part.
See perldoc perlretut for more information.

regular expression that matches any word that starts with pre and ends in al

The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;

A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}

You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...

Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i

How do I substitute with an evaluated expression in Perl?

There's a file dummy.txt
The contents are:
9/0/2010
9/2/2010
10/11/2010
I have to change the month portion (0,2,11) to +1, ie, (1,3,12)
I wrote the substitution regex as follows
$line =~ s/\/(\d+)\//\/\1+1\//;
It's is printing
9/0+1/2010
9/2+1/2010
10/11+1/2010
How to make it add - 3 numerically than perform string concat? 2+1??

Three changes:
You'll have to use the e modifier
to allow an expression in the
replacement part.
To make the replacement globally
you should use the g modifier. This is not needed if you've one date per line.
You use $1 on the replacement side, not a backreference
This should work:
$line =~ s{/(\d+)/}{'/'.($1+1).'/'}eg;
Also if your regex contains the delimiter you're using(/ in your case), it's better to choose a different delimiter ({} above), this way you don't have to escape the delimiter in the regex making your regex clean.

this works: (e is to evaluate the replacement string: see the perlrequick documentation).
$line = '8/10/2010';
$line =~ s!/(\d+)/!('/'.($1+1).'/')!e;
print $line;
It helps to use ! or some other character as the delimiter if your regular expression has / itself.
You can also use, from this question in Can Perl string interpolation perform any expression evaluation?
$line = '8/10/2010';
$line =~ s!/(\d+)/!("/#{[$1+1]}/")!e;
print $line;
but if this is a homework question, be ready to explain when the teacher asks you how you reach this solution.

How about this?
$ cat date.txt
9/0/2010
9/2/2010
10/11/2010
$ perl chdate.pl
9/1/2010
9/3/2010
10/12/2010
$ cat chdate.pl
use strict;
use warnings;
open my $fp, '<', "date.txt" or die $!;
while (<$fp>) {
chomp;
my #arr = split (/\//, $_);
my $temp = $arr[1]+1;
print "$arr[0]/$temp/$arr[2]\n";
}
close $fp;
$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match C integer literals - regex

I would like to use egrep/grep -E to print out the lines in C source files that contain integer literals (as described here). The following works for the most part, except it matches floats too: egrep '\b[0-9]+' *.c Any suggestions for how to fix this?

Related

Perl, use regex to find a match and replace just the last character of the match (in this case a line break)

Wild card matching

Replacing a single character in a perl regex match

regular expression that matches any word that starts with pre and ends in al

How do I substitute with an evaluated expression in Perl?

Categories

Resources