How do I convert 1024 KB to 1 MB with the output? - regex

I have this
/([^\/\|\#\<\(\>\;\s][0-9]*[\s][KB]{2})
in order to be specific i had to use [KB]{2}
I get the value needed, but can I convert the final print to MB?
Exemple: match= 2000KB = 2MB?
Thanks

Sure you can; capture the unit and the number separately like this:
/[^\/\|\#\<\(\>\;\s]([0-9]*)[\s]([KB]{2})
Assuming your original regex is correct, of course. Then:
if ($2 eq "KB" && $1 > 1024) {
$1 /= 1024;
$2 = "MB";
}

lol on the sting "2000 KK" your regex matches
$1 = 000
$2 = KK
better try this one ;)
/(\d+)\s*(KB)/

Related

Perl Replace 26 characters with numeric

I would like to replace a string with the numerical correspondent.
For example (one-liner on Windows):
perl -e "$_ = \"abcdefghijklmnopqrstuvwxyz\"; tr\a-z\1-9\;"
The result is:
12345678999999999999999999
This works until 9 but how I can assign the numeric correspondent after character i?
I would like to know how I can assign 2 sign to one 1 sign,
for example,
12 -> j, 13 -> k, etc.
To identify the numerical value it would makes sense to assign
"1-", "2-", ... "25-", "26".
perl -E"$_ = 'abcdefghijklmnopqrstuvwxyz'; s/([a-z])/ord($1)-96/ge; say;"
or if you have 5.14+
perl -E"say 'abcdefghijklmnopqrstuvwxyz' =~ s/([a-z])/ord($1)-96/ger;"
You can substitute any rule instead of ord($1) - 96.
I don't believe tr/// can do that unfortunately - it's a one-to-one character substitution. So you're going to have to go the long way round:
my %indicies = map { $_ => (ord($_) - ord('a')) + 1 } ('a' .. 'z');
my $result = join '', map { $indicies{$_} } split(//, $string);
Unfortunately that's not a one-liner.

In Perl, how many groups are in the matched regex?

I would like to tell the difference between a number 1 and string '1'.
The reason that I want to do this is because I want to determine the number of capturing parentheses in a regular expression after a successful match. According the perlop doc, a list (1) is returned when there are no capturing groups in the pattern. So if I get a successful match and a list (1) then I cannot tell if the pattern has no parens or it has one paren and it matched a '1'. I can resolve that ambiguity if there is a difference between number 1 and string '1'.
You can tell how many capturing groups are in the last successful match by using the special #+ array. $#+ is the number of capturing groups. If that's 0, then there were no capturing parentheses.
For example, bitwise operators behave differently for strings and integers:
~1 = 18446744073709551614
~'1' = Î ('1' = 0x31, ~'1' = ~0x31 = 0xce = 'Î')
#!/usr/bin/perl
($b) = ('1' =~ /(1)/);
print isstring($b) ? "string\n" : "int\n";
($b) = ('1' =~ /1/);
print isstring($b) ? "string\n" : "int\n";
sub isstring() {
return ($_[0] & ~$_[0]);
}
isstring returns either 0 (as a result of numeric bitwise op) which is false, or "\0" (as a result of bitwise string ops, set perldoc perlop) which is true as it is a non-empty string.
If you want to know the number of capture groups a regex matched, just count them. Don't look at the values they return, which appears to be your problem:
You can get the count by looking at the result of the list assignment, which returns the number of items on the right hand side of the list assignment:
my $count = my #array = $string =~ m/.../g;
If you don't need to keep the capture buffers, assign to an empty list:
my $count = () = $string =~ m/.../g;
Or do it in two steps:
my #array = $string =~ m/.../g;
my $count = #array;
You can also use the #+ or #- variables, using some of the tricks I show in the first pages of Mastering Perl. These arrays have the starting and ending positions of each of the capture buffers. The values in index 0 apply to the entire pattern, the values in index 1 are for $1, and so on. The last index, then, is the total number of capture buffers. See perlvar.
Perl converts between strings and numbers automatically as needed. Internally, it tracks the values separately. You can use Devel::Peek to see this in action:
use Devel::Peek;
$x = 1;
$y = '1';
Dump($x);
Dump($y);
The output is:
SV = IV(0x3073f40) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PV(0x30698cc) at 0x3073484
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x3079bb4 "1"\0
CUR = 1
LEN = 4
Note that the dump of $x has a value for the IV slot, while the dump of $y doesn't but does have a value in the PV slot. Also note that simply using the values in a different context can trigger stringification or nummification and populate the other slots. e.g. if you did $x . '' or $y + 0 before peeking at the value, you'd get this:
SV = PVIV(0x2b30b74) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x3079c5c "1"\0
CUR = 1
LEN = 4
At which point 1 and '1' are no longer distinguishable at all.
Check for the definedness of $1 after a successful match. The logic goes like this:
If the list is empty then the pattern match failed
Else if $1 is defined then the list contains all the catpured substrings
Else the match was successful, but there were no captures
Your question doesn't make a lot of sense, but it appears you want to know the difference between:
$a = "foo";
#f = $a =~ /foo/;
and
$a = "foo1";
#f = $a =~ /foo(1)?/;
Since they both return the same thing regardless if a capture was made.
The answer is: Don't try and use the returned array. Check to see if $1 is not equal to ""

Parse time string using regex

My time string may be in one of the following formates (x and y - integer numbers, h and m - symbols):
xh ym
xh
ym
y
Examples:
1h 20m
45m
2h
120
What regular expression should I write to get x and y numbers from such string?
(\d+)([mh]?)(?:\s+(\d+)m)?
You can then inspect groups 1-3. For your examples those would be
('1', 'h', '20')
('45', 'm', '')
('2', 'h', '')
('120', '', '')
As always, you might want to use some anchors ^, $, \b...
I'm going to assume you're using .NET due to your username. :)
I think in this case, it's easier to use TimeSpan.ParseExact for this task.
You can specify a list of permitted formats (see here for the format for these) and ParseExact will read in the TimeSpan according to them.
Here is an example:
var formats = new[]{"h'h'", "h'h 'm'm'", "m'm'", "%m"};
// I have assumed that a single number means minutes
foreach (var item in new[]{"23","1h 45m","1h","45m"})
{
TimeSpan timespan;
if (TimeSpan.TryParseExact(item, formats, CultureInfo.InvariantCulture, out timespan))
{
// valid
Console.WriteLine(timespan);
}
}
Output:
00:23:00
01:45:00
01:00:00
00:45:00
The only problem with this is that it is rather inflexible. Additional whitespace in the middle will fail to validate. A more robust solution using Regex is:
var items = new[]{"23","1h 45m", "45m", "1h", "1h 45", "1h 45", "1h45m"};
foreach (var item in items)
{
var match = Regex.Match(item, #"^(?=\d)((?<hours>\d+)h)?\s*((?<minutes>\d+)m?)?$", RegexOptions.ExplicitCapture);
if (match.Success)
{
int hours;
int.TryParse(match.Groups["hours"].Value, out hours); // hours == 0 on failure
int minutes;
int.TryParse(match.Groups["minutes"].Value, out minutes);
Console.WriteLine(new TimeSpan(0, hours, minutes, 0));
}
}
Breakdown of the regex:
^ - start of string
(?=\d) - must start with a digit (do this because both parts are marked optional, but we want to make sure at least one is present)
((?<hours>\d+)h)? - hours (optional, capture into named group)
\s* - whitespace (optional)
((?<minutes>\d+)m?)? - minutes (optional, capture into named group, the 'm' is optional too)
$ - end of string
I would say that mhyfritz' solution is simple, efficient and good if your input is only what you shown.
If you ever need to handle corner cases, you can use a more discriminative expression:
^(\d+)(?:(h)(?:\s+(\d+)(m))?|(m?))$
But it can be overkill...
(get rid of ^ and $ if you need to detect such pattern in a larger body of text, of course).
Try this one: ^(?:(\d+)h\s*)?(?:(\d+)m?)?$:
var s = new[] { "1h 20m", "45m", "2h", "120", "1m 20m" };
foreach (var ss in s)
{
var m = Regex.Match(ss, #"^(?:(\d+)h\s*)?(?:(\d+)m?)?$");
int hour = m.Groups[1].Value == "" ? 0 : int.Parse(m.Groups[1].Value);
int min = m.Groups[2].Value == "" ? 0 : int.Parse(m.Groups[2].Value);
if (hour != 0 || min != 0)
Console.WriteLine("Hours: " + hour + ", Mins: " + min);
else
Console.WriteLine("No match!");
}
in bash
echo $string | awk '{for(i=1;i<=NF;i++) print $i}' | sed s/[hm]/""/g

I want to replace ',' on the 150th location in a String with a <br>

My String is : PI Last Name equal to one of
('AARONSON','ABDEL MEGUID','ABDEL-LATIF','ABDOOL KARIM','ABELL','ABRAMS','ACKERMAN','ADAIR','ADAMS','ADAMS-CAMPBELL', 'ADASHI','ADEBAMOWO','ADHIKARI','ADIMORA','ADRIAN', 'ADZERIKHO','AGADJANYAN','AGARWAL','AGOT', 'AGUIRRE-CRUZ','AHMAD','AHMED','AIKEN', 'AINAMO', 'AISENBERG','AJAIYEOBA','AKA','AKHTAR','AKINGBEMI','AKINYINKA','AKKERMAN','AKSOY','AKYUREK', 'ALBEROLA-ILA','ALBERT','ALCANTARA' ,'ALCOCK','ALEMAN', 'ALEXANDER','ALEXANDRE','ALEXANDROV','ALEXANIAN','ALLAND','ALLEN','ALLISON','ALPER', 'ALTMAN','ALVAREZ','AMARYAN','AMBESI-IMPIOMBATO','AMEGBETO','AMOWITZ', 'ANAGNOSTARAS','ANAND','ANDERSEN','ANDERSON', 'ANDRADE','ANDREEFF','ANDROPHY','ANGER','ANHOLT','ANTHONY','ANTLE','ANTONELLI','ANTONY', 'ANZULOVICH', 'APODACA','APOSHIAN','APPEL','APPLEBY','APRIL','ARAUJO','ARBIB','ARBOLEDA', 'ARCHAKOV','ARCHER', 'ARECHAVALETA-VELASCO','ARENS','ARGON','ARGYROKASTRITIS', 'ARIAS','ARIZAGA','ARMSTRONG','ARNON', 'ARSHAVSKY','ARVIN','ASATRYAN','ASCOLI','ASKENASE','ASSI','ATALAY','ATANASOVA','ATKINSON','ATTYGALLE','ATWEH','AU','AVETISYAN','AWE','AYOUB','AZAD','BACSO','BAGASRA','BAKER','BALAS', 'BALCAZAR','BALK','BALKAY','BALLOU','BALRAJ','BALSTER','BANERJEE','BANKOLE','BANTA','BARAL','BARANOWSKA','BARBAS', 'BARBER','BARILLAS-MURY','BARKHOLT','BARNES','BARNETT','BARRETT','BARRIA','BARROW','BARROWS','BARTKE','BARTLETT','BASSINGTHWAIGHTE','BASSIOUNY','BASU','BATES','BATTAGLIA','BATTERMAN','BAUER','BAUERLE','BAUM','BAUME', 'BAUMLER','BAVISTER','BAWA','BAYNE','BEASLEY','BEATTY','BEATY','BEBENEK','BECK','BECKER','BECKMAN','BECKMAN-SUURKULA' ,'BEDFORD','BEDOLLA','BEEBE','BEEMON','BEHETS','BEHRMAN','BEIER','BEKKER','BELL','BELLIDO','BELMAIN', 'BENATAR','BENBENISHTY','BENBROOK','BENDER','BENEDETTI','BENNETT','BENNISH','BENZ','BERG','BERGER','BERGEY','BERGGREN','BERK','BERKOWITZ','BERLIN','BERLINER','BERMAN','BERTINO','BERTOZZI','BERTRAND','BERWICK','BETHONY','BEYERS','BEYRER' ,'BEZPROZVANNY','BHAGWAT','BHANDARI','BHARGAVA','BHARUCHA','BHUJWALLA','BIANCO','BIDLACK','BIELERT','BIER','BIESSMANN','BIGELOW' ,'BILLER','BILLINGS','BINDER','BINDMAN','BINUTU','BIRBECK','BIRGE','BIRNBAUM','BIRO','BIRT','BISHAI','BISHOP','BISSELL','BJORKEGREN','BJORNSTAD','BLACK','BLANCHARD','BLASS','BLATTNER','BLIGNAUT','BLOCH','BLOCK','BLOOM','BLOOM,','BLUM','BLUMBERG' ,'BLUMENTHAL','BLYUKHER','BODDULURI','BOFFETTA','BOGOLIUBOVA', 'BOLLINGER','BOLLS','BOMSZTYK','BONANNO','BONNER','BOOM','BOOTHROYD','BOPPANA','BORAWSKI','BORG','BORIS-LAWRIE','BORISY','BORLONGAN','BORNSTEIN','BORODOVSKY','BORST','BOS','BOTO','BOWDEN','BOWEN','BOYCE-JACINO','BRADEN','BRADY' ,'BRAITHWAITE','BRANN','BRASH','BRAUNSTEIN', 'BREMAN','BRENNAN','BRENNER','BRETSCHER','BREW','BREYSSE','BRIGGS','BRITES','BRITT','BRITTENHAM','BRODIE','BRODY','BROOK','BROOTEN','BROSCO','BROSNAN','BROWN','BROWNE','BRUCKNER','BRUNENGRABER','BRYL','BRYSON','BU','BUCHAN','BUDD','BUDNIK', 'BUEKENS','BUKRINSKY','BULLMORE','BULUN','BURBANO','BURGENER','BURGESS','BURKS','BURMEISTER','BURNETT','BURNHAM','BURNS','BURRIDGE','BURTON','BUSCIGLIO','BUSHEK','BUSIJA','BUZSAKI','BZYMEK','CABA')
I need to have a regex which will greedily looks for up to 150 characters with a last character being a ','. And then replace the last ',' of the 150 with a <br />
Any suggestions pls?
I used this ','(?=[^()]*\)) but this one replaces all the occurences. I want the 150th ones to be replaced.
Thanks everyone for your suggestions. I managed to do it with Java code instead of regex.
StringBuilder sb = new StringBuilder(html);
int i = 0;
while ((i = sb.indexOf("','", i + 150)) != -1) {
int j = sb.lastIndexOf("','", i + 150);
sb.insert(i+1, "<BR>");
}
return sb.toString();
However, this breaks at the first encounter of ',' in the 150 chars.
Can anyone help modify my code to incorporate the break at the last occurence of ',' withing the 150 chars.
You'll want something like this:
Look for every occurrence of \([^)]+*,[^)]+*\) (Find a parenthesis-wrapped string with a comma in it and then run the following regular expression on each of the matched elements:
(.{135,150}[^,]*?),
The first number is the minimum number of characters you want to match before you add a break tag -- the second is the maximum number of characters you would like to match before inserting a break tag. If there is no , between the characters in question then the regular expression will continue to consume characters until it finds a comma.
You could probably do it like this:
regex ~ /(^.{1,14}),/
replacement ~ '\1<replacement' or "$1<insert your text>"
In Perl:
$target = ','x 22;
$target =~ s/(^ .{1,14}) , /$1<15th comma>/x;
print $target;
Output
,,,,,,,,,,,,,,<15th comma>,,,,,,,
Edit: As an alternative, if you want to break the string up into succesive 150 or less
you could do it this way:
regex ~ /(.{1,150},)/sg
replacement ~ '\1<br/>' or "$1<br\/>"
// That is a regex of type global (/g) and include newlines (/s)
In Perl:
$target = "
('AARONSON','ABDEL MEGUID','ABDEL-LATIF','ABDOOL KARIM','ABELL','ABRAMS','ACKERMAN','ADAIR','ADAMS','ADAMS-CAMPBELL', 'ADASHI','ADEBAMOWO','ADHIKARI','ADIMORA','ADRIAN', 'ADZERIKHO','AGADJANYAN','AGARWAL','AGOT', 'AGUIRRE-CRUZ','AHMAD','AHMED','AIKEN', 'AINAMO', 'AISENBERG','AJAIYEOBA','AKA','AKHTAR','AKINGBEMI','AKINYINKA','AKKERMAN','AKSOY','AKYUREK', 'ALBEROLA-ILA','ALBERT','ALCANTARA' ,'ALCOCK','ALEMAN', 'ALEXANDER','ALEXANDRE','ALEXANDROV','ALEXANIAN','ALLAND','ALLEN','ALLISON','ALPER', 'ALTMAN', ... )
";
if ($target =~ s/( .{1,150} , )/$1<br\/>/sxg) {
print $target;
}
Output:
('AARONSON','ABDEL MEGUID','ABDEL-LATIF','ABDOOL KARIM','ABELL','ABRAMS','ACKERMAN','ADAIR','ADAMS','ADAMS-CAMPBELL', 'ADASHI','ADEBAMOWO','ADHIKARI',<br/>'ADIMORA','ADRIAN', 'ADZERIKHO','AGADJANYAN','AGARWAL','AGOT', 'AGUIRRE-CRUZ','AHMAD','AHMED','AIKEN', 'AINAMO', 'AISENBERG','AJAIYEOBA','AKA',<br/>'AKHTAR','AKINGBEMI','AKINYINKA','AKKERMAN','AKSOY','AKYUREK', 'ALBEROLA-ILA','ALBERT','ALCANTARA' ,'ALCOCK','ALEMAN', 'ALEXANDER','ALEXANDRE',<br/>'ALEXANDROV','ALEXANIAN','ALLAND','ALLEN','ALLISON','ALPER', 'ALTMAN',<br/> ... )

How can I extract a substring after a match position?

I have a requirement to grep a string or pattern (say around 200 characters before and after the string or pattern) from an extremely long line ed file. The file contains streams of data (market trading data) coming from a remote server and getting appended onto this line of the file.
I know that I can match lines containing a specific pattern using grep (or other tools), but once I have such lines, how can I extract a portion of the line? I want to grab the part of the line with the pattern plus roughly 200 characters before and after the pattern. I would be especially interested in answers using...(supply tools or languages you're comfortable with here).
If what you need is the 200 characters before and after the expression plus the expression itself, then you are looking at:
/.{200}aaa.{200}/
If you need captures for each (allowing you to extract each part as a unit), then you use this regexp:
/(.{200})(aaa)(.{200})/
If your grep has -o then that will output only the matched part.
echo "abc def ghi jkl mno pqr" | egrep -o ".{4}ghi.{4}"
produces:
def ghi jkl
(.{0,200}(pattern).{0,200}), or something?
Is this what you want (in C)?
If it is, feel free to adapt to your specific needs.
#include <stdio.h>
#include <string.h>
void prt_grep(const char *haystack, const char *needle, int padding) {
char *ptr, *start, *finish;
ptr = strstr(haystack, needle);
if (!ptr) return;
start = (ptr - padding);
if (start < haystack) start = haystack;
finish = ptr + strlen(needle) + padding;
if (finish > haystack + strlen(haystack)) finish = haystack + strlen(haystack);
for (ptr = start; ptr < finish; ptr++) putchar(*ptr);
}
int main(void) {
const char *longline = "123456789 ASDF 123456789";
const char *pattern = "ASDF";
prt_grep(longline, pattern, 5); /* you want 200 */
return 0;
}
I think I might approach the problem by matching the part of the string I need, then using the match position as the starting point for the substring extraction. In Perl, once your regex suceeds, the pos built-in tells you where you left off:
if( $long_string = m/$regex/ ) {
$substring = substr( $long_string, pos( $long_string ), 200 );
}
I tend to write my programs in Perl instead of doing everything in the regular expression. There's nothing particularly special about Perl in this case.
I think this may be more basic that everybody is thinking, correct me if I'm wrong...
Do you want to print before and after the string excluding the string?
awk -F "ASDF" '{print "Before ASDF" $1 "\n" "After ASDF" $2}' $FILE
This will print something like:
Before ASDF blablabla
After ASDF blablablabla
Change it to match your needs, remove the "\n" and or the "Before..." and "After..." comments
Do you want to supress the string from the file?
This will replace the string with a blank space, again, change it to whatever you need.
sed -i 's/ASDF/\ /' longstring.txt
HTH