Why does the function chunkBy parse strings into integer? - d

In the snippet below I want to split a string into repeated characters. The cunkBy function seems to return the parsed int value instead of strings. Why does this happen?
import std.stdio, std.algorithm, std.array, std.conv: to;
void main() {
//writeln("12236666".chunkBy!((a, b) => a == b).any!(a => a.to!string().length == 2));
writeln("12236666".chunkBy!((a, b) => a == b)); // prints [1, 22, 3, 6666]
}

It looks like it is just printing strings without quotes. There's no parsing going on there at all.

Related

Typescrpt regular expression extract x,y locations

I'm using Typescript and was wondering if I can use regular expressions to do the following.
input that comes in a string
"(0,1)(1,2)(3,1)"
There are quite a few validations which I need to do in this
Every individual value should be a number.
Each number cannot be larger than a variable value x provided at runtime.
If all the validations pass I want to get the matches out in an array and strip out the paranthesis.
Is this something possible with regular expressions ?
This is more of a js question - anyway you can achieve your desire solution with regexp and extra logic:
export const validate = (input: string, maxValue: number) => {
const match = input.match(/\((\d+),(\d+)\)\((\d+),(\d+)\)\((\d+),(\d+)\)/);
if (match) {
const results = match.slice(1, 7);
const pass = results.every(item => Number(item) < maxValue);
if (pass) {
return results.map(n => Number(n));
}
}
return false;
};
validate('(1,1)(1,2)(3,1)', 3); // false
validate('(1,1)(1,2)(3,1)', 5); // [ 1, 1, 1, 2, 3, 1 ]
validate('(1,1)(1,2)(3,1s)', 5); // false

Regex count number of replacements [duplicate]

Is there a way to count the number of replacements a Regex.Replace call makes?
E.g. for Regex.Replace("aaa", "a", "b"); I want to get the number 3 out (result is "bbb"); for Regex.Replace("aaa", "(?<test>aa?)", "${test}b"); I want to get the number 2 out (result is "aabab").
Ways I can think to do this:
Use a MatchEvaluator that increments a captured variable, doing the replacement manually
Get a MatchCollection and iterate it, doing the replacement manually and keeping a count
Search first and get a MatchCollection, get the count from that, then do a separate replace
Methods 1 and 2 require manual parsing of $ replacements, method 3 requires regex matching the string twice. Is there a better way.
Thanks to both Chevex and Guffa. I started looking for a better way to get the results and found that there is a Result method on the Match class that does the substitution. That's the missing piece of the jigsaw. Example code below:
using System.Text.RegularExpressions;
namespace regexrep
{
class Program
{
static int Main(string[] args)
{
string fileText = System.IO.File.ReadAllText(args[0]);
int matchCount = 0;
string newText = Regex.Replace(fileText, args[1],
(match) =>
{
matchCount++;
return match.Result(args[2]);
});
System.IO.File.WriteAllText(args[0], newText);
return matchCount;
}
}
}
With a file test.txt containing aaa, the command line regexrep test.txt "(?<test>aa?)" ${test}b will set %errorlevel% to 2 and change the text to aabab.
You can use a MatchEvaluator that runs for each replacement, that way you can count how many times it occurs:
int cnt = 0;
string result = Regex.Replace("aaa", "a", m => {
cnt++;
return "b";
});
The second case is trickier as you have to produce the same result as the replacement pattern would:
int cnt = 0;
string result = Regex.Replace("aaa", "(?<test>aa?)", m => {
cnt++;
return m.Groups["test"] + "b";
});
This should do it.
int count = 0;
string text = Regex.Replace(text,
#"(((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)", //Example expression. This one captures URLs.
match =>
{
string replacementValue = String.Format("<a href='{0}'>{0}</a>", match.Value);
count++;
return replacementValue;
});
I am not on my dev computer so I can't do it right now, but I am going to experiment later and see if there is a way to do this with lambda expressions instead of declaring the method IncrementCount() just to increment an int.
EDIT modified to use a lambda expression instead of declaring another method.
EDIT2 If you don't know the pattern in advance, you can still get all the groupings (The $ groups you refer to) within the match object as they are included as a GroupCollection. Like so:
int count = 0;
string text = Regex.Replace(text,
#"(((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)", //Example expression. This one captures URLs.
match =>
{
string replacementValue = String.Format("<a href='{0}'>{0}</a>", match.Value);
count++;
foreach (Group g in match.Groups)
{
g.Value; //Do stuff with g.Value
}
return replacementValue;
});

Split a DNA sequence into a list of codons with D

DNA strings consist of an alphabet of four characters, A,C,G, and T
Given a string,
ATGTTTAAA
I would like to split it in to its constituent codons
ATG TTT AAA
codons = ["ATG","TTT","AAA"]
codons encode proteins and they are redundant (http://en.wikipedia.org/wiki/DNA_codon_table)
I have a DNA string in D and would like to split it into a range
of codons and later translate/map the codons to amino acids.
std.algorithm has a splitter function which requires a delimiter and also the
std.regex Splitter function requires a regex to split the string.
Is there an idiomatic approach to splitting a string without a delimiter?
Looks like you are looking for chunks:
import std.range : chunks;
import std.encoding : AsciiString;
import std.algorithm : map;
AsciiString ascii(string literal)
{
return cast(AsciiString) literal;
}
void main()
{
auto input = ascii("ATGTTTAAA");
auto codons = input.chunks(3);
auto aminoacids = codons.map!(
(codon) {
if (codon == ascii("ATG"))
return "M";
// ...
}
);
}
Please note that I am using http://dlang.org/phobos/std_encoding.html#.AsciiString here instead of plain string literals. This is to avoid costly UTF-8 decoding which is done for string and is never applicable to actual DNA sequence. I remember that making notable performance difference for similar bioinformatics code before.
If you just want groups of 3 characters, you can use std.range.chunks.
import std.conv : to;
import std.range : chunks;
import std.algorithm : map, equal;
enum seq = "ATGTTTAAA";
auto codons = seq.chunks(3).map!(x => x.to!string);
assert(codons.equal(["ATG", "TTT", "AAA"]));
The foreach type of the chunks is Take!string, so you may or may not need the map!(x => x.to!string), depending on how you want to use the results.
For example, if you just want to print them:
foreach(codon ; "ATGTTTAAA".chunks(3)) { writeln(codon); }
import std.algorithm;
import std.regex;
import std.stdio;
int main()
{
auto seq = "ATGTTTAAA";
auto rex = regex(r"[AGT]{3}");
auto codons = matchAll(seq, rex).map!"a[0]";
writeln(codons);
return 0;
}

Replace C++ function with Regular Expression

I would like to convert the following C++ method to a regular expression match/replace string pair. Is it possible to do this in a single pass, i.e. with a single call to a regex replace method? (such as this one)
std::string f(std::string value)
{
if (value.length() < 3)
{
value = std::string("0") + value;
}
value = value.substr(0, value.length() - 2) + std::string(".") + value.substr(value.length() - 2, 2);
return value;
}
The input is a string of one or more digits.
Some examples:
f("1234") = "12.34"
f("123") = "1.23"
f("12") = "0.12"
f("1") = ".01"
The only way I've been able to achieve this so far is by using 2 steps:
1. Apply a prefix of "00" to the input string.
2. Use the following regex match/replace pair:
Match: (0*)(\d+)(\d{2})
Replace: $2.$3
My question is, can this be done in a single "pass" by only calling the Regex replace method once and without prepending anything to the string beforehand.
I believe this isn't possible with a single expression/replacement, but I'd just like someone to confirm that (or otherwise provide a solution :) ).
I hope this will help. (Change a bit again) x3.
string a_="123456";
a_="14";
a_="9";
string a = regex_replace(a_,regex("(.*)(.{2})|()"),string("$1.$2."));
//a = regex_replace(regex_replace(a,regex("^"),string("00$1$2")),regex("(.+)(.{2})"),string("$1.$2"));
//a = regex_replace("00"+a,regex("(.+)(.{2})"),string("$1.$2"));
float i=atof(a.c_str());
if(!(i))//just go here for 0-9
{
i=atof((string("0.0")+a_).c_str());
}
cout<<i<<endl;
return 0;

Pythonic way to rewrite the following C++ string processing code

Previous, I am having a C++ string processing code which is able to do this.
input -> Hello 12
output-> Hello
input -> Hello 12 World
output-> Hello World
input -> Hello12 World
output-> Hello World
input -> Hello12World
output-> HelloWorld
The following is the C++ code.
std::string Utils::toStringWithoutNumerical(const std::string& str) {
std::string result;
bool alreadyAppendSpace = false;
for (int i = 0, length = str.length(); i < length; i++) {
const char c = str.at(i);
if (isdigit(c)) {
continue;
}
if (isspace(c)) {
if (false == alreadyAppendSpace) {
result.append(1, c);
alreadyAppendSpace = true;
}
continue;
}
result.append(1, c);
alreadyAppendSpace = false;
}
return trim(result);
}
May I know in Python, what is the Pythonic way for implementing such functionality? Is regular expression able to achieve so?
Thanks.
Edit: This reproduces more accurately what the C++ code does than the previous version.
s = re.sub(r"\d+", "", s)
s = re.sub(r"(\s)\s*", "\1", s)
In particular, if the first whitespace in a run of several whitespaces is a tab, it will preserve the tab.
Further Edit: To replace by a space anyway, this works:
s = re.sub(r"\d+", "", s)
s = re.sub(r"\s+", " ", s)
Python has a lot of built-in functions that can be very powerful when used together.
def RemoveNumeric(str):
return ' '.join(str.translate(None, '0123456789').split())
>>> RemoveNumeric('Hello 12')
'Hello'
>>> RemoveNumeric('Hello 12 World')
'Hello World'
>>> RemoveNumeric('Hello12 World')
'Hello World'
>>> RemoveNumeric('Hello12World')
'HelloWorld'
import re
re.sub(r'[0-9]+', "", string)
import re
re.sub(r"(\s*)\d+(\s*)", lambda m: m.group(1) or m.group(2), string)
Breakdown:
\s* matches zero or more whitespace.
\d+ matches one or more digits.
The parentheses are used to capture the whitespace.
The replacement parameter is normally a string, but it can alternatively be a function which constructs the replacement dynamically.
lambda is used to create an inline function which returns whichever of the two capture groups is non-empty. This preserves a space if there was whitespace and returns an empty string if there wasn't any.
The regular expression answers are clearly the right way to do this. But if you're interested in a way to do if you didn't have a regex engine, here's how:
class filterstate(object):
def __init__(self):
self.seenspace = False
def include(self, c):
isspace = c.isspace()
if (not c.isdigit()) and (not (self.seenspace and isspace)):
self.seenspace = isspace
return True
else:
return False
def toStringWithoutNumerical(s):
fs = filterstate()
return ''.join((c for c in s if fs.include(c)))