I am trying to write a regex to parse out seven match objects: four numbers and three operands:
Individual lines in the file look like this:
[ 9] -21 - ( 12) - ( -5) + ( -26) = ______
The number in brackets is the line number which will be ignored. I want the four integer values, (including the '-' if it is a negative integer), which in this case are -21, 12, -5 and -26. I also want the operands, which are -, - and +.
I will then take those values (match objects) and actually compute the answer:
-21 - 12 - -5 + -26 = -54
I have this:
[\s+0-9](-?[0-9]+)
In Pythex it grabs the [ 9] but it also then grabs every integer in separate match objects (four additional match objects). I don't know why it does that.
If I add a ? to the end: [\s+0-9](-?[0-9]+)? thinking it will only grab the first integer, it doesn't. I get seventeen matches?
I am trying to say, via the regex: Grab the line number and it's brackets (that part works), then grab the first integer including sign, then the operand, then the next integer including sign, then the next operand, etc.
It appears that I have failed to explain myself clearly.
The file has hundreds of lines. Here is a five line sample:
[ 1] 19 - ( 1) - ( 4) + ( 28) = ______
[ 2] -18 + ( 8) - ( 16) - ( 2) = ______
[ 3] -8 + ( 17) - ( 15) + ( -29) = ______
[ 4] -31 - ( -12) - ( -5) + ( -26) = ______
[ 5] -15 - ( 12) - ( 14) - ( 31) = ______
The operands are only '-' or '+', but any combination of those three may appear in a line. The integers will all be from -99 to 99, but that shouldn't matter if the regex works. The goal (as I see it) is to extract seven match objects: four integers and three operands, then add the numbers
exactly as they appear. The number in brackets is just the line number and plays no role in the computation.
Much luck with regex, if you just need the result:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
s = s[s.find("]")+1:s.find("=")] # cut away line nr and = ...
if not re.sub( "[+-0123456789() ]*","",s): # weak attempt to prevent python code injection
print(eval(s))
else:
print("wonky chars inside, only numbers, +, - , space and () allowed.")
Output:
-54
Make sure to read the eval()
and have a look into:
https://opensourcehacker.com/2014/10/29/safe-evaluation-of-math-expressions-in-pure-python/
https://softwareengineering.stackexchange.com/questions/311507/why-are-eval-like-features-considered-evil-in-contrast-to-other-possibly-harmfu/311510
https://www.kevinlondon.com/2015/07/26/dangerous-python-functions.html
Example for hundreds of lines:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
def calcIt(line):
s = line[line.find("]")+1:line.find("=")]
if not re.sub( "[+-0123456789() ]*","",s):
return(eval(s))
else:
print(line + " has wonky chars inside, only numbers, +, - , space and () allowed.")
return None
import random
random.seed(42)
pattern = "[ {}] -{} - ( {}) - ( -{}) + ( -{}) = "
for n in range(1000):
nums = [n]
nums.extend([ random.randint(0,100),random.randint(-100,100),random.randint(-100,100),
random.randint(-100,100)])
c = pattern.format(*nums)
print (c, calcIt(c))
Ahh... I had a cup of coffee and sat down in front of Pythex again.
I figured out the correct regex:
[\s+0-9]\s+(-?[0-9]+)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)
Yields:
-21
-
12
-
-5
+
-26
Related
I would like to see continued fractions with integers displayed in that form with SymPy, but I cannot seem to make SymPy comply. I found this Stack Overflow question and answer very useful (see farther below), but cannot reach my target goal here:
This is the continued fraction expansion of $\frac{13}{5}$. A common notation for this expansion is to give only the boxed terms as does SymPy below, i.e., $[2,1,1,2]$ from the SymPy continued_fraction_iterator:
Rat_13_5 = list(continued_fraction_iterator(Rational(13, 5)))
print( Rat_13_5 )
Rat_13_5 = list(continued_fraction_iterator(Rational(13, 5)))
( Rat_13_5 )
print( Rat_13_5 )
With output [2, 1, 1, 2].
Pg 37 of the Sympy manual release 1.5 Dec 9, 2019 gives a code snippet to print such an expanded fraction list:
def list_to_frac(l):
expr = Integer(0)
for i in reversed(l[1:]):
expr += i
expr = 1/expr
return l[0] + expr
If you invoke list_to_frac with the Rat_13_5 continued fraction expansion list, SymPy takes off and evaluates it:
print( list_to_frac( Rat_13_5 ) )
with output 13/5
If you use a list of symbols instead, then list_to_frac prints the desired continued fraction, e.g.,
n1, n2, n3, n4, n5, n6, n7, n8, n9 = symbols('n1:10')
cont_frac_list = [n2, n1, n1, n2]
contfrac12201015 = list_to_frac( [n2,n1,n1,n2] )
contfrac122010154
Which produces the desired (I am working in a JupyterLab environment so am actually obtaining typset LaTeX output throughout):
n2 + 1/(n1 + 1/(n1 + 1/n2))
I rewrote list_to_frac to use the UnevaluatedExpr facility presented by Francesco in the StackOverflow question I cited earlier:
def list_to_frac_noEval(l):
expr = Integer(0)
for i in reversed(l[1:]):
expr = UnevaluatedExpr(expr + i)
expr = UnevaluatedExpr( 1/expr )
return l[0] + expr
Invoking list_to_frac_noEval on the $\frac{13}{5}$ expansion list:
list_to_frac_noEval( [2,1,1,2] )
I obtain output
2 + (1 + (1 + 2**(-1))**(-1))**(-1)
Some folks use that notation (so I wanted to share list_to_frac_noEval in any case, that being superior to ending up with an evaluated single rational if you want to see the continued fraction), for example Roger Penrose in section $\unicode{x00A7}3.2$ of The Road to Reality (2004), but I still find it annoying that I cannot obtain the explicit continued fraction format when using integers instead of symbols.
I experimented with substituting in integers for symbols with evaluate=False, using both the subs method and the Subs function, looked at various combinations of sympify and srepr and parse_expr with evaluate=False, , but cannot persuade SymPy 1.4 to print the explicit fraction form that I obtain with list_to_frac operating on symbol arguments. Is there a way to accomplish this short of modifying SymPy code or special casing a particular set of numbers?
You can construct the expression explicitly passing evaluate=False to each part of the expression tree:
def list_to_frac(l):
expr = Integer(0)
for i in reversed(l[1:]):
expr = Add(i, expr, evaluate=False)
expr = Pow(expr, -1, evaluate=False)
return Add(l[0], expr, evaluate=False)
That gives:
In [2]: nums = list(continued_fraction_iterator(Rational(13, 5)))
In [3]: nums
Out[3]: [2, 1, 1, 2]
In [4]: list_to_frac(nums)
Out[4]:
1
───────────── + 2
1
───────── + 1
1
───── + 1
0 + 2
It looks like it's the wrong way around but that's just the way the printing works with default settings:
In [5]: init_printing(order='old')
In [6]: list_to_frac(nums)
Out[6]:
1
2 + ─────────────
1
1 + ─────────
1
1 + ─────
0 + 2
You can trigger evaluation with doit:
In [7]: _.doit()
Out[7]: 13/5
I'm trying to do this: if the last column is negative number from 1-5 then write second and last column to a file "neg.txt". If a last column is positive number, second and last column need to be written to "pos.txt". My both output files end up empty after execution. I don't know what's wrong with the code, when I think if statement can handle multiple conditions. I also tried with regular expressions but it did't work so I made it as simple as possible to see what is not working.
The input file looks like this:
abandon odustati -2
abandons napusta -2
abandoned napusten -2
absentee odsutne -1
absentees odsutni -1
aboard na brodu 1
abducted otet -2
accepted prihvaceno 1
My code is:
from urllib.request import urlopen
import re
pos=open('lek_pos.txt','w')
neg=open('lek_neg.txt','w')
allCondsAreOK1 = ( parts[2]=='1' and parts[2]=='2' and
parts[2]=='3' and parts[2]=='4' and parts[2]=='5' )
allCondsAreOK2 = ( parts[2]=='-1' and parts[2]=='-2' and
parts[2]=='-3' and parts[2]=='-4' and parts[2]=='-5' )
with open('leksicki_resursi.txt') as pos:
for line in pos:
parts=line.split() # split line into parts
if len(parts) > 1: # if at least 2 columns (parts)
if allCondsAreOK:
pos.write(parts[1]+parts[2])
elif allCondsAreOK2:
neg.write(parts[1]+parts[2])
else:
print("nothing matches")
You don't need a regex, you just need an if/elif checking if after casting to int the last value falls between -5 and -1, if it does you write to the neg file or if the value is any non negative number you write to the pos file:
with open('leksicki_resursi.txt') as f, open('lek_pos.txt','w')as pos, open('lek_neg.txt','w') as neg:
for row in map(str.split, f):
a, b = row[1], int(row[-1])
if b >= 0:
pos.write("{},{}\n".format(a, b))
elif -5 <= b <= -1:
neg.write("{},{}\n".format(a, b))
If the positive nums must also be between 1-5 then you can do something similar to the negative condition:
if 5 >= int(b) >= 0:
pos.write("{},{}\n".format(a, b))
elif -5 <= int(b) <= -1:
neg.write("{},{}\n".format(a, b))
Also if you have empty lines you can filter them out:
for row in filter(None,map(str.split, f)):
I have a bunch of email subject lines and I'm trying to extract whether a range of values are present. This is how I'm trying to do it but am not getting the results I'd like:
library(stringi)
df1 <- data.frame(id = 1:5, string1 = NA)
df1$string1 <- c('15% off','25% off','35% off','45% off','55% off')
df1$pctOff10_20 <- stri_match_all_regex(df1$string1, '[10-20]%')
id string1 pctOff10_20
1 1 15% off NA
2 2 25% off NA
3 3 35% off NA
4 4 45% off NA
5 5 55% off NA
I'd like something like this:
id string1 pctOff10_20
1 1 15% off 1
2 2 25% off 0
3 3 35% off 0
4 4 45% off 0
5 5 55% off 0
Here is the way to go,
df1$pctOff10_20 <- stri_count_regex(df1$string1, '^(1\\d|20)%')
Explanation:
^ the beginning of the string
( group and capture to \1:
1 '1'
\d digits (0-9)
| OR
20 '20'
) end of \1
% '%'
1) strapply in gsubfn can do that by combining a regex (pattern= argument) and a function (FUN= argument). Below we use the formula representation of the function. Alternately we could make use of betweeen from data.table (or a number of other packages). This extracts the matches to the pattern, applies the function to it and returns the result simplifying it into a vector (rather than a list):
library(gsubfn)
btwn <- function(x, a, b) as.numeric(a <= as.numeric(x) & as.numeric(x) <= b)
transform(df1, pctOff10_20 =
strapply(
X = string1,
pattern = "\\d+",
FUN = ~ btwn(x, 10, 20),
simplify = TRUE
)
)
2) A base solution using the same btwn function defined above is:
transform(df1, pctOff10_20 = btwn(gsub("\\D", "", string1), 10, 20))
I'm trying to figure out how I would go about formatting a large number to the shorter version by appending 'k' or 'm' using Lua. Example:
17478 => 17.5k
2832 => 2.8k
1548034 => 1.55m
I would like to have the rounding in there as well as per the example. I'm not very good at Regex, so I'm not sure where I would begin. Any help would be appreciated. Thanks.
Pattern matching doesn't seem like the right direction for this problem.
Assuming 2 digits after decimal point are kept in the shorter version, try:
function foo(n)
if n >= 10^6 then
return string.format("%.2fm", n / 10^6)
elseif n >= 10^3 then
return string.format("%.2fk", n / 10^3)
else
return tostring(n)
end
end
Test:
print(foo(17478))
print(foo(2832))
print(foo(1548034))
Output:
17.48k
2.83k
1.55m
Here a longer form, which uses the hint from Tom Blodget.
Maybe its not the perfect form, but its a little more specific.
For Lua 5.0, replace #steps with table.getn(steps).
function shortnumberstring(number)
local steps = {
{1,""},
{1e3,"k"},
{1e6,"m"},
{1e9,"g"},
{1e12,"t"},
}
for _,b in ipairs(steps) do
if b[1] <= number+1 then
steps.use = _
end
end
local result = string.format("%.1f", number / steps[steps.use][1])
if tonumber(result) >= 1e3 and steps.use < #steps then
steps.use = steps.use + 1
result = string.format("%.1f", tonumber(result) / 1e3)
end
--result = string.sub(result,0,string.sub(result,-1) == "0" and -3 or -1) -- Remove .0 (just if it is zero!)
return result .. steps[steps.use][2]
end
print(shortnumberstring(100))
print(shortnumberstring(200))
print(shortnumberstring(999))
print(shortnumberstring(1234567))
print(shortnumberstring(999999))
print(shortnumberstring(9999999))
print(shortnumberstring(1345123))
Result:
> dofile"test.lua"
100.0
200.0
1.0k
1.2m
1.0m
10.0m
1.3m
>
And if you want to get rid of the "XX.0", uncomment the line before the return.
Then our result is:
> dofile"test.lua"
100
200
1k
1.2m
1m
10m
1.3m
>
I am attempting to write elements from a nested list to individual lines in a file, with each element separated by tab characters. Each of the nested lists is of the following form:
('A', 'B', 'C', 'D')
The final output should be of the form:
A B C D
E F G H
. . . .
. . . .
However, my output seems to have reproducible inconsistencies such that the output is of the general form:
A B C D
E F G H
I J K L
M N O P
. . . .
. . . .
I've inspected the lists before writing and they seem identical in form. The code I'm using to write is:
with open("letters.txt", 'w') as outfile:
outfile.writelines('\t'.join(line) + '\n' for line in letter_list)
Importantly, if I replace '\t' with, for example, '|', the file is created without such inconsistencies. I know whitespace parsing can become an issue for certain file I/O operations, but I don't know how to troubleshoot it here.
Thanks for the time.
EDIT: Here is some actual input data (in nested-list form) and output:
IN
('5', '+', '5752624-5752673', 'alt_region_8161'), ('1', '+', '621461-622139', 'alt_region_67'), ('1', '+', '453907-454063', 'alt_region_60'), ('1', '+', '539611-539815', 'alt_region_61'), ('4', '+', '14610049-14610103', 'alt_region_6893'), ('4', '+', '14610049-14610144', 'alt_region_6895'), ('4', '+', '14610049-14610144', 'alt_region_6897'), ('4', '+', '14610049-14610144', 'alt_region_6896')]
OUT
4 + 12816011-12816087 alt_region_6808
1 + 21214720-21214747 alt_region_2377
4 + 9489968-9490833 alt_region_7382
1 + 12121545-12126263 alt_region_650
4 + 9489968-9490811 alt_region_7381
4 + 12816011-12816087 alt_region_6807
1 + 2032338-2032740 alt_region_157
5 + 4695084-4695628 alt_region_9316
1 + 22294677-22295134 alt_region_2424
1 + 22294677-22295139 alt_region_2425
1 + 22294677-22295139 alt_region_2426
1 + 22294677-22295139 alt_region_2427
1 + 22294677-22295134 alt_region_2422
1 + 22294677-22295134 alt_region_2423
1 + 22294384-22295198 alt_region_2428
1 + 22294384-22295198 alt_region_2429
5 + 20845105-20845211 alt_region_9784
5 + 20845105-20845206 alt_region_9783
3 + 2651447-2651889 alt_region_5562
EDIT: Thanks to everyone who commented. Sorry if the question was poorly phrased. I appreciate the help in clarifying the issue (or, apparently, non-issue).
There are no spaces (' ')in your output, only tabs ('\t').
>>> print(repr('1 + 21214720-21214747 alt_region_2377'))
'1\t+\t21214720-21214747\talt_region_2377'
^^ ^^ ^^
Tabs are not equivalent to a fixed number of spaces (in most editors). Rather, they move the character following the tab to the next available multiple of x characters from the left margin, where x varies - x is most commonly 8, though it is 4 here on SO.
>>> for i in range(7):
print('x'*i+'\tx')
x
x x
xx x
xxx x
xxxx x
xxxxx x
xxxxxx x
If you want your output to appear aligned to the naked eye, you should use string formatting:
>>> for line in data:
print('{:4} {:4} {:20} {:20}'.format(*line))
5 + 5752624-5752673 alt_region_8161
1 + 621461-622139 alt_region_67
1 + 453907-454063 alt_region_60
1 + 539611-539815 alt_region_61
4 + 14610049-14610103 alt_region_6893
4 + 14610049-14610144 alt_region_6895
4 + 14610049-14610144 alt_region_6897
4 + 14610049-14610144 alt_region_6896
Note, however, that this will not necessarily be readable by code that expects a tab-separated value file.
In some text editors, tabs are displayed like that. The contents of the file are correct, it's just a matter of how the file is displayed on screen. It happens with tabs but not with | which is why you don't see it happening when you use |.