dict to remove smart quotes [closed]

dict to remove smart quotes [closed] - python-2.7

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
charmap = [
(u"\u201c\u201d", "\""),
(u"\u2018\u2019", "'")
]
_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
print fixed
I was looking to write a similar script to replace smart quotes and curly apostrophes from text answered here here: Would someone be kind enough to explain the two lines:
_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
and possibly rewrite them in a longer-winded format with comments to explain what is exactly going on - I'm a little confused whether its an inner/outer loop combo or sequential checking over items in a dictionary.

_map = dict((c, r) for chars, r in charmap for c in list(chars))
means:
_map = {} # an empty dictionary
for (c, r) in charmap: # c - string of symbols to be replaced, r - replacement
for chars in list(c): # chars - individual symbol from c
_map[chars] = r # adding entry replaced:replacement to the dictionary
and
fixed = "".join(_map.get(c, c) for c in s)
means
fixed = "" # an empty string
for c in s:
fixed = fixed + _map.get(c, c) # first "c" is key, second is default for "not found"
as method .joinsimply concatenates elements of sequence with given string as a separators between them (in this case "", i. e. without a separator)

It's faster and more straightforward to use the built in string function translate:
#!python2
#coding: utf8
# Start with a Unicode string.
# Your codecs.open() will read the text in Unicode
text = u'''\
"Don't be dumb"
“You’re smart!”
'''
# Build a translation dictionary.
# Keys are Unicode ordinal numbers.
# Values can be ordinals, Unicode strings, or None (to delete)
charmap = { 0x201c : u'"',
0x201d : u'"',
0x2018 : u"'",
0x2019 : u"'" }
print text.translate(charmap)
Output:
"Don't be dumb"
"You're smart!"

Related

Splitting a string at every 2 newline characters in haskell [duplicate]

This question already has answers here:
What is the best way to split a string by a delimiter functionally?
(9 answers)
Closed 8 months ago.
My input looks like
abc
a
b
c
abc
abc
abc
abc
I need a function that would split it into something like
[ "abc"
, "a\nb\nc"
, "abc\nabc\nabc\nabc"]
I've tried using regexes, but
I can't import Text.Regex itself
Module Text.Regex.Base does not export splitStr

It's generally a bad idea to use regex in such cases, since it's less readable then pure and concise code, that can be used here.
For example using foldr, the only case where we should add new string into lists of strings is the case where last seen element and current element are newline's:
split :: FilePath -> IO [String]
split path = do
text <- readFile path
return $ foldr build [[]] (init text)
where
build c (s:ls) | s == [] || head s /= '\n' || c /= '\n' = (c:s) : ls
| otherwise = [] : tail s : ls
This code produces the aforementioned result when it is given file with aforementioned content.

Regex find all first unique occurences of character in a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have following string
1,2,3,a,b,c,a,b,c,1,2,3,c,b,a,2,3,1,
I would like to get only the first occurrence of any number without changing the order. This would be
1,2,3,a,b,c,
With this regex (found # https://stackoverflow.com/a/29480898/9307482) I can find them, but only the last occurrences. And this reverses the order.
(\w)(?!.*?\1) (https://regex101.com/r/3fqpu9/1)
It doesn't matter if the regex ignores the comma. The order is important.

Regular expression is not meant for that purpose. You will need to use an index filter or Set on array of characters.
Since you don't have a language specified I assume you are using javascript.
Example modified from: https://stackoverflow.com/a/14438954/1456201
String.prototype.uniqueChars = function() {
return [...new Set(this)];
}
var unique = "1,2,3,a,b,c,a,b,c,1,2,3,c,b,a,2,3,1,".split(",").join('').uniqueChars();
console.log(unique); // Array(6) [ "1", "2", "3", "a", "b", "c" ]

I would use something like this:
// each index represents one digit: 0-9
const digits = new Array(10);
// make your string an array
const arr = '123abcabc123cba231'.split('');
// test for digit
var reg = new RegExp('^[0-9]$');
arr.forEach((val, index) => {
if (reg.test(val) && !reg.test(digits[val])) {
digits[val] = index;
}
});
console.log(`occurrences: ${digits}`); // [,0,1,2,,,,....]
To interpret, for the digits array, since you have nothing in the 0 index you know you have zero occurrences of zero. Since you have a zero in the 1 index, you know that your first one appears in the first character of your string (index zero for array). Two appears in index 1 and so on..

A perl way to do the job:
use Modern::Perl;
my $in = '4,d,e,1,2,3,4,a,b,c,d,e,f,a,b,c,1,2,3,c,b,a,2,3,1,';
my (%h, #r);
for (split',',$in) {
push #r, $_ unless exists $h{$_};
$h{$_} = 1;
}
say join',',#r;
Output:
4,d,e,1,2,3,a,b,c,f

Check if two strings are anagrams in VB [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm making a little anagram solving game in VB and I'd like to verify that the given word fits the letters given. For example, the string eliter shouldn't match letter because t only appears once in the above string. I already have a check to determine if the submission is a word, but ideally, a check to make sure it fits the letters given would be wrapped around that statement.
Public Class Form1
Public Function GenerateRandomString(ByRef iLength As Integer) As String
Dim rdm As New Random()
Dim allowChrs() As Char = "ABCDEFGHIJKLOMNOPQRSTUVWXYZ".ToCharArray()
Dim sResult As String = ""
For i As Integer = 0 To iLength - 1
sResult += allowChrs(rdm.Next(0, allowChrs.Length))
Next
Return sResult
End Function
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
countdown.Enabled = True
lbl_countdown.Text = "10"
lbl_anagram.Text = GenerateRandomString(9)
End Sub
Private Sub countdown_Tick(sender As Object, e As EventArgs) Handles countdown.Tick
lbl_countdown.Text -= 1
If lbl_countdown.Text < 1 Then
countdown.Enabled = False
MsgBox("End of game")
Dim wordList As HashSet(Of String) = New HashSet(Of String)(File.ReadAllLines("words_alpha.txt"))
If Not wordList.Contains(TextBox1.Text.ToString()) Then
MsgBox("Not found")
Else
MsgBox("Found")
End If
End If
End Sub
End Class
Note that this needs to be in VB and preferably requires no extra libraries, etc.
Also note that you don't have to use all the letter but can't use more than what's available!
Thanks in advance!

You just need to sort your string alphabetically and compare the sorted string. Two anagram will have the same string when the characters are sorted.
Sub Main()
Dim s As String = "betjwepfw"
Console.WriteLine(s)
Console.WriteLine(New String(s.OrderBy(Function(c) c).ToArray()))
Console.ReadLine()
End Sub

def countchar() to find frequency of letters in one sentence [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
def countchar(str):
list1 = [0]*26
for i in range(0,len(str)):
if (str[i] >= 'a' and str[i] <='z'):
***list1[ord(str[i])-0] += 1***
print list1
if __name__ == "__main__":
str = " GOOD morning and have a nice day"
str = str.lower()
print countchar(str)
there is an error in my code so I can achieve my goal.

Your main issue was that you needed to be subtracting ord("a") (which is 97) from each character to find its index in list1, not 0.
But I cleaned up the rest of your code too, since there were a lot of other inefficiencies and bad practices and stuff.
def countchar(sentence):
list1 = [0 for i in range(26)]
for c in sentence:
if 'a' <= c <= 'z':
list1[ord(c) - ord("a")] += 1
return list1
if __name__ == "__main__":
string = " GOOD morning and have a nice day"
string = string.lower()
print countchar(string)
In particular, it's bad practice to use keywords like str as variable names.
Also, depending on what you're planning to do with this, it's likely that a dictionary would work better for your purposes than a list.
Here's a quick rewrite (with the additional functionality that it'll count all characters, not just lowercase letters) using a dictionary:
def countchar(sentence):
char_dict = {}
for c in sentence:
if c in char_dict:
char_dict[c] += 1
else:
char_dict[c] = 1
return char_dict
if __name__ == "__main__":
string = " GOOD morning and have a nice day!!"
print "With uppercase (and punctuation, etc):"
print countchar(string)
print "All lowercase (and punctuation, etc):"
string = string.lower()
print countchar(string)
As requested in a comment, here is some clarification of the following line:
list1[ord(c) - ord("a")] += 1
First let's look at the inside, ord(c) - ord("a"). c is a string with only a single character in it, so ord(c) gives you the ASCII value for that string. Since you're mapping lowercase letters to the numbers 0, 1, ..., 25, we need to make sure that the letter "a" gets mapped to 0. Since, in ASCII, the letters are sequential (a=97, b=98, ..., z=122), then we can just subtract the smallest one from each one in order to map them:
a --> 97-97 = 0
b --> 98-97 = 1
c --> 99-97 = 2
...
z --> 122-97 = 25
So that's what ord(c) - ord("a") is doing. It's subtracting that 97 (which is the value of ord("a")) from each value, and giving a number between 0 and 25.
Then list1[ that number between 0 and 25 ] += 1 just increments the proper index in list1.

Find a substring between two optional markers [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I was trying to extract substrings from a string of the following form:
dest=comp;jump
I'm looking for a regexp to retrieve comp, but both dest and jump are optional, in which case = or ; is ommitted. So these are all valid configurations:
dest=comp;jump
dest=comp
comp;jump
comp
dest, comp and jump are arbitrary strings, but do not contain equality signs nor semicolons.
What I managed to come up with is the following:
(?:=)([^;=]*)(?:;)
Unfortunately, it doesn't work when either dest or jump is ommitted.

How about:
(?:.*=|^)([^;]+)(?:;|$)
The string you're searching is in group 1.

If the whole line has to have that form, this should do:
if line.chomp =~ /\A(?:[^;=]+=)?([^=;]+)(?:;[^;=]+)?\z/
puts $1
end
This will skip ill-formed lines like
"dest=dest=comp;jump;jump"

I wouldn't try to make it all happen inside a single regular expression. That path makes it harder to read and maintain. Instead I'd break it into more atomic tests using case/when statements:
If you only want comp I'd use:
array = %w[
dest=comp;jump
dest=comp
comp;jump
comp
].map{ |str|
case str
when /.+=(.+);/, /=(.+)/, /(.+);/
$1
else
str
end
}
array
# => ["comp", "comp", "comp", "comp"]
The when clause breaks the complexity down into three small tests, each of them very easy to understand:
Does the string have both '=' and ';'? Then return the substring between those two characters.
Does the string have '='? Then return the last word.
Does the string have ';'? Then return the first word.
Return the word.
If you need to know which of your terms are being returned then a bit more code is needed:
%w[
dest=comp;jump
dest=comp
comp;jump
comp
].each{ |str|
dest, comp, jump = case str
when /(.+)=(.+);(.+)/
[$1, $2, $3]
when /(.+)=(.+)/
[$1, $2, nil]
when /(.+);(.+)/
[nil, $1, $2]
else
[nil, str, nil]
end
puts 'dest="%s" comp="%s" jump="%s"' % [dest, comp, jump]
}
# >> dest="dest" comp="comp" jump="jump"
# >> dest="dest" comp="comp" jump=""
# >> dest="" comp="comp" jump="jump"
# >> dest="" comp="comp" jump=""

I would just try to split the expression into two, makes it easier to understand what is happening:
string = 'dest=comp;jump'
trimming_regexp = [/.*=/, /;.*/]
trimming_regexp.each{|exp| string.slice!(exp)}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

dict to remove smart quotes [closed] - python-2.7

Related

Splitting a string at every 2 newline characters in haskell [duplicate]

Regex find all first unique occurences of character in a string [closed]

Check if two strings are anagrams in VB [closed]

def countchar() to find frequency of letters in one sentence [closed]

Find a substring between two optional markers [closed]

Categories

Resources