In a c++ project, I'm writing a parser to read from and write to .ics files. To do that, I test my parser against several files with a maximum of possible cases, from several sources (gmail, yahoo, ...). Recently I found in a test file a situation that leaves me a little confused, and for which I could not find a satisfactory answer.
One of my test file failed to be imported by my parser. The VEVENT that cause the issue contains the following start date:
DTSTART;TZID=GMT+04:00:20120103T120000
This event match with a VTIMEZONE, that declares his TZID parameter as follow:
TZID:GMT+04:00
AFAIK the ":" char should be instead used as separator, and I suspect that the VTIMEZONE itself is malformed in the sample described above, but I didn't find any document that explicitly specifies that this situation can never happen. I also noticed that several apps like Thunderbird also fail to import this file, apparently for the same reason.
So my question is: can a TZID parameter in a VTIMEZONE contain a ":" char?
Also, I don't know if I should use the TZID content as a key to extract the ISO date from the DTSTART parameter, or if I do simply reject a such event, tagging it as corrupted, and showing an error message after the importation?
The authority on this is RFC5545 and the relevant subsections:
RFC5545 3.2.19. Time Zone Identifier
tzidparam = "TZID" "=" [tzidprefix] paramtext
which is complemented by
RFC5545 3.1. Content Lines
param = param-name "=" param-value *("," param-value)
param-value = paramtext / quoted-string
paramtext = *SAFE-CHAR
[..]
SAFE-CHAR = WSP / %x21 / %x23-2B / %x2D-39 / %x3C-7E
/ NON-US-ASCII
; Any character except CONTROL, DQUOTE, ";", ":", ","
From which we can conclude that a Time Zone ID parameter cannot contain a ":" char.
Related
Background
I've built and published a script to retrieve daily short volume data via the Quandl data connector:
Short Volume Script
I actually just discovered that this script doesn't correctly access the data for preferred or classed shares like BRK.A, BRK.B because of the literal period in the symbol / ticker code which on Quandl I believe is either a slash or an underscore.
Code
This is what I've currently used in my script so far:
quandl_ticker = "QUANDL:FINRA/FNSQ_" + syminfo.ticker
quandl_dly_sh_vol = security(quandl_ticker, "D", close )
What I'm looking for would be something to the effect of:
"QUANDL:FINRA/FNSQ_" + substitute( syminfo.ticker, ".", "_" )
Which transforms the BRK.B into BRK_B. I hope that's clear enough.
The str.replace_all built-in function sounds like what you are looking for:
quandl_ticker = "QUANDL:FINRA/FNSQ_" + str.replace_all(syminfo.ticker, ".", "_")
I'm brand new to using PBI but as far as I can tell, I should be able to substitute a parameter as part of a Direct Query in place of a hard-coded variable...ie
let
Source = Sql.Database("NAMEOFDB", "CMUtility", [Query="sp_get_residentsinfo "& home_name]),.....
instead of
let
Source = Sql.Database("NAMEOFDB", "CMUtility", [Query="sp_get_residentsinfo 'NAME OF HOME'"]),...
However, the parameter-included version just says
DataSource.Error: Microsoft SQL: Incorrect syntax near 'House'.
Details:
DataSourceKind=SQL
DataSourcePath=NAMEOFDB;CMUtility
Message=Incorrect syntax near 'House'.
Number=102
Class=15
"House" is the currently - assigned last word of the home_name variable. What have I done wrong?
PS - I have surmised that I shouldn't need the extra & at the end of the parameter, as I'm not adding anything else to the query, but even with both &s it still doesn't work.
The type of your parameters is text. In SQL, text literals must be quoted, i.e. sp_get_residentsinfo 'NAME OF HOME', but the statement build by you is sp_get_residentsinfo NAME OF HOME.
You should use Text.Replace to escape single quotes in the parameter's value and append a quote before and after it.
I am currently learning Python 2.7 and am really impressed by how much it can do.
Right now, I'm working my way through basics such as functions and loops. I'd reckon a more 'real-world' problem would spur me on even further.
I use a satellite recording device to capture TV shows etc to hard drive.
The naming convention is set by the device itself. It makes finding the shows you want to watch after the recording more difficult to find as the show name is preceded with lots of redundant info...
The recordings (in .mts format) are dumped into a folder called "HBPVR" at the root of the drive. I'd be running the script on my Mac when the drive is connected to it.
Example.
"Channel_4_+1-15062015-2100-Exams__Cheating_the_....mts"
or
"BBC_Two_HD-19052015-2320-Newsnight.mts"
I included the double-quotes.
I'd like a Python script that (ideally) would remove the broadcaster name, reformat the date info, strip the time info and then put the show's name to the front of the file name.
E.g "BBC_Two_HD-19052015-2320-Newsnight.mts" ->> "Newsnight 19 May 2015.mts"
What may complicate matters is that the broadcaster names are not all of equal length.
The main pattern is that broadcaster name runs up until the first hyphen.
I'd like to be able to re-run this script at later points for newer recordings and not have already renamed recordings renamed further.
Thanks.
Try this:
import calendar
input = "BBC_Two_HD-19052015-2320-Newsnight.mts"
# Remove broadcaster name
input = '-'.join(input.split("-")[1:])
# Get show name
show = ''.join(' '.join(input.split("-")[2:]).split(".mts")[:-1])
# Get time string
timestr = ''.join(input.split("-")[0])
day = int(''.join(timestr[0:2])) # The day is the first two digits
month = calendar.month_name[int(timestr[2:4])] # The month is the second two digits
year = timestr[4:8] # The year is the third through sixth digits
# And the new string:
new = show + " " + str(day) + " " + month + " " + year + ".mts"
print(new) # "Newsnight 19 May 2015.mts"
I wasn't quite sure what the '2320' was, so I chose to ignore it.
Thanks Coder256.
That has given me a bit more insight into how Python can actually help solve real world (first world!) problems like mine.
It tried it out with some different combos of broadcaster and show names and it worked.
I would like though to use the script to rename a batch of recordings/files inside the folder from time to time.
The script did throw and error when processing an already re-named recording, which is to be expected I guess. Should the renamed file have a special character at the start of its name to help avoid this happening?
e.g "_Newsnight 19 May 2015.mts"
Or is there a more aesthetically pleasing way of doing this, with special chars being added on etc.
Thanks.
One way to approach this, since you have a defined pattern is to use regular expressions:
>>> import datetime
>>> import re
>>> s = "BBC_Two_HD-19052015-2320-Newsnight.mts"
>>> ts, name = re.findall(r'.*?-(\d{8}-\d{4})-(.*?)\.mts', s)[0]
>>> '{} {}.mts'.format(name, datetime.datetime.strptime(ts, '%d%m%Y-%H%M').strftime('%d %b %Y'))
'Newsnight 19 May 2015.mts'
I'm trying to import some publicly available life outcomes data using the code below:
require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE)
Naturally, the imported data frame doesn't look good:
I would like to amend my column names using the code below:
# Clean column names
names(simd.sg.xls) <- make.names(names = as.character(simd.sg.xls[1,]),
unique = TRUE,allow_ = TRUE)
But it produces rather unpleasant results:
> names(simd.sg.xls)
[1] "X1" "X1.1" "X771" "X354" "X229" "X74" "X67" "X33" "X19" "X1.2"
[11] "X6" "X1.3" "X8" "X7" "X7.1" "X6506" "X21" "X1.4" "X6158" "X6506.1"
[21] "X6506.2" "X6506.3" "X6263" "X6506.4" "X6468" "X1010" "X815" "X99" "X58" "X65"
[31] "X60" "X6506.5" "X21.1" "X1.5" "X6173" "X5842" "X6506.6" "X6506.7" "X6263.1" "X6506.8"
[41] "X6481" "X883" "X728" "X112" "X69" "X56" "X54" "X6506.9" "X21.2" "X1.6"
[51] "X6143" "X5651" "X6506.10" "X6506.11" "X6263.2" "X6506.12" "X6480" "X777" "X647" "X434"
[61] "X518" "X246" "X436" "X6506.13" "X21.3" "X1.7" "X6136" "X5677" "X6506.14" "X6506.15"
[71] "X6263.3" "X6506.16" "X660" "X567" "X480" "X557" "X261" "X456"
My question is if there is a way to neatly force the values from the first row to the column names? As I'm doing a lot of data I'm looking for solution that would be easily reproducible, I can accommodate a lot of violation to the actual strings to get syntactically correct names but ideally I would avoid faffing around with elaborate regular expressions as I'm often reading files like the one linked here and don't wan to be forced to adjust the rules for each single import.
It looks like the problem is that the header is on the second line, not the first. You could include a skip=1 argument but a more general way of dealing with this using read.xls seems to be to use the pattern and header arguments which force the first line which matches the pattern string to be treated as the header. Your code becomes:
require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE,
pattern="DATAZONE", header=TRUE)
UPDATE
I don't get the warning messages you do when I execute the code. The messages refer to an issue with locale. The locale settings on my system are:
Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Yours are probably different. Locale data could be OS dependent. I'm using Windows 8.1. Also I'm using Strawberry Perl; you appear to be using something else. So some possible reasons for the discrepancy in warning messages but nothing more specific.
On the second question in your comment, to read the entire file, and convert a particular row ( in this case, row 2) to column names, you could use the following code:
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE,
header=FALSE, stringsAsFactors=FALSE)
names(simd.sg.xls) <- make.names(names = simd.sg.xls[2,],
unique = TRUE,allow_ = TRUE)
simd.sg.xls <- simd.sg.xls[-(1:2),]
All data will be of character type so you'll need to convert to factor and numeric as necessary.
Currently I am working very basic game using the C++ environment. The game used to be a school project but now that I am done with that programming class, I wanted to expand my skills and put some more flourish on this old assignment.
I have already made a lot of changes that I am pleased with. I have centralized all the data into folder hierarchies and I have gotten the code to read those locations.
However my problem stems from a very fundamental flaw that has been stumping me.
In order to access the image data that I am using I have used the code:
string imageLocation = "..\\DATA\\Images\\";
string bowImage = imageLocation + "bow.png";
The problem is that when the player picks up an item on the gameboard my code is supposed to use the code:
hud.addLine("You picked up a " + (*itt)->name() + "!");
to print to the command line, "You picked up a Bow!". But instead it shows "You picked up a ..\DATA\Images\!".
Before I centralized my data I used to use:
name_(item_name.substr(0, item_name.find('.')))
in my Item class constructor to chop the item name to just something like bow or candle. After I changed how my data was structured I realized that I would have to change how I chop the name down to the same simple 'bow' or 'candle'.
I have changed the above code to reflect my changes in data structure to be:
name_(item_name.substr(item_name.find("..\\DATA\\Images\\"), item_name.find(".png")))
but unfortunately as I alluded to earlier this change of code is not working as well as I planned it to be.
So now that I have given that real long winded introduction to what my problem is, here is my question.
How do you extract the middle of a string between two sections that you do not want? Also that middle part that is your target is of an unknown length.
Thank you so very much for any help you guys can give. If you need anymore information please ask; I will be more than happy to upload part or even my entire code for more help. Again thank you very much.
In all honeasty, you're probably approaching this from the wrong end.
Your item class should have a string "bow", in a private member. The function Item::GetFilePath would then (at runtime) do "..\DATA\Images\" + this->name + ".png".
The fundamental property of the "bow" item object isn't the filename bow.png, but the fact that it's a "bow". The filename is just a derived proerty.
Assuming I understand you correctly, the short version of your question is: how do I split a string containing a file path so I have removed the path and the extension, leaving just the "title"?
You need the find_last_of method. This gets rid of the path:
std::size_type lastSlash = filePath.find_last_of('\\');
if (lastSlash == std::string::npos)
fileName = filePath;
else
fileName = filePath.substr(lastSlash + 1);
Note that you might want to define a constant as \\ in case you need to change it for other platforms. Not all OS file systems use \\ to separate path segments.
Also note that you also need to use find_last_of for the extension dot as well, because filenames in general can contain dots, throughout their paths. Only the very last one indicates the start of the extension:
std::size_type lastDot = fileName.find_last_of('.');
if (lastDot == std::string::npos)
{
title = fileName;
}
else
{
title = fileName.substr(0, lastDot);
extension = fileName.substr(lastDot + 1);
}
See http://msdn.microsoft.com/en-us/library/3y5atza0(VS.80).aspx
using boost filesystem:
#include "boost/filesystem.hpp"
namespace fs = boost::filesystem;
void some_function(void)
{
string imageLocation = "..\\DATA\\Images\\";
string bowImage = imageLocation + "bow.png";
fs::path image_path( bowImage );
hud.addLine("You picked up a " + image_path.filename() + "!"); //prints: You picked up a bow!
So combining Paul's and my thoughts, try something like this (broken down for readability):
string extn = item_name.substr(item_name.find_last_of(".png"));
string path = item_name.substr(0, item_name.find("..\\DATA\\Images\\"));
name_ = item_name.substr( path.size(), item_name.size() - extn.size() );
You could simplify it a bit if you know that item name always starts with "..DATA" etc (you could store it in a constant and not need to search for it in the string)
Edit: Changed extension finding part to use find_last_of, as suggested by EarWicker, (this avoids the case where your path includes '.png' somewhere before the extension)
item_name.find("..\DATA\Images\") will return the index at which the substring "..\DATA\Images\" starts but it seems like you'd want the index where it ends, so you should add the length of "..\DATA\Images\" to the index returned by find.
Also, as hamishmcn pointed out, the second argument to substr should be the number of chars to return, which would be the index where ".png" starts minus the index where "..\DATA\Images\" ends, I think.
One thing that looks wrong is that the second parameter to substr should be the number of chars to copy, not the position.