How can I convert an Excel table into Trac Wiki Table format? - wiki

I've seen copy & paste converters to MediaWiki or HTML format. But I could not find one that converts to the Trac Wiki format (WikiFormattting) which uses pipes to separate cells, such as:
||Cell 1||Cell 2||Cell 3||
||Cell 4||Cell 5||Cell 6||

You could save your excel sheet as a CSV file. Then from a command prompt (assuming you are running Windows XP or newer) type this command:
for /f "tokens=1,2,3 delims=," %a in (mycsvfile.csv) do ((echo ^|^|%a^|^|%b^|^|%c^|^|) >> mywikifile.txt)
The number of tokens depends on how many columns you have. You could do up to 26 columns this way in a single pass by increasing the number of tokens and adding the corresponding number of variable names %d, %e, etc.

I made a jsFiddle to do just this, just put your CSV test in the HTML box and run the script. The content that you would paste into a TracWiki page would be in the Result box then.
In case something happens to the jsFiddle, here is the JavaScript I used (I probably didn't need to use jQuery, but it was faster for me then to have to think of the non-jQuery way to do it:
var csv = $('body').html().trim();
csv = csv.replace(/,/g, "||");
csv = csv.replace(/$/gm, "||<br />");
csv = csv.replace(/^/gm, "||");
// set to false if you don't want empty cells
if (true) {
while (csv.indexOf("||||") > -1) {
csv = csv.replace(/\|\|\|\|/g, "|| ||");
}
}
$('body').html(csv);

My port of Shan Carter's Mr. Data Converter now supports Wiki in the format you specified. You can copy & paste directly from Excel or from a CSV file.
http://thdoan.github.io/mr-data-converter/
UPDATE: I've added Trac-specific formatting under the "Trac" option.

Related

Formatting thousand separator for numbers in a pandas dataframe

I am trying to write a dataframe to a csv and I would like the .csv to be formatted with commas. I don't see any way on the to_csv docs to use a format or anything like this.
Does anyone know a good way to be able to format my output?
My csv output looks like this:
12172083.89 1341.4078 -9568703.592 10323.7222
21661725.86 -1770.2725 12669066.38 14669.7118
I would like it to look like this:
12,172,083.89 1,341.4078 -9,568,703.592 10,323.7222
21,661,725.86 -1,770.2725 12,669,066.38 14,669.7118
Comma is the default separator. If you want to choose your own separator you can do this by declaring the sep parameter of pandas to_csv() method.
df.to_csv(sep=',')
If you goal is to create thousand separators and export them back into a csv you can follow this example:
import pandas as pd
df = pd.DataFrame([[12172083.89, 1341.4078, -9568703.592, 10323.7222],
[21661725.86, -1770.2725, 12669066.38, 14669.7118]],columns=['A','B','C','D'])
for c in df.columns:
df[c] = df[c].apply(lambda x : '{0:,}'.format(x))
df.to_csv(sep='\t')
If you just want pandas to show separators when printed out:
pd.options.display.float_format = '{:,}'.format
print(df)
What you're looking to do has nothing to do with csv output but rather is related to the following:
print('{0:,}'.format(123456789000000.546776362))
produces
123,456,789,000,000.546776362
See format string syntax.
Also, you'd do well to pay heed to #Peter 's comment above about compromising the structure of a csv in the first place.

Why must I run this code a few times before my entire .csv file is converted into a .yaml file?

I am trying to build a tool that can convert .csv files into .yaml files for further use. I found a handy bit of code that does the job nicely from the link below:
Convert CSV to YAML, with Unicode?
which states that the line will take the dict created by opening a .csv file and dump it to a .yaml file:
out_file.write(ry.safe_dump(dict_example,allow_unicode=True))
However, one small kink I have noticed is that when it is run once, the generated .yaml file is typically incomplete by a line or two. In order to have the .csv file exhaustively read through to create a complete .yaml file, the code must be run two or even three times. Does anybody know why this could be?
UPDATE
Per request, here is the code I use to parse my .csv file, which is two columns long (with a string in the first column and a list of two strings in the second column), and will typically be 50 rows long (or maybe more). Also note that it designed to remove any '\n' or spaces that could potentially cause problems later on in the code.
csv_contents={}
with open("example1.csv", "rU") as csvfile:
green= csv.reader(csvfile, dialect= 'excel')
for line in green:
candidate_number= line[0]
first_sequence= line[1].replace(' ','').replace('\r','').replace('\n','')
second_sequence= line[2].replace(' ','').replace('\r','').replace('\n','')
csv_contents[candidate_number]= [first_sequence, second_sequence]
csv_contents.pop('Header name', None)
Ultimately, it is not that important that I maintain the order of the rows from the original dict, just that all the information within the rows is properly structured.
I am not sure what would cause could be but you might be running out of memory as you create the YAML document in memory first and then write it out. It is much better to directly stream it out.
You should also note that the code in the question you link to, doesn't preserve the order of the original columns, something easily circumvented by using round_trip_dump instead of safe_dump.
You probably want to make a top-level sequence (list) as in the desired output of the linked question, with each element being a mapping (dict).
The following parses the CSV, taking the first line as keys for mappings created for each following line:
import sys
import csv
import ruamel.yaml as ry
import dateutil.parser # pip install python-dateutil
def process_line(line):
"""convert lines, trying, int, float, date"""
ret_val = []
for elem in line:
try:
res = int(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = float(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = dateutil.parser.parse(elem)
ret_val.append(res)
continue
except ValueError:
pass
ret_val.append(elem.strip())
return ret_val
csv_file_name = 'xyz.csv'
data = []
header = None
with open(csv_file_name) as inf:
for line in csv.reader(inf):
d = process_line(line)
if header is None:
header = d
continue
data.append(ry.comments.CommentedMap(zip(header, d)))
ry.round_trip_dump(data, sys.stdout, allow_unicode=True)
with input xyz.csv:
id, title_english, title_russian
1, A Title in English, Название на русском
2, Another Title, Другой Название
this generates:
- id: 1
title_english: A Title in English
title_russian: Название на русском
- id: 2
title_english: Another Title
title_russian: Другой Название
The process_line is just some sugar that tries to convert strings in the CSV file to more useful types and strings without leading spaces (resulting in far less quotes in your output YAML file).
I have tested the above on files with 1000 rows, without any problems (I won't post the output though).
The above was done using Python 3 as well as Python 2.7, starting with a UTF-8 encoded file xyz.csv. If you are using Python 2, you can try unicodecsv if you need to handle Unicode input and things don't work out as well as they did for me.

python convert a string to an xls document for processing

new_list = eval(my_list[0]) # my_list contains dictionaries
def get_next_file():
for key, value in new_list.iteritems():
yield value
file = get_next_file
book = xlrd.open_workbook(file_contents=file)
for sheet in book.sheet_names():
print sheet
I am trying to take a string from a dict and turn it into an xls file so it can be processed. It was an xls file that I used str(list(xls_file)) so that it could be saved in my database.
Any thoughts?
the saved string prints out as hex with some words in it.
The library xlrd you are using is only to read informations from an excel file.
If you want to write a new excel file you need to use xlwt.
If you want to change some cells in an excel file you should use xlutils.
Homepage of these three libraries: http://www.python-excel.org/

Azure Data Warehouse PolyBase File format

We have a file that looks like this:
Col1,Col2,Col3,Col4,Col5
"Hello,",I,",am",some,data!
It therefore has the following 'properties':
Comma-separated
Double-quote column delimiter
Commas in some of the columns
Now, I am not sure if it's actually possible to ingest this with PolyBase, but wondered if there was a way?
The error we are seeing at present is "Could not find a delimiter after quote".. which i guess is because after the double quote it is hitting what is an expected delimiter..
Here is our current file format, for completeness:
CREATE EXTERNAL FILE FORMAT Comma
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
)
)
Specify it in hex instead.
STRING_DELIMITER = '0x22'
(Based on the problem that someone described at the end of https://msdn.microsoft.com/en-au/library/dn935026.aspx )
Sorted this out in the end by adding an intermediary step to convert the file from csv to ORC format..
It's a bit clunky (as it leaves a mess of a copy behind), but the PolyBase then does work with the fileformat:
CREATE EXTERNAL FILE FORMAT Orc
WITH (FORMAT_TYPE = ORC)
works for now, until it is addressed by the product team: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-field-row-terminators-within-strin

How to Alter returned data format called from csv using python

I am looking for guidance regarding a return result FORMAT from a csv file. The code I have to date partially ahcieves my objective but despite significant effort researching through this and many other sites/forums I cannot resolve the final step. I have also posed this question on gis.stackexchange but was redirected to this forum with the comment "Questions relating to general Information Technology, with no clear GIS component, are off-topic here, but can be researched/asked at Stack Overflow".
My successful piece of python code that reads selected data from a csv and returns it in dict format is below ; (Yes I know the reason it returns as type dict is due to the format my code is calling!!! and that is the crux of the problem)
import arcpy, csv
Att_Dict ={}
with open ("C:/Data/Code/Python/Library/Peter/123.csv") as f:
reader = csv.DictReader(f)
for row in reader:
if row['Status']=='Keep':
Att_Dict.update({row['book_id']:row['book_ref']})
print Att_Dict
Att_Dict = {'7643': '7625', '9644': '2289', '4406': '4443', '7588': '9681', '2252': '7947'}
For the next part of my code to run I need the result above but in the format of ; (this is part of a very lengthy code but the only show stopper is the returned format so little value in posting the other 200 or so lines)
Att_Dict = [[7643, 7625], [9644, 2289], [4406, 4443], [7588, 9681], [2252, 7947]]
Although I have experimented endlessly and can achieve this by reverting to csv.Reader rather than csv.DictReader, I then lose the ability to 'weed out' rows where column 'Status' has value 'Keep' in them and that is a requirement for the task at hand.
My sledgehammer approach to date has been to use 'search and replace' within Idle to amend the returned set to the meet the other requirement but Im sure it can be done programatically rather than manually. Similar but not exact to https://docs.python.org/2/library/index.html, plus my startout question at Returning values from multiple CSV columns to Python dictionary? and Using Python's csv.dictreader to search for specific key to then print its value plus a multitude of csv based questions at geonet.esri.
(Using Win 7, ArcGIS 10.2, Python 2.7.5)
Try this
Att_Dict = {'7643': '7625', '9644': '2289', '4406': '4443', '7588': '9681', '2252': '7947'}
Att_List = []
for key, value in Att_Dict.items():
Att_List.append([int(key), int(value)])
print Att_List
Out: [[7643, 7625], [9644, 2289], [4406, 4443], [7588, 9681], [2252, 7947]]