I am running this statement in a Django app:
c = connections['default'].cursor()
query="copy (select * from analysis.\"{0}\") to STDOUT DELIMITER ',' CSV HEADER;".format(view_name)
with open(csvFile,'w') as f:
c.copy_expert(query,f)
f.close()
It does not create the correct csv file. Some of the values appear to be in the wrong columns. I am trying to test the SQL statement by running it in POSTGRESQL:
copy (select * from analysis."S03_2005_activity_140807_153431_with_geom") to 'C:/djangoProjects/web_output/csvfiles/S03_2005_activity_140807_153431_with_geom.csv' DELIMITER ',' CSV HEADER;
It gives me: "ERROR: relative path not allowed for COPY to file". I have looked into the issue and it appears to typically be one of two issues: 1. confusing '\' and '/'. My slashes should be correct. 2. The server being on a different computer. I thought this may be my issue as the database is located on an external computer, but I have the connection in my Postgresql. It also runs from Django so I'm not sure why it isn't working from PG Admin.
If you want to store data / get data from your local machine and communicate with a Postgres server on a different, remote machine, you cannot simply use COPY.
Try the meta-command \copy in psql. It's a wrapper for the SQL COPY command and uses local files.
Your filename should work as is on a Windows machine, but Postgres interprets it as a local filename on the server, which is probably a Unix derivate. And there the filename would have to start with '/'.
Related
I would like to make an accent-insensitive lookup in a french database (with words containing accents):
>>> User.objects.filter(first_name="Jeremy")
['<User: Jéremy>', '<User: Jérémy>', '<User: Jeremy>']
After lots of research, I found that Django has an Unaccent lookup for PostgreSQL but nothing for other databases (like MariaDB)
Is there a way to make this happen without changing the database to PostgreSQL?
Finally got it solved:
So first and foremost: get acquainted with SQL Collations! (Thanks to #MartinBurch)
Collations set the searching rules so that SQL can match your given keywords.
So after lots of head-smashes, here is how I solved it:
1- head to your "my.cnf" file
(if you know where is your my.cnf, you can skip to step 5)
2- Open your terminal and type:
$ which mysqld
That should return "/usr/sbin/mysqld" or another path depending on your configuration.
3- Use the path returned in step 2 as follows:
$ /your/path/to/mysqld --verbose --help | grep -A 1 "Default options"
That should return something that finishes like:
Default options are read from the following files in the given order:
/etc/mysql/my.cnf ~/.my.cnf /usr/etc/my.cnf
4- Find the file that has a [mysqld] section. if none of them has one, just add it.
Note: writing the section's name is very important SQL won't start otherwise
5- Add the following lines at the end of the file and Save:
[mysqld]
character_set_server = latin1
collation-server = latin1_general_ci
6- Restart your MariaDB server (if you have a brew version):
brew services stop mariadb
brew services start mariadb
7- head to your Django project and use '__icontains' to make case insensitive and accent insensitive queries:
User.objects.filter(name__icontains=jeremy)
>>> ['<User: Jéremy>', '<User: Jérémy>', '<User: Jeremy>']
I have apache NIFI job where I get file from system using getFile then I use putHDFS, how can I rename the file in HDFS after putting the file in hadoop ?
I tried to use executeScript processor but can't get it to work
flowFile = session.get()
if flowFile != None:
tempFileName= flowFile.getAttribute("filename")
fileName=tempFileName.replace('._COPYING_','')
flowFile = session.putAttribute(flowFile, 'filename', fileName)
session.transfer(flowFile, REL_SUCCESS)
The answer above by Shu is correct for how to manipulate the filename attribute in NiFi, but if you have already written a file to HDFS and then use UpdateAttribute, it is not going to change the name of the file in HDFS, it will only change the value of the filename attribute in NiFi.
You could use the UpdateAttribute approach to create a new attribute called "final.filename" and then use MoveHDFS to move the original file to the final file.
Also of note, the PutHDFS processor already writes a temp file and moves it to the final file so I'm not sure if it is necessary for you to name ".COPYING". For example if you send a flow file to PutHDFS with filename of "foo" it will first write ".foo" to the directory and when done it will move it to "foo".
The only case where you need to use MoveHDFS is if some other process is monitoring the directory and can't ignore the dot files, then you write it somewhere else and use MoveHDFS once it is complete.
Instead of using ExecuteScript processor(extra overhead) use UpdateAttribute processor Feed the Success relationship from PutHDFS
Add new property in UpdateAttribute processor as
filename
${filename:replaceAll('<regex_expression>','<replacement_value>')}
Use replaceAll function from NiFi Expression Language.
(or)
Using replace Function
filename
${filename:replaceAll('<search_string>','<replacement_value>')}
NiFi expression language offers different functions to manipulate strings refer to this link for more documentation related to expression language.
i have tried same exact script that in Question with ExecuteScript processor with Script Engine as Python and everything works as expected.
As you are using .replace function and replacing with ''
Output:
As the filename fn._COPYING_ got changed to fn.
I'm using Knime 3.1.2 on OSX and Linux for OPENMS analysis (Mass Spectrometry).
Currently, it uses static filename.mzML files manually put in a directory. It usually has more than one file pressed in at a time ('Input FileS' module not 'Input File' module) using a ZipLoopStart.
I want these files to be downloaded dynamically and then pressed into the workflow...but I'm not sure the best way to do that.
Currently, I have a Python script that downloads .gz files (from AWS S3) and then unzips them. I already have variations that can unzip the files into memory using StringIO (and maybe pass them into the workflow from there as data??).
It can also download them to a directory...which maybe can them be used as the source? But I don't know how to tell the ZipLoop to wait and check the directory after the python script is run.
I also could have the python script run as a separate entity (outside of knime) and then, once the directory is populated, call knime...HOWEVER there will always be a different number of files (maybe 1, maybe three)...and I don't know how to make the 'Input Files' knime node to handle an unknown number of input files.
I hope this makes sense.
Thanks!
Thanks to Gábor for getting me on the right track. Although I ended up doing a slightly different route after much experimentation.
===
Being new to Knime, I don't know if this is an efficient use of Knime, or a complete Kluge...but it does work.
So, part of the problem is some of the Knime specific objects - One of which is called URIDataValue.
A Python Pandas dataframe is, apparently, interchangable with the Knime tables. However, I don't know if there's a way to import one of these URIDataValue objects into Python. So here's what I did...
1. I wrote a Python script that creates a Pandas Dataframe, and populates it with one Column. Everything is a string, including the column header:
from pandas import DataFrame
# Create empty table
T = DataFrame(
[
['file:///Users/.../copy/lfq_spikein_dilution_1.mzML'],
['file:///Users/.../copy/lfq_spikein_dilution_2.mzML'],
],
)
T.columns = ['URIDataValue']
#print T
output_table = T
That creates this dataframe:
Note: The column name and values are just strings. But it is (apparently) important that the column header be 'URIDataValue'...even though HERE it's just text. If the column name is not 'URIDataValue' the next node doesn't know what to do.
NEXT, the 'output_table' from the 'Python Source' node is patched to a 'String to URI' node, which (apparently and magically) knows to change the entire columns string values to URIDataValues (presumably based on the name of the first column...don't know that for sure).
Finally, the NEW table, with the correct data objects goes to a 'URI to PORT' node...since apparently 'Port' objects and a 'URI' object are different.
This, then, matches the needed input to the ZipLoop...which is normally the out put from a static (hard coded) 'Input Files' node.
Now, to actually solve the question above, I just have to add the code to my 'Python Source' to download and unzip the S3 files, then annotate the dataframe with their locations, and go.
I have no idea what I'm doing, but it worked.
There are multiple options to let things work:
Convert the files in-memory to a Binary Object cells using Python, later you can use that in KNIME. (This one, I am not sure is supported, but as I remember it was demoed in one of the last KNIME gatherings.)
Save the files to a temporary folder (Create Temp Dir) using Python and connect the Pyhon node using a flow variable connection to a file reader node in KNIME (which should work in a loop: List Files, check the Iterate List of Files metanode).
Maybe there is already S3 Remote File Handling support in KNIME, so you can do the downloading, unzipping within KNIME. (Not that I know of, but it would be nice.)
I would go with option 2, but I am not so familiar with Python, so for you, probably option 1 is the best. (In case option 3 is supported, that is the best in my opinion.)
I have a dataset containing thousands of tweets. Some of those contain urls but most of them are in the classical shortened forms used in Twitter. I need something that gets the full urls so that I can check the presence of some particular websites. I have solved the problem in Python like this:
import urllib2
url_filename='C:\Users\Monica\Documents\Pythonfiles\urlstrial.txt'
url_filename2='C:\Users\Monica\Documents\Pythonfiles\output_file.txt'
url_file= open(url_filename, 'r')
out = open(url_filename2, 'w')
for line in url_file:
tco_url = line.strip('\n')
req = urllib2.urlopen(tco_url)
print >>out, req.url
url_file.close()
out.close()
Which works but requires that I export my urls from Stata to a .txt file and then reimport the full urls. Is there some version of my Python script that would allow me to integrate the task in Stata using the shell command? I have quite a lot of different .dta files and I would ideally like to avoid appending them all just to execute this task.
Thanks in advance for any answer!
Sure, this is possible without leaving Stata. I am using a Mac running OS X. The details might differ on your operating system, which I am guessing is Windows.
Python and Stata Method
Say we have the following trivial Python program, called hello.py:
#!/usr/bin/env python
import csv
data = [['name', 'message'], ['Monica', 'Hello World!']]
with open('data.csv', 'w') as wsock:
wtr = csv.writer(wsock)
for i in data:
wtr.writerow(i)
wsock.close()
This "program" just writes some fake data to a file called data.csv in the script's working directory. Now make sure the script is executable: chmod 755 hello.py.
From within Stata, you can do the following:
! ./hello.py
* The above line called the Python program, which created a data.csv file.
insheet using data.csv, comma clear names case
list
+-----------------------+
| name message |
|-----------------------|
1. | Monica Hello World! |
+-----------------------+
This is a simple example. The full process for your case will be:
Write file to disk with the URLs, using outsheet or some other command
Use ! to call the Python script
Read the output into Stata using insheet or infile or some other command
Cleanup by deleting files with capture erase my_file_on_disk.csv
Let me know if that is not clear. It works fine on *nix; as I said, Windows might be a little different. If I had a Windows box I would test it.
Pure Stata Solution (kind of a hack)
Also, I think what you want to accomplish can be done completely in Stata, but it's a hack. Here are two programs. The first simply opens a log file and makes a request for the url (which is the first argument). The second reads that log file and uses regular expressions to find the url that Stata was redirected to.
capture program drop geturl
program define geturl
* pass short url as first argument (e.g. http://bit.ly/162VWRZ)
capture erase temp_log.txt
log using temp_log.txt
copy `1' temp_web_file
end
The above program will not finish because the copy command will fail (intentionally). It also doesn't clean up after itself (intentionally). So I created the next program to read what happened (and get the URL redirect).
capture program drop longurl
program define longurl, rclass
* find the url in the log file created by geturl
capture log close
loc long_url = ""
file open urlfile using temp_log.txt , read
file read urlfile line
while r(eof) == 0 {
if regexm("`line'", "server says file permanently redirected to (.+)") == 1 {
loc long_url = regexs(1)
}
file read urlfile line
}
file close urlfile
return local url "`long_url'"
end
You can use it like this:
geturl http://bit.ly/162VWRZ
longurl
di "The long url is: `r(url)'"
* The long url is: http://www.ciwati.it/2013/06/10/wdays/?utm_source=twitterfeed&
* > utm_medium=twitter
You should run them one after the other. Things might get ugly using this solution, but it does find the URL you are looking for. May I suggest that another approach is to contact the shortening service and ask nicely for some data?
If someone at Stata is reading this, it would be nice to have copy return HTTP response header information. Doing this entirely in Stata is a little out there. Personally I would use entirely Python for this sort of thing and use Stata for the analysis of data once I had everything I needed.
I want to generate a complete SQL file with Django that can be downloaded and executed to create a SQLite DB.
The problem is escaping the strings to insert them into the file to download. This is what I got so far:
name = MySQLdb.escape_string(self.name.encode("utf-8")).decode("utf-8")
return "INSERT INTO names VALUES(%d,'%s');" % (self.id, name)
But unfortunately MySQL escapes single quotes with a backslash \ which SQlite does not like. I would prefer to use something contained in Django to replace MySQLdb.escape_string(). I'm not sure if there are any other issues of incompatibility between MySQL-escaping and SQLite therefore it would be good to avoid MySQLdb.escape_string() completely.
As a last recourse I would do this before returning:
name = name.replace("\\\'","\'\'")
Any thoughts?
SQLite has the quote function for this.
It's also used by the sqlite3 command-line tool when you .dump an entire database.