How to download a file, in Unix, whose index changes constantly?

How to download a file, in Unix, whose index changes constantly? - regex

I use wget as part of script to fetch a URL eg. "www.myremoteurl.com/subpage/index-X.run", but X is a integer that keeps incrementing/changing breaking the script as the resource will not have existed.

ok, then, it's easy!, as you imaged
#!/bin/sh
for i in {1..10}
do
wget www.someurl/someindex-$i.run &
done
this would loop from 1 to 10 and ignore any failure, should be handy for some normal cases

Related

Move only certain files to GCP and keep subfolder

I want to move all the files with extension "gz", with his folder/subfolders of the dir "C:\GCPUpload\Additional" to a folder in the bucket "gs://BucketName/Additional/".
I need to keep the folder structure, in a way like this:
C:\GCPUpload\Additional\Example1.gz --> gs://BucketName/Additional/Example1.gz
C:\GCPUpload\Additional\Example2.gz --> gs://BucketName/Additional/Example2.gz
C:\GCPUpload\Additional\ExampleNot.txt --> (Ignore this file)
C:\GCPUpload\Additional\Subfolder2\Example3.gz --> gs://BucketName/Additional/Subfolder2/Example3.gz
C:\GCPUpload\Additional\Subfolder2\Example4.gz --> gs://BucketName/Additional/Subfolder2/Example4.gz
This is the command that I am using so far:
call gsutil mv -r -c "C:\GCPUpload\Additional\**\*.gz" "gs://BucketName/Additional/"
The trouble that I'm having is that all the files are being move to the root of the bucket (i.e gs://BucketName/Additional/) , and ignoring its original folder/subfolder
How should I write this? I've tried and googled, but can't find a way where this is working.
Thanks!!

The behavior you're seeing was implemented by gsutil to match the corresponding (older) behavior when you use a recursive wildcard (**) in the shell.
To do what you want you'll need to list all of the objects you want moved and create a shell script that individually runs gsutil mv commands that move them to the directories you want. You could probably use local editing tools to make that somewhat easier (like awk or sed).

Redshift COPY command raises error if S3 prefix does not exist

When I run this COPY command:
COPY to_my_table (field1, field2, etc)
FROM s3://my-service-f55b83j5vvkp/2018/09/03
CREDENTIALS 'aws_iam_role=...'
JSON 'auto' TIMEFORMAT 'auto';
I get this error:
The specified S3 prefix '2018/09/03' does not exist
Which makes sense, because my S3 bucket does not have any file in that specific prefix. However, this is part of a daily job to load data, where sometimes there's something to load, but some other times there's nothing to load.
I checked the COPY documentation and it doesn't seem to be any way to avoid the error and just don't do anything if there are no objects under that prefix. Maybe I am missing something?

I would like to suggest here, how we have solved this problem in our case, though its simple solution but may be helpfull to others. Jon Scot has suggested good option in comment that I liked. But, unfortuanetely in our case, we coundn't do it as system adding files to S3 was not in our controll. So not sure it its your case too.
I think you could solve your problem multiple ways, but here are two options that I suggest.
1) As you may be running cron job to load data to Redshift, put a file existence check before executing the Copy command, like below.
path=s3://my-service-f55b83j5vvkp/2018/09/03
count=\`s3cmd ls $path | wc -l\`
if [[ $count -eq 1 ]]; then
//Your Redshift copy code goes here.
else
echo "Nothing to load"
fi
Advantage of this options is your saving some cost though may be completely negligible.
2) dummy file without records, that will eventually load no data to Redshift.

git log suppress refs matching specified pattern

I regularly use the following git-log command:
git log --oneline --graph --decorate --all
The command is perfect for me, with one exception. I maintain a set of refs in refs/arch/ that I want to keep around ("arch" stands for "archive"), but I do not want to see them every time I look at my git log. I don't mind them showing up if they are an ancestor of an existing branch or tag, but I really do not want to see series of commits that would not otherwise show up in the git log but for the fact that they are in the commit history of a given refs/arch/* ref.
For example, in the image below, the left-hand side is an illustration of what I see currently when I run git log --oneline --graph --decorate --all. As you can see, the commit referred to by refs/arch/2 would not show up in the log if that ref didn't exist. (Assume there are no refs that are not shown in the left-hand side image.) Now, the right-hand side is an illustration of two alternative log graphs, either of which would be perfectly fine. I don't mind seeing anything matching refs/arch/* so long as it is in the commit history of a branch or tag. But, in the image below, I definitely do not want to see the commit referred to by refs/arch/2.
How can my git-log command be modified to suppress refs/arch/* in either of the senses depicted in the illustration?

What you want is:
git log --oneline --graph --decorate --exclude 'refs/arch/*' --all
The --exclude option is new in git 1.9.0.
From the git-log manual page:
--exclude=<glob-pattern>
Do not include refs matching <glob-pattern> that the next --all, --branches, --tags, --remotes, or --glob would otherwise consider. Repetitions of this option accumulate exclusion patterns up to the next --all, --branches, --tags, --remotes, or --glob option (other options or arguments do not clear accumlated patterns).
The patterns given should not begin with refs/heads, refs/tags, or refs/remotes when applied to --branches, --tags, or --remotes, respectively, and they must begin with refs/ when applied to --glob or --all. If a trailing /* is intended, it must be given explicitly.
If you are on some flavor of Ubuntu you can upgrade git from the Ubuntu Git Maintainers team ppa.
sudo add-apt-repository ppa:git-core/ppa
sudo apt-get update
sudo apt-get upgrade

Python program to extend short urls that integrates with Stata

I have a dataset containing thousands of tweets. Some of those contain urls but most of them are in the classical shortened forms used in Twitter. I need something that gets the full urls so that I can check the presence of some particular websites. I have solved the problem in Python like this:
import urllib2
url_filename='C:\Users\Monica\Documents\Pythonfiles\urlstrial.txt'
url_filename2='C:\Users\Monica\Documents\Pythonfiles\output_file.txt'
url_file= open(url_filename, 'r')
out = open(url_filename2, 'w')
for line in url_file:
tco_url = line.strip('\n')
req = urllib2.urlopen(tco_url)
print >>out, req.url
url_file.close()
out.close()
Which works but requires that I export my urls from Stata to a .txt file and then reimport the full urls. Is there some version of my Python script that would allow me to integrate the task in Stata using the shell command? I have quite a lot of different .dta files and I would ideally like to avoid appending them all just to execute this task.
Thanks in advance for any answer!

Sure, this is possible without leaving Stata. I am using a Mac running OS X. The details might differ on your operating system, which I am guessing is Windows.
Python and Stata Method
Say we have the following trivial Python program, called hello.py:
#!/usr/bin/env python
import csv
data = [['name', 'message'], ['Monica', 'Hello World!']]
with open('data.csv', 'w') as wsock:
wtr = csv.writer(wsock)
for i in data:
wtr.writerow(i)
wsock.close()
This "program" just writes some fake data to a file called data.csv in the script's working directory. Now make sure the script is executable: chmod 755 hello.py.
From within Stata, you can do the following:
! ./hello.py
* The above line called the Python program, which created a data.csv file.
insheet using data.csv, comma clear names case
list
+-----------------------+
| name message |
|-----------------------|
1. | Monica Hello World! |
+-----------------------+
This is a simple example. The full process for your case will be:
Write file to disk with the URLs, using outsheet or some other command
Use ! to call the Python script
Read the output into Stata using insheet or infile or some other command
Cleanup by deleting files with capture erase my_file_on_disk.csv
Let me know if that is not clear. It works fine on *nix; as I said, Windows might be a little different. If I had a Windows box I would test it.
Pure Stata Solution (kind of a hack)
Also, I think what you want to accomplish can be done completely in Stata, but it's a hack. Here are two programs. The first simply opens a log file and makes a request for the url (which is the first argument). The second reads that log file and uses regular expressions to find the url that Stata was redirected to.
capture program drop geturl
program define geturl
* pass short url as first argument (e.g. http://bit.ly/162VWRZ)
capture erase temp_log.txt
log using temp_log.txt
copy `1' temp_web_file
end
The above program will not finish because the copy command will fail (intentionally). It also doesn't clean up after itself (intentionally). So I created the next program to read what happened (and get the URL redirect).
capture program drop longurl
program define longurl, rclass
* find the url in the log file created by geturl
capture log close
loc long_url = ""
file open urlfile using temp_log.txt , read
file read urlfile line
while r(eof) == 0 {
if regexm("`line'", "server says file permanently redirected to (.+)") == 1 {
loc long_url = regexs(1)
}
file read urlfile line
}
file close urlfile
return local url "`long_url'"
end
You can use it like this:
geturl http://bit.ly/162VWRZ
longurl
di "The long url is: `r(url)'"
* The long url is: http://www.ciwati.it/2013/06/10/wdays/?utm_source=twitterfeed&
* > utm_medium=twitter
You should run them one after the other. Things might get ugly using this solution, but it does find the URL you are looking for. May I suggest that another approach is to contact the shortening service and ask nicely for some data?
If someone at Stata is reading this, it would be nice to have copy return HTTP response header information. Doing this entirely in Stata is a little out there. Personally I would use entirely Python for this sort of thing and use Stata for the analysis of data once I had everything I needed.

I do not want to see empty buffer after makeprg in VIM

Using VIM I want to execute current sql file and see results. I've tried the following (./manage.py dbshell is a Django wrapper over psql)
nmap <silent> <Leader>r :make<CR>
autocmd FileType sql set makeprg=cat\ %\\\|./manage.py\ dbshell
It works fine. But after Press ENTER or type command to continue VIM always shows me empty buffer (maybe it is error list). How to skip its opening?
If I run the same in command mode it will be as I've expected (without annoying buffers)
:!cat %|./manage.py dbshell
My SQL script contains a single select statement. And the magic buffer looks like:

It is likely to be wrong 'errorformat' option. Try doing
:make!
(with bang!) and see whether this window appears. If it does not, this is true and you should read :h 'errorformat' and also set it in addition to 'make'. Or just never use plain :make without bang and forget about jumping to errors (if that script is able to output information about errors).
Another idea: Could you show the output of
:au ShellCmdPost,QuickFixCmdPre,QuickFixCmdPost
? It may also be a problem of some plugin or vimrc code that is launched on one of these three events.
By the way, you have two things in commands you posted that may be improved. First, mapping should be written as nnoremap. You don’t need remapping here and it may save your time when you add some other mapping to your vimrc.
Second, use setlocal in autocmd, not set. With set you set default 'makeprg' for all buffers that will be opened after sql one.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to download a file, in Unix, whose index changes constantly? - regex

I use wget as part of script to fetch a URL eg. "www.myremoteurl.com/subpage/index-X.run", but X is a integer that keeps incrementing/changing breaking the script as the resource will not have existed.

ok, then, it's easy!, as you imaged #!/bin/sh for i in {1..10} do wget www.someurl/someindex-$i.run & done this would loop from 1 to 10 and ignore any failure, should be handy for some normal cases

Related

Move only certain files to GCP and keep subfolder

Redshift COPY command raises error if S3 prefix does not exist

git log suppress refs matching specified pattern

Python program to extend short urls that integrates with Stata

I do not want to see empty buffer after makeprg in VIM

Categories

Resources