I am trying to create a list of the 99th and 1st percentiles. Rather than a single percentile for today. I wanted percentiles for 500 days each using the prior 500 days. The functions I was using for this are the following
swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
percentile:{[x;y] y (100 xrank y:asc y) bin x}
swin[percentile[99;];500;List].
The issue I come across is that the 99th percentile calculates perfectly, but the 1st percentile makes the entire list = 0. a bit lost as to why it would do that. suggestions appreciated!
What's causing the zeros is two-fold:
What behaviour do you want for the earliest 500 days when there isn't 500 days of history to work with? On day 1 there's only 1 datapoint, on day 2 only 2 etc. Only on the 500th day is there 500 days of actual data to work with. By default that swin function fills the gaps with some seed value
You're using zero as that seed value, aka w#0
For example a 5 day lookback on each date looks something like:
q)swin[::;5;1 2 3 4 5]
0 0 0 0 1
0 0 0 1 2
0 0 1 2 3
0 1 2 3 4
1 2 3 4 5
You have zeros until you have data, so naturally the 1st percentile will pick up the zeros for the first roughly 500 dates.
So then you can decide to seed with a different value, or else possibly exclude zeros from your percentile function:
q)List:1000?1000
q)percentile:{[x;y] y (100 xrank y:asc y except 0) bin x}
q)swin[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90...
If zeros are a legitimate value in your list and can't be excluded then maybe seed the swin with some other value that you know won't be in the list (negatives? infinity? null?) and then exclude that seed from the percentile function.
EDIT: A final alternative is to use a different sliding window function which doesn't fill gaps with a seed value, e.g.
q)swin2:{[f;w;s] f each(),/:{neg[x]sublist y,z}[w]\[s]}
q)swin2[::;5;1 2 3 4 5]
,1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
q)percentile:{[x;y] y (100 xrank y:asc y) bin x}
q)swin2[percentile[99;];500;List]
908 908 908 908 908 908 908 908 908 908 908 959 959..
q)swin2[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90 90..
If one can understand how to store stuff in memory, know where it is and how to get it out again in an orderly way it will go a long way to achieving results in Perl.(And probably all programming languages)
I am not a programmer.
I am trying to extract data from 'older program' output and import it into a SQL database. The extraction is the thing.
My previous question was largely incorrect, as I found out when importing data into the table, as I did not have enough data from the 'old program' output file.
I would like to learn from my mistake and re-ask my previous question, hopefully correctly this time.
I have included my poor effort at extracting the data, exactly as it was last time. It doesn't come anywhere near getting the correct data out.
I believe this is quite a complex question but maybe it isn't.
It is certainly above my level of Perl at present, and maybe ever.
Answers to my incorrectly phrased question have been partially understood. Thank you very much for them.
If I could summarize it, my main problem with this task is dealing with the type of question: 'If a line contains ... get data from two lines up and insert it at the beginning of .... Seemingly impossible for me.
I tried regex over the end of line searches but was unable to get that to work.
I was unable also to arrange successive loops to insert data in lines as I wanted. If one loop worked, the next did not and so on. I was prepared to work on successive files in a step by step process but the 'two lines up' question stumped me completely.
I was able to extract other data from these output files relatively easily as they are very orderly files, but this particular question has me stumped.
My revised question is:
My input file consists of batches of data(+-50 - 70 lines long) in the following format:
1(P1) 3 P.ell 05/0120 W/P068819 0 12.0 98/99 380 380 C03 104 PROCESSED 21/02/16 TIME 22.16.52 KSINA=8
AGE SPH %THN %INC SV SI MAI20 HTPC VIPC AGE BA DBH HT SPH CIH% CIV% CVD BCON CMAI C0 C0CAL SI20
0 1100 .0 89.0%SPH 2 2 .00 .0 .0 20.00 1 .0 17.3 0 .0 .0 .0 .0 .0000 .000 0% .00
7 815 25.9 .0 2 2 9.90 75.5 47.2 20.00 1 26.6 17.3 330 .0 .0 .0 13.0 .2099 1.005 .000 17.30
13 550 32.5 .0
18 330 40.0 .0
45 0 100.0 .0
0SQ -4 -4 -4 = SI20 17 17 17 PLANTN---104 GREEN MEADOWS MODEL---P.ELLIOTTII MAC MAC SQ 10 SI20 22.90
HTPC 76 76 76 =MAI20 10 10 10 FROM HTPC HTPC 100 MAI20 20.71
VIPC 47 47 47 =MAI20 10 10 10 HTPC/VIPC REGRESSION---P.ELLIOTTII GENERAL 1/83 VIPC 100 MAI20 20.99
MAIDBH 0
INMAI==> 0
0INPUT FOR CALCULATING HTPC & VIPC = HT ---- ----
AGE DBH HT VTREE SPH BA TOTAL WS UTIL S A B C D TCAI CTCAI TMAI UCAI CUCAI UMAI SCAI CSCAI SMAI IAGE
1 .0 .2 .0000 979 0 0 0 0 0 0 0 0 0 .0 0 .0 .0 0 .0 .0 0 .0 1.0
2 .0 .9 .0000 979 0 0 0 0 0 0 0 0 0 .0 0 .0 .0 0 .0 .0 0 .0 2.0
3 3.9 2.0 .0007 979 1 1 1 0 0 0 0 0 0 .7 1 .2 .0 0 .0 .0 0 .0 3.0
4 7.1 3.4 .0041 979 4 4 3 1 1 0 0 0 0 3.4 4 1.0 .6 1 .2 .0 0 .0 4.0
5 9.4 4.6 .0102 979 7 10 5 5 5 0 0 0 0 5.9 10 2.0 4.1 5 .9 .0 0 .0 5.0
6 11.3 5.7 .0188 979 10 18 6 12 12 1 0 0 0 8.4 18 3.1 7.5 12 2.0 .0 0 .0 6.0
7 13.0 6.7 .0293 979 13 29 7 22 19 3 0 0 0 10.3 29 4.1 9.7 22 3.1 .0 0 .0 7.0
17%
THN 11.4 6.7 .0230 164 2 4 1 3 2 0 0 0 0
REM 13.4 6.7 .0315 815 12 26 6 20 17 3 0 0 0
8 15.0 7.6 .0453 815 14 37 6 31 21 10 0 0 0 11.2 40 5.0 10.9 33 4.1 .0 0 .0 7.6
9 16.4 8.5 .0607 815 17 49 6 43 23 20 0 0 0 12.5 52 5.8 12.2 45 5.0 .2 0 .0 8.6
10 17.4 9.4 .0771 815 19 63 7 56 24 30 2 0 0 13.4 66 6.6 13.1 58 5.8 1.3 2 .2 9.6
11 18.3 10.3 .0941 815 21 77 7 70 24 41 5 0 0 13.9 80 7.3 13.6 72 6.5 3.0 5 .4 10.6
12 19.0 11.3 .1118 815 23 91 7 84 24 50 10 0 0 14.4 94 7.8 14.1 86 7.2 5.4 10 .8 11.6
13 19.6 12.2 .1299 815 25 106 8 98 24 56 18 0 0 14.7 109 8.4 14.4 100 7.7 8.0 18 1.4 12.6
33%
THN 17.5 12.2 .1044 265 6 28 2 25 8 15 3 0 0
REM 20.6 12.2 .1421 550 18 78 5 73 16 42 15 0 0
14 21.3 13.0 .1636 550 20 90 6 84 16 44 25 0 0 11.8 121 8.6 11.6 112 8.0 10.0 28 2.0 10.4
15 22.0 13.7 .1864 550 21 103 6 97 16 45 36 0 0 12.5 133 8.9 12.3 124 8.3 11.0 39 2.6 11.2
16 22.7 14.5 .2100 550 22 116 6 109 15 46 48 0 0 13.0 146 9.1 12.7 137 8.6 12.0 51 3.2 12.0
17 23.3 15.3 .2345 550 23 129 6 123 15 46 61 0 0 13.5 160 9.4 13.2 150 8.8 12.9 64 3.8 12.8
18 23.9 15.9 .2598 550 25 143 7 136 15 46 74 1 0 13.9 174 9.6 13.6 164 9.1 13.8 78 4.3 13.6
40%
THN 21.6 15.9 .2142 220 8 47 2 45 6 19 20 0 0
REM 25.3 15.9 .2901 330 17 96 4 92 9 28 54 1 0
19 26.0 16.6 .3203 330 17 106 4 101 9 27 63 3 0 10.0 184 9.7 9.8 174 9.1 10.5 88 4.6 11.0
20 26.6 17.3 .3519 330 18 116 5 112 9 27 71 5 0 10.4 194 9.7 10.2 184 9.2 10.6 99 4.9 11.7
21 27.2 18.0 .3849 330 19 127 5 122 9 27 80 8 0 10.9 205 9.8 10.7 194 9.3 11.1 110 5.2 12.4
22 27.9 18.7 .4192 330 20 138 5 133 8 26 87 11 0 11.3 216 9.8 11.1 206 9.3 11.5 121 5.5 13.2
23 28.4 19.3 .4546 330 21 150 5 145 8 26 94 16 0 11.7 228 9.9 11.4 217 9.4 11.8 133 5.8 14.0
24 29.0 20.0 .4914 330 22 162 5 157 8 26 101 22 0 12.2 240 10.0 11.9 229 9.5 12.3 145 6.1 14.9
25 29.6 20.6 .5292 330 23 175 6 169 8 25 106 29 0 12.5 253 10.1 12.2 241 9.6 12.6 158 6.3 15.7
26 30.2 21.2 .5682 330 24 188 6 182 8 25 112 37 0 12.9 265 10.2 12.6 254 9.8 13.0 171 6.6 16.5
27 30.7 21.8 .6083 330 25 201 6 194 8 25 115 46 0 13.2 279 10.3 13.0 267 9.9 13.3 184 6.8 17.3
28 31.3 22.4 .6492 330 25 214 7 208 8 24 119 56 1 13.5 292 10.4 13.2 280 10.0 13.6 198 7.1 18.2
29 31.9 23.0 .6908 330 26 228 7 221 8 24 122 65 2 13.7 306 10.5 13.5 293 10.1 13.8 212 7.3 19.0
30 32.4 23.5 .7332 330 27 242 7 235 8 24 123 77 3 14.0 320 10.7 13.7 307 10.2 14.0 226 7.5 19.8
31 33.0 23.9 .7766 330 28 256 7 249 8 24 125 88 5 14.3 334 10.8 14.0 321 10.4 14.3 240 7.7 20.4
32 33.6 24.4 .8202 330 29 271 8 263 8 23 126 99 7 14.4 349 10.9 14.1 335 10.5 14.4 255 8.0 21.0
Firstly the two variables in the first line(1(P1...): in this case 'C03 104' need to be extracted from it and be sent to OUTPUT.(Same as previous question, but the output position changes.)
Secondly, all lines beginning with 'THN' need to be extracted as they are except that the THN can be dropped.
If there are two, three, four or even five etc. 'THN' lines, they all need to be extracted from the batch and sent to OUTPUT.(+- same as previous question)
Thirdly, although sequentially the second step, the last figure in the 'AGE' column of the main tabular data just before the 'THN' line, needs to be attached to the extracted 'THN' line directly below it.(in this case the figures 7, 13 and 18) These need to be added to their respective THN lines. See expected output below where the ages have been inserted after the two 'C03 104' variables in each line.
If there are no 'THN' lines in a given batch, the entire batch should be ignored, with no output, and the next batch(starting with a '1(P1)' again) considered.
The correct output expected from the above batch is:
CO3 104 7 11.4 6.7 .0230 164 2 4 1 3 2 0 0 0 0
CO3 104 13 17.5 12.2 .1044 265 6 28 2 25 8 15 3 0 0
CO3 104 18 21.6 15.9 .2142 220 8 47 2 45 6 19 20 0 0
As will be seen from this, the two variables from the top line are inserted at the start of the output THN data line. The age figure read from the input batch is then inserted into its respective THN line and thereafter the rest of the THN line data is attached.
My effort some time ago but not updated is as follows:
while ( my $line = <INPUT> ) {
if($line =~ /\s{6,11}(\w{1}\d{1}\w{0,5})\s{0,5}(\d{3})/) {
my #c_no = "$1,$2\n";
foreach (#c_no) {
print OUTPUT $_;
}
if ($line =~ /^(\s{1}THN)(\s{1,3}\d{0,2}.\d)(\s{1,3}\d{0,2}.\d)(\s{1,2}\d{0,1}.\d{4})(\s{1,2}\d{2,4})
(\s{2,3}\d{1,2})(\s{1,6}\d{1,4})(\s{1,2}\d{1,2})(\s{1,5}\d{1,4})(\s{1,4}\d{1,4})
(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})|(^1(P1))/x){
print OUTPUT "$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14\n";
}
}
}
Advice, guidance and help would be greatly appreciated.
This is much more simply done using split to separate each line into space-delimited fields, and quite straightforward if you maintain state variables for the two fields from the header row, and the age from any row whose first field is entirely digits. Then all that is necessary is to print these three values before the numbers on any line that starts with THN
Note that it's simplest to pass the name of the input file as a parameter on the command line. Then all you have to do is read from <>. All the opening and error handling are already done for you
The output format you've asked for is rather esoteric. I can't see any pattern to the column widths and I've had to write a custom printf format to recreate it. If you need something else then all the values in each output line are in the #data array, which you can use as you wish
use strict;
use warnings 'all';
my ($c1, $c2, $age);
while ( <> ) {
next unless /\S/;
my #fields = split;
if ( $fields[0] eq '1(P1)' ) {
($c1, $c2) = #fields[10,11];
}
elsif ( $fields[0] !~ /\D/ ) {
$age = $fields[0];
}
elsif ( $fields[0] eq 'THN' ) {
my #data = ( $c1, $c2, $age, #fields[1..13] );
printf "%4s %5s %5d %5.1f%5.1f%7.4f%5d%4d%7d%3d%6d%5d%5d%5d%5d%5d\n", #data;
}
}
output
C03 104 7 11.4 6.7 0.0230 164 2 4 1 3 2 0 0 0 0
C03 104 13 17.5 12.2 0.1044 265 6 28 2 25 8 15 3 0 0
C03 104 18 21.6 15.9 0.2142 220 8 47 2 45 6 19 20 0 0
I copied and modified your example data, so this hasn't had a really good test. And I'm printing to STDOUT for testing purposes, but that should be easy to change.
The trick is to recognize that you've got line matching to do, which is great with regexes, and other processing, which is probably better with plain old code. So build a little loop, and process the lines with equal precedence (this is important for detecting errors in the file - don't try to nest things too much). Put in some state variables to help keep track of what comes next, and be sure to reset them appropriately.
Also, one thing I noticed in your example code is that you spent a lot if time getting spacing and number-of-digits right for the fields. That was almost certainly wasted time in this context, since the key was the "THN" at the start of the line. One trick with processing text is to focus on the things you really need, and use .* for the other stuff. That way, line noise or a syntax error or some strange formatting glitch won't screw up your program. (Sometimes .* becomes [^"]* or whatever, but you take the point...)
my $line_prefix, $have_age_col, $age_col;
while (<>) {
if (/^1\(P1\).*\s(?P<two_vars>\w+\s+\w+)\s+PROCESSED .* TIME .* KSINA=.*$/) {
# Start new section
$line_prefix = $+{two_vars};
$have_age_col = 0;
$age_col = undef;
}
if (/^AGE /) {
$have_age_col = 1;
}
if ($have_age_col && /^\s{0,5}(\d+)/) {
$age_col = substr " ".$1, -5;
}
if (/^THN /) {
die "THN encountered without header"
unless $line_prefix;
die "THN encountered without age column"
unless $have_age_col and $age_col;
s/^THN \s*//;
s/\s+$//;
my $output = "$line_prefix $age_col $_\n";
print STDOUT $output;
}
}
I have been chosen by my Computer Science laboratory in order to develop a Graphical User Interface (GUI) which main goal is to allow the user to open High Resolution images (TIFF format) and manipulate them (zoom in, zoom out, draw rectangles,create and edit annotations...).
I would like to build this GUI using Qt coupled with OpenCV. Nevertheless, I have some doubts about working with OpenCV for this project.
Therefore, my questions are:
Does Qt allow you to handle High-Resolution TIFF images (2000x3000 pixels?
Does OpenCV allow you to handle High-Resolution TIFF images?
Is it easy to convert a OpenCV TIFF image into a Qt TIFF image?
Create and edit annotations? That sounds like going much farther than process TIFF.
Supposing the original post was understood correctly in both the motivation - to create GUI capable of working with large TIFF image files which also allows users to create and ( later ) edit annotations and other graphical elements ( rectangles et al )
The solution goes in a much different way than just to how to handle TIFF images in OpenCV and similar tools.
If the goal is not to just reinvent wheel again, there is a lovely and powerful tool xfig used in CERN since ages, which provides a powerful framework for your idea.
Using xfig's syntax-engine and rendering options allows you to keep objects compose-able and editable throughout the whole lifecycle of the work and leaves you more time for developing your own concept of GUI, not spending man*years of efforts on the internals of how to handle standard format pixmaps and layered vector objects on the lowest-level side.
Meta-file looks like:
#FIG 3.2
Landscape
Center
Inches
Letter
200.00
Single
-3
1200 2
6 8625 825 11775 7275
6 8625 1125 9075 3975
6 8625 1125 9075 2175
6 8625 1125 9075 1575
2 1 0 3 0 7 0 0 -1 0.000 1 0 -1 0 0 2
8700 1200 9000 1200
2 1 0 3 0 7 0 0 -1 0.000 1 0 -1 0 0 2
8700 1500 9000 1500
-6
6 8625 1725 9075 2175
2 1 0 3 0 7 0 0 -1 0.000 1 0 -1 0 0 2
8700 1800 9000 1800
2 1 0 3 0 7 0 0 -1 0.000 1 0 -1 0 0 2
8700 2100 9000 2100
...
4 0 0 0 0 0 20 0.0000 4 195 1080 35100 19800 Z-80 PIO\001
4 0 0 0 0 0 15 0.0000 4 180 930 34575 21075 CTL/DAT\001
4 0 0 0 0 0 15 0.0000 4 180 795 34575 21375 B/A SEL\001
4 0 0 0 0 0 15 0.0000 4 150 270 34575 21975 A6\001
4 0 0 0 0 0 15 0.0000 4 150 270 34575 22275 A5\001
4 0 0 0 0 0 15 0.0000 4 150 480 34575 22875 GND\001
4 0 0 0 0 0 15 0.0000 4 150 270 34575 24075 A0\001
4 0 0 0 0 0 15 0.0000 4 150 600 34575 24375 A STB\001
4 0 0 0 0 0 15 0.0000 4 150 570 34575 24675 B STB\001
4 0 0 0 0 0 15 0.0000 4 150 240 36525 21975 B6\001
and result may look like:
I am following a tutorial for a word processor for my QT module at uni.
It has asked me to put set this attribute:
MainWindow::setAttribute(Qt::WA_DeleteOnClose);
the problem comes when i run the application it causes an error saying that the application has closed unexpectedly.
Also it asked me to make a actionExit action and add to the file toolbar, which doesnt show, i am guessing that it is due to the fact that i am writing it on OSx and the exit/quit is taken care for you with the cmd+Q shortcut.
I was wondering if anyone could shed some light on this problem for me so that i know for future reference. if needed i can post the tutorial + source code.
Thanks
edit: backtrace from the debugger(hope this is correct)
0 __pthread_kill 0 0x7fff8eaff212
1 pthread_kill 0 0x7fff86f7eaf4
2 abort 0 0x7fff86fc2dce
3 free 0 0x7fff86f96959
4 MainWindow::~MainWindow mainwindow.cpp 22 0x100002cff
5 QObject::event 0 0x100e48906
6 QWidget::event 0 0x1000ecd5e
7 QMainWindow::event 0 0x10049cadb
8 QApplicationPrivate::notify_helper 0 0x10009593d
9 QApplication::notify 0 0x10009bdc4
10 QCoreApplication::notifyInternal 0 0x100e3417c
11 QCoreApplicationPrivate::sendPostedEvents 0 0x100e355a0
12 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ 0 0x7fff90925101
13 __CFRunLoopDoSources0 0 0x7fff90924a25
14 __CFRunLoopRun 0 0x7fff90947dc5
15 CFRunLoopRunSpecific 0 0x7fff909476b2
16 RunCurrentEventLoopInMode 0 0x7fff8d0f60a4
17 ReceiveNextEventCommon 0 0x7fff8d0f5d84
18 BlockUntilNextEventMatchingListInMode 0 0x7fff8d0f5cd3
19 _DPSNextEvent 0 0x7fff91a00613
20 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 0 0x7fff919ffed2
... <More>
Is your MainWindow object declared on the stack, by any chance? If so, then DeleteOnClose is not a good idea, simply because deleting an object that is on the stack is an error.
I have a Django view which creates 500-5000 new database INSERTS in a loop. Problem is, it is really slow! I'm getting about 100 inserts per minute on Postgres 8.3. We used to use MySQL on lesser hardware (smaller EC2 instance) and never had these types of speed issues.
Details:
Postgres 8.3 on Ubuntu Server 9.04.
Server is a "large" Amazon EC2 with database on EBS (ext3) - 11GB/20GB.
Here is some of my postgresql.conf -- let me know if you need more
shared_buffers = 4000MB
effective_cache_size = 7128MB
My python:
for k in kw:
k = k.lower()
p = ProfileKeyword(profile=self)
logging.debug(k)
p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
#checking created is just a small optimization to save some database hits on new keywords
pass #duplicate entry
else:
p.save()
Some output from top:
top - 16:56:22 up 21 days, 20:55, 4 users, load average: 0.99, 1.01, 0.94
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 0.2%sy, 0.0%ni, 90.5%id, 0.7%wa, 0.0%hi, 0.0%si, 2.8%st
Mem: 15736360k total, 12527788k used, 3208572k free, 332188k buffers
Swap: 0k total, 0k used, 0k free, 11322048k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14767 postgres 25 0 4164m 117m 114m S 22 0.8 2:52.00 postgres
1 root 20 0 4024 700 592 S 0 0.0 0:01.09 init
2 root RT 0 0 0 0 S 0 0.0 0:11.76 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0 0.0 0:00.08 events/0
6 root 11 -5 0 0 0 S 0 0.0 0:00.00 khelper
7 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
9 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenwatch
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenbus
18 root RT -5 0 0 0 S 0 0.0 0:11.84 migration/1
19 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/1
Let me know if any other details would be helpful.
One common reason for slow bulk operations like this is each insert happening in its own transaction. If you can get all of them to happen in a single transaction, it could go much faster.
Firstly, ORM operations are always going to be slower than pure SQL. I once wrote an update to a large database in ORM code and set it running, but quit it after several hours when it had completed only a tiny fraction. After rewriting it in SQL the whole thing ran in less than a minute.
Secondly, bear in mind that your code here is doing up to four separate database operations for every row in your data set - the get in get_or_create, possibly also the create, the count on the filter, and finally the save. That's a lot of database access.
Bearing in mind that a maximum of 5000 objects is not huge, you should be able to read the whole dataset into memory at the start. Then you can do a single filter to get all the existing Keyword objects in one go, saving a huge number of queries in the Keyword get_or_create and also avoiding the need to instantiate duplicate ProfileKeywords in the first place.