Writing both characters and digits in an array

Writing both characters and digits in an array - fortran

I have a Fortran code which reads a txt file with seperate lines of characters and digits and then write them in a 1D array with 20 elements.
This code is not compatible with Fortran 77 compiler Force 2.0.9. My question is that how we can apply the aformenetioned procedure using a Fortran 77 compiler;i.e defining a 1D array nd then write the txt file line by line into elements of the array?
Thank you in advance.
The txt file follows:
Case 1:
10 0 1 2 0
1.104 1.008 0.6 5.0
25 125.0 175.0 0.7 1000.0
0.60
1 5
Advanced Case
15 53 0 10 0 1 0 0 1 0 0 0 0
0 0 0 0
0 0 1500.0 0 0 .03
0 0.001 0
0.1 0 0.125 0.08 0.46
0.1 5.0 0.04
# Jason:
I am a beginner and still learning Fortran. I guess Force 2 uses g77.
The followings are the correspond part of the original code. Force 2 editor returns an empty txt file as a result.
DIMENSION CARD(20)
CHARACTER*64 FILENAME
DATA XHEND / 4HEND /
OPEN(UNIT=3,FILE='CON')
OPEN(UNIT=4,FILE='CON')
OPEN(UNIT=7,STATUS='SCRATCH')
WRITE(3,9000) 'PLEASE ENTER THE INPUT FILE NAME : '
9000 FORMAT (A)
READ(4,9000) FILENAME
OPEN(UNIT=5,FILE=FILENAME,STATUS='OLD')
WRITE(3,9000) 'PLEASE ENTER THE OUTPUT FILE NAME : '
READ(4,9000) FILENAME
OPEN(UNIT=6,FILE=FILENAME,STATUS='NEW')
FILENAME = '...'
IR = 7
IW = 6
IP = 15
5 REWIND IR
I = 0
2 READ (5,7204,END=10000) CARD
IF (I .EQ. 0 ) WRITE (IW,7000)
7000 FORMAT (1H1 / 10X,15HINPUT DECK ECHO / 10X,15(1H-))
I= I + 1
WRITE (IW,9204) I,CARD
IF (CARD(1) .EQ. XHEND ) GO TO 7020
WRITE (IR,7204) CARD
7204 FORMAT (20A4)
9204 FORMAT (1X,I4,2X,20A4)
GO TO 2
7020 REWIND IR

It looks that CARD is being used as a to hold 20 4-character strings. I don't see the declaration as a character variable, only as an array, so perhaps in extremely old FORTRAN style a non-character variable is being used to hold characters? You are using a 20A4 format, so the values have to be positioned in the file precisely as 20 groups of 4 characters. You have to add blanks so that they are aligned into groups of 4 columns.
If you want to read numbers it would be much easier to read them into a numeric type and use list-directed IO:
real values (20)
read (5, *) values
Then you wouldn't have to worry about precision positioning of the values in the file.
This is really archaic FORTRAN ... even pre-FORTRAN-77 in style. I can't remember the last time that I saw Hollerith (H) formats! Where are you learning this from?
Edit: While I like Fortran for many programming tasks, I wouldn't use FORTRAN 66! Computers are supposed to make things easier ... there is no reason to have to count characters. Instead of
7000 FORMAT (1H1 / 10X,15HINPUT DECK ECHO / 10X,15(1H-))
You can use
7000 FORMAT ( / 10X, "INPUT DECK ECHO" / 10X, 15("-") )
I can think of only two reasons to use a Hollerith code: not bothering to change legacy source code (it is remarkable that a current Fortran compiler can process a feature that was obsolete 30 years ago! Fortran source code never dies!), or studying the history of computing languages. The name honors a great computing pioneer, whose invention accomplished the 1890 US Census in one year, when the 1880 Census took eight years: http://en.wikipedia.org/wiki/Herman_Hollerith
I much doubt that you will see the "1" in the first column performing "carriage control" today. I had to look up that "1" was the code for page eject. You are much more likely to see it in your output. See Are Fortran control characters (carriage control) still implemented in compilers?

Related

Reading and Writing from a csv file: want to only extract rows that do not contain zeros

I'm trying to write a program to pull information from a file, but i only want the rows that do not contain 0. I know how to get the file to print to screen but I can't seem to get the desired rows printed with out printing each individual line I want. I've tried putting the information i want into a list and writing it to a new CSV file but it just jumbles everything up into 2 rows instead of the 300+ I'm trying to get. Any help would be appreciated
Months A B C
Feb-49 0 0 0
Jan-50 95 378 3767
Feb-50 67 17 233
Jan-51 0 0 0
Feb-51 0 0 0
#This file has about 400 rows that look something like this

Scikit-learn labelencoder: how to preserve mappings between batches?

I have 185 million samples that will be about 3.8 MB per sample. To prepare my dataset, I will need to one-hot encode many of the features after which I end up with over 15,000 features.
But I need to prepare the dataset in batches since the memory footprint exceeds 100 GB for just the features alone when one hot encoding using only 3 million samples.
The question is how to preserve the encodings/mappings/labels between batches?
The batches are not going to have all the levels of a category necessarily. That is, batch #1 may have: Paris, Tokyo, Rome.
Batch #2 may have Paris, London.
But in the end I need to have Paris, Tokyo, Rome, London all mapped to one encoding all at once.
Assuming that I can not determine the levels of my Cities column of 185 million all at once since it won't fit in RAM, what should I do?
If I apply the same Labelencoder instance to different batches will the mappings remain the same?
I also will need to use one hot encoding either with scikitlearn or Keras' np_utilities_to_categorical in batches as well after this. So same question: how to basically use those three methods in batches or apply them at once to a file format stored on disk?

I suggest using Pandas' get_dummies() for this, since sklearn's OneHotEncoder() needs to see all possible categorical values when .fit(), otherwise it will throw an error when it encounters a new one during .transform().
# Create toy dataset and split to batches
data_column = pd.Series(['Paris', 'Tokyo', 'Rome', 'London', 'Chicago', 'Paris'])
batch_1 = data_column[:3]
batch_2 = data_column[3:]
# Convert categorical feature column to matrix of dummy variables
batch_1_encoded = pd.get_dummies(batch_1, prefix='City')
batch_2_encoded = pd.get_dummies(batch_2, prefix='City')
# Row-bind (append) Encoded Data Back Together
final_encoded = pd.concat([batch_1_encoded, batch_2_encoded], axis=0)
# Final wrap-up. Replace nans with 0, and convert flags from float to int
final_encoded = final_encoded.fillna(0)
final_encoded[final_encoded.columns] = final_encoded[final_encoded.columns].astype(int)
final_encoded
output
City_Chicago City_London City_Paris City_Rome City_Tokyo
0 0 0 1 0 0
1 0 0 0 0 1
2 0 0 0 1 0
3 0 1 0 0 0
4 1 0 0 0 0
5 0 0 1 0 0

formula for integer less than 15

In one field I want to accept numbers that could be decimal figure for weight but it should not be over 15. Previously I had the following regex code:
[1-9]\d*(\.\d+)?$
This is to be entered in Google Forms. In other words, all these numbers are OK:
0.05
1.5
2
3.56
But these are not ok:
2 kg
0
15.1
16

This should work for values 0 to 15
^((1[0-5])|([1-9]))?(\.\d*)?$

Reshaping Pandas data frame (a complex case!)

I want to reshape the following data frame:
index id numbers
1111 5 58.99
2222 5 75.65
1000 4 66.54
11 4 60.33
143 4 62.31
145 51 30.2
1 7 61.28
The reshaped data frame should be like the following:
id 1 2 3
5 58.99 75.65 nan
4 66.54 60.33 62.31
51 30.2 nan nan
7 61.28 nan nan
I use the following code to do this.
import pandas as pd
dtFrame = pd.read_csv("data.csv")
ids = dtFrame['id'].unique()
temp = dtFrame.groupby(['id'])
temp2 = {}
for i in ids:
temp2[i]= temp.get_group(i).reset_index()['numbers']
dtFrame = pd.DataFrame.from_dict(temp2)
dtFrame = dtFrame.T
Although the above code solve my problem but is there a more simple way to achieve this. I tried Pivot table but it does not solve the problem perhaps it requires to have same number of element in each group. Or may be there is another way which I am not aware of, please share your thoughts about it.

In [69]: df.groupby(df['id'])['numbers'].apply(lambda x: pd.Series(x.values)).unstack()
Out[69]:
0 1 2
id
4 66.54 60.33 62.31
5 58.99 75.65 NaN
7 61.28 NaN NaN
51 30.20 NaN NaN
This is really quite similar to what you are doing except that the loop is replaced by apply. The pd.Series(x.values) has an index which by default ranges over integers starting at 0. The index values become the column names (above). It doesn't matter that the various groups may have different lengths. The apply method aligns the various indices for you (and fills missing values with NaN). What a convenience!
I learned this trick here.

why has output data row shifted one cell down?

I am working with the simplest program of finding the average of all the scores eliminating the lowest two.
data new;
low1=smallest(1,score_1-score_6);
low2=smallest(2,score_1-score_6);
tot=score_1+score_2+score_3+score_4+score_5+score_6;
avg=(tot-low1-low2)/4;
set mydata;
run;
'mydata' does not contain any missing values yet in the output table the data is shifted by one cell down.
the output looks like dis
id low1 score_1 score_2 score_3 score_4 score_5 score_6 low2 tot avg
1 . 0 0 10 80 0 75 . . .
2 0 0 0 0 75 80 0 0 165 41.25
3 0 0 50 10 60 55 0 0 155 38.25
4...and so on
sas generates a note like this:
NOTE: Missing values were generated as a result of performing an operation on missing values
i cant figure why are the calculated values getting printed in line 2 instead starting in line 1?
help would be appreciated .thank you!

Put the SET statement right after the DATA statement (which is, I think, generally a good idea). As you have it, the first row is trying to create a variable but doesn't have any data; the data is only being read at the end.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js