How to reorder certain row index in pandas

How to reorder certain row index in pandas - python-2.7

I have a dataframe like this:
100MHz_Dif0 102MHz_Dif0 100MHz_Dif1 102MHz_Dif1
Frequency
9.000000e+07 -70.209000 -65.174004 -66.063004 -66.490997
9.003333e+07 -70.628998 -65.196999 -66.339996 -66.461998
9.006667e+07 -70.405998 -65.761002 -65.432999 -65.549004
9.010000e+07 -70.524002 -65.552002 -66.038002 -65.887001
9.013333e+07 -70.746002 -65.658997 -65.086998 -65.390999
9.016667e+07 -70.884003 -66.209999 -64.887001 -65.397003
9.020000e+07 -70.752998 -66.019997 -65.308998 -66.571999
9.023333e+07 -70.447998 -65.858002 -65.500000 -65.028999
9.026667e+07 -70.452003 -65.832001 -66.032997 -65.005997
9.030000e+07 -71.219002 -65.739998 -65.961998 -65.986000
9.033333e+07 -71.095001 -65.820999 -67.112999 -65.977997
9.036667e+07 -70.834000 -65.926003 -66.348000 -65.568001
as an example. If I want to move the third row and forth row to be the first and the second row, which command shall I use? And I will change the order of the rows depending on the frequency, then what can I do to implement it? Thank you very much.

Assuming you dataframe is named df
Use np.r_ to create the appropriate slice
df.iloc[np.r_[[2, 3], [0, 1], 4:]]
100MHz_Dif0 102MHz_Dif0 100MHz_Dif1 102MHz_Dif1
Frequency
90066670.0 -70.405998 -65.761002 -65.432999 -65.549004
90100000.0 -70.524002 -65.552002 -66.038002 -65.887001
90000000.0 -70.209000 -65.174004 -66.063004 -66.490997
90033330.0 -70.628998 -65.196999 -66.339996 -66.461998
90000000.0 -70.209000 -65.174004 -66.063004 -66.490997
90033330.0 -70.628998 -65.196999 -66.339996 -66.461998
90066670.0 -70.405998 -65.761002 -65.432999 -65.549004
90100000.0 -70.524002 -65.552002 -66.038002 -65.887001

For change order of index values by frequency use sort_index:
df = df.sort_index()
print (df)
100MHz_Dif0 102MHz_Dif0 100MHz_Dif1 102MHz_Dif1
Frequency
90000000.0 -70.209000 -65.174004 -66.063004 -66.490997
90033330.0 -70.628998 -65.196999 -66.339996 -66.461998
90066670.0 -70.405998 -65.761002 -65.432999 -65.549004
90100000.0 -70.524002 -65.552002 -66.038002 -65.887001
90133330.0 -70.746002 -65.658997 -65.086998 -65.390999
90166670.0 -70.884003 -66.209999 -64.887001 -65.397003
90200000.0 -70.752998 -66.019997 -65.308998 -66.571999
90233330.0 -70.447998 -65.858002 -65.500000 -65.028999
90266670.0 -70.452003 -65.832001 -66.032997 -65.005997
90300000.0 -71.219002 -65.739998 -65.961998 -65.986000
90333330.0 -71.095001 -65.820999 -67.112999 -65.977997
90366670.0 -70.834000 -65.926003 -66.348000 -65.568001
And for sort columns:
df = df.sort_index(axis=1)
print (df)
100MHz_Dif0 100MHz_Dif1 102MHz_Dif0 102MHz_Dif1
Frequency
90000000.0 -70.209000 -66.063004 -65.174004 -66.490997
90033330.0 -70.628998 -66.339996 -65.196999 -66.461998
90066670.0 -70.405998 -65.432999 -65.761002 -65.549004
90100000.0 -70.524002 -66.038002 -65.552002 -65.887001
90133330.0 -70.746002 -65.086998 -65.658997 -65.390999
90166670.0 -70.884003 -64.887001 -66.209999 -65.397003
90200000.0 -70.752998 -65.308998 -66.019997 -66.571999
90233330.0 -70.447998 -65.500000 -65.858002 -65.028999
90266670.0 -70.452003 -66.032997 -65.832001 -65.005997
90300000.0 -71.219002 -65.961998 -65.739998 -65.986000
90333330.0 -71.095001 -67.112999 -65.820999 -65.977997
90366670.0 -70.834000 -66.348000 -65.926003 -65.568001
And for sorts both - index and columns:
df = df.sort_index(axis=1).sort_index()
print (df)
100MHz_Dif0 100MHz_Dif1 102MHz_Dif0 102MHz_Dif1
Frequency
90000000.0 -70.209000 -66.063004 -65.174004 -66.490997
90033330.0 -70.628998 -66.339996 -65.196999 -66.461998
90066670.0 -70.405998 -65.432999 -65.761002 -65.549004
90100000.0 -70.524002 -66.038002 -65.552002 -65.887001
90133330.0 -70.746002 -65.086998 -65.658997 -65.390999
90166670.0 -70.884003 -64.887001 -66.209999 -65.397003
90200000.0 -70.752998 -65.308998 -66.019997 -66.571999
90233330.0 -70.447998 -65.500000 -65.858002 -65.028999
90266670.0 -70.452003 -66.032997 -65.832001 -65.005997
90300000.0 -71.219002 -65.961998 -65.739998 -65.986000
90333330.0 -71.095001 -67.112999 -65.820999 -65.977997
90366670.0 -70.834000 -66.348000 -65.926003 -65.568001

This can be practical since you have only 4 columns
assuming the dataFrame do:
dataFrame = dataFrame[['100MHz_Dif1','102MHz_Dif1','100MHz_Dif0', '102MHz_Dif0']]
which is actually rewriting the dataFrame,

Related

How to bind datetime function in sqlite3 c++

My sqlite insert statement is as
char *testSQL;
testSQL = "INSERT INTO Test (id, tx_time) VALUES ("+id+ ", datetime("+timestamp+",'unixepoch', 'localtime'));";
I'm trying to convert above into prepared statement using sqlite3_bind.
testSQL = "INSERT INTO Test (id, tx_time) VALUES (?, ?);";
I can bind id simply using sqlite3_bind_int(stmt, 1, id) but how can I bind datetime function?

Put the datetime in the SQL instead:
char *testSQL;
testSQL = "INSERT INTO Test (id, tx_time) "
"VALUES (?, datetime(?,'unixepoch', 'localtime'));"
And use sqlite3_bind_int to bind timestamp instead.

python list.append screws up the dictionary entries (only last entry is added in each index)

I have this piece of code and individually the dictionary looks fine, how ever when appended to a list, the list only shows last entry
def readExcel(fInputFile="", sheetname=""):
mywb = xlrd.open_workbook(vInputFile, on_demand=True)
sheet_names = mywb.sheet_names()
mysheet = mywb.sheet_by_name(sheet_names[0])
for row_idx in range(1,mysheet.nrows):
for col_idx in range(mysheet.ncols):
cell = mysheet.cell(row_idx,col_idx)
hdr = mysheet.cell(0,col_idx)
init_list(myl,str(hdr.value),str(cell.value))
if testmethod==NOLOAD:
mpt_noload_execute(myl)
else:
myll.append(myl)
print("myll after each row", myll[row_idx-1]['uname'])
for j in range(len(myll)):
print("myll after file reading", myll[j]['uname'])
==============================
Execution and Results:
>python mpt_login_test_driver.py
('myll after each row', 'autotest01')
('myll after each row', 'autotest02')
('myll after each row', 'autotest03')
('myll after file reading', 'autotest03') <=== error
('myll after file reading', 'autotest03') <=== error
('myll after file reading', 'autotest03')

You need to use a new dictionary each iteration for myl. Otherwise you are just modifying the same one. Eg.
def readExcel(fInputFile="", sheetname=""):
mywb = xlrd.open_workbook(vInputFile, on_demand=True)
sheet_names = mywb.sheet_names()
mysheet = mywb.sheet_by_name(sheet_names[0])
for row_idx in range(1,mysheet.nrows):
myl={}
for col_idx in range(mysheet.ncols):
cell = mysheet.cell(row_idx,col_idx)
hdr = mysheet.cell(0,col_idx)
init_list(myl,str(hdr.value),str(cell.value))
if testmethod==NOLOAD:
mpt_noload_execute(myl)
else:
myll.append(myl)
print("myll after each row", myll[row_idx-1]['uname'])
for j in range(len(myll)):
print("myll after file reading", myll[j]['uname'])
Now myll will have a new dict appended and not just modify the old one.

Why does this code produce 4 times nothing and the fifth time the correct data?

I got an XML-file:
<weatherdata>
<location>
<name>Vlaardingen</name>
<type/>
<country>NL</country>
<timezone/>
<location altitude="0"
latitude="51.912498"
longitude="4.34167"
geobase="geonames"
geobaseid="2745467"/>
</location>
<credit/>
<meta>
<lastupdate/>
<calctime>0.0152</calctime>
<nextupdate/>
</meta>
<sun rise="2016-02-23T06:40:58"
set="2016-02-23T17:11:47"/>
<forecast>
<time day="2016-02-23">
<symbol number="500"
name="lichte regen"
var="10d"/>
<precipitation/>
<windDirection deg="316"
code="NW"
name="Northwest"/>
<windSpeed mps="9.01"
name="Fresh Breeze"/>
<temperature day="6.06"
min="5.57"
max="6.06"
night="5.66"
eve="5.57"
morn="6.06"/>
<pressure unit="hPa"
value="1027.72"/>
<humidity value="96"
unit="%"/>
<clouds value="clear sky"
all="8"
unit="%"/>
</time>
<time day="2016-02-24">
<symbol number="501"
name="matige regen"
var="10d"/>
<precipitation value="3.15"
type="rain"/>
<windDirection deg="283"
code="WNW"
name="West-northwest"/>
<windSpeed mps="6.21"
name="Moderate breeze"/>
<temperature day="4.98"
min="4.17"
max="5.11"
night="4.17"
eve="4.85"
morn="4.32"/>
<pressure unit="hPa"
value="1030.97"/>
<humidity value="100"
unit="%"/>
<clouds value="scattered clouds"
all="48"
unit="%"/>
</time>
<time day="2016-02-25">
<symbol number="500"
name="lichte regen"
var="10d"/>
<precipitation value="1.23"
type="rain"/>
<windDirection deg="295"
code="WNW"
name="West-northwest"/>
<windSpeed mps="5.71"
name="Moderate breeze"/>
<temperature day="5.43"
min="4.92"
max="5.48"
night="5.34"
eve="5.48"
morn="4.92"/>
<pressure unit="hPa"
value="1026.18"/>
<humidity value="100"
unit="%"/>
<clouds value="broken clouds"
all="68"
unit="%"/>
</time>
</forecast>
</weatherdata>
This is my C++ code which reads the XML-file:
#include <iostream>
#include <string>
#include "tinyxml2.h"
using namespace std;
struct weatherData
{
// date of day
string time_day;
// symbol data for weathericon and display of weather type
string symbol_number;
string symbol_name;
string symbol_var;
// windspeed
string windSpeed_mps;
// min. and max. temperature
string temp_min;
string temp_max;
};
int main()
{
weatherData forecast[3];
int counter = 0;
tinyxml2::XMLDocument doc;
if(doc.LoadFile("daily.xml") == tinyxml2::XML_SUCCESS)
{
tinyxml2::XMLElement* root = doc.FirstChildElement();
for(tinyxml2::XMLElement* elem = root->FirstChildElement(); elem != NULL; elem = elem->NextSiblingElement())
{
std::string elemName = elem->Value();
for (tinyxml2::XMLElement* e = elem->FirstChildElement("time"); e != NULL; e = e->NextSiblingElement("time"))
{
if (e)
{
const char *time = e->Attribute("day");
forecast[counter].time_day = time;
counter++;
}
}
cout << "Time dates: " << endl;
for (int i = 0; i < 3;i++)
{
cout << forecast[i].time_day << endl;
}
counter = 0;
}
}
}
I am a novice in coding. I'm using the example code from a blog and adapted it for my needs. I know the for-loops just run across the elements in the XML-file.
And every time it finds the element 'time' it looks if it has an attribute 'day'. What I don't get is why it runs 4 times and the fifth time it produces the attributes of the three 'time' parts.
This is the output:
Time dates:
Time dates:
Time dates:
Time dates:
Time dates:
2016-02-23
2016-02-24
2016-02-25

It is because your outer loop iterates over all direct successors of root element weatherdata, i.e. it iterates over the element nodes location, credit, meta, sun, and forecast. For each of these elements, you search for the time-elements, in which you are actually interested. But the first 4 elements, i.e. location, credit, meta and sun, do not comprise any time-element, such that the first 4 iterations of the outer loop cannot extract any time data, whereas the 5th iteration then selects element node forecast, which has the three time-elements that you are looking for.
I suppose that it works if you change your code as follows (note the "forecast"-parameter in the call to FirstChildElement):
....
if(doc.LoadFile("daily.xml") == tinyxml2::XML_SUCCESS)
{
tinyxml2::XMLElement* root = doc.FirstChildElement();
for(tinyxml2::XMLElement* elem = root->FirstChildElement("forecast"); elem != NULL; elem = elem->NextSiblingElement())
{
....

Fastest Way to insert data( millions of data) into mysql in python

Insert data into CiqHistorical master table in Mysql
sql ="""INSERT INTO CiqHistorical(CiqRefID, CoID, GVKEY, IID, GRID, CreateDID, SectorID,
UserID, ClientID, MinPeriodID, MaxPeriodID, MaxPeriodDID, MinAnnualID, MaxAnnualID,
MaxAnnualDID) VALUES(%s,%s,'%s','%s',%s,GetDateID(now()),%s,%s,%s,%s,%s,
GetDateID('%s'),%s,%s,GetDateID('%s'));""" %(ciq_ref_id, coid, gvkey, iid,
grid, sector_id,user_id, client_id, min_period_id,
max_period_id, max_period_did, min_annual_id,
max_annual_id, max_annual_did)
ciq_hist = self.mysql_hermes.execute(sql)

You can insert many records using one INSERT, but you should keep the size of your query not too large. Here's a script that does what you need:
CHUNK_SIZE = 1000
def insert_many(data)
index = 0
while True:
chunk = data[index : index + CHUNK_SIZE]
if not chunk:
break
values_str = ", ".join(
"('{0}', '{1}', '{2}', ...)".format(row['field1'], row['field2'], row['field3'], ...)
for row in chunk
)
sql = "INSERT INTO `your_table` (field1, field2, field3, ...) VALUES {0}".format(values_str)
self.mysql_hermes.execute(sql)
index += CHUNK_SIZE

MapReduce: Split a txt file into multiple files based on a pattern in a file

I have a tab separated .txt file like this:
05-12-2011 02:00:00 XYZZ
05-12-2011 02:01:00 XYZZ
05-12-2011 02:02:00 XYZZ
05-12-2011 02:03:00 XYZZ
05-12-2011 02:04:00 ABCD
05-12-2011 02:05:00 ABCD
05-12-2011 02:06:00 ABCD
05-12-2011 02:07:00 XYZZ
05-12-2011 02:08:00 ABCD
I want to write the data into different files such as those with the pattern "XYZZ" into one file and those with "ABCD" into another file.
file1.txt will contain:
05-12-2011 02:01:00 XYZZ
05-12-2011 02:02:00 XYZZ
05-12-2011 02:03:00 XYZZ
05-12-2011 02:07:00 XYZZ
And file2.txt will contain:
05-12-2011 02:04:00 ABCD
05-12-2011 02:05:00 ABCD
05-12-2011 02:06:00 ABCD
Here's the code which i want to share.
public class WordCount2 {
public static class TokenizerMapper2
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer2
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
/* int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);*/
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String line;
String arguements[];
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
// calculating the total number of attributes in the file
FileReader infile = new FileReader(args[0]);
BufferedReader bufread = new BufferedReader(infile);
line = bufread.readLine();
arguements = line.split(","); //for spliting fields separated by comma
conf.setInt("argno", arguements.length); // saving that attribute value
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to reorder certain row index in pandas - python-2.7

This can be practical since you have only 4 columns assuming the dataFrame do: dataFrame = dataFrame[['100MHz_Dif1','102MHz_Dif1','100MHz_Dif0', '102MHz_Dif0']] which is actually rewriting the dataFrame,

Related

How to bind datetime function in sqlite3 c++

python list.append screws up the dictionary entries (only last entry is added in each index)

Why does this code produce 4 times nothing and the fifth time the correct data?

Fastest Way to insert data( millions of data) into mysql in python

MapReduce: Split a txt file into multiple files based on a pattern in a file

Categories

Resources