Python, calculating time difference - python-2.7

I'm parsing logs generated from multiple sources and joined together to form a huge log file in the following format;
My_testNumber: 14, JobType = testx.
ABC 2234
**SR 111**
1483529571 1 1 Wed Jan 4 11:32:51 2017 0 4
datatype someRandomValue
SourceCode.Cpp 588
DBConnection failed
TB 132
**SR 284**
1483529572 0 1 Wed Jan 4 11:32:52 2017 5010400 4
datatype someRandomXX
SourceCode2.cpp 455
DBConnection Success
TB 102
**SR 299**
1483529572 0 1 **Wed Jan 4 11:32:54 2017** 5010400 4
datatype someRandomXX
SourceCode3.cpp 455
ConnectionManager Success
....
(there are dozens of SR Numbers here)
Now i'm looking a smart way to parse logs so that it calculates time differences in seconds for each testNumber and SR number
like
My_testNumber:14 it subtracts SR 284 and SR 111 time (difference would be 1 second here), for SR 284 and 299 it is 2 seconds and so on.

You can parse your posted log file and save the corresponding data accordingly. Then, you can work with the data to get the time differences. The following should be a decent start:
from itertools import combinations
from itertools import permutations # if order matters
from collections import OrderedDict
from datetime import datetime
import re
sr_numbers = []
dates = []
# Loop through the file and get the test number and times
# Save the data in a list
pattern = re.compile(r"(.*)\*{2}(.*)\*{2}(.*)")
for line in open('/Path/to/log/file'):
if '**' in line:
# Get the data between the asterisks
if 'SR' in line:
sr_numbers.append(re.sub(pattern,"\\2", line.strip()))
else:
dates.append(datetime.strptime(re.sub(pattern,"\\2", line.strip()), '%a %b %d %H:%M:%S %Y'))
else:
continue
# Use hashmap container (ordered dictionary) to make it easy to get the time differences
# Using OrderedDict here to maintain the order of the order of the test number along the file
log_dict = OrderedDict((k,v) for k,v in zip(sr_numbers, dates))
# Use combinations to get the possible combinations (or permutations if order matters) of time differences
time_differences = {"{} - {}".format(*x):(log_dict[x[1]] - log_dict[x[0]]).seconds for x in combinations(log_dict, 2)}
print(time_differences)
# {'SR 284 - SR 299': 2, 'SR 111 - SR 284': 1, 'SR 111 - SR 299': 3}
Edit:
Parsing the file without relying on the asterisks around the dates:
from itertools import combinations
from itertools import permutations # if order matters
from collections import OrderedDict
from datetime import datetime
import re
sr_numbers = []
dates = []
# Loop through the file and get the test number and times
# Save the data in a list
pattern = re.compile(r"(.*)\*{2}(.*)\*{2}(.*)")
for line in open('/Path/to/log/file'):
if 'SR' in line:
current_sr_number = re.sub(pattern,"\\2", line.strip())
sr_numbers.append(current_sr_number)
elif line.strip().count(":") > 1:
try:
dates.append(datetime.strptime(re.split("\s{3,}",line)[2].strip("*"), '%a %b %d %H:%M:%S %Y'))
except IndexError:
#print(re.split("\s{3,}",line))
dates.append(datetime.strptime(re.split("\t+",line)[2].strip("*"), '%a %b %d %H:%M:%S %Y'))
else:
continue
# Use hashmap container (ordered dictionary) to make it easy to get the time differences
# Using OrderedDict here to maintain the order of the order of the test number along the file
log_dict = OrderedDict((k,v) for k,v in zip(sr_numbers, dates))
# Use combinations to get the possible combinations (or permutations if order matters) of time differences
time_differences = {"{} - {}".format(*x):(log_dict[x[1]] - log_dict[x[0]]).seconds for x in combinations(log_dict, 2)}
print(time_differences)
# {'SR 284 - SR 299': 2, 'SR 111 - SR 284': 1, 'SR 111 - SR 299': 3}
I hope this proves useful.

Related

Keras loss is nan when using inputing data from csv with numpy

I'm trying to use the TensorFlow's Boston housing price example to learn how to use TensorFlow/Keras for regressions, but I keep running into a problem using my own data, even when I make as small of changes as possible. After giving up on writing everything myself, I simply changed the two lines of the code that input the data:
boston_housing = keras.datasets.boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
to something that, after looking online, should also create numpy arrays from my csv:
np_array = genfromtxt('trainingdata.csv', delimiter=',')
np_array = np.delete(np_array, (0), axis=0) # Remove header
test_np_array = np_array[:800,:]
tr_np_array = np_array[800:,:] # Separate out test and train data
train_labels = tr_np_array[:, 20] # Get the last row for the labels
test_labels = test_np_array[:, 20]
train_data = np.delete(tr_np_array, (20), axis=1)
test_data = np.delete(test_np_array, (20), axis=1) # Remove the last row so the data is only the features
Everything I can look at seems right – the shapes of the arrays are all correct, the arrays do seem to be correct-looking numpy arrays, the features do seem to become normalized, etc. and yet when I set verbose to 1 on model.fit(...), the very first lines of output show a problem with loss:
Epoch 1/500
32/2560 [..............................] - ETA: 18s - loss: nan - mean_absolute_error: nan
2016/2560 [======================>.......] - ETA: 0s - loss: nan - mean_absolute_error: nan
2560/2560 [==============================] - 0s 133us/step - loss: nan - mean_absolute_error: nan - val_loss: nan - val_mean_absolute_error: nan
I'm especially confused because every other place on stack overflow where I've seen the "TensorFlow loss is 'NaN'" error, it has generally a) been with a custom loss function, and b) once the model has trained for a while, not (as here) within the first 52 passes. Where that's not the case, it's because the data wasn't normalized, but I do that later in the code, and the normalization works for the housing pricing example and prints out numbers clustered around 0. At this point, my best guess is that it's a problem with the genfromtxt command, but if anyone can see what I'm doing wrong or where I might find my issue, I'd be incredibly appreciative.
Edit:
Here is the full code for the program. Commenting out lines 13 through 26 and uncommenting lines 10 and 11 make the program work perfectly. Commenting out lines 13 and 14 and uncommenting 16 and 17 was my attempt at using pandas, but that led to the same errors.
import tensorflow as tf
from tensorflow import keras
import numpy as np
from numpy import genfromtxt
import pandas as pd
print(tf.__version__)
# boston_housing = keras.datasets.boston_housing # Line 10
# (train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
np_array = genfromtxt('trainingdata.csv', delimiter=',') # Line 13
np_array = np.delete(np_array, (0), axis=0)
# df = pd.read_csv('trainingdata.csv') # Line 16
# np_array = df.get_values()
test_np_array = np_array[:800,:]
tr_np_array = np_array[800:,:]
train_labels = tr_np_array[:, 20]
test_labels = test_np_array[:, 20]
train_data = np.delete(tr_np_array, (20), axis=1)
test_data = np.delete(test_np_array, (20), axis=1) # Line 26
order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std
labels_mean = train_labels.mean(axis=0)
labels_std = train_labels.std(axis=0)
train_labels = (train_labels - labels_mean) / labels_std
test_labels = (test_labels - labels_mean) / labels_std
def build_model():
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu,
input_shape=(train_data.shape[1],)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
EPOCHS = 500
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = model.fit(train_data, train_labels, epochs=EPOCHS,
validation_split=0.2, verbose=1,
callbacks=[early_stop])
[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)
print("Testing set Mean Abs Error: ${:7.2f}".format(mae * 1000 * labels_std))

I am trying to run Dickey-Fuller test in statsmodels in Python but getting error

I am trying to run Dickey-Fuller test in statsmodels in Python but getting error P
Running from python 2.7 & Pandas version 0.19.2. Dataset is from Github and imported the same
enter code here
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
print 'Results of Dickey-Fuller Test:'
dftest = ts.adfuller(timeseries, autolag='AIC' )
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print dfoutput
test_stationarity(tr)
Gives me following error :
Results of Dickey-Fuller Test:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-10ab4b87e558> in <module>()
----> 1 test_stationarity(tr)
<ipython-input-14-d779e1ed35b3> in test_stationarity(timeseries)
19 #Perform Dickey-Fuller test:
20 print 'Results of Dickey-Fuller Test:'
---> 21 dftest = ts.adfuller(timeseries, autolag='AIC' )
22 #dftest = adfuller(timeseries, autolag='AIC')
23 dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\stattools.pyc in adfuller(x, maxlag, regression, autolag, store, regresults)
209
210 xdiff = np.diff(x)
--> 211 xdall = lagmat(xdiff[:, None], maxlag, trim='both', original='in')
212 nobs = xdall.shape[0] # pylint: disable=E1103
213
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\tsatools.pyc in lagmat(x, maxlag, trim, original)
322 if x.ndim == 1:
323 x = x[:,None]
--> 324 nobs, nvar = x.shape
325 if original in ['ex','sep']:
326 dropidx = nvar
ValueError: too many values to unpack
tr must be a 1d array-like, as you can see here. I don't know what is tr in your case. Assuming that you defined tr as the dataframe that contains the time serie's data, you should do something like this:
tr = tr.iloc[:,0].values
Then adfuller will be able to read the data.
just change the line as:
dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )
It will work. adfuller requires a 1D array list. In your case you have a dataframe. Therefore mention the column or edit the line as mentioned above.
I am assuming since you are using the Dickey-Fuller test .you want to maintain the timeseries i.e date time column as index.So in order to do that.
tr=tr.set_index('Month') #I am assuming here the time series column name is Month
ts = tr['othercoulumnname'] #Just use the other column name here it might be count or anything
I hope this helps.

Duplicate elements in Django Paginate after `order_by` call

I'm using Django 1.7.7.
I'm wondering if anyone has experienced this. This is my query:
events = Event.objects.filter(
Q(date__gt=my_date) | Q(date__isnull=True)
).filter(type__in=[...]).order_by('date')
When I try to then paginate it
p = Paginator(events, 10)
p.count # Gives 91
event_ids = []
for i in xrange(1, p.count / 10 + 2):
event_ids += [i.id for i in p.page(i)]
print len(event_ids) # Still 91
print len(set(event_ids)) # 75
I noticed that if I removed the .order_by, I don't get any duplicates. I then tried just .order_by with Event.objects.all().order_by('date') which gave no duplicates.
Finally, I tried this:
events = Event.objects.filter(
Q(date__gt=my_date) | Q(date__isnull=True)
).order_by('date')
p = Paginator(events, 10)
events.count() # Gives 131
p.count # Gives 131
event_ids = []
for i in xrange(1, p.count / 10 + 2):
event_ids += [i.id for i in p.page(i)]
len(event_ids) # Gives 131
len(set(event_ids)) # Gives 118
... and there are duplicates. Can anyone explain what's going on?
I dug into the Django source (https://github.com/django/django/blob/master/django/core/paginator.py#L46-L55) and it seems to be something to do with how Django slices the object_list.
Any help is appreciated. Thanks.
Edit: distinct() has no affect on the duplicates. There aren't any duplicates in the database and I don't think the query introduces any duplicates ([e for e in events.iterator()] doesn't produce any duplicates). It's just when the Paginator is slicing.
Edit2: Here's a more complete example
In [1]: from django.core.paginator import Paginator
In [2]: from datetime import datetime, timedelta
In [3]: my_date = timezone.now()
In [4]: 1 events = Event.objects.filter(
2 Q(date__gt=my_date) | Q(date__isnull=True)
3 ).order_by('date')
In [5]: events.count()
Out[5]: 134
In [6]: p = Paginator(events, 10)
In [7]: p.count
Out[7]: 134
In [8]: event_ids = []
In [9]: 1 for i in xrange(1, p.num_pages + 1):
2 event_ids += [j.id for j in p.page(i)]
In [10]: len(event_ids)
Out[10]: 134
In [11]: len(set(event_ids))
Out[11]: 115
oh, shot in the dark, but i think i might know what it is. i wasn't able to reproduce it in sqlite but using mysql. i think mysql trying to sort on a column that has the same value has it returning the same results during slicing
the pagination splicing basically does an sql statement of
SELECT ... FROM ... WHERE (date > D OR date IS NULL) ORDER BY date ASC LIMIT X OFFSET X
But when date is null I'm not sure how mysql sorts it. So when I tried two sql queries of LIMIT 10 and LIMIT 10 OFFSET 10 it returned sets that had the same rows, while LIMIT 20 produce a unique set.
you can try to update your order_by to order_by('id', 'date') to have it sort by a unique field first and it may fix it.
Try to use .distinct() on your query before passing it to Paginator.

'numpy.int64' error thwarts my automation of querying Google Distance Matrix API. Solutions?

Goal:
To automate obtaining drive duration's by querying a list (see the CSV setup below) of Zipcodes ('Origin_Zip') to Addresses ('Destination_BH') using Google Distance Matrix API to obtain drive time (minutes) in the, "time_to_BH" row. I am using Pandas to move the data between the csv and call Google matrix. However, I am receiving the following error:
Error:
TypeError: argument of type 'numpy.int64' is not iterable
I am using this GitHub as blueprint to structure the Google Maps Distance portion. I am using Python 2.7.
Code:
from google import search
import pandas as pd
from pandas import DataFrame
import googlemaps
from googlemaps import convert
from googlemaps.convert import as_list
import datetime
#stores my API code as 'gmaps'
key = '(my API Key)'
client = googlemaps.Client(key)
#establishes: drive time (in minutes), english, non-metric measurements, trip occurs at 1:00pm PST
def distance_matrix(client, origins, destinations,
mode="driving", language="en", avoid=None, units="imperial",
departure_time=None, arrival_time=None, transit_mode=None,transit_routing_preference=None):
#establishes "origin" and "destinations" header format to direct pandas to begin.
params = {
"origins": 'Origin_Zip',
"destinations": 'Destination_BH'
}
#Reads the strings within csv's("drive_ca.csv") rows via the indicated column (usecols=) to automate queryinig, google distance Matrix API calls
df = pd.read_csv('C:\Users\Desktop\drive_ca.csv', usecols=['Origin_Zip'])
#Number indicates outputs to result
stop = 1
#Assigns a column name to iterate
urlcols = ['Destination_BH']
# First, apply() to call the google distance Matrix for each 'row'
# A list is built for the urls return by search()
df[urlcols] = df['Origin_Zip'].apply(lambda Origin_Zip : pd.Series([destinations for destinations in search(Origin_Zip, stop=stop, pause=5.0)][:stop]))
departure_time = datetime.datetime.fromtimestamp(1428580693);
if mode:
# NOTE(broady): the mode parameter is not validated by the Maps API
# server. Check here to prevent silent failures.
if mode not in ["driving", "walking", "bicycling", "transit"]:
raise ValueError("Invalid travel mode.")
params["mode"] = mode
if language:
params["language"] = language
if avoid:
if avoid not in ["tolls", "highways", "ferries"]:
raise ValueError("Invalid route restriction.")
params["avoid"] = avoid
if units:
params["units"] = units
if departure_time:
params["departure_time"] = convert.time(departure_time)
if arrival_time:
params["arrival_time"] = convert.time(arrival_time)
if departure_time and arrival_time:
raise ValueError("Should not specify both departure_time and"
"arrival_time.")
if transit_mode:
params["transit_mode"] = convert.join_list("|", transit_mode)
if transit_routing_preference:
params["transit_routing_preference"] = transit_routing_preference
print params
return client._get("/maps/api/distancematrix/json", params)
#prints corresponding duration to the indicated header row in "drive_ca.csv"
df.to_csv('C:\Users\Desktop\drive_ca.csv', usecols=['Destination_BH'])
Complete Traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-1a75d4fe26fb> in <module>()
34 # First, apply() to call the google distance Matrix for each 'row'
35 # A list is built for the urls return by search()
---> 36 df[urlcols] = df['Origin_Zip'].apply(lambda Origin_Zip : pd.Series([destinations for destinations in search(Origin_Zip, stop=stop, pause=5.0)][:stop]))
37
38
C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\series.pyc in apply(self, func, convert_dtype, args, **kwds)
2056 values = lib.map_infer(values, lib.Timestamp)
2057
-> 2058 mapped = lib.map_infer(values, f, convert=convert_dtype)
2059 if len(mapped) and isinstance(mapped[0], Series):
2060 from pandas.core.frame import DataFrame
C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:56997)()
<ipython-input-4-1a75d4fe26fb> in <lambda>(Origin_Zip)
34 # First, apply() to call the google distance Matrix for each 'row'
35 # A list is built for the urls return by search()
---> 36 df[urlcols] = df['Origin_Zip'].apply(lambda Origin_Zip : pd.Series([destinations for destinations in search(Origin_Zip, stop=stop, pause=5.0)][:stop]))
37
38
C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\google.pyc in search(query, tld, lang, num, start, stop, pause, only_standard)
174
175 # Prepare the search string.
--> 176 query = quote_plus(query)
177
178 # Grab the cookie from the home page.
C:\Users\AppData\Local\Continuum\Anaconda2\lib\urllib.pyc in quote_plus(s, safe)
1290 def quote_plus(s, safe=''):
1291 """Quote the query fragment of a URL; replacing ' ' with '+'"""
-> 1292 if ' ' in s:
1293 s = quote(s, safe + ' ')
1294 return s.replace(' ', '+')
TypeError: argument of type 'numpy.int64' is not iterable
.CSV Setup:
It's possible to diagnose the problem just by looking at the traceback. Working backwards from where the exception was raised:
C:\Users\AppData\Local\Continuum\Anaconda2\lib\urllib.pyc in quote_plus(s, safe)
1290 def quote_plus(s, safe=''):
1291 """Quote the query fragment of a URL; replacing ' ' with '+'"""
-> 1292 if ' ' in s:
1293 s = quote(s, safe + ' ')
1294 return s.replace(' ', '+')
TypeError: argument of type 'numpy.int64' is not iterable
This tells me that s is a numpy.int64 rather than a string. s is the query input to quote_plus(query):
C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\google.pyc in search(query, tld, lang, num, start, stop, pause, only_standard)
174
175 # Prepare the search string.
--> 176 query = quote_plus(query)
177
178 # Grab the cookie from the home page.
From looking at the part after "in", which shows where these lines were executed, I can tell that query is the first argument to the google.search() function:
search(query, tld, lang, num, start, stop, pause, only_standard)
Without even looking at the documentation, I can therefore infer from the traceback that search expects its first argument to be a string, but it is currently getting a numpy.int64.
The input to google.search() is generated by this nasty-looking lambda function:
C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:56997)()
<ipython-input-4-1a75d4fe26fb> in <lambda>(Origin_Zip)
34 # First, apply() to call the google distance Matrix for each 'row'
35 # A list is built for the urls return by search()
---> 36 df[urlcols] = df['Origin_Zip'].apply(lambda Origin_Zip : pd.Series([destinations for destinations in search(Origin_Zip, stop=stop, pause=5.0)][:stop]))
37
38
The relevant part is search(Origin_Zip, stop=stop, pause=5.0). Each Origin_Zip here will be a value taken from the 'Origin_Zip' column of df, which pinpoints the source of the problem: df['Origin_Zip'] should contain strings, but at the moment it contains numpy.int64s.
Based on your screenshot, I'm guessing that since the string values in the CSV file look like '90278', pandas is automatically converting them to integer values. If you convert that column to strings then the problem will probably go away, for example:
df['Origin_Zip'] = df['Origin_Zip'].astype(str)

Unhashable type error with sklearn and importing a CSV

I'm trying to execute the below code and I don't understand what I'm doing wrong. The purpose of the code is to use Python's & sklearn's train_test_split function to partition the data into training and testing chunks.
The data (downloadable here) is cost of rent data for various houses/condos, along with each house/condo's properties. Ultimately I'm trying to use predictive modeling to predict rent prices (so rent prices are the target). Here's the code:
import pandas as pd
rentdata = pd.read_csv('6000_clean.csv')
import sklearn as sk
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
#trying to make a all rows of the first column and b all rows of columns 2-46, i.e., a will be only target data (rent prices) and b will be the data.
a, b = rentdata[ : ,0], rentdata[ : ,1:46]
What results is the following error:
TypeError Traceback (most recent call last)
<ipython-input-24-789fb8e8c2f6> in <module>()
8 from sklearn.cross_validation import train_test_split
9
---> 10 a, b = rentdata[ : ,0], rentdata[ : ,1:46]
11
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
2001 # get column
2002 if self.columns.is_unique:
-> 2003 return self._get_item_cache(key)
2004
2005 # duplicate columns
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
665 return cache[item]
666 except Exception:
--> 667 values = self._data.get(item)
668 res = self._box_item_values(item, values)
669 cache[item] = res
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
1653 def get(self, item):
1654 if self.items.is_unique:
-> 1655 _, block = self._find_block(item)
1656 return block.get(item)
1657 else:
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _find_block(self, item)
1933
1934 def _find_block(self, item):
-> 1935 self._check_have(item)
1936 for i, block in enumerate(self.blocks):
1937 if item in block:
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _check_have(self, item)
1939
1940 def _check_have(self, item):
-> 1941 if item not in self.items:
1942 raise KeyError('no item named %s' % com.pprint_thing(item))
1943
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\index.pyc in __contains__(self, key)
317
318 def __contains__(self, key):
--> 319 hash(key)
320 # work around some kind of odd cython bug
321 try:
TypeError: unhashable type
You can download the CSV to get a look at the data here: http://wikisend.com/download/776790/6000_clean.csv
I downloaded your data and modified your problem line to this:
a, b = rentdata.iloc[0], rentdata.iloc[1:46]
iloc selects row by position, see the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
This now selects the first row and rows 2-46 (remember that slicing is open-closed, includes begin of range but not the end of the range)
Note you can always select the first row using head:
a, b = rentdata.head(0), rentdata.iloc[1:46]
would also work
In [5]:
a
Out[5]:
Monthly $ rent 1150
Location alameda
# of bedrooms 1
# of bathrooms 1
# of square feet NaN
Latitude 37.77054
Longitude -122.2509
Street address 1500-1598 Lincoln Lane
# more rows so trimmed for brevity here
.......
In [9]: b
Out[9]:
# too large to paste here
.....
45 rows × 46 columns