I am trying to iterate through the cursor over web API, My code
let
FnGetOnePage = (cursor_ as text) => Json.Document(Web.Contents(URLEndPoint,[
Headers = [#"Content-Type"="application/json", #"Authorization"=token],
Content = Text.ToBinary("{""kind"": """ & kind & """,""limit"": """ & Number.ToText(100) & """, ""cursor"": """ & cursor_ & """}")
]))
GeneratedList = List.Generate(
()=> [res = FnGetOnePage("")],
each [res][cursor]<>null,
each [res = FnGetOnePage([res][cursor])]
)
in
x
This is giving me the same list of output responding to the first cursor
Related
I have a Spark Data frame like this and want divide its first row by next row using Scala only and not by Spark Functions. Where getting some issues and looking for suggestions.
+------+
|number|
+------+
|24 |
|12 |
+------+
Here output should be 2 as dividing 24/12 = 2
Below is the code what I am trying :
val spark = SparkSession
.builder()
.appName("Spark Test")
.getOrCreate()
val testDF1 = spark.sql("""select string("24") as number""")
val testDF2 = spark.sql("""select string("12") as number""")
val testDF = testDF1.union(testDF2)
testDF.show(false)
import org.apache.spark.sql.functions._
def divideRows = udf((columnValue: String) => {
var result = ""
val dfAslist = testDF.collectAsList()
result = dfAslist.get(0)/dfAslist.get(1)
result
})
//Applying UDF Method on DataFrame Column
val finalDividedDF = testDF.withColumn("dividedValue", divideRows(testDF("columnValue")))
Output value of this should be 2. Please help.
Although this task can be solved with a simple Window operation if you need to do this custom aggregation without using SparkSql functions you must use an Spark UDAFs. The problem here is that "in a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into". you can use your custom aggregation extract the result in an Array and then explode that column.
Your UDAF code could be something like:
class GenericOperation
(func: (Double, Double) => Double)
extends UserDefinedAggregateFunction {
private def combinations2[A](input: List[A]): List[(A, A)] =
input match {
case h :: t if !t.isEmpty => (h, t.head) :: combinations2(t)
case h => Nil
case Nil => Nil
}
/** Internal buffer to apply transformations **/
private val bufferType = ArrayType(DoubleType)
override def inputSchema: StructType =
StructType(List(StructField("value", DoubleType)))
override def bufferSchema: StructType =
StructType(StructField("internal_buffer", bufferType) :: Nil)
override def dataType: DataType = ArrayType(DoubleType)
override def deterministic: Boolean = true
override def initialize(buffer: MutableAggregationBuffer): Unit = {
buffer(0) = Array[Double]()
}
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
buffer(0) = buffer.getAs[mutable.WrappedArray[Double]](0).+:(input.getAs[Double](0))
}
override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
buffer1(0) = buffer1.getAs[mutable.WrappedArray[Double]](0) ++ (buffer2.getAs[mutable.WrappedArray[Double]](0))
}
override def evaluate(buffer: Row): Any = {
val sequence = buffer.getAs[mutable.WrappedArray[Double]](0)
val c = combinations2(sequence.toList).toList
val combination = c.map(el => func(el._1, el._2)).toArray
combination
}
}
One example application:
import org.apache.spark.sql.functions.{lit, max}
implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
val data = Seq(Row(1, 24.0), Row(2, 12.0), Row(3, 48.0), Row(4, 56.0))
val df = spark
.createDataFrame(spark.sparkContext.parallelize(data),
StructType(List(StructField("order", IntegerType), StructField("value", DoubleType))))
def main(args: Array[String]) : Unit = {
import org.apache.spark.sql.expressions.Window
import spark.implicits._
val aggregation = new GenericOperation((a, b) => a / b)
df.agg(aggregation('value))
.flatMap(el => el.getAs[mutable.WrappedArray[Double]](0)).show()
}
the output is:
+------------------+
| value|
+------------------+
| 2.0|
| 0.25|
|0.8571428571428571|
+------------------+
I didnĀ“t check with a structured streaming source. I think it will work but, besides this a little weird implementation(it should be done with a window, if possible), problems arise what happen between mini batches?. You can better try this option and tell me the problems that you find.
The below script i have to compare Test1 vs Test2.Test1 and Test2 data is mentioned in the bottom .I tried to make it a generic one so that it will work for different devices also.The below script i have to compare Test1 vs Test2.Test1 and Test2 data is mentioned in the bottom .I tried to make it a generic one so that it will work for different devices also
import re
data_cleaned = {}
current_key = ''
action_flag = False
data_group = []
if_found_vlan = True
output = open('./output.txt','r').read()
switch_red = re.findall(r'(\w*-RED\d{0,1})', output)[0]
switch_blue = re.findall(r'(\w*-BLUE\d{0,1})', output)[0]
for line in open('./output.txt'):
m = re.match(r'(\w*-RED\d{0,1}|\w*-BLUE\d{0,1})# sh run vlan \d+', line)
if m:
if not if_found_vlan:
data_cleaned[current_key].append([])
if_found_vlan = False
current_key = m.group(1)
if not data_cleaned.has_key(current_key):
data_cleaned[current_key] = []
continue
mm = re.match(r'vlan \d+', line)
if mm:
if_found_vlan = True
action_flag = True
data_group = []
if action_flag and '' == line.strip():
action_flag = False
data_cleaned[current_key].append(data_group)
if action_flag:
data_group.append(line.replace('\r', '').replace('\n', ''))
if not if_found_vlan:
data_cleaned[current_key].append([])
#print ("+++++++++++++++++ The missing configuration ++++++++++++++\n")
print switch_blue + "#" + " has below missing VLAN config\n "
p = [item for index, item in enumerate(data_cleaned[switch_blue]) if [] != [it for it in item if it not in data_cleaned[switch_red][index]]]
print('\n'.join(['\n'.join(item) for item in p]))
print ("+++++++++++++++++++++++++++++++\n")
print switch_red + "#" + " has below missing VLAN config\n "
q = [item for index, item in enumerate(data_cleaned[switch_red]) if [] != [it for it in item if it not in data_cleaned[switch_blue][index]]]
print('\n'.join(['\n'.join(item) for item in q]))
Update with your raw output, I think use a one-dimensional list to represent your output is not a good way for further handling.
When we handle a data, we first need to clean the data & setup a model easy to handle for further program processing, so I use a dict with a two-dimensional list inside it to model your output, and finally easier to process.
import re
data_cleaned = {}
current_key = ''
action_flag = False
data_group = []
if_found_vlan = True
for line in open('./output.txt'):
m = re.match(r'(Test\d+)# sh run vlan \d+', line)
if m:
if not if_found_vlan:
data_cleaned[current_key].append([])
if_found_vlan = False
current_key = m.group(1)
if not data_cleaned.has_key(current_key):
data_cleaned[current_key] = []
continue
mm = re.match(r'vlan \d+', line)
if mm:
if_found_vlan = True
action_flag = True
data_group = []
if action_flag and '' == line.strip():
action_flag = False
data_cleaned[current_key].append(data_group)
if action_flag:
data_group.append(line.replace('\r', '').replace('\n', ''))
if not if_found_vlan:
data_cleaned[current_key].append([])
print ("+++++++++++++++++ The missing configuration is++++++++++++++\n")
p = [item for index, item in enumerate(data_cleaned['Test2']) if [] != [it for it in item if it not in data_cleaned['Test1'][index]]]
print('\n'.join(['\n'.join(item) for item in p]))
print ("+++++++++++++++++ The missing configuration is++++++++++++++\n")
q = [item for index, item in enumerate(data_cleaned['Test1']) if [] != [it for it in item if it not in data_cleaned['Test2'][index]]]
print('\n'.join(['\n'.join(item) for item in q]))
I am working with ExactTarget FUEL SDK to retrieve data from the SalesForce Marketing Cloud. More specifically I working on calling "Unsub Events"(https://github.com/salesforce-marketingcloud/FuelSDK-Python/blob/master/objsamples/sample_unsubevent.py#L15) but the structure of the SOAP response has couple of deeper nested dictionary objects, which I need to iterate over and place into dataframes. Here is what the response looks like and I need to place each of the variables into seperate dataframe.
(UnsubEvent){
Client =
(ClientID){
ID = 11111111
}
PartnerKey = None
CreatedDate = 2016-07-13 13:37:46.000663
ModifiedDate = 2016-07-13 13:37:46.000663
ID = 11111111
ObjectID = "11111111"
SendID = 11111111
SubscriberKey = "aaa#aaa.com"
EventDate = 2016-07-13 13:37:46.000663
EventType = "Unsubscribe"
TriggeredSendDefinitionObjectID = None
BatchID = 1
List =
(List){
PartnerKey = None
ID = 11111111
ObjectID = None
Type = "aaaa"
ListClassification = "aaa"
}
IsMasterUnsubscribed = False
}]
I have successfully placed all variables in data frames except one "ListClassification". I am getting the error "List instance has no attribute 'ListClassification', my question is why is this happening if I can see the attribute in the response? and is there a fix for the issue?
My Code:
import ET_Client
import pandas as pd
try:
debug = False
stubObj = ET_Client.ET_Client(False, debug)
print '>>>UnsubEvents'
getUnsubEvent = ET_Client.ET_UnsubEvent()
getUnsubEvent.auth_stub = stubObj
getResponse3 = getUnsubEvent.get()
ResponseResultsUnsubEvent = getResponse3.results
#print ResponseResultsUnsubEvent
ClientIDUnsubEvents = []
partner_keys3 = []
created_dates3 = []
modified_date3 = []
ID3 = []
ObjectID3 = []
SendID3 = []
SubscriberKey3 = []
EventDate3 = []
EventType3 = []
TriggeredSendDefinitionObjectID3 = []
BatchID3 = []
IsMasterUnsubscribed = []
ListPartnerKey = []
ListID = []
ListObjectID = []
ListType = []
ListClassification = []
for UnsubEvent in ResponseResultsUnsubEvent:
ClientIDUnsubEvents.append(str(UnsubEvent['Client']['ID']))
partner_keys3.append(UnsubEvent['PartnerKey'])
created_dates3.append(UnsubEvent['CreatedDate'])
modified_date3.append(UnsubEvent['ModifiedDate'])
ID3.append(UnsubEvent['ID'])
ObjectID3.append(UnsubEvent['ObjectID'])
SendID3.append(UnsubEvent['SendID'])
SubscriberKey3.append(UnsubEvent['SubscriberKey'])
EventDate3.append(UnsubEvent['EventDate'])
EventType3.append(UnsubEvent['EventType'])
TriggeredSendDefinitionObjectID3.append(UnsubEvent['TriggeredSendDefinitionObjectID'])
BatchID3.append(UnsubEvent['BatchID'])
IsMasterUnsubscribed.append(UnsubEvent['IsMasterUnsubscribed'])
ListPartnerKey.append(str(UnsubEvent['List']['PartnerKey']))
ListID.append(str(UnsubEvent['List']['ID']))
ListObjectID.append(str(UnsubEvent['List']['ObjectID']))
ListType.append(str(UnsubEvent['List']['Type']))
ListClassification.append(str(UnsubEvent['List']['ListClassification']))
df3 = pd.DataFrame({'ListPartnerKey':ListPartnerKey,'ListID':ListID,'ListObjectID':ListObjectID,'ListType':ListType,
'ClientID':ClientIDUnsubEvents,'PartnerKey':partner_keys3,'CreatedDate':created_dates3,
'ModifiedDate':modified_date3,'ID':ID3,'ObjectID':ObjectID3,'SendID':SendID3,'SubscriberKey':SubscriberKey3,
'EventDate':EventDate3,'EventType':EventType3,'TriggeredSendDefinitionObjectID':TriggeredSendDefinitionObjectID3,
'BatchID':BatchID3,'ListClassification':ListClassification,'IsMasterUnsubscribed':IsMasterUnsubscribed})
print df3
Literally all other attributes are going into the dataframe but not sure why "ListClassification" is not being picked up on.
Thank you in advance for you help!
I am using below code snippet to get the red reduced price from amazon but simehow I am always getting the old price and not the reduced red one.
enter code here`def getSignedUrlAmazon(searchvalue):
params = {'ResponseGroup':'Medium',
'AssociateTag':'',
'Operation':'ItemSearch',
'SearchIndex':'All',
'Keywords':searchvalue}
action = 'GET'`enter code here`
server = "webservices.amazon.in"
path = "/onca/xml"
params['Version'] = '2011-08-01'
params['AWSAccessKeyId'] = AWS_ACCESS_KEY_ID
params['Service'] = 'AWSECommerceService'
params['Timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
# Now sort by keys and make the param string
key_values = [(urllib.quote(k), urllib.quote(v)) for k,v in params.items()]
key_values.sort()
# Combine key value pairs into a string.
paramstring = '&'.join(['%s=%s' % (k, v) for k, v in key_values])
urlstring = "http://" + server + path + "?" + \
('&'.join(['%s=%s' % (k, v) for k, v in key_values]))
# Add the method and path (always the same, how RESTy!) and get it ready to sign
hmac.update(action + "\n" + server + "\n" + path + "\n" + paramstring)
# Sign it up and make the url string
urlstring = urlstring + "&Signature="+\
urllib.quote(base64.encodestring(hmac.digest()).strip())
return urlstring
forgot the price grabbing part:
def getDataAmazon(searchvalue,PRICE, lock):
try:
searchvalue = removeStopWord(searchvalue)
url = getSignedUrlAmazon(searchvalue)
data = etree.parse(url)
root = objectify.fromstring(etree.tostring(data, pretty_print=True))
gd=[]
gd1=[]
counter = 0
if hasattr(root, 'Items') and hasattr(root.Items, 'Item'):
for item in root.Items.Item:
cd={}
priceCheckFlag = False
try :
if hasattr(item.ItemAttributes, 'EAN'):
cd['EAN'] = str(item.ItemAttributes.EAN)
elif hasattr(item, 'ASIN'):
cd['ASIN'] = str(item.ASIN)
cd['productName'] = str(item.ItemAttributes.Title)
if hasattr(item, 'SmallImage'):
cd['imgLink'] = str(item.SmallImage.URL)
elif hasattr(item, 'MediumImage'):
cd['imgLink'] = str(item.MediumImage.URL)
cd['numstar'] = "0"
cd['productdiv'] = '1'
cd['sellername'] = 'Amazon'
if hasattr(item, 'DetailPageURL'):
cd['visitstore'] = str(item.DetailPageURL)
if hasattr(item.ItemAttributes, 'ListPrice') and hasattr(item.ItemAttributes.ListPrice, 'Amount'):
cd['price'] = item.ItemAttributes.ListPrice.Amount/100.0
elif hasattr(item, 'OfferSummary') and hasattr(item.OfferSummary, 'LowestNewPrice'):
cd['price'] = item.OfferSummary.LowestNewPrice.Amount/100.0
Does anyone know how I get Django Filter build an OR statement? I'm not sure if I have to use the Q object or not, I thought I need some type of OR pipe but this doesn't seem right:
filter_mfr_method = request.GET.getlist('filter_mfr_method')
for m in filter_mfr_method:
designs = designs.filter(Q(mfr_method = m) | m if m else '')
# Or should I do it this way?
#designs = designs.filter(mfr_method = m | m if m else '')
I would like this to be:
SELECT * FROM table WHERE mfr_method = 1 OR mfr_method = 2 OR mfr_method = 3
EDIT: Here is what Worked
filter_mfr_method = request.GET.getlist('filter_mfr_method')
list = []
for m in filter_mfr_method:
list.append(Q(mfr_method = m))
designs = designs.filter(reduce(operator.or_, list))
What about:
import operator
filter_mfr_method = request.GET.getlist('filter_mfr_method')
filter_params = reduce(operator.or_, filter_mfr_method, Q())
designs = designs.filter(filter_params)
Something I used before:
qry = None
for val in request.GET.getlist('filter_mfr_method'):
v = {'mfr_method': val}
q = Q(**v)
if qry:
qry = qry | q
else:
qry = q
designs = designs.filter(qry)
That is taken from one of my query builders.