Is there an open source provider of market data? - financial

Basically I need test data for a trading engine, so don't need real-time data, just something that would give me a reasonable frequency of intra-daily(?) prices.

I've just uploaded the file (76MB)
EUR/USD in MySQL database - it contains 5 time intervals (1, 5, 15, 30 and 60 minutes)
those are MYISAM tables, so:
create some database,
turn off MySQL server,
copy those file into database folder
run the server again
It's located at some polish free file hosting provider server, so if the website won't translate into Your language just click the blue button "Pobierz" (at the bottom-left) to start downloading

Related

Can PowerBi dynamically load files?

I think I'm on a severe wishful thinking trip here, please confirm if I am, if not then what would be the process to accomplish this?
Every day a team compiles an Excel file and sends it via email attachment. I have a Power Automate Flow that saves the attachment into a SharePoint space.
I can create a Power Bi and manually connect load those files to create the report, but it seems the real "Power" would be to not have to manually connect the new file (which has the creation date in the filename - eg 'Daily DRR 3-16-22.xlsx') every day, ergo:
What steps (using the Power Platform) would I take to have my PowerBi report auto (dynamically) refresh using the last 5 files (days) in the SharePoint drive? Is that possible?
Thanks in advance!

Data Source Credentials and Scheduled Refresh greyed out in Power BI Service

I have a problem:
I have a PBI file containing three data sources: 2 SQL Server sources + 1 API call.
I have separate queries for each respective data source and an additional query that combines all three queries into a single table.
Both SQL Server sources have been added to a gateway and I can set scheduled refresh for each source, if I publish them in separate PBI files.
However, I cannot set scheduled refresh for the file that contains all three sources - both the data source credentials and scheduled refresh options are greyed out.
The manage gateway section of the settings page also shows no gateway options. If I publish the SQL Server data (with no API data) I can clearly see my data source and gateway under the gateway heading.
screenshot of dataset settings
Does anyone have any idea why this might be happening?
Thank you,
I had the same problem.
I have a PBI file with different data sources : SQL Server sources and APIs.
On The PowerBI Service the Data Source Credentials was grayed out, so here's what I did :
Downloaded the file
Refresh the file locally and signed up on all data sources (the Server of DB Server name changed but not for the APIs)
published in the PBI Service
It worked for me.
Same problem here. After additional poking around I learned that the "Web Source" (API call) was the reason for the inability to refresh and can cause "Data Source Credentials" to be inaccessible. This was annoying to learn after diving down several rabbit holes.
Several (weak) workarounds
Using Excel's Power Query to connect to the web source. Learn more about Excel's Power Query.
Make any needed transformations.
Put the Excel file in SharePoint Online folder or other PBI accessible directory.
Connect to the Excel file using the appropriate data source (i.e. SharePoint Folder).
Alternatively, if the data is static, you can directly copy/paste values into PBI (if you just need to get this done and move on with your life):
Copy target values
Open Power Query Editor
Home tab -> Enter Data
Paste values into table
Hopefully this will save some poor soul a little of their life.

Timeout value in powerbi service

I have a dataset for 3 days I can not update it on Power BI service, knowing that on DESKTOP it is normally updated in 30 minutes. the dataset is powered from a SQL server database, via a data gateway. the data gateway is well updated. the incremental update is activated and the dataset retrieves only the data of the last 3 days to each update.
Here is the generated error message :
Data source error: The XML for Analysis request timed out before it was completed. Timeout value: 17998 sec.
Cluster URI: WABI-WEST-EUROPE-B-PRIMARY-redirect.analysis.windows.net
Activity ID: 680ec1d7-edea-4d7c-b87e-859ad2dee192
Application ID: fcde3b0f-874b-9321-6ee4-e506b0782dac
Time: 2020-12-24 19:03:30Z
What is the solution to this problem please.
Thank you
What license are you using?
Without premium capacity, the max dataset size can be 1GB. Maybe your dataset size has crossed this mark? If you are using the shared capacity, then you can check the workspace utilized storage size by clicking on ellipses at top right corner. Then click on storage to see how much is utilized for that workspace.
Also note that in shared capacity there is a 10GB uncompressed dataset limit at the gateway (but this should not be an issue as you have only 3 day data in refesh)
Also check whether your power query query is foldable (on the final step you should be able to see the 'show native query' option). If not then incremental refresh does not work and results in querying the entire data.
Also, note that 17998 sec means 5 hours. What is your internet speed?

How to maximize DB upload rate with azure cosmos db

Here is my problem. I am trying to upload a large csv file to cosmos db (~14gb) but I am finding it difficult to maximize the throughput I am paying for. On the azure portal metrics overview UI, it says that I am using 73 RU/s when I am paying for 16600 RU/s. Right now, I am using pymongo's bulk write function to upload to the db but I find that any bulk_write length greater than 5 will throw a hard Request rate is large. exception. Am I doing this wrong? Is there a more efficient way to upload data in this scenario? Internet bandwidth is probably not a problem because I am uploading from an azure vm to cosmos db.
Structure of how I am uploading in python now:
for row in csv.reader:
row[id_index_1] = convert_id_to_useful_id(row[id_index_1])
find_criteria = {
# find query
}
upsert_dict = {
# row data
}
operations.append(pymongo.UpdateOne(find_criteria, upsert_dict, upsert=True))
if len(operations) > 5:
results = collection.bulk_write(operations)
operations = []
Any suggestions would be greatly appreciated.
Aaron. Yes,as you said in the comment, migration tool is supported by Azure Cosmos DB MongoDB API. You could find the blow statement in the official doc.
The Data Migration tool does not currently support Azure Cosmos DB
MongoDB API either as a source or as a target. If you want to migrate
the data in or out of MongoDB API collections in Azure Cosmos DB,
refer to Azure Cosmos DB: How to migrate data for the MongoDB API for
instructions. You can still use the Data Migration tool to export data
from MongoDB to Azure Cosmos DB SQL API collections for use with the
SQL API.
I just provide you with a workaround that you could use Azure Data Factory. Please refer to this doc to make the cosmos db as sink.And refer to this doc to make the csv file in Azure Blob Storage as source.In the pipeline,you could configure the batch size.
Surely,you could do this programmatically. You didn't miss something, the error Request rate is large just means you have exceeded the provisioned RUs quota. You could raise up the value of RUs setting. Please refer to this doc.
Any concern,please feel free to let me know.
I'd take a look at the Cosmos DB: Data Migration Tool. I haven't used this with the MongoDB API, but it is supported. I have used this to move lots of documents from my local machine to Azure with great success, and it will utilize RU/s that are available.
If you need to do this programmatically, I suggest taking a look at the underlying source code for DB Migration Tool. This is open source. You can find the code here.
I was able to improve the upload speed. I noticed that each physical partition had a throughput limit (which for some reason, the number of physical partitions times the throughput per partition is still not the total throughput for the collection) so what I did was split the data by each partition and then create a separate upload process for each partition key. This increased my upload speed by (# of physical partitions) times.
I have used ComsodDB Migration tool, which is awesome to send data to CosmosDB without doing much configurations. Even we can send the CSV files which are 14Gb also as per my assumption.
Below is the data which we transferred
[10000 records transferred | throughput 4000 | 500 parellel request | 25 seconds].
[10000 records transferred | throughput 4000 | 100 parellel request | 90 seconds].
[10000 records transferred | throughput 350 | parellel request 10 | 300 seconds].

Framework selection for a new project?

Problem Context
We have a set of excel reports which are generated from an excel input provided by the user and then fed into SAS for further transformation. SAS pulls data from Teradata database and then there is a lot of manipulation that happens with the input data & data pulled from Teradata. Finally, a dataset is generated which can either be sent to the client as a report, or be used for populating Tableau dashboard. Also the database is being migrated from Teradata to Google Cloud (Big Query EDW) as the Teradata pulls from SAS used to take almost 6-7 hours
Problem Statement
Now we need to automate this whole process, by creating front end for the user to upload the input files and from there on the process should trigger and in the end the user should receive the excel file or Tableau dashboard as an attachment in a mail.
Can you suggest what technologies should be used in the front end & middle tier to make this process feasible is least possible time with google cloud platform as the backend?
Can an R shiny front end be a solution given that we need to communicate with a Google Cloud backend ?
I have got suggestion from people that Django will be a good framework to accomplish this task. What are your views on this ?