odoo14 CE: Restrict merging of picking for same destination and same products - transfer

By default odoo 14 CE merge the picking for the same destination and
for the products. Is there a way / configuration / settings / codes
that can "RESTRICT MERGING" of picking and products.
Example Case:
I have Warehouse A & B and I want to send Internal Transfer via
Transit Warehouse with 1 step with pull rule from Transit/A to B/IN/01
{Delivery Operation Receipt}. I create a A/INT/01 to Transit/A which
odoo will create 3 transfers as follows
A/INT/01 to Transit/A >> Ready Transit/A to B/IN/01 >> Waiting
another operation B/IN/01 to B/Stock >> Waiting another operation If
followed step by step operation everything works fine but if I create
another transfer from A to B say as follows,
A/INT/02 to Transit/A >> Ready Transit/A to B/IN/01 >> Waiting
another operation B/IN/01 to B/Stock >> Waiting another operation
Notice: Transfer A/INT/02 will be merged in existing Transit/A this is
also fine but it will not create B/INT/02 instead all products will be
merged in B/IN/01 and odoo will not create B/IN/02
What I am doing wrong in Transit configuration or how to stop merging
products and picking?
Reason why I need to use "Transit" is >> I need steps for Internal
transfer [Pick+Pack+Ship] as products will be scanned with barcode
reader on every step which is NOT AVAILABLE in Internal Transfer.

Related

Is it possible to write a single Pcollection at different Output sinks without using side inputs?

I have a specific use case for writing my pipeline data. I wanted to make a single Pub/Sub Subscription and wanted to read those from that single source and write the Pcollection at multiple sinks without making another Pub/Sub subscription for it. I've been wanting to make a Pipeline such that I've multiple pipelines in a single dataflow working in parallel and write the same pipeline data, firstly in Google Cloud Storage and Secondly at Bigquery by just using a single subscription.
Code or references for the same would be helpful and bring light to the direction I'm working in.
Thanks in advance!!
You only have to do multi sinks in your Beam job to meet your need.
In Beam you can build a PCollection and then sink this PCollection to multiple places :
Example with Beam Python :
result_pcollection = (inputs | 'Read from pub sub' >> ReadFromPubSub(
subscription=subscription_path)
| 'Map 1' >> beam.Map(your_map1)
| 'Map 2' >> beam.Map(your_map2)
)
# Sink to Bigquery
(result_pcollection | 'Map 3' >> beam.Map(apply_transform_logic_bq)
| 'Write to BQ' >> beam.io.WriteToBigQuery(
project=project_id,
dataset=dataset,
table=table,
method='YOUR_WRITE_METHOD',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER)
)
# Sink to GCS
(result_pcollection | 'Map 4' >> beam.Map(apply_transform_logic_gcs)
| 'Windowing logic' >> WindowInto(FixedWindows(10*60))
| fileio.WriteToFiles(path=known_args.output)
)
To be able to write a streaming flow to GCS, you need applying windowing and generate a file per window.
Yes, this is definitely possible. In Java, you can do something like this:
PCollection<PubsubMessage> messages = p.apply(PubsubIO.read()...);
// Write messages to GCS
messages.apply(TextIO.write()...);
// Write messages to BQ
messages.apply(BigQueryIO.write()...);
The messages will only be consumed once from pubsub. You can define multiple branches of your pipeline that all read from the same PCollection.
The downside here is really around error handling. If your BigQuery sink has errors that cause the pipeline to fail, it will be taking down your GCS output as well. It's harder to reason about these failure scenarios when you have multiple sinks in one pipeline.
You mention "firstly in Google Cloud Storage and Secondly at Bigquery"; if the order of writes is important (you don't want data showing up in BQ if it isn't also in GCS), that's significantly more difficult to express, and you'd likely be better off creating a second pipeline that reads from the GCS output of the first pipeline and writes to BQ.

No of items in PCollection is not affecting allocated no of workers

I have a pipeline which comprises of Three steps. In First step which is a ParDo which accepts 5 urls in a PCollection. And each of the 5 items generate thousands of urls each and output it. So input of second step is another PCollection which can be of size 100-400k. In the last step the scraped output of each url is saved to a storage service.
I have noticed that First step which generates the url list out of 5 input urls got allocated 5 workers and generates new set of urls. But once the first step is completed no of workers get reduced and reach 1. And while second step is running it's only running in 1 worker (with 1 worker my dataflow is runing for last 2 days So by looking at the logs I am making a logical assumption that the first step is completed).
So my question is eventhough the size of the PCollection is big why it's not split between workers or why more workers are not getting allocated ? Step 2 is a simple web scraper which scrape the given url and output a string. Which is then saved to a storage service
Dataflow tries to connect steps together to create fused steps. So even though you have few ParDos in your pipeline, they'll be fused together and will be executed as a single step.
Also once fused, scaling of Dataflow is limited by the step at the beginning of the fused step.
I suspect you have a Create transform that consist of few elements at the top of your pipeline. In this case Dataflow can only scale up to number of elements in this Create transform.
One way to prevent this behavior is the break fusion after one (or more) of your high fanout ParDo transforms. This can be done by adding a Reshuffle.viaRandomKey() transform after it (which contains a GroupByKey). Given that Reshuffle is an identity transform, your pipeline should not require additional changes.
See here for more information regarding fusion and ways to prevent it.

Azure streaming analytics with event hub input stream position

Setup
I use Azure stream analytics to stream data into Azure warehouse staging table.
The input source of the job is a EventHub stream.
I notice when I'm updating the job, the job input event backlog goes up massively after the start.
It looks like the job starting to process the complete EventHub queue again from the beginning.
Questions
how is the stream position management organised in stream analytics
is it possible to define a stream position where the job starts (event after queued after a specific point in time for example)
So far done
I notice a similar question here on StackOverflow.
There is mentioned a variable name "eventStartTime".
But since I use an "asaproj" project within visual studio to create, update and deploy the job I don't know where to place this before deploying.
For updating job without stop, it will use previous setting of "Joboutputstarttime", so it is possible for job starting to process the data from the beginning.
you can stop the job first, then choose "Joboutputstarttime" before you will start the job.
You can reference this document https://learn.microsoft.com/en-us/azure/stream-analytics/start-job to see detailed information for each mode. for your scenario, "When last stopped" mode maybe the one you need and it will not process data from beginning of the eventhub queue.

How to skip slave replication errors on Google Cloud SQL 2nd Gen

I am in the process of migrating a database from an external server to cloud sql 2nd gen. Have been following the recommended steps and the 2TB mysqlsump process was complete and replication started. However, got an error:
'Error ''Access denied for user ''skip-grants user''#''skip-grants host'' (using password: NO)'' on query. Default database: ''mondovo_db''. Query: ''LOAD DATA INFILE ''/mysql/tmp/SQL_LOAD-0a868f6d-8681-11e9-b5d3-42010a8000a8-6498057-322806.data'' IGNORE INTO TABLE seoi_volume_update_tracker FIELDS TERMINATED BY ''^#^'' ENCLOSED BY '''' ESCAPED BY ''\'' LINES TERMINATED BY ''^|^'' (keyword_search_volume_id)'''
2 questions,
1) I'm guessing the error has come about because cloud sql requires LOAD DATA LOCAL INFILE instead of LOAD DATA INFILE? However am quite sure on the master we run only LOAD DATA LOCAL INFILE so not sure how it changes to remove LOCAL while in replication, is that possible?
2) I can't stop the slave to skip the error and restart since SUPER privileges aren't available and so am not sure how to skip this error and also avoid it for the future while the the final sync happens. Suggestions?
There was no way to work around the slave replication error in Google Cloud SQL, so had to come up with another way.
Since replication wasn't going to work, I had to do a copy of all the databases. However, because of the aggregate size of all my DBs being at 2TB, it was going to take a long time.
The final strategy that took the least amount of time:
1) Pre-requisite: You need to have at least 1.5X the amount of current database size in terms of disk space remaining on your SQL drive. So my 2TB DB was on a 2.7TB SSD, I needed to eventually move everything temporarily to a 6TB SSD before I could proceed with the steps below. DO NOT proceed without sufficient disk space, you'll waste a lot of your time as I did.
2) Install cloudsql-import on your server. Without this, you can't proceed and this took a while for me to discover. This will facilitate in the quick transfer of your SQL dumps to Google.
3) I had multiple databases to migrate. So if in a similar situation, pick one at a time and for the sites that access that DB, prevent any further insertions/updates. I needed to put a "Website under Maintenance" on each site, while I executed the operations outlined below.
4) Run the commands in the steps below in a separate screen. I launched a few processes in parallel on different screens.
screen -S DB_NAME_import_process
5) Run a mysqldump using the following command and note, the output is an SQL file and not a compressed file:
mysqldump {DB_NAME} --hex-blob --default-character-set=utf8mb4 --skip-set-charset --skip-triggers --no-autocommit --single-transaction --set-gtid-purged=off > {DB_NAME}.sql
6) (Optional) For my largest DB of around 1.2TB, I also split the DB backup into individual table SQL files using the script mentioned here: https://stackoverflow.com/a/9949414/1396252
7) For each of the files dumped, I converted the INSERT commands into INSERT IGNORE because didn't want any further duplicate errors during the import process.
cat {DB_OR_TABLE_NAME}.sql | sed s/"^INSERT"/"INSERT IGNORE"/g > new_{DB_OR_TABLE_NAME}_ignore.sql
8) Create a database by the same name on Google Cloud SQL that you want to import. Also create a global user that has permission to access all the databases.
9) Now, we import the SQL files using the cloudsql-import plugin. If you split the larger DB into individual table files in Step 6, use the cat command to combine a batch of them into a single file and make as many batch files as you see appropriate.
Run the following command:
cloudsql-import --dump={DB_OR_TABLE_NAME}.sql --dsn='{DB_USER_ON_GLCOUD}:{DB_PASSWORD}#tcp({GCLOUD_SQL_PUBLIC_IP}:3306)/{DB_NAME_CREATED_ON_GOOGLE}'
10) While the process is running, you can step out of the screen session using Ctrl+a
+ Ctrl+d (or refer here) and then reconnect to the screen later to check on progress. You can create another screen session and repeat the same steps for each of the DBs/batches of tables that you need to import.
Because of the large sizes that I had to import, I believe it did take me a day or two, don't remember now since it's been a few months but I know that it's much faster than any other way. I had tried using Google's copy utility to copy the SQL files to Cloud Storage and then use Cloud SQL's built-in visual import tool but that was slow and not as fast as cloudsql-import. I would recommend this method up until Google fixes the ability to skip slave errors.

What is the best way to activate sequentially 2 or more data pipeline on AWS?

I have two distinct pipelines (A and B). When A has terminated I would like to kick off immediately the second one (B).
So far, to accomplish that I have added a ShellCommandActivity with the following command:
aws datapipeline activate-pipeline --pipeline-id <my pipeline id>
Are there other better ways to do that?
You can use a combination of indicator files (zero byte files) & Lambda to loosely couple the two data pipelines. You need to make the following changes -
Data Pipeline - Using a shell command touch a zero byte file as the last step in the data-pipeline in any of the given s3 path
Create a lambda function to watch for the indicator file and activate the Data Pipeline2
Note - This may not be very helpful if you are looking at a simple scenario of just executing two data-pipelines sequentially. However, it's helpful when you want to create an intricate dependency between pipelines viz. you have a set of Staging jobs (each corresponding to one pipeline) and you want to trigger your Data-mart Jobs or derived table jobs after all the staging jobs are completed.