how to find a pattern which is repeated n number of times in a column of a table in informatica - informatica

i have a scenario in which a field, of a particular record, in my table looks like below (array format)
The set of id, email and address can be repeated n number of times for each record. So i need to set up a mapping in informatica where it will give me the output like below:
......waiting for a solution thanks
i tried with substr and instr functions but with that i need to know beforehand how many times the mail id is occurring in a particular record. since the email can be repeated n number of times for each row, hence i am not able to find a way which will dynamically tell my instr function to run for n number of times

Related

Index Match with multiple results greater than zero

I am trying to simplify a table that shows the amount of time that people are working on certain jobs and wanting to present the dataset in a table that only shows the values greater than zero.
The image below shows how the table currently looks, where each person has a % of their time allocated to 1 of 5 jobs across columns.
I am trying to create a table that looks like the below, where it only shows the jobs that each person is working on, and excludes the ones where they have no % of their time allocated.
Wondering if I am going about this in the wrong fashion, any help greatly appreciated!
Thanks
I have been tryin to use an index match function with some if logic for values greater than zero but have been only able to get the first value greater than zero to populate.

Importrange + Query + Matches + Regexp

I am trying to filter out data from a different sheet with a specific account number. However the code below doesn't give out any results
=query(importrange(Setup!B1,"Sheet1!A2:F"),"Select * where Col4 matches '\d\d-\d\d\d\d-[14-7]\d\d\d'",1)
This is supposed to filter out all accounts where the 1st digit in the 3rd group of numbers is either 1,4,5,6 or 7. The width of the account numbers are all the same following the format xx-xxxx-xxxx.
Try using this as your match criteria:
\d{2}-\d{4}-[14-7]{3}\d
Also, while I can't see your data, make sure you actually have a header in the first row of your IMPORTRANGE results (which you've requested with the 1 at the end of the QUERY). If you don't actually have headers, the 1 will leave you with one more result than you want; if that is the case, just remove the ,1 from the end of the QUERY.
If this doesn't produce the results you want, it may be due to mixed data types in your raw data that are being filtered out by the QUERY. In that case, you can try using FILTER and REGEXMATCH instead:
=ArrayFormula(FILTER(IMPORTRANGE(Setup!B1,"Sheet1!A2:F"),REGEXMATCH(IMPORTRANGE(Setup!B1,"Sheet1!D2:D"),"\d{2}-\d{4}-[14-7]{3}\d")))
It is always hard to write complex formulas sight unseen. If none of these solutions (which work in my local sheet) produce the results you expect, I encourage you to share a link to both of your sheets. The raw data sheet being called by IMPORTRANGE can be "View Only"; but you'll want to set the Share permission on the second sheet (the one with the IMPORTRANGE formula itself) to "Anyone with the link..." and "Editor," so that those here can access it to test.

Algorithm to Group Selected Numbers In a List

Given a list of consecutive and unique numbers where some are selected and others are not, I need to create groups that contain all selected numbers. The number of groups should be kept to a minimum, and the number of non-required values in the groups should also be kept to a minimum. The max size of the groups is also a variable.
Example list, where * indicates selected number, and group size is limited to 5:
1*,2,3,4,5*,6*,7,8*,9
The most optimized groups would be [(1) and (5,6,7,8)].
[(1,2,3,4,5) and (6,7,8)] is another possible answer, but it contains more non-selected values, thus is not desirable.
Is there a name for this type of algorithm? I don't need someone to write the code for me, just looking for pointers if this problem is already well known.
For those curious what this is for, I am trying to optimize Modbus TCP register requests. A user may define a list of registers they need, and only continuous groups of registers may be requested at a time. Due to TCP latency, we want to make as few requests as possible, and only request the minimum number of non-required registers.
Try this:
numbers = [1,2,3,4,5,4,5,6,2,4]
groups, current_group = [], []
max_group_size = 4 # here you put your max size
for n in numbers:
is_valid = is_selected(n)
if is_valid:
current_group.append(n)
elif (not is_valid and current_group) or len(current_group) == max_group_size:
groups.append(current_group)
current_group = []
Assuming is_selected is a function that tells you if a number is selected

Application for filtering database for the short period of time

I need to create an application that would allow me to get phone numbers of users with specific conditions as fast as possible. For example we've got 4 columns in sql table(region, income, age [and 4th with the phone number itself]). I want to get phone numbers from the table with specific region and income. Just make a sql query won't help because it takes significant amount of time. Database updates 1 time per day and I have some time to prepare data as I wish.
The question is: How would you make the process of getting phone numbers with specific conditions as fast as possible. O(1) in the best scenario. Consider storing values from sql table in RAM for the fastest access.
I came up with the following idea:
For each phone number create smth like a bitset. 0 if the particular condition is false and 1 if the condition is true. But I'm not sure I can implement it for columns with not boolean values.
Create a vector with phone numbers.
Create a vector with phone numbers' bitsets.
To get phone numbers - iterate for the 2nd vector and compare bitsets with required one.
It's not O(1) at all. And I still don't know what to do about not boolean columns. I thought maybe it's possible to do something good with std::unordered_map (all phone numbers are unique) or improve my idea with vector and masks.
P.s. SQL table consumes 4GB of memory and I can store up to 8GB in RAM. The're 500 columns.
I want to get phone numbers from the table with specific region and income.
You would create indexes in the database on (region, income). Let the database do the work.
If you really want it to be fast I think you should consider ElasticSearch. Think of every phone in the DB as a doc with properties (your columns).
You will need to reindex the table once a day (or in realtime) but when it's time to search you just use the filter of ElasticSearch to find the results.
Another option is to have an index for every column. In this case the engine will do an Index Merge to increase performance. I would also consider using MEMORY Tables. In case you write to this table - consider having a read replica just for reads.
To optimize your table - save your queries somewhere and add index(for multiple columns) just for the top X popular searches depends on your memory limitations.
You can use use NVME as your DB disk (if you can't load it to memory)

Is there a way to temporarily change an int to a string in mongodb db.find?

In the collection I have there is a field that may be numbers or strings depending on what they are trying to categorize. Each number is like a measurement plus id so the measurement would be 400 with the id .01 so it is put in as 400.01. I do not have the ability to change this database at all but I need to change these doubles into a string so I can perform a regex search on them.
For example I will be asked to find all ids with 400 measurements. So I would need to collect numbers like 400.01, 400.05, etc.
So far I have
db.collection.find{field:{$regex:/^400/m}}
Which works but not for the doubles and I tried:
db.collection.find{{$set:{DX:DX.toString()}:{$regex:/^400/m}}
but it doesn't work.
example:
{"date":99-99-9999,"DX":400.1}
{"date":99-99-9999,"DX":"400.056"}
{"date":00-00-0000,"DX":"5n.05"}
{"date":11-11-1111,"DX":400.03}