Is there a way to temporarily change an int to a string in mongodb db.find? - regex

In the collection I have there is a field that may be numbers or strings depending on what they are trying to categorize. Each number is like a measurement plus id so the measurement would be 400 with the id .01 so it is put in as 400.01. I do not have the ability to change this database at all but I need to change these doubles into a string so I can perform a regex search on them.
For example I will be asked to find all ids with 400 measurements. So I would need to collect numbers like 400.01, 400.05, etc.
So far I have
db.collection.find{field:{$regex:/^400/m}}
Which works but not for the doubles and I tried:
db.collection.find{{$set:{DX:DX.toString()}:{$regex:/^400/m}}
but it doesn't work.
example:
{"date":99-99-9999,"DX":400.1}
{"date":99-99-9999,"DX":"400.056"}
{"date":00-00-0000,"DX":"5n.05"}
{"date":11-11-1111,"DX":400.03}

Related

Android/ Java : IS there fast way to filter large data saved in a list ? and how to get high quality picture with small storage space in server?

I have two questions
the first one is:
I have large data come from the server I saved it in a list , the customer can filter this data by 7 filters and two by text watcher this thing caused filtering operation to slow it takes 4 seconds in each time
I tried to put the filter keywords like(length or width ...) in one if and (&&) between them
but it didn't give me a result, also I tried to replace the textwatcher by spinner but it's not
useful.
I'm using one (for loop)
So the question: how can I use multi filter for list contain up to 2000 row with mini or zero slow?
the second is:
I saved from 2 to 8 pictures in the server in string form
the question is when I get these pictures from the server how can I show them in high quality?
when I show them I can see the pixels and this is not good for the customer
I don't want these pictures to take large space in the server and at the same time I want it in good quality when I restore them to display
I'm using Android/ Java
Thank you
The answer on my first quistion is if you want using filter (like when you are using online clothes shop and you want to filter it by less price ) you should use the hash map, not ordinary list it will be faster
The answer on my second question is: if you want to save store images in a database you should save it as a link, not a string or any other datatype

Application for filtering database for the short period of time

I need to create an application that would allow me to get phone numbers of users with specific conditions as fast as possible. For example we've got 4 columns in sql table(region, income, age [and 4th with the phone number itself]). I want to get phone numbers from the table with specific region and income. Just make a sql query won't help because it takes significant amount of time. Database updates 1 time per day and I have some time to prepare data as I wish.
The question is: How would you make the process of getting phone numbers with specific conditions as fast as possible. O(1) in the best scenario. Consider storing values from sql table in RAM for the fastest access.
I came up with the following idea:
For each phone number create smth like a bitset. 0 if the particular condition is false and 1 if the condition is true. But I'm not sure I can implement it for columns with not boolean values.
Create a vector with phone numbers.
Create a vector with phone numbers' bitsets.
To get phone numbers - iterate for the 2nd vector and compare bitsets with required one.
It's not O(1) at all. And I still don't know what to do about not boolean columns. I thought maybe it's possible to do something good with std::unordered_map (all phone numbers are unique) or improve my idea with vector and masks.
P.s. SQL table consumes 4GB of memory and I can store up to 8GB in RAM. The're 500 columns.
I want to get phone numbers from the table with specific region and income.
You would create indexes in the database on (region, income). Let the database do the work.
If you really want it to be fast I think you should consider ElasticSearch. Think of every phone in the DB as a doc with properties (your columns).
You will need to reindex the table once a day (or in realtime) but when it's time to search you just use the filter of ElasticSearch to find the results.
Another option is to have an index for every column. In this case the engine will do an Index Merge to increase performance. I would also consider using MEMORY Tables. In case you write to this table - consider having a read replica just for reads.
To optimize your table - save your queries somewhere and add index(for multiple columns) just for the top X popular searches depends on your memory limitations.
You can use use NVME as your DB disk (if you can't load it to memory)

Best way to use spreadsheet RegEx to extract text and numbers and replace with the formatting?

I'm currently working on a non profit project where I need to reformat the way the data in the rows displays.
At the moment, this is how the row data looks:
Save The Children (Donation)|10.00{0}{2}
And I need it to output like this instead:
donation_id:save_children|quantity:1|total:10.00
The first problem is sometimes there's multiple items within the row:
Save The Children (Donation)|10.00{0}{2} / Save The Forrest|15.50{0}{2}
In which case it would need to be separated by a semicolon:
donation_id:save_children|quantity:1|total:10.00;donation_id:save_forrest|quantity:1|total:15.50
The second problem is, we have 9 donation variables/causes, each needing to convert the output to a different "donation_id".
So every time it finds:
Save the Children, it needs to convert to: donation_id:save_children
Save the Forrest, to, donation_id:save_forrest
Save the Animals, to, donation_id:save_animals
And so forth.
And the third problem is that the donation amounts are variable (as people donate whatever they wish), so the "total:" dollar value that we ouput will often be different.
How would I go about doing this with the regex?
Thank you
You can use below regex
(Save) The (Children|Forrest|Animals).*?\|([0-9]+\.[0-9]+)\{0\}\{2\}([\s\/]+)?
substitution/replace with
donation_id:$1_$2|quantity:1|total:$3;
When I test for
Save The Children (Donation)|10.00{0}{2} / Save The Forrest|15.50{0}{2}
Output is
donation_id:Save_Children|quantity:1|total:10.00;donation_id:Save_Forrest|quantity:1|total:15.50;
Test it online!

Vtiger. Change query limit

In vtiger wiki written:
Query always limits its output to 100 records, client application can use limit operator to get different records.
This query does not work:
doQuery("select * from Leads limit='200';")
How to specify the operator in a query?
The "limit" clause only works if the number given is lower than 100. You can't get more records than 100 using "limit" with 1 request.
To get more than 100 records from vTiger services you need to make various request using the "offset" in the "limit" clause.
If you really read the Wiki documentation, you'd see that you need to use:
select *
from Leads
limit 200;
Stop using unnecessary single quotes ('200') - the limit expects a numerical value, there's absolutely no point in converting that to a string (by using single quotes) .....
and drop the equal sign, too - it's not shown in the docs anywhere .....

What are the steps of preprocessing anonymized data for predictive analysis?

Suppose we have a large dataset of anonymized data. Dataset consist if certain number of variables and observations. All we can learn about data is a type(numeric, char, date, etc.) of variable. We can do it by looking to data manually.
What are the best practise steps of pre-proccessing dataset for the further analysis?
Just for instance, let this data set be just one table, so we don't need to check any relations between tables.
This link gives the complete set of validations currently in practice. Still, to start with:
wherever possible, have your data written in such a way that you can parse it as fast and as easily as possible, using your preferred programming language's methods/constructors;
you can verify if all the data types match correctly - like int fields do not contain string data etc;
you can verify that your values are in acceptable range;
check if a non-nullable field has null values;
check if dates are in expected ranges;
check if data follows correct set-membership constraints wherever applicable;
if you have pattern following data like phone numbers, make sure they are in (XXX) XXX-XXXX design, if you prefer them that way;
are the zip codes at correct accuracy level (in US you may have 5 or 9 digits of accuracy);
if your data is time-series, is it complete (i.e. you have values for all dates)?
is there any unwanted duplication?
Hope this is good enough to get you started...