IP Address storing, sorting, and filtering in MongoDB - regex

I'm trying to figure out how to store IP addresses in a database, particularly MongoDB, so I can easily sort and filter on these addresses. I've looked over several questions:
Most efficient way to store IP Address in MySQL
save IP address in mongoDB
equivalent of INET_ATON() in mongodb
I've implemented an answer from the last question:
// ip example: 192.168.2.1
function inet_aton(ip){
// split into octets
var a = ip.split('.');
var buffer = new ArrayBuffer(4);
var dv = new DataView(buffer);
for(var i = 0; i < 4; i++){
dv.setUint8(i, a[i]);
}
return(dv.getUint32(0));
}
// num example: 3232236033
function inet_ntoa(num){
var nbuffer = new ArrayBuffer(4);
var ndv = new DataView(nbuffer);
ndv.setUint32(0, num);
var a = new Array();
for(var i = 0; i < 4; i++){
a[i] = ndv.getUint8(i);
}
return a.join('.');
}
This actually works wonderfully. I store my IPs as ints, and I convert them back to IPs before I send them to my UI, and because I'm storing them as ints, sorting is a freebie.
The problem becomes filtering. If a user wants to look for an IP that starts with 102.1*, there doesn't seem like a reasonable approach to do this, especially if the user wants to use regexes. If they search for a full IP, that's no problem, but partial matching is a nuisance.
Does anyone have any insight into this issue? I'd love to hear any thoughts.

I guess you can opt for duplication and have two fields in the same document, one with the integer value of your ip address and the other with the string form of it. This would be a one time process during the write. If you have to sort, you can use the integer form and if you have to do any regex searches, you can use the string form of it. All you have to do is to spare the extra space occupied!

Related

SuiteScript 2.0: Are there any search result limitations when executing a saved search via "getInputData" stage of map/reduce script?

I am currently building a map/reduce script in NetSuite which passes the results of a saved search from the getInputData stage to the map stage. This is being done by first running a WHILE loop in the getInputData stage to obtain the internal ids of each entry, inserting into an array, then passing over to the map stage. Like so:
// run saved search - unlimited rows from saved search.
do {
var subresults = invoiceSearch.run().getRange({ start: start, end: start + pageSize });
results = results.concat(subresults);
count = subresults.length;
start += pageSize + 1;
} while (count == pageSize);
var invSearchArray = [];
if(invoiceSearch){
//NOTE: .run().each has a limit of 4,000 results, hence the do-while loop above.
for (var i = 0; i < results.length; i++){
var invObj = new Object();
invObj['invID'] = results[i].getValue({name: 'internalid'});
invSearchArray.push(invObj);
}
}
return invSearchArray;
I implemented it this way because I feared there would be result restrictions, just as the ".run().each" function has (limited to 4000 results).
I made the assumption that passing the search object directly from getInputData to Map would have restricted results of 4000 as well. Can someone offer clarity on whether there are such restrictions? Am I right to fear the script holting prematurely because search results cannot be processed beyond 4000 in the getInputData stage of a map/reduce script?
Any example to aid me in understanding how a search object is processed in a map/reduce script would be most appreciated.
Thanks
If you simply return the Search instance, all results will be passed along to map, beyond the 1000 or 4000 limits of the getRange and each methods.
If the Search has 8500 results, all 8500 will get passed to map.
function getInputData() {
return search.load(...); // alternatively search.create(...)
}

Regex with SQL Server 2008 CLR performance issues

I am trying to understand why is it taking so long to execute a simple query.
In my local machine it takes 10 seconds but in production it takes 1 min.
(I imported the database from production into my local database)
select *
from JobHistory
where dbo.LikeInList(InstanceID, 'E218553D-AAD1-47A8-931C-87B52E98A494') = 1
The table DataHistory is not indexed and it has 217,302 rows
public partial class UserDefinedFunctions
{
[SqlFunction]
public static bool LikeInList([SqlFacet(MaxSize = -1)]SqlString value, [SqlFacet(MaxSize = -1)]SqlString list)
{
foreach (string val in list.Value.Split(new char[] { ',' }, StringSplitOptions.None))
{
Regex re = new Regex("^.*" + val.Trim() + ".*$", RegexOptions.IgnoreCase);
if (re.IsMatch(value.Value))
{
return(true);
}
}
return (false);
}
};
And the issue is that if a table has 217k rows then I will be calling that function 217,000 times! not sure how I can rewrite this thing.
Thank you
There are several issues with this code:
Missing (IsDeterministic = true, IsPrecise = true) in [SqlFunction] attribute. Doing this (mainly just the IsDeterministic = true part) will allow the SQLCLR UDF to participate in parallel execution plans. Without setting IsDeterministic = true, this function will prevent parallel plans, just like T-SQL UDFs do.
Return type is bool instead of SqlBoolean
RegEx call is inefficient: using an instance method once is expensive. Switch to using the static Regex.IsMatch instead
RegEx pattern is very inefficient: wrapping the search string in "^.*" and ".*$" will require the RegEx engine to parse and retain in memory as the "match", the entire contents of the value input parameter, for every single iteration of the foreach. Yet the behavior of Regular Expressions is such that simply using val.Trim() as the entire pattern would yield the exact same result.
(optional) If neither input parameter will ever be over 4000 characters, then specify a MaxSize of 4000 instead of -1 since NVARCHAR(4000) is much faster than NVARCHAR(MAX) for passing data into, and out of, SQLCLR objects.

Best way to compare phone numbers using Regex

I have two databases that store phone numbers. The first one stores them with a country code in the format 15555555555 (a US number), and the other can store them in many different formats (ex. (555) 555-5555, 5555555555, 555-555-5555, 555-5555, etc.). When a phone number unsubscribes in one database, I need to unsubscribe all references to it in the other database.
What is the best way to find all instances of phone numbers in the second database that match the number in the first database? I'm using the entity framework. My code right now looks like this:
using (FusionEntities db = new FusionEntities())
{
var communications = db.Communications.Where(x => x.ValueType == 105);
foreach (var com in communications)
{
string sRegexCompare = Regex.Replace(com.Value, "[^0-9]", "");
if (sMobileNumber.Contains(sRegexCompare) && sRegexCompare.Length > 6)
{
var contact = db.Contacts.Where(x => x.ContactID == com.ContactID).FirstOrDefault();
contact.SMSOptOutDate = DateTime.Now;
}
}
}
Right now, my comparison checks to see if the first database contains at least 7 digits from the second database after all non-numeric characters are removed.
Ideally, I want to be able to apply the regex formatting to the point in the code where I get the data from the database. Initially I tried this, but I can't use replace in a LINQ query:
var communications = db.Communications.Where(x => x.ValueType == 105 && sMobileNumber.Contains(Regex.Replace(x.Value, "[^0-9]", "")));
Comparing phone numbers is a bit beyond the capability of regex by design. As you've discovered there are many ways to represent a phone number with and without things like area codes and formatting. Regex is for pattern matching so as you've found using the regex to strip out all formatting and then comparing strings is doable but putting logic into regex which is not what it's for.
I would suggest the first and biggest thing to do is sort out the representation of phone numbers. Since you have database access you might want to look at creating a new field or table to represent a phone number object. Then put your comparison logic in the model.
Yes it's more work but it keeps the code more understandable going forward and helps cleanup crap data.

<Binary> in sql

I want to select all the binary data from a column of a SQL database (SQL Server Enterprise) using C++ query. I'm not sure what is in the binary data, and all it says is .
I tried this (it's been passed onto me to study off from) and I honestly don't 100% understand the code at some parts, as I commented):
SqlConnection^ cn = gcnew SqlConnection();
SqlCommand^ cmd;
SqlDataAdapter^ da;
DataTable^ dt;
cn->ConnectionString = "Server = localhost; Database=portable; User ID = glitch; Pwd = 1234";
cn->Open();
cmd=gcnew SqlCommand("SELECT BinaryColumn FROM RawData", cn);
da = gcnew SqlDataAdapter(cmd);
dt = gcnew DataTable("BinaryTemp"); //I'm confused about this piece of code, is it supposed to create a new table in the database or a temp one in the code?
da->Fill(dt);
for(int i = 0; i < dt->Rows->Count-1; i++)
{
String^ value_string;
value_string=dt->Rows[i]->ToString();
Console::WriteLine(value_string);
}
cn->Close();
Console::ReadLine();
but it only returns a lot of "System.Data.DataRow".
Can someone help me?
(I need to put it into a matrix form after I extract the binary data, so if anyone could provide help for that part as well, it'd be highly appreciated!)
dt->Rows[i] is indeed a DataRow ^. To extract a specific field from it, use its indexer:
array<char> ^blob=dt->Rows[i][0];
This extracts the first column (since you have only one) and returns an array representation of it.
To answer the question in your code, the way SqlDataAdapter works is like this:
you build a DataTable to hold the data to retrieve. You can fill in its columns, but you're not required to. Neither are you required to give it a name.
you build the adapter object, giving it a query and a connection object
you call the Fill method on the adapter, giving it the previously created DataTable to fill with whatever your query returns.
and you're done with the adapter. At this point you can dispose of it (for example inside a using statement if you're using C#).

Trying to create a system to merge two lists of "buzz words" together, forming every possibility for domain availability checking?

My problem originates from me trying to create names for all my crazy (brilliant?) ideas for business and products, which then need to have their purchasing availability checked for .com domain names.
So I have a pen and paper system where I create two lists of words... List A and List B for example.
I want to find or create a little app where I can create and store custom lists which takes each word from List A, appends each word from List B (to create a total of List A * List B results?)
After the list is compiled of "ListAListB" results, I want to check if the .com domain is available for purchase online via some other method...
And ultimately, create a new list of each combination, along with some sort of visual status like maybe a color or word representing if the combined word is available as a .com...
So I'm basically using a nested for loop structure to index each word in List A, Loop through each word in List B, and create List C?
Then when the list is fully completed, send a CSV? to somewhere online and then somehow get a new list back.
I guess that is my rough thought process.
Any advice in the algorithm to create the list from the two original lists is appreciated.
Any help in the process to check the available domain names online via godaddy, ICANN, etc is appreciated..
Any help as to where I might find this tool already is even more appreciated..
I could probably download a free sdk or tool and write this in a language I suppose, based on my c++ experience from a few years ago, but I am rusty for sure, and haven't actually created anything since college like 3 years ago.
Thank you.
Here's a quick shell script that leverages Chris's answer.
#!/bin/sh
ids_url="http://instantdomainsearch.com/services/quick/?name="
for a in $(< listA); do
for b in $(< listB); do
avail=`wget -qO- $ids_url$a$b | sed -e "s/.*'com':'u'.*//g"`
if [ "$avail" == "" ]; then
echo "$a$b.com unavailable"
else
echo "$a$b.com available"
fi
done
done
It iterates through both lists, hits the DNS service with wget and looks for any results that contain "'com':'a'". Supposing List A contains 'goo', 'foo', and 'arglbar' and List B contains 'gle', the output should look like this:
google.com unavailable
foogle.com unavailable
arglbargle.com available
Pipe it through grep -v unavail to see only the available names.
For checking if a domain you can give try this out:
Request this page:
http://instantdomainsearch.com/services/quick/?name=example
Which will return this json (u = unavailable, a = available)
{'name':'example','com':'a','net':'u','org':'a'}
Then you just need to parse it. You may get blocked if you are checking lots of domains but I doubt it since this site recieves a ton of request from one session so you should be good (I'd pause a 800 miliseconds between each request at least).
C# code for list creation:
// Load up all the lines of each list into string arrays
string[] listA = File.ReadAllLines("listA.txt");
string[] listB = File.ReadAllLines("listB.txt");
// Create a list to hold the combinations
List<string> listC = new List<string>();
// Loop through each line in listA
foreach(string buzzwordA in listA)
{
// Now loop through each word in listB
foreach(string buzzwordB in listB)
listC.Add(buzzwordA + buzzwordB); // Combine them and add it to the listC
}
File.WriteAllLines("listC.txt", listC.ToArray()); // Save all the combos
I didn't check the code but thats the general idea. This is a bad idea for huge lists though because it reads the lists completely into memory at the same time. A better solution is probably reading the files line-by-line using FileStream and StreamReader
Edit (Same idea but this uses filestreams):
// Open up file streams for the lists
using(FileStream _listA = new FileStream("listA.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using(FileStream _listB = new FileStream("listB.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using(StreamReader listA = new StreamReader(_listA))
using(StreamReader listB = new StreamReader(_listB))
using(StreamWriter listC = new StreamWriter("listC.txt"))
{
string buzzwordA = listA.ReadLine();
while(buzzwordA != null)
{
string buzzwordB = listB.ReadLine();
while(buzzwordB != null)
{
listC.WriteLine(buzzwordA + buzzwordB);
buzzwordB = listB.ReadLine();
}
buzzwordA = listA.ReadLine();
// reset the listB stream to the begining
listB.BaseStream.Seek(0, SeekOrigin.Begin);
}
} // All streams and readers are disposed by using statement
For parsing the json try this out: C# Json Parser Library
check out www.bustaname.com
sounds exactly like what you're doing