Crunch - How to generate specific password list? - list

First Hello! I have a crunch question:
We need to generate some list to crack password which has lower, upper case letters and numbers.
exp: a2jXBv69
We can do it simple way by using "crunch 8 8 abcABC123..." but it will create a huge list with a lot of useless passwords containing only upper case latter or just lower case letters or just numbers which cost a lot of time, power, and its not effective.
Now I didn't find still or I didn't look on right place how to generate only passwords which have upper-lower cases and number without other useless passwords like 214364984941 AIDJFISDOGFJO ssasadasd...
My question need is how to generate a list with only what I need like in this example?
Sorry if you don't understand my English I don't speak or write it very well.

The best way to accomplish this would probably be to use a dictionary.
https://github.com/danielmiessler/SecLists/tree/master/Passwords/Common-Credentials
There is no real practical way to "generate" common passwords. Sure there are some tricks, like checking how spellable a word is or adding some numbers e.g. 123 to the end of the password, but they don't nearly produce the same quality passwords as the above-mentioned password list.

Related

Regex - How can you identify strings which are not words?

Got an interesting one, and can't come up with any solid ideas, so thought maybe someone else may have done something similar.
I want to be able to identify strings of letters in a longer sentence that are not words and remove them. Essentially things like kuashdixbkjshakd
Everything annoyingly is in lowercase which makes it more difficult, but since I only care about English, I'm essentially looking for the opposite of consonant clusters, groups of them that don't make phonetically pronounceable sounds.
Has anyone heard of/done something like this before?
EDIT: this is what ChatGpt tells me
It is difficult to provide a comprehensive list of combinations of consonants that have never appeared in a word in the English language. The English language is a dynamic and evolving language, and new words are being created all the time. Additionally, there are many regional and dialectal variations of the language, which can result in different sets of words being used in different parts of the world.
It is also worth noting that the frequency of use of a particular combination of consonants in the English language is difficult to quantify, as the existing literature on the subject is limited. The best way to determine the frequency of use of a particular combination of consonants would be to analyze a large corpus of written or spoken English.
In general, most combinations of consonants are used in some words in the English language, but some combinations of consonants may be relatively rare. Some examples of relatively rare combinations of consonants in English include "xh", "xw", "ckq", and "cqu". However, it is still possible that some words with these combinations of consonants exist.
You could try to pass every single word inside the sentence to a function that checks wether the word is listed inside a dictionary. There is a good number of dictionary text files on GitHub. To speed up the process: use a hash map :)
You could also use an auto-corretion API or a library.
Algorithm to combine both methods:
Run sentence through auto correction
Run every word through dictionary
Delete words that aren't listed in the dictionary
This could remove typos and words that are non-existent.
You could train a simple model on sequences of characters which are permitted in the language(s) you want to support, and then flag any which contain sequences which are not in the training data.
The LangId language detector in SpamAssassin implements the Cavnar & Trenkle language-identification algorithm which basically uses a sliding window over the text and examines the adjacent 1 to 5 characters at each position. So from the training data "abracadabra" you would get
a 5
ab 2
abr 2
abra 2
abrac 1
b 2
br 2
bra 2
brac 1
braca 1
:
With enough data, you could build a model which identifies unusual patterns (my suggestion would be to try a window size of 3 or smaller for a start, and train it on several human languages from, say, Wikipedia) but it's hard to predict how precise exactly this will be.
SpamAssassin is written in Perl and it should not be hard to extract the language identification module.
As an alternative, there is a library called libtextcat which you can run standalone from C code if you like. The language identification in LibreOffice uses a fork which they adapted to use Unicode specifically, I believe (though it's been a while since I last looked at that).
Following Cavnar & Trenkle, all of these truncate the collected data to a few hundred patterns; you would probably want to extend this to cover up to all the 3-grams you find in your training data at least.
Perhaps see also Gertjan van Noord's link collection: https://www.let.rug.nl/vannoord/TextCat/
Depending on your test data, you could still get false positives e.g. on peculiar Internet domain names and long abbreviations. Tweak the limits for what you want to flag - I would think that GmbH should be okay even if you didn't train on German, but something like 7 or more letters long should probably be flagged and manually inspected.
This will match words with more than 5 consonants (you probably want "y" to not be considered a consonant, but it's up to you):
\b[a-z]*[b-z&&[^aeiouy]]{6}[a-z]*\b
See live demo.
5 was chosen because I believe witchcraft has the longest chain of consonants of any English word. You could dial back "6" in the regex to say 5 or even 4 if you don't mind matching some outliers.

google-speech-api and overriding phone number recognition

Does anyone know if there is a way to manipulate the recognition of phone numbers when using the Google Speech API? I am trying to implement a transcription scenario where a caller will say a string of letters and numbers, but the logic out of the box seems to be to try to fit any sequence of numbers to a phone number scheme, even if it means rendering letters into numbers they may sound vaguely similar to (or not). I have tried using speech contexts to manipulate the values within the "phone number" by typing out and giving the entire thing as it should be as a speech context ("eight seven seven two bee three seven", for example), but it refuses to override the digits being interpreted as a phone number. Has anyone encountered this issue or is aware of any way in which this could be worked around?
Thanks!
I'm not aware of an easy way to do this. For the Web Speech API for JavaScript, doing the following seems to yield fewer results that are forced into phone number format.:
Set the maxAlternatives = 2, e.g.,
var recognition = new speechRecognition();
recognition.maxAlternatives = 2;
Then use the second result offered, e.g.,
constr speechToText = event.results[0][1].transcript
You can get pretty far by processing the result. A remaining challenge is that since the result often clumps digits together, you lose the distinction between a series of single digit numbers and one multi-digit number (e.g., '15' & '1', '5'). The utility of this approach depends on the specifics of the numbers your app is trying to capture.
In at least one case, setting the language to en-PH (English Philippines) seems to have fixed, or at least notably improved, this problem. Other English language options might work as well.
en-GB comes back as a UK formatted number where they put one digit first then the rest of the number.

Backend challenge in Python

For the past few days I have taken a challenge to write an algorithm in Python 2.7, to find a passphrase given a few hints.
Specifically:
An anagram of the passphrase is: "poultry outwits ants"
The MD5 hash of the secret phrase is "4624d200580677270a54ccff86b9610e"
A Wordlist
What approaches would you use to find the password? I know that brute-forcing all the possible combinations is the sure way but also the one taking way too long to complete (never, i think it is around 20^20 possible combinations).
What I have come up with is to filter the wordlist based on the letters that exist in the anagram and the words. Meaning, if I find a word that contains a letter that is not in the passphrase, I discard it. In addition, I wanted to take into consideration the frequencies of the characters. So I removed from the wordlist any word that has characters with frequency higher than the character frequency of the passphrase. The above was performed on word-level, meaning I checked each word individual. Eventually, from 90k+ words from the wordlist I narrowed them down to 1700 unique words that can be used for the passphrase.
Question 1: Is there anything more that can be done in order to reduce the number of possibilities? Is there a more clever way to take into account the frequencies of the letters?
Next, I thought that since I narrowed it down to 1.7k words maybe I could try permutations (itertools.permutations()) of these words that could possible match the md5 hash of passphrase. Since the anagram of the passphrase contains 2 space characters, I assumed that the passphrase is also three words and not just scrambled characters (maybe I am wrong, but at least I should try it first). As a result, I tried checking permutations of three words from the filtered wordlist. As it turned out, neither that kind of approach is fast enough for my laptop to get any results. The program reaches the memory limit, and the computer freezes.
I also thought about taking into consideration pairs of words instead of triplets and somehow match the letter frequencies in order to filter out some possibilities but I did not come up with a way to do it yet.
Question 2: Is there a way that I can get any more information about the passphrase without checking all the permutations, since this task is prohibitive for my laptop (at least for 1.7k words and above).
I tried using hashcat but I found it too complicated. I ran a couple mask attacks on the md5 hash but with no success. I tried brute-forcing it too, since it can use the GPU but it was still impossible. The main reason was that I could not understand the kind of arguments I needed to give. I know there is an extensive wiki, but as someone with close to no background about hash cracking it was not really helpful. In addition, I would prefer if there was a way to do it on my own and use other programs as less as possible.
If you have any suggestions about solving this please let me know. I am doing this for education purposes, so any input on this will be greatly appreciated.
Thanks

Validate Proper Names (with Perl)

I have a census list of 150k last names, and trying to use this to validate the spelling of person names in an existing database.
Obviously there are many ethnic names in my database that don't match the census list, but are clearly not misspelled (Italian names like "Petroni", Swedish names like "Magnusdotter").
I would like to create a function (in Perl) to detect slight variations - i.e. likely mis-spellings - between names in the database and other very popular names in the census list (a frequency number is available).
I can imagine the algorithm, but before I dive in - any suggestions to do this in a reliable way - i.e. one that doesn't throw too many false positives?
Thanks!!
Essentially, you're writing a spell checker. You may want to look into an Open Source, multi-lingual spell checker such as Aspell and see what they do. You might even be able to implement what you want as an aspell dictionary.
There are many algorithms for doing approximate string matching. The Levenshtein distance between words is one algorithm, and there are several Perl modules to calculate it, but Text::Fuzzy looks pretty good.
That's great for comparing a few words, but you have to choose between 150k. You could just see if it's fast enough. You could try caching the result. But it remains an O(n) algorithm. Instead (or in addition) you can create an index using a phonetic matching algorithm. Generally, these index words by what they sound like to allow matching on misspelled words. Once you've generated the index for each word, you can match a new word against the index very quickly. Obviously this is subject to cultural ideas of what words sound like which is why there are many algorithms each with different optimizations. You can create several indexes using different algorithms and try them all.
You can even combine the two and do approximate string matching on the phonetic indexes.

SQL Hash table for words

I'm trying to solve the "find all possible words for a set of letters" problems. There are some good answers out there, but I still can't figure it out.
In my first test, I put the whole dictionary in an array and then looped through each letter. This is super fast, but it takes forever to load the dictionary in the array, and requires huge amount of memory.
So I need to store the dictionary (750,000) letter is a sql database.
I guess there are two solutions to find all the possible words:
Make an advance query that returns all the possible words
Make a simple query that return a fraction of the database with words that might be possible, and then quickly loop through that array and valide the words.
The problem?:
It must be super fast. An iPhone 4 need to be able to get all possible words in under 5-6 seconds so it doesn't hinder the game.
Here's a similar questions:
IOS: Sqlite. Find record fast
Sulthans answer seems like a good idea. Create a hash table, and then:
Bitmask for ASCII letters (ignoring any non-ASCII alphabets). Bit at
position 0 means the word contains "a", at position 1 contains "b"
etc. If we create the same bitmask for our letters, we can select
words such as (wordMask & ~lettersMask) == 0
How do you make the bitmask, hash table, and how do you construct the sql query?
Thanks
sql is probably not the best option. The traditional data structure for storing a collection of words is called a Trie. I'm sure there implementations out there you can find. Someone else will have an answer to that.
The algorithm I envision is to permute the letters you are given, and check each permutation to see if it is in the Trie.