Extracting Passwords Using Regular Expressions in Python.

This is my second post on regular expressions. Here, I’m attempting to solve one of the practice tasks given at the end of Chapter 7, Automate The Boring Stuff. Instead of merely identifying whether or not a given string is a strong password, I will write a program that takes a list of strings, and returns a list containing those strings which will make valid ‘strong passwords.’

  • Extract potential strong passwords from text in Clipboard
  • Strong password must have both uppercase and lowercase characters, as well as digits. It must also be at least eight characters long, and must not have any spaces.

So for this piece of code, we will be using ‘lookaheads,’ a new concept for me pertaining to the usage of regexes. Firstly, here is some basic information about ‘lookaheads.’ A ‘lookahead’ is like an if condition. It follows the following format, (note that not many websites refer to it in this way, but I find this intuitive) (?=Regex1)(Regex2). Here, Regex2 is considered if and only if (iff) Regex1 is true. Now, this means that like in an if ... else conditional statement, we can write code that could mean: (?=if digit is present)(read string). Also, do note that Regex1 being true doesn’t affect how Regex2 is read. Another thing we need to do is ensure that the we scan only one password, and not the entire text.

We can do this by either specifying all possible characters except spaces inside square brackets as follows. [a-zA-Z0-9_+$%...]*. Or by simply specifying ‘all non-space characters,’ which is much simpler.[^\s]*. One could also use \w*, but this would not scan characters used commonly in passwords, like %, #, @, ! et cetera.

Then, we need three lookbacks. One to check for a lowercase character. One to check for an uppercase character. One to check for a digit. We also need to use {} brackets to ensure that the password is a least 8 characters long. Here is the code:

strongPassword = re.compile(r'''(
    (?=.*\d) 
    (?=.*[a-z])
    (?=.*[A-Z])
    [^\s]{8,}
    )''', re.VERBOSE) 

res = []                            
for potentialPassword in strongPassword.findall(text):
    res.append(potentialPassword)                        
    print(potentialPassword)
    

The easiest one is the last statement, which says accept all non-space containing strings that follow the lookaheads above and are at least 8 characters long. Each lookahead follows the same format.

(?=.*\d), which is equivalent to (?=.*[0-9]) says match any character that is a digit. Similarly, the other two regex components say match any character that is a lowercase alphabet and uppercase alphabet respectively.

We then just output our results.

You can find the full code here at Github.

Personally, I found this really difficult mainly because I didn’t know how to chain multiple lookaheads together. Simply putting them one after the other without the .* required the characters to be ordered (lowercase first etc.). Nesting them like a bad if..else statement provided similar results. I ultimately found the correct solution here on stackoverflow, through an implementation for jquery, although not without surfing for over 1.5 hours.

Moreover, the implementation in the official regex website was absolutely nightmarish. A long, single line that was overflowing onto the right side of their webpage. Turns out that their implementation doesn’t even work, for reasons I couldn’t comprehend.