Unshorten URLs

From Colin Hardy
Jump to: navigation, search

TL;DR


It is often necessary to inspect the contents of websites that are masked behind a short URL service, such as those provided by goo.gl and bitly.com. When the website you want to inspect, or traverse to is malicious you'll also often find that there are many hops in the path. When you have just one or two of these sites to examine, it's pretty easy to keep a record of the URL hops by proxying your traffic through the likes of Burp and viewing the HTTP history, however when you have a lot of URLs to examine, copying and pasting and grabbing the output from Burp can be time consuming therefore we can use some simple script to parse each URL and view each hop in the path.

Code


Python is the language of choice for this piece of code, simply because you can run it on most any system and it provides a very quick and clean method for parsing the info we're interested in. Here, the pseudo code of what we want to achieve is:

1. Pass a file in to the program via the command line
2. Read each line in the file
3. Send a request to each URL, and follow any redirects which may occur
4. Print the output the console in an easy to read fashion
5. Skip URLs which don't exist, or any other error which may occur

So, let's translate this to Python:

#!/usr/bin/python

import requests, fileinput
import sys

def main(argv):
	inputfile = ''
	# ensure correct usage
	if len(sys.argv) < 2:
		print "Usage : " + sys.argv[0] + " <inputfile>"
		sys.exit(1)
	else:
		inputfile = sys.argv[1]

	# iterate each line in the input file
	for line in fileinput.input([inputfile]):
		line = line.rstrip('\n')
		# use try, except pass to skip over any errors
		try:
			r = requests.get(line, allow_redirects=True, timeout=2.0)
			# print each hop in the request history
			for idx, hop in enumerate(r.history):
				for i in range(idx):
					print "-",
				print hop.url
		except:
			print "Skipping over " + line
			pass

if __name__ == "__main__":
	main(sys.argv[1:])

As you can see from the code, this is nice and simple. Python has an excellent library called requests which handles all the hard work under the hood. The code ensures an argument is passed at the command line, and leaves it up to the python interpreter to spit out an error if the file does not exist. Feeding it a sample file containing two URLs:

$ cat ~/urls.txt
https://goo.gl/8VsrRt
http://www.colin.guru
$ python unshorten.py ~/urls.txt
https://goo.gl/8VsrRt
- http://www.colin.guru/
- - https://www.colin.guru/
http://www.colin.guru/
- https://www.colin.guru/

And there you have it! A super simple way to unshorten multiple URLs and view each hop in the chain.