The Trick To Write A Fast (Universal) Java URL Expander

140 characters. Means something to you?

This is about how twitter (and micro-blogging) was born. Even if some profane firefox extensions try to work around this, when it comes to insert (long) urls you may be in trouble to stick to the rule.

And here comes URL shortening services.

Pretty simple: The long URL http://philippeadjiman.com/blog/2009/09/01/can-you-guess-what-is-the-hottest-trend-of-google-hot-trends/ becomes http://bit.ly/miUkz that will nicely fit in your next tweet.

Now everyone wants to shorten URLs. Here is a list of 90 + URL shortening services (!!) without counting the ones that you can build by yourself.

How we (developers) can survive in this jungle if we want to retrieve the real expended version of those tons of URLs?

Well, a naive JAVA version would be:

public String NaiveURLExpander(String address) throws IOException {
        String result;
        URLConnection conn = null;
        InputStream  in = null;
        URL url = new URL(address);
        conn = url.openConnection();
        in = conn.getInputStream();
        result = conn.getURL().toString();
        in.close();
        return result;
    }

Nice. It works. But it is terribly slow.
Why?Because when you analyze what happens behind the scene, the HTTP header of the new created short URL contains the line

HTTP/1.1 301 Moved

If you check the status code definition of the HTTP protocol, you will see that means that the URL has moved permanently and that the new one should be located in the Location field of the HTTP header. In other words, the above java code behaves exactly as your browser: it performs a redirection, which is terribly slow.

So here is the trick:

  1. Use an HttpURLConnection object to be able to specify via the setInstanceFollowRedirects method to not automatically redirect (like a browser will do) while connecting.
  2. Extract the Location value in the HTTP header.

Here you go:

 public String expandShortURL(String address) throws IOException {
        URL url = new URL(address);

        HttpURLConnection connection = (HttpURLConnection) url.openConnection(Proxy.NO_PROXY); //using proxy may increase latency
        connection.setInstanceFollowRedirects(false);
        connection.connect();
        String expandedURL = connection.getHeaderField("Location");
        connection.getInputStream().close();
        return expandedURL;
    }

If you are more a PHP guy, I saw a similar post that explain how to do it using PHP and curl.

Note that for sake of conciseness, I do not manage errors int the code. Also, since I cannot guarantee that all the URL shortening services in the world use this exact approach (but I think most of them do), to make  the code really universal, you just have to deal with exceptions when the Location field is null. Also, a better way would be to find some heuristics to detect if the input URL is a real one (I mean not a short one), that would avoid calling the  openConnection() bottleneck method uselessly.

Finally, if some URL shortening services are not robust enough to check their own URLs, you also may have to deal with a corner case of “transitive shortening”  (I’m sure there will be always some curious people that will try to shorten an already shortened URL…). Update: check this example: http://bit.ly/4XzVxm points to http://tcrn.ch/6c8AU4 which is itself another short url!

Also to achieve real performance, such code should be multithreaded. If you have to expand millions of URLs you would probably need to use many machines. Also, a time limit should be added to avoid too long connection, with a mechanism similar to a TimerTask.

Note that this trick makes the code 5 to 6 times faster. When it comes to deal with millions of short URLs, it makes a difference.