What Are The 10 Most Cited Websites On Twitter When Tweeting About Hot Trends?

Lately I wrote a post on how to build a relevant real time search engine prototype in few hundreds lines of code.  Using a tailored ranking algorithm based on link popularity in twitter,  I showed that the prototype was able to return very relevant answers in response to very hot queries like the ones that can be found in the hourly updated list of google hot trends.

I wrote a small program on top of this prototype to run an experiment: each hour, the program crawl the new list of hot queries from google hot trends, then it runs the prototype on each of those queries and keep the hottest link found in twitter for the corresponding hot query. I wanted to see which websites were mostly cited in those tweets talking about hot trends.

So I let ran the program for a week, collected the  links (more than a thousand), expanded all those into their long URLs version (using an improved version of my java universal URL expander),  extracted the domain names and compiled the whole into a top 10 list of the most cited websites. Here it is (click to enlarge):

top10twitterBuzzWebsites

The Most Cited Websites When Tweeting About Hot Trends. Click to enlarge.

I was surprised to see some websites that I’ve never heard about before (like wpparty.com or actionnewsblast.com).

To have a better idea for which kind of hot queries/topics those websites are most cited in twitter, find below, for each of those top website, a sample of 5 google hot trends query they covered last week.

Website Sample of 5 covered google hot trends of this past week
www.cnn.com 2011 budget
ipad tablet
cnn.com/haiti360
concorde crash
federer murray
sports.espn.go.com federer tsonga australian open
aaron miles
tom brookshier
jackson jeffcoat
paul pierce
wpparty.com jackson jeffcoat
leon russell
wanamaker mile
buffalo exchange
recalled toyotas
www.huffingtonpost.com governor of virginia
obama republican retreat
obama gop
apple tablet announcement
groundhog prediction
twitpic.com miss america 2010 winner
what celeb do i look like
footprints in the sand
apple itablet
itablet
www.youtube.com i pad
grammy awards 2010
bob kellar
lakers celtics
ipad a disappointment
www.facebook.com general beauregard lee
roberta flack
action express racing
slightly stoopid
rolex 24 hours daytona
www.actionnewsblast.com codswallop meaning
jonathan antin
fred baron
codswallop definition
stevie nicks
www.netnewsticker.com arc energy
ego ferguson
kim burrell
reserveamerica
ivan mccartney
mashable.com national lady gaga day
ipad tablet
ipad thoughts
doppelganger week facebook
tebow super bowl ad

Few remarks:

  • All the links spotted by my prototype and that appear in the table are coming from real tweets around those google hot trends queries.
  • You’ll notice that apple iPad announcement is a theme that was covered by 4 of those top 10 websites!
  • I recommend you to have a look on the youtube video in the table around the google hot trend “ipad a disappointment” :) .
  • I also recommend you to have a look at the haiti 360 view covered by cnn.
  • For twitpic, it is only pics, so what you’ll find there is a sample of “trendy pics” (see below for more on that…)
  • Sometimes the hot query seems to be not connected with the related article at first view (like with fred baron). But when you take a closer look, there is always a connection! This is not for nothing that people tweet about a link with the text of the hot query in the tweet…

To finish, find below a picasa collage that I built using the most cited twitpic pictures in twitter for this past week of hot trends (not only the 5 cited in the table). You’ll identify easily some sarcastic pictures before the iPad announcement or pics around the election of Miss USA. Click the picture to enlarge.

picasaCollageTopPics

Collage of the most cited twitpic links in twitter for a week of google hot trends (Click to enlarge)

If you’re curious to map some pictures with its related hot topic, click the collage to enlarge it and try to guess which pics correspond to which google hot query below :) .

miss america 2010 winner, what celeb do i look like, miss america 2010, roberta flack, lady gaga and elton john, addicted to love, jim florentine, apple itablet, lost season 6 premiere, candy crowley, to make you feel my love, swagger crew, footprints in the sand, gasparilla, miss virginia, duke georgetown, celebrity look alike, katherine putnam, itablet, andrea bocelli, monster diesel, peta ad.

Google Hot Trends Clustering: The 100 Hottest Queries Tell You About 67.76 Stories In Average

Did you noticed that among the 100 (hourly updated) Google Hot Trends, there are always several hot queries that are related one to the other?

Let’s take  a look at the Hot Trends of the current hour by the time I’m writing this post: Hot Trends of  September 24 at 11PM PST Time (clicking on the keywords won’t work, it is just a local copy of the file at that time). In few seconds, we can spot some similar queries, for instance Hot Trend #5 “sean salisbury” is clearly related to Hot Trend #45 “sean salisbury internet postings” and also to Hot Trend #57 “sean salisbury cell phone incident” (click the picture to enlarge).

SeanClust3

Now, a small quizz: is there a link between Hot Trend #48 “julia grovenburg” and Hot Trend #8 “superfetation”, and what the hell is “superfetation”??.

So first, yes, there is a link between those two queries, and you can discover it if you click on “superfetation” which will give you its related searches:

superfetationDetails

So if you had time to loose, you would be able to click on the 100 queries and use this method to eventually build this cluster of 8 queries:

superfetationClust8

  • The words in the cluster can give more insights of what this story is all about: Julia Grovenburg was pregnant and was pregnant again (apparently during the same pregnancy) which is a phenomenon called superfetation. You can verify it on a news article of the same day:

newsPregnancy

  • Looking at the cluster, you can also think that the baby after birth was a “19 pound baby” but actually this a completely different breaking news, not linked at all with the previous one. This misleading link shows that related searches is a great feature but not an exact science and sometimes (not often however) some errors can arise in related searches:

wrongRelatedSearches

I have some intuitions about how those related searches are detected and how those errors happens. It’s beyond the scope of this post but if you are interested about it, shoot me an email.

So I implemented a link-based clustering algorithm that knows how to plug to google hot trends data ant that build all that stuff automatically. Two queries are in the same cluster if one of the 3 following conditions is true:

  • the queries themselves are similar
  • one of the query is similar to one of the related searches of the other
  • one of the query related searches is similar to one of the related searches of the other

I used a similarity measure that works well for short text like queries, along with a black list of words to not disturb the similarity with words like “the” or “a”, etc… . I also empirically determined different thresholds for the three different cases described above. If you have more questions about that stuff, feel free to shoot a comment or to contact me.

So How Many Clusters Can I Build Out Of The 100 Google Hot Trends Queries?

You got it from this post title: 67.76 clusters in average (based on crawled data that represents few months of hot trends). Each cluster is supposed to represent a same “story” or breaking news. Note that this number is also dependent of my thresholds and that other algorithms and/or thresholds (more or less strict) can obtain slightly different numbers.

Of course, some errors can also arise, either because of some misleading related searches (like showed above) or because is some cases two queries look very similar but in reality they are speaking about two different things.

As an example of output, see the file generated for the 100 keywords studied in this post.

What It Is Useful For?

First of all it is fun :) . Second, in information retrieval, order is always better than the opposite. But much more than that: if you are a breaking news website or blog, you’d better use in your article all the keywords of the same cluster since they represent the hottest searched queries of that particular story represented in its cluster! From an SEO point of view, I think the interest is pretty clear.

BONUS

If you read the post up to here, I’d like to offer you a small bonus :) . It is the HUGEST cluster that I was able to observe running my program on the last few years of google hot trends data. I think you already guessed to which breaking news it is related.  Check it out!

Update: Coincidence, the day after I wrote this post the hot trends list was reduced from 100 to 40, so the screenshots and data above are in souvenir of the older version :) .

Can You Guess What Is The Hottest Trend Of Google Hot Trends ?

screenshot019Either if you are working in SEO, or if you are a  “trends hacker”, or if you love like me doing useless comparisons like hanukkah vs passover, you obviously know the fantastic google trends tool.

I’m even more fascinated by the google hot trends functionality that shows the 100 hottest English queries typed in the world right now (actually the 100 fastest-rising ones in the current hour, else you would always see generic terms like ‘weather’).

I asked myself a simple question: is there some queries that always appearing over and over in this top 100 list? Can we discover patterns of queries? To answer it, I write for fun a simple crawler to crawl the daily list since the service exists (May 15, 2007) and I generated a list of the hottest phrases (meaning the hottest n-grams of words, not queries).

Can you guess if there is a clear winner?

Actually there is one. The phrase “lyrics”.  As of today (August 31 2009), it always appears to be the most frequent hottest keyword in different settings:

  • 759 occurrences if you consider the whole daily top 100 list. Think about it: since May 15, 2007,  it’s been 809 days (thanks Jeffrey). Even if it appears sometimes several times in a single day, it means that almost everyday, the word lyrics appears in the 100 hottest English queries in the world!!!
  • 207 occurrences if you consider only the daily top 10 list.
  • 124 occurrences if you consider only the daily top 5 list.
  • 34 occurrences if you consider only the daily hottest keyword.

But again, ‘lyrics’ is always the top ranked phrase of all the lists  I generated. Seems however like a decreasing trend.

What about other phrases?  Here are few other examples of the top phrases appearing over and over in all day top world queries. Note that you don’t necessarily want to  build a business around one of those hot topics since all of them are in general already overcrowded niches.

What about patterns? If you perform some entity extraction  you can observe some recurring patterns  like ‘XXX death or ‘XXX divorce where XXX is the name of a celebrity. I also noticed that users are much more interested in celebrities divorces than marriages :) .

In summary, Google hot trends is fun. In the new real time web buzz, this service is not really meant to be a competitor, but it is still my favorite way of feeling the pulse of the web.