A Generic Method For Sorting (Google Collections) Multiset Per Entry Count

I’m regularly using the excellent google collections library (now final and part of the more general guava libraries). One of the data structure I’m using the most is probably the multiset (a.k.a bag). But most of the time, when I need a multiset to track the number of occurrences of particular entries, I almost always also need to know what is the most occurring entry (or the top N occurring entries).

Let’s take a canonical example: as you are parsing a text, you’re inserting each tokens into a multiset to track their number of occurrences and you simply want to know what are the top N most occurring tokens (ok, if you want to do it on terabytes of data, you might want to start learning hadoop 🙂 ).

I need those kind of statistics  so frequently that I was surprised to not find an existing utility method allowing to sort the entries of a multiset per entry count (or number of occurrences). Here is my attempt to do it in a generic short and efficient way:

public  List> sortMultisetPerEntryCount(Multiset multiset){
	Comparator> occurence_comparator = new Comparator>() {
		public int compare(Multiset.Entry e1, Multiset.Entry e2) {
			return e2.getCount() - e1.getCount() ;
		}
	};
	List> sortedByCount = new ArrayList>(multiset.entrySet());
	Collections.sort(sortedByCount,occurence_comparator);

	return sortedByCount;
}

If you got any other better or most efficient way to do it (or if you know an existing utility method that does it), please share.

If you never used google collections, in addition to the official website, you might find those tutorials (1 and 2)  useful for an introduction.