All posts by adjiman@gmail.com

BeanShell Tutorial: Quick Start On Invoking Your Own Or External Java Code From The Shell

bshsplash3BeanShell is a lightweight scripting language that’s compatible with the Java language.
It provides a dynamic environment for executing Java code in its standard syntax but also allow common scripting conveniences such as loose types, commands, and method closures like those in Perl and JavaScript. It is considered so useful that it should became part of the J2SE at some time in the future (the BeanShell Scripting Language JSR-274 , has passed the voting process with flying colors).

Here I simply describe how to call you own code or any external existing code directly from the bean shell. You first have to download the last bean shell jar release. Let’s suppose that you put it in the directory “C:libs” along with the famous Apache commons lang library. So we suppose that “C:libs” contains two jars called bsh-2.0b4.jar and commons-lang-2.4.jar.

Open a command prompt and type:

java -cp C:libsbsh-2.0b4.jar;C:libscommons-lang-2.3.jar bsh.Interpreter

You should see a prompt “bsh %” indicating that the bean shell session has started. So here an example of session using the method getLevenshteinDistance from the StringUtils utility class of the apache commons lang package:

bsh % import  org.apache.commons.lang.StringUtils;
bsh % d = StringUtils.getLevenshteinDistance("Louisville Slugger", "Lousiville Slugger");
bsh % print(d);
2

Kamagra http://cute-n-tiny.com/category/cute-animals/page/44/ free prescription for levitra UK is the answer to all these questions is here. Constant erectile failure may lead to a breakup or divorce or separation.More over, in the event that you took an overdose supposing it will be more compelling, you are off-base. tadalafil generic cheap There are numerous sites that will provide you with viagra lowest price safe medications at low prices. Propecia provides fantastic results in some men, most men canada in levitra react inside 25 minutes. Note that instead of having to type the precise import, you can type instead:

bsh % import *;

This will trigger a set of “mappings” between the shell and the external jars that you specified in your classpath. By doing this, just remember that you are importing every possible class accessible from the classpath so it may force you to type the full path of classes in the case that two classes exists with the same name in different packages (it happens more often than one may think).

A good intermediary solution is to define a file called .bshrc and to put there all the specific imports that you are usually using. Then, while invoking the interpreter, just set the java system property user.home to the directory containing the .bshrc file. Let’s say for example that it is located in “C:appbshconfig”, you just have to type:

java -Duser.home=C:appbshconfig -cp C:libsbsh-2.0b4.jar;C:libscommons-lang-2.3.jar bsh.Interpreter

Note that you can add to the java command any options that you need (for instance you can use -Xmx if you need to).

For a complete doc of bean shell commands, consult the bean shell documentation page.

For an eclipse plugin allowing you to perform auto-complete from the bean shell and other nice features, take a look at EclipseShell (I didn’t tested it yet but the site contains nice screencasts and documentation).

5 Video Tutorials Of Small To Killer Eclipse Shortcuts

eclipse I believe that when you spend a significant percentage of your time on a specific software, it is an obligation to become “mouse-less” using it. Few years ago when I started to use the powerful eclipse shortcuts, I observed that my productivity was dramatically improving. You’ll be able to find a lot of posts promoting some lists of “Top 10 eclipse shortcuts” (I like this one). I believe that small video tutorials can show more easily (rather than a bunch of screenshots) the power that some shortcuts can unleash.

So here 5 small video tutorials of shortcuts ranging from small ones to killer ones, all of them together making my day on eclipse much more easier and productive. The first two are small ones but still nice and useful. The remaining ones are more advanced and really have impact since you can potentially use them every couple of line of codes.

  1. Ctrl + Alt + Arrow (up or down): duplicating lines.
  2. Impact on productivity: low to medium

    httpvhd://www.youtube.com/watch?v=U80IhJLLxE8

  3. Alt + Arrow (up or down): moving lines
  4. Impact on productivity: low to medium

    httpvhd://www.youtube.com/watch?v=9N8HUiYPAe0

  5. Ctrl +1: How To Directly or Indirectly Use The Power Of Quick Fixes.
  6. Impact on productivity: huge

    httpvhd://www.youtube.com/watch?v=rnixsV-pEYk

  7. Alt + Shift + L: Extract Local Variables
  8. Impact on productivity: medium

    httpvhd://www.youtube.com/watch?v=6YkAKK5XQ5w

  9. Ctrl + Space: Beyond Auto Completion, The Template Assistant (+ customization)
  10. Impact on productivity: high if heavily customized

    httpvhd://www.youtube.com/watch?v=ZYwo6mTkT7A

    Except those, I highly recommend to heavily use those five ones (for which I think a video is less useful):

    • Ctrl + Shift + R (open resources)
    • Ctrl + O (quick outline). Pressing Ctrl + O again will show inherited members.
    • Ctrl + E (quick switch editor). Very handy to navigate between files.
    • Alt + Shift + R (rename variable). A very powerful one since it resolves all the possible dependencies on the renamed variable (works also on filenames).
    • Ctrl + T (quick type hierarchy).

    It not only disturbs the man, but also his spirit as it affects your relationship with your husband or with your partner? Then you need to be serious signs and symptoms. cialis 100mg canada Keep it in mind that your penis totally depend on your blood pressure, so it is very essential to combat these disorders and lead a successful and satisfactory sexual life that every individual desires. levitra online india Not only that I have had a really hard erection, I was horny like hell too. cipla cialis india Thus to satisfy your partner sexually men should opt for the Silagra and have a fulfilling sex life. generic viagra prices
    Become as much mouse-less as possible in Eclipse. Don’t try to start using them all in one day, try to integrate one per day, even week. You’ll end up much more productive anyway.

Google Hot Trends Clustering: The 100 Hottest Queries Tell You About 67.76 Stories In Average

Did you noticed that among the 100 (hourly updated) Google Hot Trends, there are always several hot queries that are related one to the other?

Let’s take  a look at the Hot Trends of the current hour by the time I’m writing this post: Hot Trends of  September 24 at 11PM PST Time (clicking on the keywords won’t work, it is just a local copy of the file at that time). In few seconds, we can spot some similar queries, for instance Hot Trend #5 “sean salisbury” is clearly related to Hot Trend #45 “sean salisbury internet postings” and also to Hot Trend #57 “sean salisbury cell phone incident” (click the picture to enlarge).

SeanClust3

Now, a small quizz: is there a link between Hot Trend #48 “julia grovenburg” and Hot Trend #8 “superfetation”, and what the hell is “superfetation”??.

So first, yes, there is a link between those two queries, and you can discover it if you click on “superfetation” which will give you its related searches:

superfetationDetails

So if you had time to loose, you would be able to click on the 100 queries and use this method to eventually build this cluster of 8 queries:

superfetationClust8

  • The words in the cluster can give more insights of what this story is all about: Julia Grovenburg was pregnant and was pregnant again (apparently during the same pregnancy) which is a phenomenon called superfetation. You can verify it on a news article of the same day:

When it comes to IVF with donor eggs, obese women apparently have normal success rates. tadalafil from india Thus try to avoid them as much as possible. cialis india generic appalachianmagazine.com The buying viagra in australia Canadian government enables pharmacies to give free or low cost medicines because of their high prices but Kamagra is different. All these are taking viagra doctor huge tolls on our mind and body and these are leading us towards one place and that is how gadgets are ruining your sexual life.

newsPregnancy

  • Looking at the cluster, you can also think that the baby after birth was a “19 pound baby” but actually this a completely different breaking news, not linked at all with the previous one. This misleading link shows that related searches is a great feature but not an exact science and sometimes (not often however) some errors can arise in related searches:

wrongRelatedSearches

I have some intuitions about how those related searches are detected and how those errors happens. It’s beyond the scope of this post but if you are interested about it, shoot me an email.

So I implemented a link-based clustering algorithm that knows how to plug to google hot trends data ant that build all that stuff automatically. Two queries are in the same cluster if one of the 3 following conditions is true:

  • the queries themselves are similar
  • one of the query is similar to one of the related searches of the other
  • one of the query related searches is similar to one of the related searches of the other

I used a similarity measure that works well for short text like queries, along with a black list of words to not disturb the similarity with words like “the” or “a”, etc… . I also empirically determined different thresholds for the three different cases described above. If you have more questions about that stuff, feel free to shoot a comment or to contact me.

So How Many Clusters Can I Build Out Of The 100 Google Hot Trends Queries?

You got it from this post title: 67.76 clusters in average (based on crawled data that represents few months of hot trends). Each cluster is supposed to represent a same “story” or breaking news. Note that this number is also dependent of my thresholds and that other algorithms and/or thresholds (more or less strict) can obtain slightly different numbers.

Of course, some errors can also arise, either because of some misleading related searches (like showed above) or because is some cases two queries look very similar but in reality they are speaking about two different things.

As an example of output, see the file generated for the 100 keywords studied in this post.

What It Is Useful For?

First of all it is fun :). Second, in information retrieval, order is always better than the opposite. But much more than that: if you are a breaking news website or blog, you’d better use in your article all the keywords of the same cluster since they represent the hottest searched queries of that particular story represented in its cluster! From an SEO point of view, I think the interest is pretty clear.

BONUS

If you read the post up to here, I’d like to offer you a small bonus :). It is the HUGEST cluster that I was able to observe running my program on the last few years of google hot trends data. I think you already guessed to which breaking news it is related.  Check it out!

Update: Coincidence, the day after I wrote this post the hot trends list was reduced from 100 to 40, so the screenshots and data above are in souvenir of the older version :).

Open Calais From Java: Get Ready To Extract Entities, Facts And Events In 4 Minutes!

I’m a big fan of Open Calais, the well known web service that allows you to perform Named Entity, Facts and Events Extraction on free english text (and now also in french since version 4.0).

In the video tutorial below, I show you how in only 4 minutes you can build the material that allows you to make a call to the Open Calais web service from a Java program, and to  perform Entity, Facts and Events Extraction on a news article took from CNN.

The tutorial supposes that you already have Java and Eclipse for Java EE developers installed along with an Open Calais API developer key (else go get one here, it is a very light process to obtain the key).

Note that you can watch the tutorial in HD.

Also, check the remarks below to more easily reproduce and get more detailed explanations on what you’ll see in the tutorial.

To see the video in its best quality, just click here.

httpvhd://www.youtube.com/watch?v=zUAvGh42tw4

Remarks/Complementary information:

  • The open calais web service WSDL showed in the demo is: http://api.opencalais.com/enlighten/?wsdl
  • The method enlighten which allows to call the Open Calais web service via soap has three parameters:
    • licenseId. This is your API key that you can get here.
    • paramsXML. Those are the INPUT parameters of the service in XML format (documentation here). In the tutorial, for sake of simplicity I put the parameter as a raw String, of course it is better to read them from a file. Here are the parameters that I used:  calaisParams.xml.
    • content. This is the content on which the extraction will be performed. Again, for sake of simplicity I put the parameter as a raw String, and again, it is of course better to read it from a file (put whatever free text you want there). Here the content I used (from CNN).

    Erectile levitra brand cheap dysfunction also known as impotence is a big slap on your manhood. Well, this could pharmacy viagra be a rare case if analyzed. It tadalafil cialis first leads to pain. Bond can shoot his buy viagra manly way to the heart muscle, heart attacks, and many more other symptoms.

  • Pasting in a Java source code a long text copied from the web can be a nightmare because of the escape characters. The workaround I used in the demo is this general converter that knows (among other things) where to add the ” automatically at the good place.
  • Here is the output of the tutorial.
  • Here is the list of Open Calais possible outputs.

If you’re like me, you’re obviously more interested about the algorithms behind the scene. To know more about the methods/algorithms involved, you can read about morphological analysis, POS tagging, Shallow Parsing. On the Open Calais website, they also mention in a discussion that they have developed their own rule-based system with their own programming language. They are also using lexicons.

The problems addressed by Open Calais are tough and it’s hard to be perfect, but I think they are doing a pretty good job at it. It would be interesting to compare relevance results with the Alchemy API that offers pretty much the same service.

The Trick To Write A Fast (Universal) Java URL Expander

140 characters. Means something to you?

This is about how twitter (and micro-blogging) was born. Even if some profane firefox extensions try to work around this, when it comes to insert (long) urls you may be in trouble to stick to the rule.

And here comes URL shortening services.

Pretty simple: The long URL http://philippeadjiman.com/blog/2009/09/01/can-you-guess-what-is-the-hottest-trend-of-google-hot-trends/ becomes http://bit.ly/miUkz that will nicely fit in your next tweet.

Now everyone wants to shorten URLs. Here is a list of 90 + URL shortening services (!!) without counting the ones that you can build by yourself.

How we (developers) can survive in this jungle if we want to retrieve the real expended version of those tons of URLs?

Well, a naive JAVA version would be:

public String NaiveURLExpander(String address) throws IOException {
        String result;
        URLConnection conn = null;
        InputStream  in = null;
        URL url = new URL(address);
        conn = url.openConnection();
        in = conn.getInputStream();
        result = conn.getURL().toString();
        in.close();
        return result;
    }

Nice. It works. But it is terribly slow.
Why?Because when you analyze what happens behind the scene, the HTTP header of the new created short URL contains the line

HTTP/1.1 301 Moved

If you check the status code definition of the HTTP protocol, you will see that means that the URL has moved permanently and that the new one should be located in the Location field of the HTTP header. In other words, the above java code behaves exactly as your browser: it performs a redirection, which is terribly slow.

So here is the trick:
But most physicians have made http://cute-n-tiny.com/cute-animals/cat-and-horse-pals/ order uk viagra as their preference solution to bring impotency back to controlled stage. It is likewise helps the muscles in the penis to get levitra online order stiff, or uphold penis enduring to absolute sexual deed. Therefore always validate the credibility and effectiveness of the medicine can online viagra overnight cute-n-tiny.com be achieved for about 5 hours. Leave that to the generic viagra soft big dogs, and find something with less competition.

  1. Use an HttpURLConnection object to be able to specify via the setInstanceFollowRedirects method to not automatically redirect (like a browser will do) while connecting.
  2. Extract the Location value in the HTTP header.

Here you go:

 public String expandShortURL(String address) throws IOException {
        URL url = new URL(address);

        HttpURLConnection connection = (HttpURLConnection) url.openConnection(Proxy.NO_PROXY); //using proxy may increase latency
        connection.setInstanceFollowRedirects(false);
        connection.connect();
        String expandedURL = connection.getHeaderField("Location");
        connection.getInputStream().close();
        return expandedURL;
    }

If you are more a PHP guy, I saw a similar post that explain how to do it using PHP and curl.

Note that for sake of conciseness, I do not manage errors int the code. Also, since I cannot guarantee that all the URL shortening services in the world use this exact approach (but I think most of them do), to make  the code really universal, you just have to deal with exceptions when the Location field is null. Also, a better way would be to find some heuristics to detect if the input URL is a real one (I mean not a short one), that would avoid calling the  openConnection() bottleneck method uselessly.

Finally, if some URL shortening services are not robust enough to check their own URLs, you also may have to deal with a corner case of “transitive shortening”  (I’m sure there will be always some curious people that will try to shorten an already shortened URL…). Update: check this example: http://bit.ly/4XzVxm points to http://tcrn.ch/6c8AU4 which is itself another short url!

Also to achieve real performance, such code should be multithreaded. If you have to expand millions of URLs you would probably need to use many machines. Also, a time limit should be added to avoid too long connection, with a mechanism similar to a TimerTask.

Note that this trick makes the code 5 to 6 times faster. When it comes to deal with millions of short URLs, it makes a difference.

Can You Guess What Is The Hottest Trend Of Google Hot Trends ?

screenshot019Either if you are working in SEO, or if you are a  “trends hacker”, or if you love like me doing useless comparisons like hanukkah vs passover, you obviously know the fantastic google trends tool.

I’m even more fascinated by the google hot trends functionality that shows the 100 hottest English queries typed in the world right now (actually the 100 fastest-rising ones in the current hour, else you would always see generic terms like ‘weather’).

I asked myself a simple question: is there some queries that always appearing over and over in this top 100 list? Can we discover patterns of queries? To answer it, I write for fun a simple crawler to crawl the daily list since the service exists (May 15, 2007) and I generated a list of the hottest phrases (meaning the hottest n-grams of words, not queries).

Can you guess if there is a clear winner?

Actually there is one. The phrase “lyrics”.  As of today (August 31 2009), it always appears to be the most frequent hottest keyword in different settings:

  • 759 occurrences if you consider the whole daily top 100 list. Think about it: since May 15, 2007,  it’s been 809 days (thanks Jeffrey). Even if it appears sometimes several times in a single day, it means that almost everyday, the word lyrics appears in the 100 hottest English queries in the world!!!
  • 207 occurrences if you consider only the daily top 10 list.
  • 124 occurrences if you consider only the daily top 5 list.
  • 34 occurrences if you consider only the daily hottest keyword.

As such, one has to make sure that the course must be accepted under the state and is one among the foremost acknowledged words worldwide. each man needs to be an excellent lover, however nature has not precocious online prescription viagra without USA equally and a few men have gone for medical help while other men are usually seen preferring treatment without their partner’s knowledge. For an individual to offer the ideal well being, it is very important have a very suitable harmonize between both of the drugs relate to appearance and cost. best buy on cialis One of my favorite cleaners is vardenafil canadian pharmacy icks.org baking soda. One such trouble which has also become the biggest reason for viagra 20mg cipla so many relations to end is the disorder named erectile dysfunction.
But again, ‘lyrics’ is always the top ranked phrase of all the lists  I generated. Seems however like a decreasing trend.

What about other phrases?  Here are few other examples of the top phrases appearing over and over in all day top world queries. Note that you don’t necessarily want to  build a business around one of those hot topics since all of them are in general already overcrowded niches.

What about patterns? If you perform some entity extraction  you can observe some recurring patterns  like ‘XXX death or ‘XXX divorce where XXX is the name of a celebrity. I also noticed that users are much more interested in celebrities divorces than marriages :).

In summary, Google hot trends is fun. In the new real time web buzz, this service is not really meant to be a competitor, but it is still my favorite way of feeling the pulse of the web.