All Posts

 

Deep Dive Into Logistic Regression: Part 1

Deep Dive Into Logistic Regression: Part 1
Learn the fundamental theory behind logistic regression ...
Read More

Deep Dive Into Logistic Regression: Part 2

Deep Dive Into Logistic Regression: Part 2
Want to know how to implement Stochastic Gradient Descent for Logistic regression able to learn millions of parameters using the hashing trick and per-coordinate adaptive learning rate with a tiny memory footprint? This post is for you ...
Read More

Deep Dive Into Logistic Regression: Part 3

Deep Dive Into Logistic Regression: Part 3
In this third and last post of this series, we present the use of a very effective and powerful library to build logistic regression models (among others) in practice: Vowpal Wabbit ...
Read More

A Data Science Exploration From the Titanic in R

A Data Science Exploration From the Titanic in R
Kaggle offered this year a knowledge competition called "Titanic: Machine Learning from Disaster" exposing a popular "toy-yet-interesting" data set around the ...
Read More

How To Easily Build And Observe TF-IDF Weight Vectors With Lucene And Mahout

tfidf
You have a collection of text documents, and you want to build their TF-IDF weight vectors, probably before doing some clustering ...
Read More

What Are The 10 Most Cited Websites On Twitter When Tweeting About Hot Trends?

top10twitterBuzzWebsites
Lately I wrote a post on how to build a relevant real time search engine prototype in few hundreds lines ...
Read More

Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner

combiner
Welcome to the fourth issue of the Hadoop Tutorial Series. Combiners are another important Hadoop's feature that every hadoop developer ...
Read More

Hadoop Tutorial Series, Issue #3: Counters In Action

counters
Note: This post has been updated with a code working for hadoop 0.20.1. In this 3rd issue of the hadoop ...
Read More

How To Build A Relevant Real Time Search Engine Prototype In Few Hundreds Lines Of Code

gootter
By the end of the post you'll find the code along with a small command line JAVA program to play ...
Read More

Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning

partialSortOn2Reducers
In the Issue #1 of this series, we set up the "learning playground" (based on the Cloudera Virtual Machine) in ...
Read More

Hadoop Tutorial Series, Issue #1: Setting Up Your MapReduce Learning Playground

hadoop-logo
Update: Instructions updated for hadoop 0.20.2. This is the first post of a series of small hadoop tutorials introducing progressively ...
Read More

Flexible Collaborative Filtering In JAVA With Mahout Taste

Mahout-logo-164x200
I recently had to build quickly a prototype of recommendation engine for a promising start-up company. I wanted to first ...
Read More

Writing A Token N-Grams Analyzer In Few Lines Of Code Using Lucene

lucene_green_300
If you need to parse the tokens n-grams of a string, you may use the facilities offered by lucene analyzers ...
Read More

Drawing A Zipf Law Using Gnuplot, Java and Moby-Dick

whale
There are many tools out there to build more or less quickly any kind of graphs. Depending on your needs ...
Read More

Flexible Java Profiling And Monitoring Using The Netbeans Profiler

cpuProfile
I have tested a lot of those open source profiler. My preference goes definitely to the integrated Netbeans profiler. It ...
Read More

BeanShell Tutorial: Quick Start On Invoking Your Own Or External Java Code From The Shell

bshsplash3
BeanShell is a lightweight scripting language that’s compatible with the Java language. It provides a dynamic environment for executing Java ...
Read More

5 Video Tutorials Of Small To Killer Eclipse Shortcuts

eclipse
I believe that when you spend a significant percentage of your time on a specific software, it is an obligation ...
Read More

Google Hot Trends Clustering: The 100 Hottest Queries Tell You About 67.76 Stories In Average

SeanClust3
Did you noticed that among the 100 (hourly updated) Google Hot Trends, there are always several hot queries that are ...
Read More

Open Calais From Java: Get Ready To Extract Entities, Facts And Events In 4 Minutes!

I'm a big fan of Open Calais, the well known web service that allows you to perform Named Entity, Facts ...
Read More

The Trick To Write A Fast (Universal) Java URL Expander

The Trick To Write A Fast (Universal) Java URL Expander
140 characters. Means something to you? This is about how twitter (and micro-blogging) was born. Even if some profane firefox ...
Read More

Can You Guess What Is The Hottest Trend Of Google Hot Trends ?

screenshot019
Either if you are working in SEO, or if you are a  "trends hacker", or if you love like me ...
Read More