Apache Mahout
Apache Mahout is a machine learning and data mining library built on top of Hadoop. If you are looking to build a recommendation engine or a classification engine, it may help. Here is an introductory tutorial with examples: http://www.ibm.com/developerworks/java/library/j-mahout/ A tutorial for Collaborative filtering using Taste is available here.
If you are looking to become a data scientist, these posts may be of interest to you:
http://www.quora.com/How-do-I-become-a-data-scientist
http://www.quora.com/Whats-the-best-way-to-come-up-to-speed-on-MapReduce-Hadoop-and-Hive
