Hadoop

Archi & Techno

Using Hadoop for Value At Risk calculation Part 1

After introducing the Value at Risk in my first article, I have implemented it using GridGain in my second article. I conclude in this latter that relatively good performances have been reached through some optimizations. One of them was based on the hypothesis that the intermediate results - the prices for each draw - can be discarded. However, it is not always the case. Keeping the generated parameters and the call price for each draw can be very useful for business in order to analyze…

Read more
Archi & Techno

How to “crunch” your data stored in HDFS?

HDFS stores huge amount of data but storing it is worthless if you cannot analyse it and obtain information. Option #1 : Hadoop : the Map/Reduce engine Hadoop Overview Hadoop is a Map/Reduce framework that works on HDFS or on HBase. The main idea is to decompose a job into several and identical tasks that can be executed closer to the data (on the DataNode). In addition, each task is parallelized : the Map phase. Then all these intermediate results are merged into one result…

Read more
Archi & Techno

Hadoop Distributed File System : Overview & Configuration

Hadoop Distributed File System can be considered as a standard file system butt it is distributed. So from the client point of view, he sees a standard file system (the one he can have on your laptop) but behind this, the file system actually runs on several machines. Thus, HDFS implements fail-over using data replication and has been designed to manipulate, store large data sets (in large file) in a write-one-read-many access model for files.

Read more