Olivier Mallassi posts

Archi & Techno

Data Grid or NoSQL ? same, same but different…

For three years now, NoSQL as a piece of technologies for Big Data has spread over the world and is challenging the centralized world of RDBMS. The space of distributed storages is yet not new and banks, online gaming platforms are using for several years technologies called "data grid" to address latencies and throughput issues. And to be completely franc, "Big Data" is not far from being the "new SOA”: a radical paradigm shift lost in the middle of commercial buzz words but that's another…

Read more
Archi & Techno

Scribe : a way to aggregate data and why not, to directly fill the HDFS?

HDFS is a distributed file system and quickly raise an issue : how to fill this file system with all my data? There are several options that go from batch import to Straight Through Processing. Bulk load style. The first one is to keep collecting data on local file system and importing them by vacation. The second one is to use an ETL. Pentaho has announced support of Hadoop for Data Integration product. The first tests we conducted lead us to think this works much…

Read more
Archi & Techno

Scribe installation

Scribe installation is a little bit tricky (I need to precise I am not what we can call a C++ compilation expert and thanks to David for his help...). Here is so how I installed Scribe on my Ubuntu (Ubuntu 10.04 LTS - the Lucid Lynx - released in April 2010)

Read more
Archi & Techno

How to “crunch” your data stored in HDFS?

HDFS stores huge amount of data but storing it is worthless if you cannot analyse it and obtain information. Option #1 : Hadoop : the Map/Reduce engine Hadoop Overview Hadoop is a Map/Reduce framework that works on HDFS or on HBase. The main idea is to decompose a job into several and identical tasks that can be executed closer to the data (on the DataNode). In addition, each task is parallelized : the Map phase. Then all these intermediate results are merged into one result…

Read more
Archi & Techno

Hadoop Distributed File System : Overview & Configuration

Hadoop Distributed File System can be considered as a standard file system butt it is distributed. So from the client point of view, he sees a standard file system (the one he can have on your laptop) but behind this, the file system actually runs on several machines. Thus, HDFS implements fail-over using data replication and has been designed to manipulate, store large data sets (in large file) in a write-one-read-many access model for files.

Read more
Archi & Techno

Event Sourcing & noSQL

I saw the talks of Greg Young about CQRS & especially “Event Sourcing” a couple of times and each time, I really really tell myself this pattern is just “génial” (the way we say it in french) even if Martin Fowler wrote about it in 2005 and deals in details with implementation concerns and issues (especially in the cases of integration with external systems). Event Sourcing : stop thinking of your datas as a stock but rather as a list of events...

Read more
Archi & Techno

Let’s play with Cassandra… (Part 3/3)

In this part, we will see a lot of Java code (the API exists in several other languages) and look at the client part of Cassandra. Use Case #0: Open and close a connection to any node of your Cluster Cassandra is now accessed using Thrift. The following code opens a connection to the specified node. TTransport tr = new TSocket("192.168.216.128", 9160); TProtocol proto = new TBinaryProtocol(tr); tr.open(); Cassandra.Client cassandraClient = new Cassandra.Client(proto); ... tr.close(); As I told previously, the default API does not provide…

Read more
Archi & Techno

This is the story of a project…

This is the story of a project, neither more complex nor simpler than others: an application that communicates with a database and two other systems. Something quite mainstream from a technical and architectural side, something standard from the management side: all must be done for yesterday and there is a lot to do…In short, “it’s gonna be hard” as often say the developers but nobody screams it out too loud. So we build the team. 40 persons are staffed, people are specialized. The teams are…

Read more
Archi & Techno

Let’s play with Cassandra…(Part 2/3)

In this part, we will work in more details and closer to the code with Cassandra. The idea is to provide a kind of simplified current account system where a user has an account and the account has a balance… This system will so manipulate the following concepts: - A client has different kind of properties defining his identity - A client has one account - The account has a list of operations (withdrawal, transfer are all kind of operations) Here is the way it…

Read more
Archi & Techno

Let’s play with Cassandra… (Part 1/3)

I have already talked about it but NoSQL is about diversity and includes various different tools and even kind of tools. Cassandra is one of these tools and is certainly and currently one of the most popular in the NoSQL ecosystem. Built by Facebook and currently in production at web giants like Digg, Twitter, Cassandra is a hybrid solution between Dynamo and BigTable. Hybrid firstly because Cassandra uses a column-oriented way of modeling data (inspired by the BigTable) and permit to use Hadoop Map/Reduce jobs…

Read more