Unit of work, Transactions and Grails

Working with Groovy and Grails often gives you the feeling that things are magic and when you dive in, you realize that things are more complex than expected. At the same time, you often realize that a reasonable default behavior has been chosen by Groovy/Grails framework: What about transactions’ magic in Grails? For me it was hard to believe so let’s try to understand a little more how things work. (Lire la suite…)

Presentation of KNAPP’s KiSoft VISION

OCTO has been designing state-of-the-art IT architecture for more than 12 years. Recently we realized that user interfaces needed to be improved in order to bring more value to users. That’s why we now work on usability of IT systems. Add to that an interest for innovation processes and the will to partner with our clients  “from concept to cash”, and you will have a pretty good picture of the OCTO DNA.

Those traits explain why we were particularly interested by this presentation of KiSoft VISION from KNAPP for order picking assisted by augmented reality.

We’ve decided to invite KNAPP at OCTO to present their product and the process that led to it during one of our “Supply Chain Management School” session. We’ve also invited some of our clients to join us and discover what we view as part of a larger trend in the apparition of new types of user interfaces.

(more…)

Let’s play with Cassandra… (Part 3/3)

In this part, we will see a lot of Java code (the API exists in several other languages) and look at the client part of Cassandra.

Use Case #0: Open and close a connection to any node of your Cluster

Cassandra is now accessed using Thrift. The following code opens a connection to the specified node.

1
2
3
4
5
6
TTransport tr = new TSocket("192.168.216.128", 9160);
TProtocol proto = new TBinaryProtocol(tr);
tr.open();
Cassandra.Client cassandraClient = new Cassandra.Client(proto);
...
tr.close();

As I told previously, the default API does not provide any pool connections mechanisms that would have (1) the capacity to close and reopen connections in case a node has failed, (2) the capacity to load-balance requests among all the nodes of the cluster and (3) the capacity to automatically requesting another node in case the first attempt fails.

Use Case #1: Insert a customer

The following code insert a customer in the storage space (note that the object aCustomer is the object you want to persist)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Map<String , List< ColumnOrSuperColumn > > insertClientDataMap = new HashMap< string ,List<ColumnOrSuperColumn > >();
List< ColumnOrSuperColumn > clientRowData = new ArrayList< ColumnOrSuperColumn >();
 
ColumnOrSuperColumn columnOrSuperColumn = new ColumnOrSuperColumn();
columnOrSuperColumn.setColumn(new Column("fullName".getBytes(UTF8),  
aCustomer.getName().getBytes(UTF8), timestamp));
clientRowData.add(columnOrSuperColumn);
 
columnOrSuperColumn = new ColumnOrSuperColumn();
columnOrSuperColumn.setColumn(new Column("age".getBytes(UTF8),  
aCustomer.getAge().toString().getBytes(UTF8), timestamp));
clientRowData.add(columnOrSuperColumn);
 
columnOrSuperColumn = new ColumnOrSuperColumn();
columnOrSuperColumn.setColumn(new Column("accountIds".getBytes(UTF8),  
aCustomer.getAccountIds().getBytes(UTF8), timestamp));
clientRowData.add(columnOrSuperColumn);

As you can read, the first line is in fact a Java representation of the structure: a map in which a row is identified by its key, and the value is a list of columns. The rest of the code only create and append ColumnOrSuperColumn objects. Here, the columns have the following names: fullName, age, accountIds. You will also notice that when you create the column, you specify the timestamp the column is created. Remember that this timestamp will be used for “read-repair” and so that all your clients must be synchronized (using a NTP for instance)

1
insertClientDataMap.put("customers", clientRowData);

The above lines put the list of Columns into the ColumnFamily named customers (so you can add several ColumnFamily in one time with the batch_insert method). Then, the following line inserts the customer into the Cassandra Storage. You need so to specify the keyspace, the row key (here the customer name), the Column family you want to insert and the Consistency Level you have chosen for this data.

1
cassandraClient.batch_insert("myBank", aCustomer.getName(), insertClientDataMap,  ConsistencyLevel.DCQUORUM);

Use Case #2: Insert operations for an account

Inserting an operation is almost the same code instead we are using SuperColumn.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Map< string , List< ColumnOrSuperColumn > > insertOperationDataMap = new HashMap< string , List< ColumnOrSuperColumn > >();
List< ColumnOrSuperColumn> operationRowData = new ArrayList< ColumnOrSuperColumn >();
List< Column > columns = new ArrayList< Column >();
 
// THESE ARE THE SUPERCOLUMN COLUMNS
columns.add(new Column("amount".getBytes(UTF8),  
aBankOperation.getAmount().getBytes(UTF8), timestamp));
columns.add(new Column("label".getBytes(UTF8),  
aBankOperation.getLabel().getBytes(UTF8), timestamp));
if (aBankOperation.getType() != null) {
	columns.add(new Column("type".getBytes(UTF8),  
aBankOperation.getType().getBytes(UTF8), timestamp));
}
For now, there is nothing new. A list of Columns is created with three columns: amount, label and type (withdrawal, transfer...). 
// here is a superColumn
SuperColumn superColumn = new  
SuperColumn(CassandraUUIDHelper.asByteArray(CassandraUUIDHelper.getTimeUUID()),  
columns);
ColumnOrSuperColumn columnOrSuperColumn = new ColumnOrSuperColumn();
columnOrSuperColumn.setSuper_column(superColumn);
operationRowData.add(columnOrSuperColumn);

This case is different from the previous one. Instead of adding the previously defined Columns to the row, we create a SuperColumn with a dynamic (and time-based UUID) name…Quite dynamic isn’t it? Then, the three columns are added to the super column itself.
The end of the code is similar to the previous one. The row is added to the ColumnFamily named operations and then associated to the current customer account id.

1
2
3
4
// put row data dans la columnFamily operations
insertOperationDataMap.put("operations", operationRowData);
cassandraClient.batch_insert("myBank", aCustomer.getAccountIds(),  
insertOperationDataMap, ConsistencyLevel.ONE);

Here is what you get when reading the operations for the accounId

Use Case #3 : Removing an item

Removing a complete row is – in terms of API – as simple as the rest of the API.

1
cassandraClient.remove("myBank", myAccountId, new ColumnPath("operations"),  timestamp, ConsistencyLevel.ONE);

It is yet a little more complex when you are looking inside. In brief, in a distributed system where node failure will occur, you can’t simply physically delete the record. So you replace it by a tombstone and the “mark as deleted” record will be effectively deleted once the tombstone will be considered enough old. At least, you can still use “logical deletion” and write a code that do not use these flagged records.

To (quickly) conclude this series of articles

I really like Cassandra which looks like a ready to use tools (even if NoSQL is plenty of great tools) and a way to achieve high performance system at “low” (at least lower) cost than with commercial tools. There are still concerns I hope I will be able to discuss like security (Cassandra provides authentication mechanisms…), searching (or at least getting ranges of datas), monitoring (and how to monitor all the nodes of your cluster into a unique tools like Ganglia, Nagios or Graphite or even how to use Hadoop above Cassandra.

To be continued…

This is the story of a project…

This is the story of a project, neither more complex nor simpler than others: an application that communicates with a database and two other systems. Something quite mainstream from a technical and architectural side, something standard from the management side: all must be done for yesterday and there is a lot to do…In short, “it’s gonna be hard” as often say the developers but nobody screams it out too loud.
So we build the team. 40 persons are staffed, people are specialized. The teams are organized in pools, so that a kind of contract is setting up between the different pools. Each pool is responsible for treating certain kind of demands. A flow of demands appears. Certain pools are under pressure and become the bottleneck: a stock of demands is created upstream whereas the downstream pools are waiting…Therefore and for these under pressure pools, important things are becoming urgent things. Choices must be made among urgent things to treat the immediate ones. Task switching is becoming the way of working and in the end, the flow slows down.

Then the deadline of the “go live” comes: it is in two months. The user acceptance tests are just starting but have been delayed by the tedious and painful integration between the different components. Maybe the built contracts between the teams have complicated the integration: some mandatory parameters are missing, the dates do not respect the proper format, the error codes are partially interpreted…
In any case, the user acceptance tests detect more bugs than what the development team can resolve and all is not still tested. (more…)

Let’s play with Cassandra…(Part 2/3)

In this part, we will work in more details and closer to the code with Cassandra. The idea is to provide a kind of simplified current account system where a user has an account and the account has a balance…
This system will so manipulate the following concepts:
- A client has different kind of properties defining his identity
- A client has one account
- The account has a list of operations (withdrawal, transfer are all kind of operations)
Here is the way it would have been modelized in the relational world (or at least UML world)

(more…)

Let’s play with Cassandra… (Part 1/3)

I have already talked about it but NoSQL is about diversity and includes various different tools and even kind of tools. Cassandra is one of these tools and is certainly and currently one of the most popular in the NoSQL ecosystem. Built by Facebook and currently in production at web giants like Digg, Twitter, Cassandra is a hybrid solution between Dynamo and BigTable.

Hybrid firstly because Cassandra uses a column-oriented way of modeling data (inspired by the BigTable) and permit to use Hadoop Map/Reduce jobs and secondly because it uses patterns inspired by Dynamo like Eventually Consistent, Gossip protocols, a master-master way of serving both read and write requests

Another DNA of Cassandra (and in fact a lot of NoSQL solutions) is that Cassandra has been built to be fully decentralized, designed for failure and Datacenter aware (in a sense you can configure Cassandra to ensure data replication between several Datacenter…). Hence, Cassandra is currently used between the Facebook US west and east coast datacenters and stored (around two years ago) 50+ TB of data on a 150 node cluster.
(more…)