Does Alfresco fit your needs?
This article is the English translation (human made, not automatic) of what I published on the French version of this blog few a months ago. It talks about Alfresco 3.1.
These days, we hear a lot about collaboration, 2.0 company, wiki, ... and also of Alfresco.
Alfresco is an Enterprise Content Management system (ECM). It is a free software, it has a big community and its software architecture is close to Documentum but with more recent technologies. At first glance, it is a very attractive and flexible solution but as you know, the perfect software doesn't exist and that's why I propose you a review of Alfresco according to some common expectations you would have.
The objective is to provide you a first glance of what Alfresco really is.
I wrote article not with the intention of selling you or preventing you from using Alfresco, but in giving you a pragmatic overview of the product. Feel free to contact me and/or to leave a comment.
My data need to be stored in a well structured manner
If you also need some smart processing on your data, Alfresco could answer to your need because it is possible to configure a sequence of actions triggered by adding, deleting and/org modifying data on the repository. Some of the actions can be but not limited to coping a file, sending an email to the owner of the file or even converting it to another format. Of course you can also extend the mechanism with your own actions.
A knowledge base must be built upon my data
Sounds good, Alfresco is, after all, an Enterprise Content Management system!
All the data stored in Alfresco is uniquely identified in the physical storage. Deleting something in a repository doesn't physically destroy the content of the data, the system moves it in quarantine... You can also parameter the number of days that data is kept which is useful in preventing people from accidentally deleting important informations.
My data are handled by complex workflows
Alfresco uses JBPM, a mature workflow engine. This enables it to support workflows from simple review-and-validate to complex ones designed with jBPM Process Designer.
Last but not least, the trigger mechanism on the repository allows you to transform, move, copy, send by email, ... any data stored on Alfresco.
My data have a lot of metadata
Data stored in Alfresco are all handled by a data model. Any stored data, users, permissions, documents, directories, ... are all bound to a data model.
I wrote 'a' and not 'the' because data models can be extended in order to fit your needs. Therefore it is possible and simple to add custom metadata to let's say, pictures or office documents.
Also, out of the box, Alfresco provides several data models like:
- data types for directories, pictures, documents;
- data types for tags;
- a data model for permissions;
- a data model for workflows;
Adding metadata is at the root of Alfresco's content management architecture.
My access rights are handled in a fine grained way
If default access rights doesn't fit you, you can create your own permission model or extend the existing one. You may also want to know that permissions are handled per file, per directory and that quotas can be applied.
Also, Alfresco uses roles. But what is a role?
It is basically a set of permissions. You bind role to a user or to a group of users per file or per directory.
By default, the following roles are provided, feel free to extend it:
We could talk more about it in another article.
My data must be accessed remotely
There are several possibilities.
The web-client UI
First web UI of the product, it doesn't evolve a lot today. Its characteristics are:
- Access to almost everything in the system which makes this UI not usable for everybody;
- The administration panel has many buttons, you can heandle groups, users, import/export your repository, browse your system with a low level tool, the node browser, ...;
- Can be customized though XML configuration files. To extend it, you are more likely to have some Java code to write;
- Runs on the same instance as the Alfresco core server. In other words, this UI runs on the backend, it can't be used remotely.
Personally, I don't recommend to use this UI but only for administration purposes. Indeed, there is far more interesting things to do if you want a great experience for your users.
The Share UI
Share is the web 2.0 UI of Alfresco. It is a user-centric collaborative website, However, in the recent 3.2 version of Alfresco, an administration panel has appeared. It aims at providing collaborative sites to people where they can share their work through wiki pages, forums, document library, links, their calendar, ...
It is based on the Surf platform, a framework built by Alfresco to easily create 2.0 websites and which communicates with the Alfresco core through REST.
Surf has been recently integrated in Spring.
REST API and SOAP
Alfresco provides a REST API also called Webscripts and a SOAP API.
Those two API allow developers to create custom remote frontend which is good since it is a great way to let the core server focus on its main task: content management.
The CMIS API
A new API available in Alfresco, CMIS has been implemented in Alfresco 3 in order to facilitate inter-operability among several ECM systems. This means that a frontend which uses the CMIS API should be able to switch from Alfresco to another backend implementing CMIS without any customization.
JCR and JCR over RMI
If you need it (Liferay integration for example), the Alfresco repository implements JCR (JSR-170). You can use it locally or remotely over RMI.
However, this method is not recommended since the emphasis is on REST and CMIS.
Presently, we are on a collaborative platform project based on Alfresco and our client uses Alfresco as the fundations for its content management.
The expectations of our client can be classified in three main categories:
- administration of the platform;
- access to data with custom metadata through a user friendly UI;
- portal like application in order to favor collaborative work.
These three kind of expectations are covered by Alfresco. The administration is achieved through the web-client UI, the user friendly UI is a flex based website communicating with Alfresco through REST and the portal like expectations are covered by Share, the collaborative website solution provided by Alfresco on which we can add new components to better fit its needs.
At last, we have a consistent, performant and scalable solution and we have saved time compared to a development from scratch.
Alfresco must be integrated with Liferay Portal
We had this kind of expectation, Alfresco as a backend and Liferay for the portal frontend.
Even though Alfresco claims that the integration with Liferay is very good, our experience showed that it is inter-operability rather than real integration. The portlet standard allows agregation of contents retrieved as HTML code. This is exactly what Alfresco provides with few portlets written with its Webscript API but they are more a proof of concept than a real integration.
Therefore it is up to your development team to write their own portlets and their REST services on the Alfresco side in order to achieve the integration. If you want Liferay to use Alfresco as its repository backend, it is possible since both implements the JSR-170 but your development team will also have to write down some glue code.
I have an AD and I want Alfresco to use it
No problem! Alfresco uses the LDAPFactory of the JVM. Therefore, it integrates perfectly with an LDAP directory whether it is AD, OpenLDAP or something else.
Some directories have non standard fields? No problem, everything is easily customizable through configuration files.
By default, new synchronized accounts are created in the root folder of the repository, but it can easily be moved to a specific /users subdirectory. You can also customize the template of a user account directory so every new user directory can be created with some specific directories and documents in it.
By default, removing a user from the Alfresco users database doesn't remove its home folder. This behaviour can be easily modified so the home folder is removed along with the account.
Synchronization through LDAP like any other background task performed by Alfresco is scheduled by Quartz. The latter let you configure when you want a synchronization to happen just like in a standard crontab.
In spite of it, Alfresco works with several directories at the same time and distinguish users and groups. In this way, it is easy to adapt the configuration to your context.
Lastly, if your LDAP directories limit the number of possible answers in one request, Alfresco, since the 3.2 version, supports LDAP paged requests.
Quartz ? But I don't want internal schedulers in the applications of my IT
No problem but a bit of development. Quartz is provided by default by Alfresco and is configured through Spring beans. If you want to remove it from Alfresco, you can but you will have to modify some configuration beans and maybe have to write Webscript code to allow some jobs to be executed remotely, from your central scheduler for example. I assure you it is not impossible.
As an example, we were about to do it for a client who wanted LDAP synchronization to be done remotely, by its central scheduler. In Alfresco, the synchronization is a two steps process:
- extracting the content to synchronize from the LDAP directory and create an XML that can be read by Alfresco for users, groups, ...;
- Inject this content in Alfresco.
So we had two tasks, two java classes and one XML with a well known structure.
Therefore, to achieve the expectation we just had to:
- write a small script for the central scheduler to request the LDAP directory and generate an XML for Alfresco;
- insert it into Alfresco by executing the import tool provided by Alfresco.
But this solution has a small issue, the java program requires the Alfresco server to be shutdown during the update... No problem, we just have to write a small REST service in order to execute the import thus there is no longer a need to shutdown the server. This REST service is pretty simple since the only thing we have to do is already provided in Alfresco's internal API.
Security wants secured channels everywhere
No problem, LDAPS and HTTPS are perfectly supported.
SSO is the standard inside my company and LDAP connection for the outside
Kerberos, NTLM, CAS, JAAS, you have the choice. Chained authentication is also supported in the case where, for example, an extranet is login-form based and the intranet is SSO based.
Cluster must be supported for high loads
Alfresco supports this kind of setup. The configuration effort is minimal here.
However, you may want to know that if the cluster mode is activated, every Alfresco instance must use the same database instance and the same data repository.
Database? But I thought that everything was in a physical storage!
Indeed, it is true for every data that have a content.
The users, directories (but not their content), metadata and workflows are stored in a database which is hardly not loaded. A simple MySQL is enough, no need for aggressive optimization on this side.
Which databases are supported?
Oracle, MSSQL, MySql, PostreSQL, ... The choice is wide enough. In spite of it, the load on the database is really small, Alfresco uses neither complex schema nor high frequency requests since it indexes most of its content and heavily uses Hibernate and EHCache.
And what about storage? Any prerequisite?
The data repository needs a storage capacity according to your expected amount of data.
The index adds about 30% of the data repository size. In spite of it, the index, implemented with Lucene must be located on a very fast physical drive which supports file locking.
Data must be segmented on several repositories.
Alfresco also works with this kind of configuration even though it is a bit complex to configure. In spite of it, you must keep only one database and your backup procedure must still backup everything at the same time.
In the case you would like to setup an internal repository with all your data and an external one with a readonly subset of data for internet, you may want to know that it is supported by Alfresco. It is a kind of cluster configuration.
In this case, the internet instance of Alfresco is readonly and shares the same repository and database as the internal instance which has full access to the repository and each instance has its own index.
We have recently run into the case at Octo and our solution was to avoid a new instance of Alfresco. We chose to create a REST Webscript on Alfresco in order to let the "readonly" side retrieve what it needs through it. Data to be exported were marked by a specific tag.
The solution must be simple to deploy and to update
Alfresco works with Tomcat and JBoss, it is a WAR or an EAR you have to deploy and that's all.
I need a fine grained events tracking mechanism
Alfresco provides an audit mechanism which keeps track of everything that happens on every data in the repository. Configure it cautiously in order to prevent a big increase of the space occupied by your database and a big slow down of your server.
The solution must be scalable
Alfresco is spring based and its architecture is good if you plan to use it as a foundation on which other specific projects will occur.
I need a wiki
Alfresco works with mediaWiki. However, if you only need a wiki, it may be more interesting to use only mediaWiki and to eventually link it to Alfresco or another ECM the day you will need an ECM.
I need collaborative softwares with a web based UI
Alfresco provides Share. However, it doesn't cover all the needs and customizing it might be fastidious.
The best is to use an Alfresco facade API (REST, SOAP,...) to build custom user friendly UI.
My teams must be quickly operational for custom developments
The scalable architecture of Alfresco fits very well and you can quickly hand over it. Furthermore, there is an active community which can help you thru forums and blogs.
Be advised that it is important that your developers know already about Spring, Java 6 and tomcat in order to be rapidly operational.
I develop in Agile
It is possible and that's what Octo does. However, full test driven development (TDD) with Alfresco may sometimes be fastidious, since all of the code is not covered by tests and executing a unit test take about 40 seconds because a mini instance of Alfresco must be initialized. Having an experienced person on Alfresco is definitely a good idea to save time. We have encountered many issues in the setup of REST unit tests with Alfresco because of not so well done things like the time taken to run a single test (a mini instance of Alfresco is launched) or the amount of configuration and files to copy in order to start a single unit test. However, in 80% of the cases, unit tests worked well and since the product is a free software, it made our life easier and we saved time when bugs happened. And...I think that's an important point.