Untar on iOS, the pragmatic way

le 16/12/2011 par Mathieu Hausherr
Tags: Software Engineering

The problem

Why untar?

Network connections cause latencies on your mobile app. Replace 10 downloads of 1Mb by 1 download of 10Mb is a good improvement to your app.

There’s a well-known unix tool for that: tar.

What is tar? Wikipedia says: “Tar is now commonly used to collect many files into one larger file for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.” That’s exactly what we need with some extra features.

How to untar an archive file on your iOS device?

How to untar?

A goolge search about untar on iOS provide two solutions:

  • BSD libarchive. This lib is already in your iOS device but without header files. Apple calls that a “private API”. You can’t use it in App Store. If you are doing an app for jailbroken device there isn’t any problem but I need to push my app to the App Store.
  • Davepeck BSD libarchive implementation https://github.com/davepeck/iOS-libarchive. The code of libarchive is open source so davepeck has packaged this code for iOS and you can add it to your project. This libarchive implementation is the same than the tar command on your Mac. You can create or extract tar file and provide gzip or bzip compression. But this lib includes 617 files, 347 606 lines of code (without any Objective-C wrapper that you have to code by yourself) and weights 4.7MB when build. Just remember that you need wifi to download app heavier than 20MB.

Why not using libarchive?

With libarchive:

  • We add code dependencies to our project. These dependencies need to be updated.
  • More lines of code means more bugs. According to Steve McConnell’s "Code Complete", there are on average between 15 and 50 bugs per KLOC (Kilo line of code), so if I take a calculator and make rocket science estimation, there are 5000 bugs in libarchive.
  • If Apple changes an API and break libarchive compatibility, your code will not stay reliable.

Is there another way? What if I must push my app in the app store but don’t want to link this 4.7MB of hardly-maintainable code? Let’s code my own light and pragmatic untar implementation.

What do we really need?

Libarchive weights 4.7MB but do a lot of things we don’t really need.

We need to untar files but we don’t need:

  • To create tar files.
  • To uncompress files (gzip or bzip). If we want to compress our file we can use zlib and then untar. We don’t have to untar during inflate.
  • To work with old fashion tar (from the 80’s).
  • To work with unix ownership and right: iOS will not let us set this rights even if we wanted.
  • To work with simlinks, hard links, FIFO or other special stuffs. We just want to handle files.

The solution

What’s a tar file?

A tar file is composed of 512 bytes data blocks.

The first block is a header for the first file; the next n blocks are the content of this file and then a new file start with a new header block.

A tar file looks like that: (H is header block, C content block, / partial content block)

At the end of the tar file there is two empty blocks. A new file starts always after an integer number of blocks. Some files haven’t any content block (like folder). Each block has 512 bytes.

A header block comports a lot of datas. Few of these data, and none of the new UStar header data are useful for us.

We just need:

  • The file name: Encoded in ASCII, it’s a relative path like “mydir/myfile.pdf”
  • The file size: Encoded in ASCII, it’s an octal value in bytes.
  • The file type: we only need two file types, ‘0’ for regular files and ‘5’ for directories

With the file size we know when the header block of the next file starts. And where to strip the content of the last block of the file.

The result: Light-Untar for iOS

Not so complex isn’t it? I just implemented this code in Objective-C and push it to github: https://github.com/mhausherr/Light-Untar-for-iOS

This code is under BSD license; you can use it for your own project.

Is it better than libarchive? Light-Untar-for-iOS has only 2 files and 168 lines of code, which include the comment lines with the license text.

To use it, include the .h file and use this NSFileManager method:

[[NSFileManager defaultManager] createFilesAndDirectoriesAtPath:@"/path/to/your/extracted/files/" withTarData:tarData error:&error];

tarData is an NSData. You can create the NSData with this code:

NSData *tarData = [NSData dataWithContentsOfFile:@"/path/to/your/tar/file.tar"];

Limitations about Performances and Security

Performances

Libarchive  is written directly in C and does the job faster than my Objective-C code.

The question is: what’s the most important thing for you?

  • To win 40ms on each untar?
  • To save 4.7Mb of the size of your app?

If you choose the second one, Light-Untar-for-iOS is for you.

Security

What are major security issues of tar?

  • Symbolic links: Someone can add links to give access to folders outside the working directory.
  • Set root ownership or add execution right: Someone can add an executable file with root access and launch it.

Neither of these are implemented in my code, so you can use it safely.

And you know what? It’s an iOS lib. On iOS**,** applications are sandboxed. The system himself protects this code. We don’t need any other protection.

Conclusion

Mobile development is primarily development on low capacity devices. Size of the build app is also important.

If you have a simple problem to solve, ask you what is the best: A huge framework can do that for you. Do you need really all this framework? Can’t you just implement the requested feature?

To go further