[What is it?] [Where do I get it?] [Who's working on it?] [To Do...] [Related Work]



The ext3cow file system is a time-shifting file system implemented with copy-on-write (COW) and based on the very popular ext2/ext3 file system. Phew! Now what does all that mean? Basically, it's a functionally enhanced version of ext3 that allows one to take a snapshot of one's file system, with one second granularity, freezing the way the file system looked at given point-in-time. One may then travel back through time, if you will, and be presented with a read-only image of the file system. No special mounting, or '.' directories. Just type a name or directory followed by the special '@' character and a time. For example:



The second cat, reads the current version of the file, while the first cat, reads the file foo.txt as it appeared at Thu Jul 10 09:58:04 2003. The syntax for the time variable is the number of seconds passed since the Epoch (00:00:00 UTC, January 1, 1970). This may seem a little weird to use, but we believe any fancier date syntax should be parsed by a shell or shell macro and not by the file system. This is on our ToDo List.

Of course, this works for more than just text files; images, databases, and binaries, ext3cow does it all.

There are quite a few advantages to system like this, including data availability and reliability, a consistent image for backup, a checkpoint for restoring the state of a file system, a source for data mining, and a way to provide tamper resistant storage.

So, what are the trade-offs for such a system? For every version of a file that exists, an inode must exist to reference it. Therefore, there's a slight increase in metadata overhead of about 5%. Of course this percentage varies with snapshot frequency and the number of files modified between snapshots. If you never take a snapshot, then there's no increase in metadata. Further, no data is every thrown away, so there's higher data block usage. Results on the average increase in data block usage have yet to be reported, but because of the copy-on-write policy, only the blocks that have changed between snapshots are written to disk. In this way, versions of the same file may share data blocks between snapshots minimizing the data footprint.

Other features include being able to change into a directory in the past and read it's contents as it appeared at a point-in-time, making symbolic links to files in the past, and the ability to diff versions of file.

 


The latest version of ext3cow is 0.1.2.

To successfully install and run ext3cow, you'll need both of the following:

    • ext3cow patches for 2.4.21. [
README
      ] [
CHANGELONG
      ]
    • ext3cow tools. [
README
    ]
Ext3cow was orignally developed on the 2.4.19 kernel but it's recently been upgraded to the 2.4.21 kernel which grabs the ext3 fixes between 2.4.19 and 2.4.21 as well as fixing a few ext3cow specific bugs. The 2.4.21 patch is not compatible with kenels older than 2.4.21 . Use the 2.4.19-ext3cow-0.1.1 patch, located in the archive, for kernels older than 2.4.21.

Please report bugs to: .

 


Right now, ext3cow is the research of young Zachary Peterson and his advisor Dr. Randal Burns. Both are members of the Hopkins Storage Systems Lab (HSSL) at the Johns Hopkins University. The project is funded by the Department of Energy and the National Science Foundation.

If you'd like to join in the development, report bugs, suggest new and better features, provide feedback, ask about the design, or just see how Zach's doing, please send email to:

NEW! Join the ext3cow mailing list! Go to http://hssl.cs.jhu.edu/mailman/listinfo/ext3cow-devel or email .

Publications

  • Z.N.J. Peterson and R.C. Burns Ext3cow: The Design, Implementation, and Analysis of Metadata for a Time-Shifting File System. Technical Report. HSSL-2003-03, Hopkins Storage Systems Lab, Department of Computer Science, Johns Hopkins University, 2003.


 


The following is a list of open issues in the file system, or perphipheral projects still needed to be done.
  • Update ext3cow for 2.5/2.6 kernel.
  • An ext3cow fsck.
  • A time-traveling c shell (ttcsh).
  • A snapshot reclamation tool.
  • A snapshot diffing tool. (Useful for intrusion detection.)
  • An ext3 to ext3cow conversion tool.
  • Support for memory mapped files.
  • CVS-like tagging and branching features. (Or is this part of ttcsh?)

 


    • The ext2/3 file system. [
    • Website] [
Website
      ] [
Website
      ]
      • Design and implementation of the second extended file system. Remy Card and Theodore Y. Ts'o and Stephen Tweedie. The 1994 Amsterdam Linux Conference, 1994.
        [
      • HTML]
      • Planned Extensions to the Linux Ext2/Ext3 Filesystem. Theodore Y. Ts'o and Stephen Tweedie. The 2002 USENIX Annual Technical Conference, Monterey, CA, June 2003.
        [
      • HTML] [
PDF
        ]


    • CVFS. [
    • Website]
      • Metadata Efficiency in a Comprehensive Versioning File System. Craig A. N. Soules, Garth R. Goodson, John D. Strunk, Gregory R. Ganger. 2nd USENIX Conference on File and Storage Technologies, San Francisco, CA, Mar 31 - Apr 2, 2003. Also available as CMU SCS Technical Report CMU-CS-02-145, May 2002.
        [
      • Abstract] [PDF
    ]

    • The Elephant file system.
        • Deciding when to forget in the Elephant file system. Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton and Jacob Ofir. The 17th ACM Symposium on Operating Systems Principles (SOSP), December 1999.
          [
        • PDF]


      • SnapFS. [
      • Website] [Website]


    • The Venti file system.
        • Venti: a new approach to archival storage. Sean Quinlan and Sean Dorward. First USENIX conference on File and Storage Technologies, Monterey, CA, January 2002.
          [
        • HTML] [PDF]

     

     

     

    Last updated:
    10/13/03
    Count