AppleWorks / ClarisWorks


Jump to: navigation, search



Clarisworks / Appleworks documents are a closed file format. This page is an attempt to describe this format(s) for the purpose of importing documents into newer more open formats for archiving.

After being frustrated that I wasn't able to normalize old files with Xena I set out figure out how to read this file to develop a plugin. I was sure that someone had already written a plugin for OpenOffice or KOffice or something. This page is a collection of all of the resources and info I have been able to find, as well as my own discoveries about this file format. I plan on continuing work until I have enough knowledge to develop a reader that can at least extract text and some basic formatting of simple documents. From what I can tell, I think this first goal can be met. If anyone else out there can find anything else on their own, please update this wiki or email me your thoughts.

code and test files can be found at:


  • discover how to determine the start of the content block
  • Figure out the format of DSET
  • discover how to read the style attributes to apply to the content
  • develop plugin for Xena
  • develop plugin for general use

Example Files

Please email me any examples you might have, especially if you have a version of ClarisWorks older than 5.0.

File Format


There appear to be several keywords

keyword type can contain description notes
CPRT variable first 4 bytes indicate length of block
v6 contains xml with printing information
DSET appear to have a format like:
4 byte Len
4 byte Len
continuing, not sure when it ends.
DSUM variable Document summary First 4 bytes indicate length of block
FNTM blocked something to do with fonts
HASH Appears in multiples of 2?
always preceded by: FF FF 00 00 00 06 00 04 00 01
HDNI variable First 4 bytes indicate length of block
KSEN preceded by?: FF FF 00 00 00 0E 00 0A 00 02
LKUP preceded by?: FF FF 00 00 00 02 00 04 00 02
LOM! don't know if this is a keyword but putting it here just in case
RULR probably page rulers
unable to determine the length
First 4 bytes indicate length of block
SNAP variable snapshot First 4 bytes indicate length of block
then there is 5 bytes that are unknown, probably payload type, then a PICT file.

possibly v6 only.

First 4 bytes indicate length of block
TNAM Different on every save


I am making a guess that these are markers, still trying to figure out the meaning of each.

marker type can contain description notes observed length v5 observed length v6
0x0101FFFF 68
0x0005FFFF 176 160
0x0E01FFFF 80

Document Header

Document Header
chunk id position start length (bytes) description example ascii or int comments
1 0 1 major version 05
2 2 3 additional version 029900
appears somewhat random but is specific to minor version, maybe platform
3 8 4 creator type 424F424F BOBO Always has the same value
4 8 4 previous version 029900
If file was converted this will contain the previous major and additional version number. If not converted it will be the same as 0-8
5 12 8 0x00000000 0x00000000 seems to always be full of zeros
6 20 2 0x0001 seems to always be 0x0001
7 22 2 0x0194
some sort of marker - will appear not too far ahead of this block.
8 24 2 is usually the same after each instance of block, but sometimes different.
9 26 4 0x00000000
10 30 2 page height 792
page width in pts. ie: 792x612 for portrait, 612x792 for landscape
11 32 2 page width
12 34 12 margins 0x0048 0x0048 0x0048 0x0048 0x0048 0x0048 HHHHHH margins
13 46 2 inner height will be equal to #10 minus either right or left, not sure which yet
14 48 2 inner width will be equal to #11 minus either top or bottom margin, not sure which yet
15 50 1 0x01 same in all files tested - probably a flag
16 51 1 0x00 same in all files tested - probably a flag
17 52 1 0x01 same in all files tested - probably a flag
18 53 1 0x00 same in all files tested - probably a flag
19 54 4 0x00000000 unknown
20 58 4 0x00000000 unknown
21 58 4 0x00000000 unknown
22 62 4 0x00000000 unknown
23 66 4 0x00000000 unknown
24 70 4 0x00000100 unknown
25 74 4 unknown
26 78 4 unknown
27 82 4 unknown
28 86 4 0x00000005 unknown
29 90 2 0xFFFF
30  ? 4 end header??? 7FFFFFFF appears in all files tested. position:
680 - 5.0v1
672 - 6.2.9
31 after last block 4 length of next block after next
32 after last 46 unknown
33 after last determined by number in #21 unknown
  • there is a 2 byte delimiter shortly after the header that is used throughout the document.

Document Info

  • there is a summary stored after the main header but before the first DSET
desc length (bytes) notes
full length + 1 4
abbreviated length 1
  • This is used to store an abbreviated table of properties for:
    • Title
    • Author
    • Version
    • Keywords
    • Category
    • Description
  • each field is allowed 255 bytes of content
  • full content is always available in the DSUM section

Document Content

Content Appears to start right after the end of the first DSET block

Strings in the document start with the first 4 bytes indicating the length of the string

The content area will have several strings in a row without any termination

The last string appears to be null terminated.

  • footnotes show up in the text as 0x02

Document TOC

The TOC can contain any number of markers in any order. The data area always starts and ends with ETBL.

Document TOC - at end of file
position start length (bytes) description example ascii comments
start position determined by other ETBL 4 tag 4554424C ETBL Value indicates the total length of data in ETBL
anywhere 4 data oBIN oBIN block offset from start of doc
anywhere 4 tag 4453554D DSUM DSUM block offset
anywhere 4 data STYL STYL block offset
anywhere 4 data BBAR
anywhere 4 data MARK MARK block offset
anywhere 4 data MRKS
EOF - 24 4 tag 4554424C ETBL Following Value indicates start position


in both versions tested, document ends with:
FF FE FD FC FB FA F9 F8 F0 F1 F2 F3 F4 F5 F6 F7


  • password protected documents do not have their content protected.
  • password is not stored in the file
  • it probably stores a checksum because there isn't much difference in password length

Other Elements

Other Efforts


OpenOffice / StarOffice

If you do a ton of google searches, you find a lot of pages that say that StarOffice could open ClarisWorks documents. This was done with the W4W filter. After a lot of digging, I believe that these filters live in OpenOffice in the Framework project. After checking out the source for the Framework project, I believe that the ClarisWorks import support was non-existant. If I am reading the source correctly, it looks like this filter simply opens the document as ASCII. If this is the case, I don't know why they even bothered to say they had a filter, if this is not the case someone please correct me.


  • Old versions of DataViz can convert documents. product appears dead but still for sale.
  • MacText can convert older .cwk files to rtf, word2, and Word Perfect
  • XTND - there is a lot of info out there about XTND filters as part of system 6 and 7. I would like to investigate if copies of these filters could help this effort but I haven't been able to find enough info yet.


Personal tools