AppleWorks / ClarisWorks
From Wiki.wirelust.com
Contents |
Overview
Clarisworks / Appleworks documents are a closed file format. This page is an attempt to describe this format(s) for the purpose of importing documents into newer more open formats for archiving.
After being frustrated that I wasn't able to normalize old files with Xena I set out figure out how to read this file to develop a plugin. I was sure that someone had already written a plugin for OpenOffice or KOffice or something. This page is a collection of all of the resources and info I have been able to find, as well as my own discoveries about this file format. I plan on continuing work until I have enough knowledge to develop a reader that can at least extract text and some basic formatting of simple documents. From what I can tell, I think this first goal can be met. If anyone else out there can find anything else on their own, please update this wiki or email me your thoughts.
Priorities
- discover how to determine the start of the content block
- Figure out the format of DSET
- discover how to read the style attributes to apply to the content
- develop plugin for Xena
- develop plugin for general use
Example Files
Please email me any examples you might have, especially if you have a version of ClarisWorks older than 5.0.
File Format
Keywords
There appear to be several keywords
| keyword | type | can contain | description | notes |
|---|---|---|---|---|
| BBAR | ||||
| CHAR | ||||
| CELL | ||||
| CPRT | variable | first 4 bytes indicate length of block v6 contains xml with printing information | ||
| DSET | appear to have a format like: 4 byte Len value 4 byte Len value continuing, not sure when it ends. | |||
| DSUM | variable | Document summary | First 4 bytes indicate length of block | |
| ETBL | ||||
| FNTM | blocked | something to do with fonts | ||
| GRPH | ||||
| HASH | Appears in multiples of 2? always preceded by: FF FF 00 00 00 06 00 04 00 01 | |||
| HDNI | variable | First 4 bytes indicate length of block | ||
| KSEN | preceded by?: FF FF 00 00 00 0E 00 0A 00 02 | |||
| LKUP | preceded by?: FF FF 00 00 00 02 00 04 00 02 | |||
| LOM! | don't know if this is a keyword but putting it here just in case | |||
| NAME | ||||
| RULR | probably page rulers unable to determine the length | |||
| MARK | MRKS MOBJ | First 4 bytes indicate length of block | ||
| MRKS | ||||
| oBIN | ||||
| SNAP | variable | snapshot | First 4 bytes indicate length of block then there is 5 bytes that are unknown, probably payload type, then a PICT file. possibly v6 only. | |
| STYL | HASH NAME FNTM CELL GRPH RULR | First 4 bytes indicate length of block | ||
| TNAM | Different on every save | |||
| WMBT |
Markers
I am making a guess that these are markers, still trying to figure out the meaning of each.
| marker | type | can contain | description | notes | observed length v5 | observed length v6 |
|---|---|---|---|---|---|---|
| 0x0000FFFF | ||||||
| 0x0001FFFF | ||||||
| 0x0101FFFF | 68 | |||||
| 0x0003FFFF | ||||||
| 0x0005FFFF | 176 | 160 | ||||
| 0x0007FFFF | ||||||
| 0x7FFFFFFF | ||||||
| 0x000BFFFF | ||||||
| 0x000DFFFF | ||||||
| 0x0E01FFFF | 80 |
Document Header
| chunk id | position start | length (bytes) | description | example | ascii | comments |
|---|---|---|---|---|---|---|
| 1 | 0 | 1 | major version | 05 06 | confirmed | |
| 2 | 2 | 3 | additional version | 029900 07E100 | appears somewhat random but is specific to minor version, maybe platform | |
| 3 | 8 | 4 | creator type | 424F424F | BOBO | Always has the same value |
| 4 | 8 | 4 | previous version | 029900 07E100 | If file was converted this will contain the previous major and additional version number. If not converted it will be the same as 0-8 | |
| 5 | 12 | 8 | 0x00000000 0000000 | seems to always be full of zeros | ||
| 6 | 20 | 2 | 0x0001 | seems to always be 0x0001 | ||
| 7 | 22 | 2 | some sort of marker - will appear not too far ahead of this block. | |||
| 8 | 24 | 2 | is usually the same after each instance of block, but sometimes different. | |||
| 9 | 30 | 2 | page width | 0x02 - landscape 0x03 - portrait 0x04 - ?? appears in bloom_cube.cwk maybe not - this and the next number change with page scale. | ||
| 10 | 32 | 2 | page height | |||
| 11 | 33 | 1 | page size | page size 96 - tabloid extra 100 - letter 164 - A5 243 - B5 | ||
| 12 | 34 | 12 | margins | 0x0048 0x0048 0x0048 0x0048 0x0048 0x0048 | HHHHHH | margins |
| 13 | 46 | 2 | inner width | will be equal to #9 minus either right or left, not sure which yet | ||
| 14 | 48 | 2 | inner height | will be equal to #10 minus either top or bottom margin, not sure which yet | ||
| 15 | 50 | 2 | 0x01 | same in all files tested | ||
| 16 | 52 | 2 | 0x00 | same in all files tested | ||
| 17 | 54 | 2 | 0x01 | same in all files tested | ||
| 18 | 56 | 2 | 0x00 | same in all files tested | ||
| 19 | ? | 8 | 4 | 0x0005FFFF | ||
| 20 | ? | 4 | end header??? | 7FFFFFFF | appears in all files tested. position: 680 - 5.0v1 672 - 6.2.9 | |
| 21 | after last block | 4 | length of next block after next | |||
| 22 | after last | 46 | unknown | |||
| 23 | after last | determined by number in #21 | unknown |
- there is a 2 byte delimiter shortly after the header that is used throughout the document.
Document Info
- there is a summary stored after the main header but before the first DSET
| desc | length (bytes) | notes |
|---|---|---|
| full length + 1 | 4 | |
| abbreviated length | 1 |
- This is used to store an abbreviated table of properties for:
- Title
- Author
- Version
- Keywords
- Category
- Description
- each field is allowed 255 bytes of content
- full content is always available in the DSUM section
Document Content
Content Appears to start right after
- FF FF FF FF FF FF FF FF
- within first DSET block
Strings in the document start with the first 4 bytes indicating the length of the string
The content area will have several strings in a row without any termination
The last string appears to be null terminated.
- footnotes show up in the text as 0x02
Document TOC
The TOC can contain any number of markers in any order. The data area always starts and ends with ETBL.
| position start | length (bytes) | description | example | ascii | comments |
|---|---|---|---|---|---|
| start position determined by other ETBL | 4 | tag | 4554424C | ETBL | Value indicates the total length of data in ETBL |
| anywhere | 4 | data | oBIN | oBIN block offset from start of doc | |
| anywhere | 4 | tag | 4453554D | DSUM | DSUM block offset |
| anywhere | 4 | data | STYL | STYL block offset | |
| anywhere | 4 | data | BBAR | ||
| anywhere | 4 | data | MARK | MARK block offset | |
| anywhere | 4 | data | MRKS | ||
| EOF - 24 | 4 | tag | 4554424C | ETBL | Following Value indicates start position |
Misc
in both versions tested, document ends with:
FF FE FD FC FB FA F9 F8 F0 F1 F2 F3 F4 F5 F6 F7
Passwords
- password protected documents do not have their content protected.
- password is not stored in the file
- it probably stores a checksum because there isn't much difference in password length
Other Elements
Other Efforts
ABIWord
- ClarisWorks import for ABIWord (non-functional but has some info)
OpenOffice / StarOffice
If you do a ton of google searches, you find a lot of pages that say that StarOffice could open ClarisWorks documents. This was done with the W4W filter. After a lot of digging, I believe that these filters live in OpenOffice in the Framework project. After checking out the source for the Framework project, I believe that the ClarisWorks import support was non-existant. If I am reading the source correctly, it looks like this filter simply opens the document as ASCII. If this is the case, I don't know why they even bothered to say they had a filter, if this is not the case someone please correct me.
Propriatary
- Old versions of DataViz can convert documents. product appears dead but still for sale.
- MacText can convert older .cwk files to rtf, word2, and Word Perfect
- XTND - there is a lot of info out there about XTND filters as part of system 6 and 7. I would like to investigate if copies of these filters could help this effort but I haven't been able to find enough info yet.
Misc
- Forum message from someone looking into the format (from 2001, ha!)

