BCSV (File format)

From Luma's Workshop
Revision as of 21:21, 21 April 2022 by Aurum (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The content described on this page is 100% documented.

BCSV stands for Binary Comma Separated Values and is the most common data format used in both Super Mario Galaxy games. Some older GameCube titles, such as Luigi's Mansion and Donkey Kong Jungle Beat, use this data format as well. As the name suggests, BCSV is a binary variant of comma-separated values (CSV). This means that the data is laid out in a table-like structure. The column names are hashed for faster access. The data is flatbuffer-like and is loaded directly into memory, meaning that it does not have to be deserialized first. The game supports reading data as signed and unsigned integers (8, 16 and 32 bit), single-precision floats and strings. All BCSV files are padded to the nearest 32 byte boundary with '@' (0x40). There is no consistent file extension for BCSV data. Instead, the game contains various BCSV, BANMT, BCAM, PA and TBL files. BCSV files that use TBL as their file extension are expected to be sorted in ascending order by some specific field. Each string is a null-terminated SHIFT-JIS (Codepage 932) encoded string.

Header

Each BCSV file starts with a header:

Offset Type Description
0x00 u32 Entry count
0x04 u32 Field count
0x08 u32 Offset to the entry data section
0x0C u32 The size of each entry in bytes

Fields Section

Right after the header comes the list of fields. The structure of a single field is as follows:

Offset Type Description
0x00 u32 Name hash
0x04 u32 Bitmask
0x08 u16 Offset to the data under this field in an individual entry
0x0A u8 Data shift amount
0x0B u8 The type of data that this field uses

Data types

Fields may cover one of the following data types:

Name ID Size (in bytes) Description
LONG 0x00 4 32-bit integer. Signedness is not specified. ANDed with the bitmask and shifted right by the field's shift amount.
STRING 0x01 32 Embedded string. Deprecated. Use STRING_OFFSET instead.
FLOAT 0x02 4 Single-precision floating-point value.
LONG_2 0x03 4 32-bit integer. Signedness is not specified. ANDed with the bitmask and shifted right by the field's shift amount.
SHORT 0x04 2 16-bit integer. Signedness is not specified. ANDed with the bitmask and shifted right by the field's shift amount.
CHAR 0x05 8 8-bit integer. Signedness is not specified. ANDed with the bitmask and shifted right by the field's shift amount.
STRING_OFFSET 0x06 4 32-bit offset into string table.

Field Order & Entry Size

For efficiency and hardware limitations, the field offsets and total entry size are calculated depending on a special ordering of the fields. This only affects the order of the data in an entry and not the order of the fields in this section. When saving, the tool should ensure that the field offsets and total entry size are calculated depending on this order: STRING < FLOAT < LONG < LONG_2 < SHORT < CHAR < STRING_OFFSET. A sample implementation from pyjmap can be found on Github which shows how to calculate these properly.

Data Section

Contains the individual data entries. The structure of their data is specified by the BCSV's fields. Each entry is aligned to four bytes.

String Pool

Right after the data comes the string pool which contains all strings used within the BCSV.