We commonly use A4 to refer to our standard files, but the proper gauge for that is 4x20.
Gauge Declaration
Which is: [ref size] x [record length]
Fixed-length Records
Which indicates that we're going to be supporting referential data [refs] and that we're going to be using fixed-length [records].
RefSize, DataLen, RecLen
So a 4x20 gauge file has 20 byte records, uses 4 byte refs [refsize], and the first 4 bytes [a ref] are the datatype identifier.
To know the datatype of a record, read the first 4 bytes, which is a ref, so then read the record in which the datatype is declared.
That record, in our protocol, is a record of type GUID or UUID, 16 bytes, whence our choice of DataLen, the 16 bytes remaining once the first 4 bytes are read, which represent the binary data being stored in that record.
The record declaring the root {gUUID} datatype stores a {gUUID} and is of type {gUUID}, so its datatype points to itself, whence we refer to A4 (Aurora files) as self-referential, which we cover here.
Record ID
Since we have fixed length records, the correlation between the record count and file length is absolute.
For a file length of zero, we have zero records.
At 20 bytes, we've stored 1 record.
At 40 bytes, we've stored 2 records.
So we choose a 1-based index of record ID's, which is file-length / 20 (for this 4x20 gauge).
More conveniently, depending on your coding system, we can use the seek position after writing to do a little modular arithmetic to infer the record ID, or if it's appended to the end of the file, we can use FileLen/20 (FileLen/Gauge.RecLen).
We'll not stress the issue here. It's a bit of coding common sense to figure it out.
Thus an A4 (A4x20) record looks like:
TTTT : DDDD DDDD DDDD DDDD
Four bytes declaring the datatype (as a ref, the record ID of the data-type-declaration record), and 16 bytes of data.
For entertainment, a 1x5 gauge file would have:
T : DDDD
The storage capacity of an Aurora datastore (datafile) is defined as two aspects:
The maximum number of records
The maximum data capacity
We prefer a signed-int ref, because they tend to be the default in coding languages, and it's 'safer' to have the smaller capacity vs an unsigned int.
We disregard negative numbers as record ID's, so we go from 1 to int.Max or however it is phrased in your language of choice: 1 to 2,147,483,647 or 1 to 2 billion for ease of remembering it.
That's a lot of data items (but see extensions in a moment) to reference for your personal jotter, but for databases, especially modern ones (in the trillions of records, I believe), it's not enough.
Simple: use a bigger refsize.
An 8-byte Int64 goes from 1 to 9, 223,372, 036,854, 775,807 or 9 million million million, aka 9 million trillion... so that should keep database enthusiasts happy for a while.
And that's just the number of refs (record ID's) available.
Data Capacity
The data capacity of an A4 (or Aurora) store is the (max) record count x the datalen, how many bytes of data per record.
So if we were restricted to 10 records of 16 bytes each, that would be a data capacity of 160 bytes.
Not all will be used, but we need to get into extensions for that.
For a standard A4 file, our capacity is of the order of 2 billion records x 16 bytes, or 32 billion bytes, aka 32 gig (loosely, depending on if you think of a gig as 1 billion bytes exactly, or its proper definition).
The strict calculation would be Int32.MaxValue x 16 bytes =Â 34,359,738,352 bytes, approx 34 billion bytes.
That should cover a few names, addresses, and a selfie or two.
Bulk Data
If you know you're going to be storing bulk data - lots of images and just a simple declaration record occasionally, then it makes sense to use a larger gauge, while keeping to the convenient Int32 (4 byte) refs.
So you might go to 4x1000, 4x1024, 4x1000,000, whatever suits, and the capacity will be 2 billion x the datalen you have chosen, where datalen is 1000 - 4 = 996, for a 4x1000 gauge.
Extension Records
Let's finally mention the obvious: if the data you want to store is longer than one record, you're going to want to wrap it across multiple records.
So the {gExtn} type is declared such that for data requiring four records, and storing for example a {gString}, you'll have four records in sequence, of type:
{gString}, {gExtn}, {gExtn}, {gExtn}
Reading Records
To read a record (say: ID 27) you'll read the singleton record at the calculated offset for record 27 (ie: the 20 bytes) but you'll also keep reading to see if the next record (28) is an extension.
This sounds inefficient, but in practice reading is carried out with 'buffers' so even if you only want to read 16 bytes, the operating system will read eg: 1024 bytes, so it's then a simple matter to read the already-retrieved-buffer to check for extension records.
We're not going to dive into actual implementation in this walkthrough of the concepts. It should be pretty clear.
It is, after all, by design, a very simple protocol.
Easy to get right. Easy to comply with.