diff options
Diffstat (limited to 'AKFS')
-rw-r--r-- | AKFS | 141 |
1 files changed, 141 insertions, 0 deletions
@@ -0,0 +1,141 @@ +AKFS - Arching Kaos File System +=============================== + +> (previously named SMFS - Sha(512) Merkle File System) + +1. Introduction +--------------- +This filesystem follows the merkle tree architecture. Given a SHA512 hash root, +we can climb the branches, towards the leafs of the tree, keeping the order of +the leaves, intact. + +In our case, leaves represent chunks of a splitted file. + +2. Analysis +----------- +Each file is encoded in base64, splitted in chunks of 4KB-4MB (chunk size will +vary depending on the size of the original file) and then for each chunk, the +SHA512 hash of it calculated. + +Then, starting from the first's chunk hash, we store it with the next's chunk +hash in a text file. We move to the third and we repeat the process. The figure +next, will help -hopefully- in understanding this process. + + + ___ + | |\ + .---------->|RUT'-|<---------. + | |HASH | | + | |_____| | + | | + | | + | | + ___ ___ + | |\ | |\ + .----->|BR0'-| |BR1'-|<-------. + | |HASH | |HASH | | + | |_____| |_____| | + | ^ ^ | + | | | | + | | | | + ___ | ___ ___ | ___ + | |\ | | |\ | |\ | | |\ + |CH1'-| '---|CH2'-| |CH3'-| | |CH4'-| + |HASH | |HASH | |HASH |------' |HASH | + |_____| |_____| |_____| |_____| + +Each "packet" of chunk hashes is called a "branch". Effectively, we repeat the +process above for the branches as well, matching the pattern 1-2,3-4,..,(N-1)-N. + +We repeat this process until we have one branch as a result. This is called the +"root" (depicted as RUT in the figure). The only note we take, is that if N is +an odd number, we simply duplicate the last hash, for example, 1-2,3-3 and we +ignore them when reconstructing the original file. + +To "merge" the leaves back to a file, we follow the process in reverse order. + +3. Benefits +----------- +Although storage nowadays is not a big concern, connectivity issues still are. +Having small pieces of information about the file to get small and verifiable +pieces of that file, helps with low-bandwidth connectivity and frequent +disconnects. + +4. Influence +------------ +The structure is heavily influenced by torrents and bitcoin. + +5. Implementation +----------------- +Currently, there are two bash scripts in the `bin` directory doing the file to +hash and the hash to file operations, respectively: +- `ak-sm-merkle-tree` -> `ak fs --add` +- `ak-sm-merkle-tree-to-file` -> `ak fs --get` + +6. Specifications +----------------- +As part of arching-kaos, the tree is stored under the `~/.arching-kaos` +directory. + +The initial file is converted to base64. +The file is splitted depending on its size in chunks under the `ftr` directory. +Branches are consisting of 2 hashes as strings, separated with a '\n'. The same +applies for the root. + +7. Networking +------------- +Although networking capabilities are not part of the current implementation of +the system, it's fairly easy to exchange branches and root hashes as 'metadata' +and chunks/leaves as 'data'. A DHT-like implementation is under the works so +nodes requesting whichever hash can discover nodes that have those. + +8. Bouquets of leaves (maps) +---------------------------- +Based on the structure of the merkle trees we are producing here, another way is +possible to share information about the leaves (chunks) of a file. Instead of +packing those as two hashes of leaves in one branch, we would get all the leaves +hashes ordered as leaf01,leaf02,...,leafN. + +For this kind of work see the following bash scripts: +./bin/ak-sm-filejoiner +./bin/ak-sm-filesplitter + +9. Storage +---------- +In current implementations storage is as follows: +- $AK_WORKDIR/fmrk branches, +- $AK_WORKDIR/ftr leaves and +- $AK_WORKDIR/fmp maps + +The files are NOT organized for now. + +To update this non structured into something more handy, we can use each digit +of a given SHA512 hash as a directory name. + +For example given the hash: +0d1e5cd136e004be8c9625b1037e2e908304f03998b94cd201f6ca8a125bab03385f3c9c11b3c7cb280fb6b6f1fcbf9b2877e48dad09c81fa0ff6e5e7412ad0e + +We should search or store or read in the following file: +0/d/1/e/5/c/d/1/3/6/e/0/0/4/b/e/8/c/9/6/2/5/b/1/0/3/7/e/2/e/9/0/8/3/0/4/f/0/3/9/9/8/b/9/4/c/d/2/0/1/f/6/c/a/8/a/1/2/5/b/a/b/0/3/3/8/5/f/3/c/9/c/1/1/b/3/c/7/c/b/2/8/0/f/b/6/b/6/f/1/f/c/b/f/9/b/2/8/7/7/e/4/8/d/a/d/0/9/c/8/1/f/a/0/f/f/6/e/5/e/7/4/1/2/a/d/0/e + +In this way we would be storing up to 16 files per directory. + +Now, doubling the digits would get us 256 files per directory: +0d/1e/5c/d1/36/e0/04/be/8c/96/25/b1/03/7e/2e/90/83/04/f0/39/98/b9/4c/d2/01/f6/ca/8a/12/5b/ab/03/38/5f/3c/9c/11/b3/c7/cb/28/0f/b6/b6/f1/fc/bf/9b/28/77/e4/8d/ad/09/c8/1f/a0/ff/6e/5e/74/12/ad/0e + +Finally, a 65536 files per directory approach would be with 3 digits per +directory, like: +0d1e/5cd1/36e0/04be/8c96/25b1/037e/2e90/8304/f039/98b9/4cd2/01f6/ca8a/125b/ab03/385f/3c9c/11b3/c7cb/280f/b6b6/f1fc/bf9b/2877/e48d/ad09/c81f/a0ff/6e5e/7412/ad0e + +But that is too much files/directory ratio. + +We can also reuse the original hash like: +0d/1e/5c/d1/36/e0/04/be/8c/96/25/b1/03/7e/2e/90/83/04/f0/39/98/b9/4c/d2/01/f6/ca/8a/12/5b/ab/03/38/5f/3c/9c/11/b3/c7/cb/28/0f/b6/b6/f1/fc/bf/9b/28/77/e4/8d/ad/09/c8/1f/a0/ff/6e/5e/74/12/ad/0d1e5cd136e004be8c9625b1037e2e908304f03998b94cd201f6ca8a125bab03385f3c9c11b3c7cb280fb6b6f1fcbf9b2877e48dad09c81fa0ff6e5e7412ad0e + +But, on the other hand, we already know the hash from the path we walked into +and we can recalculate the hash of the file we found, in case we want to verify +the correctness of the path/file. + +To take care of all the above, a "driver" should be implemented, that it would +be given a hash and it would return the location of the file. We can go for the +2digit per directory approach. |