Author Topic: looking for a Btree not recursive implementation, string keys  (Read 6254 times)

0 Members and 1 Guest are viewing this topic.

Online westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: looking for a Btree not recursive implementation, string keys
« Reply #25 on: December 20, 2017, 08:47:10 am »
Quote
I need to search string-keys, that return uint32_t
Do you need to implement add and delete as well?  Implementing just search without recursion doesn't look too bad.
Hmm.  Do BTrees require a fixed number of keys at each node, or can you fix the size of a node (ie "a disk block") and just cram as many variable-length keys (in the case of a string) in there as will fit?
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #26 on: December 20, 2017, 11:35:34 am »
At the end, I am moving to B+Tree for convenience.

Quote
I need to search string-keys, that return uint32_t
Do you need to implement add and delete as well? 

Sure!

Keys (strings) need to
- being inserted
- being deleted
- being queried
- being of fixed size

leaves contain iblock pointers.

Implementing just search without recursion doesn't look too bad.

With B+tree only leaves can contain values, and they each leaf works like a list's item: it points to the parent, and to the next. This is also good when you have to list all the leaves's content, but ... in my case I also need to provide an interactive method to list the content!

Each time you call a function, it must return NULL if no more item in the list, or the next item. I have already designed the "core", but I still need to define details like ... how to do it?  :D

require a fixed number of keys at each node

yep. Fixed number of keys of fixed size at each node.
« Last Edit: December 20, 2017, 02:09:14 pm by legacy »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: looking for a Btree not recursive implementation, string keys
« Reply #27 on: December 20, 2017, 03:43:17 pm »
Sure!

Keys (strings) need to
- being inserted
- being deleted
- being queried
- being of fixed size

You can do all this without recursion or "fake stack".

What presents a problem is balancing. When you isert, you need to make sure that your tree maintains low depth and doesn't look like linked list instead. Depending on the application, you may not need balancing. If you do need balancing, storing pointers to parents may help to avoid "fake stack" or recursion.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #28 on: December 20, 2017, 03:52:12 pm »
balancing is needed to keep the load of searching as log(N)
 

Offline alank2

  • Super Contributor
  • ***
  • Posts: 2185
Re: looking for a Btree not recursive implementation, string keys
« Reply #29 on: December 20, 2017, 11:21:48 pm »
Why do B+trees have a pointer in all the leaf nodes stringing them together?
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: looking for a Btree not recursive implementation, string keys
« Reply #30 on: December 21, 2017, 04:58:23 am »
balancing is needed to keep the load of searching as log(N)

If you keep it always perfectly balanced, it is becoming less efficient than a binary search on a sorted array. Therefore, you need to give it some slack. Depending on the specific situation, it may be the best not to do balancing at all, do a little bit of a balancing, or do a lot of balancing. This is more art than science.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: looking for a Btree not recursive implementation, string keys
« Reply #31 on: December 21, 2017, 09:52:16 am »
Why do B+trees have a pointer in all the leaf nodes stringing them together?

So you can visit every data item sequentially without bothering your pretty head keeping a stack of index nodes.

Waste of disk space (and program code to maintain the pointer) if you ask me, unless you have so little RAM you can't keep one or two index nodes in memory at the same time as a leaf node.
 

Offline grumpydoc

  • Super Contributor
  • ***
  • Posts: 2905
  • Country: gb
Re: looking for a Btree not recursive implementation, string keys
« Reply #32 on: December 21, 2017, 10:39:57 am »
Sure, you could pre-allocate a bunch of nodes as global/static (or on disk), and then check for running out of them ...

malloc() is not allowed, but custom-made dynamic allocators as you have just described are Ok? This is certainly a very strange set of rules.

Exactly so.
It's not totally mad - code often gets written as though malloc() and friends cannot fail. On larger systems this might not be unreasonable (for various reasons) but in an embedded system with limited resources (and no swap file) it is more likely and the consequences of not handling it gracefully might be more serious.

Pre-allocating and drawing from a pool of fixed-size/fixed-use objects makes it more obvious to the programmer that the resource is finite and more likely that the code will handle running out of that resource gracefully.

It also compartmentalises the issue so one bit of functionality running out of resource does not necessarily stop other system functions.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #33 on: December 21, 2017, 02:42:32 pm »
Code: [Select]
p: populate the btree with {0..19} keys
i: insert a coin and value (value = coin)
f: find the value associated with a coin
d: delete a coin and its associated value
l: show coins of the leaves
> p
> l
0 1 2 3 4  | 5 6 7 8 9  | 10 11 12 13 14  | 15 16 17 18 19
> f 9
btree_coin_find_leaf: [[5 10 15 ]] 1 -> Leaf [5 6 7 8 9 -> Record
coin 9, value 9.
> d 9
btree_coin_find_leaf: [[5 10 15 ]] 1 -> Leaf [5 6 7 8 9 -> Record
> l
0 1 2 3 4 5 6 7 8  | 10 11 12 13 14  | 15 16 17 18 19

The first (demo) draft version  :D
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #34 on: December 21, 2017, 02:44:34 pm »
I am using integer-coins for simplicity, I will replace methods for string-coins  :D

First step: replace pointers to memory, with pointers to disk's block.
 

Offline alank2

  • Super Contributor
  • ***
  • Posts: 2185
Re: looking for a Btree not recursive implementation, string keys
« Reply #35 on: December 21, 2017, 02:50:16 pm »
So you can visit every data item sequentially without bothering your pretty head keeping a stack of index nodes.
Waste of disk space (and program code to maintain the pointer) if you ask me, unless you have so little RAM you can't keep one or two index nodes in memory at the same time as a leaf node.

I agree; I don't see the logic in it at all.  Just seems like another pointer to have to maintain with very little payoff.  You built the tree to navigate the list, just use it.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #36 on: December 21, 2017, 02:51:36 pm »
it may be the best not to do balancing at all, do a little bit of a balancing, or do a lot of balancing. This is more art than science.

Yup, I have designed the core of the library in a way there are chances to tune it on specific needs. I haven't already got any more constraints from my customer, so I have to be generic at this step, in fact in the above demo there is a sub-optimal balancing-algorithm, doing just a little bit of a balancing  :-//
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #37 on: December 21, 2017, 03:10:38 pm »
You built the tree to navigate the list, just use it.

the "next" field is useful for the iterator!

You call the item_get function, and this function read a leaf node, and get the next value available if it's available, if not ... it points to the NEXT leaf.

NODE0 (leaves) --next--> NODE1 (leaves) --next--> NULL

Navigating the whole tree by internal nodes ( != leaves ) requires MORE steps ( = more seeks ) and it's also more complex to be implemented in a sort of interactive way.


In my DEMO, every items listing is performed by a loop

Code: [Select]
done = btree_leaves_list_init(p_btree);
while (done isEqualTo False)
{
    is_done = btree_leaves_list_item_get(p_btree);
    btree_leaves_list_item_show(p_btree);
}

e.g. leaves list

Code: [Select]
> l
0 1 2 3 4 5 6 7 8 9 | 10 11 12 13 14  | 15 16 17 18 19
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: looking for a Btree not recursive implementation, string keys
« Reply #38 on: December 21, 2017, 04:06:51 pm »
You built the tree to navigate the list, just use it.

the "next" field is useful for the iterator!

You call the item_get function, and this function read a leaf node, and get the next value available if it's available, if not ... it points to the NEXT leaf.

NODE0 (leaves) --next--> NODE1 (leaves) --next--> NULL

Navigating the whole tree by internal nodes ( != leaves ) requires MORE steps ( = more seeks ) and it's also more complex to be implemented in a sort of interactive way.

Assuming your internal nodes have about 100 keys/pointers in them (e.g. 32 byte keys and 8 byte pointers in a 4 KB node), it's only about 1% MORE seeks. You'd struggle to notice it.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #39 on: December 21, 2017, 04:37:06 pm »
but nodes need to rewritten to have string-coins (keys) of 100 chars each, and a disk block size is 1Kbyte.

Not so many, in the final code  :-//
 

Offline rhb

  • Super Contributor
  • ***
  • Posts: 3483
  • Country: us
Re: looking for a Btree not recursive implementation, string keys
« Reply #40 on: December 22, 2017, 06:48:30 pm »
It sounds to me as if the actual requirement is fast table search in a provably bounded time.  This has then been translated into "build us a btree" by a PHB  who doesn't understand the impact of the other  rules but has heard of btrees.

Consider using a linked list implemented as a table  of compile time defined size and  use a binary search.  You can give a hard real time bound on access and an error return if data exceeds the size of the linked list.  This provides a provably correct mechanism that cannot fail without a clear error condition.

The requirement not to use dynamic memory makes a great deal of sense in an embedded environment.  Especially if the "btree" is being used for a "would be nice" feature and is part of a "must work correctly" system such as the control system for an airplane.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #41 on: December 22, 2017, 08:02:19 pm »
Especially if the "btree" is being used for a "would be nice" feature and is part of a "must work correctly" system such as the control system for an airplane.

must work correctly, but "must" is under the Level D definition, i.e. if it fails nobody dies, just the mission is failed  ;)

A level-A (usually involved with supersonic fly boards, i.e. Mach 1, Mach 2, etc military airplanes) would be much more critical, since if it fails people dies and the mission is of course failed and this level of failure may drag you under a kind of martial court, and being fired from the job is the last of your problems. But my user-level is not enough classified to give an eye, thus I don't worry too much. Oh, and a Level-A badge has access to every toilette in every floor of the building as well as a free unlimited access to the topgun coffee machines, whose quality is really the topgun and the choice include Decaffeinated, Not Decaffeinated, coffee + milk, coffee + chocolate, etc, and unlimited free access to the swimming pool; while with my shitty level-D ... the coffee is offered by a shitty coffee machine with a coin of 0.50E per cup, you have only two choices (Decaffeinated, Not Decaffeinated), and accessing the swimming pool is rather giving them an eye and a leg in daily payment, plus the only toilette I can open with the badge is located downstairs.

That means: I don't worry too much on the strategy/whatever, they (PHP-badged guys) can have what they ask and pay, it's not my responsibility, otherwise I want a level-B/level-A badge!  :D
« Last Edit: December 22, 2017, 08:15:59 pm by legacy »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #42 on: December 22, 2017, 08:07:18 pm »
Consider using a linked list implemented as a table  of compile time defined size and  use a binary search.  You can give a hard real time bound on access and an error return if data exceeds the size of the linked list.  This provides a provably correct mechanism that cannot fail without a clear error condition.

I can try to offer them an alternative solution, but ... I am afraid of the their reaction, since they would be happy to make me also acquiring the responsibility of the design, which is ... huge an unpaid hazard, and I don't want to say that I am so good at software designing  :-//
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #43 on: December 22, 2017, 08:29:51 pm »
table  of compile time

ummm, there is an hidden constraint that sounds of run-time. May be they want to be able to update their mission's maps on demand during the mission(1). I haven't got told directly, but I see a note about a radio-link that can access the FeRAM disk to write new data chunk. Thus the B+tree request might make sense  :-//

(1) I have to sign the code, as well as I have to sign the test report, but if my algorithm fails then the map is corrupted, and this may compromise the mission. This map is not related to fly map, thus there is no probability to cause a disaster in term of an airplane that crashes due to a software bug. In the worst case the pilot has to turn the plane on its tail and return home. Of course someone may be angry with me and asking me to pay the fuel, which is not so cheap as car's petrol (but I have an insurance for that).
 

Offline rhb

  • Super Contributor
  • ***
  • Posts: 3483
  • Country: us
Re: looking for a Btree not recursive implementation, string keys
« Reply #44 on: December 22, 2017, 11:57:56 pm »
Mmmm. Forgive my saying this, but perhaps you should be doing something else.

The reason that recursive routines and dynamic memory allocation are proscribed is that the run time behavior is unpredictable.  An error in a calling routine can cause resource exhaustion.  A table based linked list of predetermined size either works or says to the caller, "table full".  Unit test validation is easy to write.

If you can't write a linked list table using bisection search you need to get a copy of Sedgewick or the MIT algorithm book and read it cover to cover before the start of the new year.  I can't tell you any details about B-trees.  I have never had an occasion to need to use one.  But I have the references and can come up to speed in a few days at most.  That is what professional programming is about.  Anything less is sub-standard.  You don't need to know anything off the top of your head.  But you do need to be sufficiently familiar with all aspects of the field and have the references so that you can come up to speed in a few days.

An important aspect of professional programming is being able to translate a vague or garbled requirement into a proper specification.  It is not the users responsibility to do this.  It is the programmer's.  "Oh, you want wheels with your car?  We can do that.  You want a steering wheel and brake?  We can do that too, but we'll have to slip the schedule a bit and it will cost extra."  The programmer's job is to translate between human beings and machines.

FWIW I did a port of 500,000 lines of VAX FORTRAN code from VMS to Unix in the early 90's.  At the suggestion of another contractor on the project, I built a regression test facility.  Every time we wrote a new module we wrote a basic test suite for it.  Every time we found a bug in the old code, we wrote a test for it.  Every build ran hundreds of test cases and compared thousands of output files.  This package was used by affiliates of a major oil company all over the world.  After the first year I was tasked with writing up the release notes for an enhanced version.  We had fewer than a dozen user submitted bug reports.  It went down from there.  There were lots of bugs, but they did not get out the door.  We found them first.  After a merger the code ran completely unsupported for another 10-15 years.  It finally went off line when there was no longer any need for the functionality.  There were a couple of 15,000 line libraries which I wrote  which never needed any maintenance except when Sun Microsystems broke getcwd(3c) and did not set errno correctly.  So the logic was changed to check if the string returned was null instead of checking errno.  The really cool code was a set of C routines called from FORTRAN which parsed user defined input.  It built a table of function pointers on the first call.  After that it simply executed the table of functions.  This got us a 6x speed up, which for input files with millions of lines of ASCII text was a huge improvement.   I sweated bullets over that code because if I got it wrong it would have been hell to debug.  It worked perfectly.  Of all the work I've done, that project is the thing I am most proud of.  Very few people can claim that level of reliability in their code.
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: looking for a Btree not recursive implementation, string keys
« Reply #45 on: December 23, 2017, 10:31:59 am »
Quote
If you can't write a linked list table using bisection search...
An important aspect of professional programming is being able to translate a vague or garbled requirement into a proper specification.
I don't think "can't" is the problem, and I don't think the requirement was either "vague" or "garbled."  It was "write some B-tree code that meets our coding standards."  Now, that may have been an inappropriate level of detail from the "user", but I think you're talking about an industry with many layers of bureaucracy and I doubt that the "programmer" gets to change the requirement.   It was probably set by some "systems engineer", "data analyst" or "scientist" and handed down, like in the old days...
 

Offline rhb

  • Super Contributor
  • ***
  • Posts: 3483
  • Country: us
Re: looking for a Btree not recursive implementation, string keys
« Reply #46 on: December 23, 2017, 03:26:51 pm »
Quote
If you can't write a linked list table using bisection search...
An important aspect of professional programming is being able to translate a vague or garbled requirement into a proper specification.
I don't think "can't" is the problem, and I don't think the requirement was either "vague" or "garbled."  It was "write some B-tree code that meets our coding standards."  Now, that may have been an inappropriate level of detail from the "user", but I think you're talking about an industry with many layers of bureaucracy and I doubt that the "programmer" gets to change the requirement.   It was probably set by some "systems engineer", "data analyst" or "scientist" and handed down, like in the old days...

I always managed to change the requirement if such an issue came up.  I simply quoted chapter and verse until everyone's  eyes glazed over. However, you *do* have to be able to explain the conflicting requirement in excruciating detail to at the level of a 5th grader.  I did have one fool who pounded on the table and screamed at me in a meeting when I did this.  Everyone was quite shocked.  I just sat there.  He did get his way, but the application was a failure and was discarded a few months later after a merger.  I learned from industry friends that the acquiring company  kept the scientific libraries I wrote, but junked the rest.

However, I am an outlier as my training was as a scientist.  Programming was just something I picked up because I needed to be able to do it.  So I was on the same level as the senior scientists in the application domain.  And much more knowledgeable in the CS domain.

It comes down to  simply this.  If someone requests dry water, you are obligated to explain why that's not possible.  And do so before you start work, not 6 months later.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: looking for a Btree not recursive implementation, string keys
« Reply #47 on: December 23, 2017, 04:16:27 pm »
I always managed to change the requirement if such an issue came up.

I just sat there.  He did get his way ...

These two statements contradict each other :)
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: looking for a Btree not recursive implementation, string keys
« Reply #48 on: December 23, 2017, 05:23:40 pm »
I doubt that the "programmer" gets to change the requirement.   It was probably set by some "systems engineer", "data analyst" or "scientist" and handed down, like in the old days...

Yes, as "programmer" with a Level-C badge I can neither do criticism. I am an executor. The only things they accept is a note for some requirements that are not clear (and I have to say "they are not clear TO me, can't say they are wrong or too vague).

I know that part of the design comes from someone with an higher user-level than me, i.g.  "systems engineer", "data analyst", "scientist" ... who knows? I can't interface directly with them, from my console their comments are not catch directly, I only get echos from the information propagation  - from my question, to the answer - through the rigid hierarchy, and it's always a one-way divine revelation  from outer space entities with a god certificate.

Anyway, my email can't physically reach their server because even the internal lan needs a user-level to operate. I can interface only to a guy in the middle, who will interface with someone higher than me in the hierarchy, who will interface to someone higher in the hierarchy, .... until someone who has a user-level badget that allows him to enter into the divine circle of those who do "the design"!

The development chain follows the command chain, thus it is unable to bend or be forced out of shape. Not flexible, and forget democracy. Someone gives orders, and you have to obey. Someone gives requirements, and you have to code.

I usually don't know more details than how many they want to give (usually ... a few, and it's not useful to do your job, but you can't question), I only know that they use the tool "Stood" as support for AADL design, and this tool uses Ada and artificial intelligence that implies tons of statistical data used in the process  :-//
« Last Edit: December 23, 2017, 05:26:13 pm by legacy »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf