T-tree

In computer science a T-tree is a type of binary tree data structure that is used by main-memory databases, such as Datablitz, EXtremeDB, MySQLCluster, Oracle TimesTen and MobileLite.

A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a B-tree is an index structure optimized for storage on block oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them.

T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields.

The ‘T’ in T-tree refers to the shape of the node data structures in the original paper which first described this type of index.

Performance

Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware.

The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access.

Node structures

A T-tree node usually consists of pointers to the parent node, the left and right child node, an ordered array of data pointers and some extra control data. Nodes with two subtrees are called internal nodes, nodes without subtrees are called leaf nodes and nodes with only one subtree are named half-leaf nodes. A node is called the bounding node for a value if the value is between the node’s current minimum and maximum value, inclusively.

For each internal node, leaf or half leaf nodes exist that contain the predecessor of its smallest data value (called the greatest lower bound) and one that contains the successor of its largest data value (called the least upper bound). Leaf and half-leaf nodes can contain any number of data elements from one to the maximum size of the data array. Internal nodes keep their occupancy between predefined minimum and maximum numbers of elements.

1) T-Tree의 노드

내부 노드(Internal Node)
- 오른쪽, 왼쪽 서브 트리를 가짐. 내부 노드에 포함될 수 있는 데이터의 개수는 T트리 생성할 때 결정
하프리프 노드(Half-leaf Node)
- 방향과 상관 없이 한쪽 서브 트리만을 가지며 하나의 자식 포인터만 가짐
리프 노드(Leaf Node)
- 자식포인터가 하나도 없음

2) 각 내부노드 A(개념도 참조)에 대해, 그 노드의 최소값보다 작은 값을 가지는 서브트리는 트리의 좌측 서브트리가 되고, 반대로 그 노드의 최대값보다 큰 값을 가지는 서브트리는 우측 서브트리가 됨

3) GLB(Greatest Lower Bound)

내부 노드A의 좌측 서브트리 값 중에서 A의 최소값과 가장 근접한 값

4)LUB(Lowest Upper Bound)

A의 우측 서브트리 값 중에서 A의 최대값과 가장 근접한 값

T-Tree Algorithm

가. 검색 알고리즘

T-Tree에서의 검색은 이진 트리에서의 검색과 유사하나 이진 트리에서는 한 개의 값만을 비교하고, T-Tree에서는 그 노드의 가장 작은 값과 가장 큰 값을 비교

검색은 항상 트리의 루트부터 시작
만약 검색하려는 값이 그 노드의 가장 작은 값보다 작으면 왼쪽 서브 트리로 이동하여 계속해서 검색함.
그렇지 않고 만약 검색 값이 그 노드의 가장 큰 값보다 크면 오른쪽 트리로 이동하여 계속 검색함.
그렇지 않으면 현재 노드에서 찾음.

나. 삽입 알고리즘

새로운 값이 삽입되고 나면 균형(Balance)인지를 검사하여 만약 삽입 후 unbalance하면 적당한 rebalance연산 수행

삽입할 위치 검색
만약 삽입할 노드가 발견되면 삽입할 수 있는 여분의 방이 있는지를 검사하여 적당한 여분의 방이 있으면 삽입하고 종료.
그렇지 않으면 가장 작은 아이템을 제거하고 삽입하여는 값을 삽입한 후 그 노드 단말 노드에 제거한 값을 삽입
만약 트리가 비어 있고 삽입할 값의 경계가 없으면 가장 나중에 삽입. 만약 이 삽입된 값이 적당하면 그 값이 새로운 가장 작은 값이 되든지 아니면 가장 큰 값이 될 수 있음.
그렇지 많으면 새로운 노드를 생성
만약 새로운 노드가 첨가되면 단말 노드부터 루트까지의 경로에 대한 balancing검사

Stop Thinking, Just Do!

T tree

Tags

19 February 2018

T-tree

Performance

Node structures

T-Tree Algorithm