| 1 | |
|---|
| 2 | \documentclass[10pt]{article} % FORMAT CHANGE |
|---|
| 3 | \usepackage[dvips]{graphicx} |
|---|
| 4 | \usepackage{times} |
|---|
| 5 | |
|---|
| 6 | \graphicspath{{./}{figs/}} |
|---|
| 7 | |
|---|
| 8 | \pagestyle{plain} |
|---|
| 9 | |
|---|
| 10 | \addtolength{\hoffset}{-2cm} |
|---|
| 11 | \addtolength{\textwidth}{4cm} |
|---|
| 12 | |
|---|
| 13 | \addtolength{\voffset}{-1.5cm} |
|---|
| 14 | \addtolength{\textheight}{3cm} |
|---|
| 15 | |
|---|
| 16 | \setlength{\parindent}{0pt} |
|---|
| 17 | \setlength{\parskip}{11pt} |
|---|
| 18 | |
|---|
| 19 | \title{Trove DBPF Handle Allocator } |
|---|
| 20 | \author{PVFS Development Team} |
|---|
| 21 | |
|---|
| 22 | \begin{document} |
|---|
| 23 | \maketitle |
|---|
| 24 | \begin{verbatim}$Id: handle-allocator.tex,v 1.1.72.4 2010-06-19 00:01:12 nlmills Exp $\end{verbatim} |
|---|
| 25 | |
|---|
| 26 | \section{Introduction} |
|---|
| 27 | |
|---|
| 28 | The Trove interface gives out handles -- unique identifiers to trove |
|---|
| 29 | objects. In addition to being unique, handles will not be reused within |
|---|
| 30 | a configurable amount of time. These two constraints make for a handle |
|---|
| 31 | allocator that ends up being a bit more complicated than one might |
|---|
| 32 | expect. Add to that the fact that we want to serialize on disk all or |
|---|
| 33 | part of the handle allocator's state, and here we are with a document to |
|---|
| 34 | explain it all. |
|---|
| 35 | |
|---|
| 36 | \subsection{Data Structures} |
|---|
| 37 | \subsubsection{Extents} |
|---|
| 38 | We have a large handle space we need to represent efficiently. This |
|---|
| 39 | approach uses extents: |
|---|
| 40 | \begin{verbatim} |
|---|
| 41 | |
|---|
| 42 | struct extent { |
|---|
| 43 | int64_t first; |
|---|
| 44 | int64_t last; |
|---|
| 45 | }; |
|---|
| 46 | |
|---|
| 47 | \end{verbatim} |
|---|
| 48 | |
|---|
| 49 | \subsubsection{Extent List} |
|---|
| 50 | We keep the extents (not nescessarily sorted) in the \texttt{extents} |
|---|
| 51 | array. For faster searches, \texttt{index} keeps an index into |
|---|
| 52 | \texttt{extents} in an AVL tree. |
|---|
| 53 | In addition |
|---|
| 54 | to the extents themselves, some bookkeeping members are added. The most |
|---|
| 55 | important is the \texttt{timestamp} member, used to make sure no handle in |
|---|
| 56 | its list gets reused before it should. \texttt{\_\_size} is only used |
|---|
| 57 | internally, keeping track of how big \texttt{extents} is. |
|---|
| 58 | |
|---|
| 59 | \begin{verbatim} |
|---|
| 60 | struct extentlist { |
|---|
| 61 | int64_t __size; |
|---|
| 62 | int64_t num_extents; |
|---|
| 63 | int64_t num_handles; |
|---|
| 64 | struct timeval timestamp; |
|---|
| 65 | struct extent * extents; |
|---|
| 66 | }; |
|---|
| 67 | \end{verbatim} |
|---|
| 68 | |
|---|
| 69 | \subsubsection{Handle Ledger} |
|---|
| 70 | We manage several lists. The \texttt{free\_list} contains all the valid |
|---|
| 71 | handles. The \texttt{recently\_freed\_list} contains handles which have been |
|---|
| 72 | freed, but possibly before some expire time has passed. The |
|---|
| 73 | \texttt{overflow\_list} holds freed handles while items on the |
|---|
| 74 | \texttt{recently\_freed\_list} wait for the expire time to pass. |
|---|
| 75 | |
|---|
| 76 | We save our state by writing out and reading from the three |
|---|
| 77 | \texttt{TROVE\_handle} members, making use of the higher level trove |
|---|
| 78 | interface. |
|---|
| 79 | \begin{verbatim} |
|---|
| 80 | struct handle_ledger { |
|---|
| 81 | struct extentlist free_list; |
|---|
| 82 | struct extentlist recently_freed_list; |
|---|
| 83 | struct extentlist overflow_list; |
|---|
| 84 | FILE *backing_store; |
|---|
| 85 | TROVE_handle free_list_handle; |
|---|
| 86 | TROVE_handle recently_freed_list_handle; |
|---|
| 87 | TROVE_handle overflow_list_handle; |
|---|
| 88 | } |
|---|
| 89 | \end{verbatim} |
|---|
| 90 | |
|---|
| 91 | \section{Algorithm} |
|---|
| 92 | \subsection {Assigning handles} |
|---|
| 93 | Start off with a \texttt{free\_list} of one big extent encompassing the |
|---|
| 94 | entire handle space. |
|---|
| 95 | \begin{itemize} |
|---|
| 96 | \item Get the last extent from the \texttt{free\_list} (We hope getting |
|---|
| 97 | the last extent improves the effiency of the extent representation) |
|---|
| 98 | \item Save \texttt{last} for later return to the caller |
|---|
| 99 | \item Decrement \texttt{last} |
|---|
| 100 | \item if $ first > last $, mark the extent as empty. |
|---|
| 101 | \end{itemize} |
|---|
| 102 | |
|---|
| 103 | \subsection{returning handles} |
|---|
| 104 | \begin {itemize} |
|---|
| 105 | \item when the first handle is returned, it gets added to the |
|---|
| 106 | \texttt{recently\_freed} list. Because this is the first item on that |
|---|
| 107 | list, we check the time. |
|---|
| 108 | \item now we add more handles to the list. we check the time after $N$ handles are returned and update the timestamp. |
|---|
| 109 | \item Once we have added $H$ handles, we decide the \texttt{recently\_freed} |
|---|
| 110 | list has enough handles. We then start using the |
|---|
| 111 | \texttt{overflow\_list} to hold returned handles. |
|---|
| 112 | \item as with the \texttt{recently\_freed} list, we record the time that |
|---|
| 113 | this handle was added, updating the timestamp after every $N$ |
|---|
| 114 | additions. We also check how old the \texttt{recently\_freed} list is. |
|---|
| 115 | \item at some point in time, the whole \texttt{recently\_freed} list is ready |
|---|
| 116 | to be returned to the \texttt{free\_list}. The \texttt{recently\_freed} |
|---|
| 117 | list is merged into the \texttt{free\_list}, the \texttt{overflow\_list} |
|---|
| 118 | becomes the \texttt{recently\_freed} list and the \texttt{overflow\_list} |
|---|
| 119 | is empty. |
|---|
| 120 | \end{itemize} |
|---|
| 121 | |
|---|
| 122 | \subsection{I don't know what to call this section} |
|---|
| 123 | |
|---|
| 124 | Let $T_{r}$ be the minimum response time for an operation of any sort, |
|---|
| 125 | $T_{f}$ be the time a handle must sit before being moved back to the free list, and $N_{tot}$ be the total number of handles available on a server. |
|---|
| 126 | |
|---|
| 127 | The pathological case would be one where a caller |
|---|
| 128 | \begin{itemize} |
|---|
| 129 | \item fills up the \texttt{recently\_freed} list |
|---|
| 130 | \item immediately starts consuming handles as quickly as possible to make for |
|---|
| 131 | the largest possible \texttt{recently\_freed} list in the next pass |
|---|
| 132 | \end{itemize} |
|---|
| 133 | |
|---|
| 134 | This results in the largest number of handles being unavailable due to sitting |
|---|
| 135 | on the \texttt{overflow\_list}. Call $N_{purg}$ the number of handles waiting |
|---|
| 136 | in ``purgatory'' ( waiting for $T_{f}$ to pass) |
|---|
| 137 | \begin{equation} |
|---|
| 138 | N_{purg} = T_{f} / T_{r} |
|---|
| 139 | \end{equation} |
|---|
| 140 | |
|---|
| 141 | \begin{equation} |
|---|
| 142 | F_{purg} = N_{purg} / N_{tot} |
|---|
| 143 | \end{equation} |
|---|
| 144 | \begin{equation} |
|---|
| 145 | F_{purg} = T_{f} / (T_{r} * N_{tot}) |
|---|
| 146 | \end{equation} |
|---|
| 147 | |
|---|
| 148 | We should try to collect statistics and see what $T_{r}$ and $N_{purg}$ end up being for real and pathological workloads. |
|---|
| 149 | |
|---|
| 150 | |
|---|
| 151 | \end{document} |
|---|
| 152 | % vim: tw=72 |
|---|