Why Dictionary is Great - Dynamic List with Indices
I needed a list of objects that could be searched very quickly. I wanted to look up objects by ID, and also by property (e.g., person.City).
So I decided to write an enumerable container of objects with automatic indices. Out came the following code:
It's short and sweet (if you ignore the Blogger butchery). The idea is simple: we automatically create indexes for common queries. Those common queries are chosen to be the properties of the object.
The final magic is in the SelectEq. It takes the name of the property (as a string unfotunately) grabs the right index for that property (_indices[propName]), then grabs the index of the item whose property is equal to the queried value, then returns that item. Phew... Did you get all that?
At first, I didn't think I would be using dictionaries for the indices. I thought something like SortedDictionary would be better. I mean, indices are supposed to be b-trees, not has tables. But Dictionary drastically outperformed SortedList and SortedDictionary. But there's one small problem...
Dictionary only stores unique keys. That means that the indices require that their associated property be unique (in order for that index to work). This is fine if we only want one result. But imagine a query like table.SelectEq("LastName", "Smith"). Here we expect more than one answer. The dictionary can't help us (without making the value of the index a list). Here, the b-trees can. So I guess I need to write more performance tests to see which performs best: b-trees or dictionaries with lists.
There are more possibilities too. With only a little work, new (dynamic) indices can be created based upon a passed-in comparison predicate. That would just be wonderful.
[Update - Some more thoughts]
I don't like the fact that the SelectEq takes a string. Strings don't change when identifiers are renamed. If comparison predicates are used, then we can just pass in the delegate as the select field.
For example, let's pretend we had people and we want an index on their full name:
Now we can, hypothetically, create an index and begin querying:
Yeah, that looks almost right.
So I decided to write an enumerable container of objects with automatic indices. Out came the following code:
using Map = System.Collections.Generic.Dictionary<object,int>;
class Table<t> : IEnumerable<t>
{
List<t> _data;
Dictionary<string,Map> _indices;
PropertyInfo[] _pis;
public Table()
{
_data = new List<t>();
_indices = new Dictionary<string,Map>();
_pis = typeof(T).GetProperties();
foreach (PropertyInfo pi in _pis) {
_indices.Add(pi.Name, new Map());
}
}
public T Add(T value)
{
_data.Add(value);
int index = _data.Count - 1;
foreach (PropertyInfo pi in _pis) {
_indices[pi.Name][pi.GetValue(value, null)] = index;
}
return value;
}
public IEnumerator<t> GetEnumerator()
{
return _data.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public T SelectEq(string propName, object v)
{
return _data[_indices[propName][v]];
}
}
It's short and sweet (if you ignore the Blogger butchery). The idea is simple: we automatically create indexes for common queries. Those common queries are chosen to be the properties of the object.
The final magic is in the SelectEq. It takes the name of the property (as a string unfotunately) grabs the right index for that property (_indices[propName]), then grabs the index of the item whose property is equal to the queried value, then returns that item. Phew... Did you get all that?
At first, I didn't think I would be using dictionaries for the indices. I thought something like SortedDictionary would be better. I mean, indices are supposed to be b-trees, not has tables. But Dictionary drastically outperformed SortedList and SortedDictionary. But there's one small problem...
Dictionary only stores unique keys. That means that the indices require that their associated property be unique (in order for that index to work). This is fine if we only want one result. But imagine a query like table.SelectEq("LastName", "Smith"). Here we expect more than one answer. The dictionary can't help us (without making the value of the index a list). Here, the b-trees can. So I guess I need to write more performance tests to see which performs best: b-trees or dictionaries with lists.
There are more possibilities too. With only a little work, new (dynamic) indices can be created based upon a passed-in comparison predicate. That would just be wonderful.
[Update - Some more thoughts]
I don't like the fact that the SelectEq takes a string. Strings don't change when identifiers are renamed. If comparison predicates are used, then we can just pass in the delegate as the select field.
For example, let's pretend we had people and we want an index on their full name:
string SortableFullName(this IPerson p) {
return p.LastName + ", " + p.FirstName;
}
Now we can, hypothetically, create an index and begin querying:
Table<IPerson> people = ...;
people.AddIndex(SortableFullName);
people.SelectEq(SortableFullName, "Doe, John");
Yeah, that looks almost right.

Reader Comments
Post a Comment