![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks. |
| |||||||
![]() |
| |
| | #1 (permalink) |
| | List<> versus HashSet<>: Which is more efficient for lookups on small sets? Assume I have a set of items, integers let's say, and I need to determine if a particular integer exists within the set. If the list contained 10,000 items, clearly I'd want to use some type of dictionary class (Dictionary<> or HashSet<> or SortedList<>) instead of a List<>. If the list only contained two or three items, a simple array or List<> would probably be more efficient. Is there some general rule that dictates the size at which one would be better off using a dictionary-based class as opposed to an array or List<> for lookups? Thanks, Jules |
My System Specs![]() |
| | #2 (permalink) |
| | Re: List<> versus HashSet<>: Which is more efficient for lookupson small sets? Jules Winfield wrote: Quote: > Assume I have a set of items, integers let's say, and I need to determine if > a particular integer exists within the set. > > If the list contained 10,000 items, clearly I'd want to use some type of > dictionary class (Dictionary<> or HashSet<> or SortedList<>) instead of a > List<>. If the list only contained two or three items, a simple array or > List<> would probably be more efficient. Is there some general rule that > dictates the size at which one would be better off using a dictionary-based > class as opposed to an array or List<> for lookups? > heavily dependent on specific use (not to mention processor architecture, cache sizes and all sorts of other things you really don't want to be bothering with in general). I usually put the cutoff at 100 items as an order of magnitude approximation (i.e. if I know it will never be more than 100 I use a list), but I don't kid myself thinking that this will always yield optimal results. The important thing to keep in mind is scalability. Assuming that a collection will always be small is a dangerous thing to do and requires that one can show why not only why this is so but also why it pays off to count on it. It's obviously much more costly to gamble wrong when you assume collections will be small than it is to gamble wrong when you assume they'll be big. -- J. |
My System Specs![]() |
| | #3 (permalink) |
| | Re: List<> versus HashSet<>: Which is more efficient for lookups on small sets? Jules Winfield wrote: Quote: > Assume I have a set of items, integers let's say, and I need to determine if > a particular integer exists within the set. > > If the list contained 10,000 items, clearly I'd want to use some type of > dictionary class (Dictionary<> or HashSet<> or SortedList<>) instead of a > List<>. If the list only contained two or three items, a simple array or > List<> would probably be more efficient. Is there some general rule that > dictates the size at which one would be better off using a dictionary-based > class as opposed to an array or List<> for lookups? the right API and, if possible, performance curve (i.e. big-Oh) for your data set. The right API is more important: you can replace the implementation later if measurements show that it matters. For cases where it really does matter, you need to measure. From my measurements (microbenchmarks with a hot cache), the switchover to Dictionary<> or HashSet<> being more efficient than a simple array comes pretty quickly, and they're not particularly slow to start with (normally beating List<int> on almost anything): Total times for 1310720 iterations of 2 probes in collections of size 4: List<int> (linear scan) 0.186 sec List<int> (binary search) 0.231 sec HashSet<int> 0.140 sec Dictionary<int,int> 0.154 sec int[] (manual inline scan) 0.109 sec int[] (Array.IndexOf<int>) 0.154 sec int[] (Array.BinarySearch<int>) 0.190 sec Total times for 327680 iterations of 10 probes in collections of size 30: List<int> (linear scan) 0.412 sec List<int> (binary search) 0.330 sec HashSet<int> 0.165 sec Dictionary<int,int> 0.181 sec int[] (manual inline scan) 0.183 sec int[] (Array.IndexOf<int>) 0.250 sec int[] (Array.BinarySearch<int>) 0.285 sec Total times for 327680 iterations of 10 probes in collections of size 100: List<int> (linear scan) 0.903 sec List<int> (binary search) 0.392 sec HashSet<int> 0.170 sec Dictionary<int,int> 0.181 sec int[] (manual inline scan) 0.338 sec int[] (Array.IndexOf<int>) 0.382 sec int[] (Array.BinarySearch<int>) 0.353 sec Bear in mind that linear scans through an array are probably slightly kinder on the cache than HashSet, but linear scans only scale linearly. HashSet is probably quite a bit kinder on cache than (e.g.) SortedDictionary, but of course, it's not sorted. * List<int> tests done with IndexOf & BinarySearch * manual inline scan is a hand-coded for-loop * The numbers above were gotten by doubling the iteration count, (starting with 10), until the fastest time was greater than 1 ms. * The profiling infrastructure is a constant baked into every time - it includes calling through a delegate * Times are best run of 5 on a Q6600 @ 2.8GHz, .NET 3.5, release mode. -- Barry -- http://barrkel.blogspot.com/ |
My System Specs![]() |
| | #4 (permalink) |
| | RE: List<> versus HashSet<>: Which is more efficient for lookups on small sets? Hi Jules, For such performance issue, it's hard to give a definite number or boundary based on appearnce, I agree other members's suggestion on profiling the application to get the actual result. For general rules, if the count of elements is a small fixed number(such as less than 100 and will not increase ), it's ok to use simple List or Array since it just take little constant time. However, if the count may vary(in large range) and the search is performing frequently against it, use a more search-ready structure(such as binary sort tree or hashtable) is preferred. I suggest you have a look at the book "Programming Pearls" which may give you some good ideas on this. Sincerely, Steven Cheng Microsoft MSDN Online Support Lead Delighting our customers is our #1 priority. We welcome your comments and suggestions about how we can improve the support we provide to you. Please feel free to let my manager know what you think of the level of service provided. You can send feedback directly to my manager at: msdnmg@xxxxxx. ================================================== Get notification to my posts through email? Please refer to http://msdn.microsoft.com/subscripti...ult.aspx#notif ications. ================================================== This posting is provided "AS IS" with no warranties, and confers no rights. -------------------- Quote: >NNTP-Posting-Date: Sun, 06 Apr 2008 10:53:04 -0500 >From: "Jules Winfield" <Jules.Winfield@xxxxxx> >Newsgroups: microsoft.public.dotnet.general >Subject: List<> versus HashSet<>: Which is more efficient for lookups on Quote: >Date: Sun, 6 Apr 2008 08:53:04 -0700 Quote: >Assume I have a set of items, integers let's say, and I need to determine Quote: >a particular integer exists within the set. > >If the list contained 10,000 items, clearly I'd want to use some type of >dictionary class (Dictionary<> or HashSet<> or SortedList<>) instead of a >List<>. If the list only contained two or three items, a simple array or >List<> would probably be more efficient. Is there some general rule that >dictates the size at which one would be better off using a Quote: >class as opposed to an array or List<> for lookups? > >Thanks, > >Jules > > > |
My System Specs![]() |
| | #5 (permalink) |
| | Re: List<> versus HashSet<>: Which is more efficient for lookups on small sets? Barry Kelly wrote: Quote: > Jules Winfield wrote: > Quote: > > Assume I have a set of items, integers let's say, and I need to > > determine if a particular integer exists within the set. > > > > If the list contained 10,000 items, clearly I'd want to use some > > type of dictionary class (Dictionary<> or HashSet<> or > > SortedList<>) instead of a List<>. If the list only contained two > > or three items, a simple array or List<> would probably be more > > efficient. Is there some general rule that dictates the size at > > which one would be better off using a dictionary-based class as > > opposed to an array or List<> for lookups? > A general rule? Don't micro-optimize, and use a library collection > with the right API and, if possible, performance curve (i.e. big-Oh) > for your data set. The right API is more important: you can replace > the implementation later if measurements show that it matters. HashSet has O(1) and List<T> has O(n) for looking up an element, so that's pretty clear. What I find interesting is that your benchmarks don't show this. -> Quote: > For cases where it really does matter, you need to measure. From my > measurements (microbenchmarks with a hot cache), the switchover to > Dictionary<> or HashSet<> being more efficient than a simple array > comes pretty quickly, and they're not particularly slow to start with > (normally beating List<int> on almost anything): > > Total times for 1310720 iterations of 2 probes in collections of size > 4: List<int> (linear scan) 0.186 sec > List<int> (binary search) 0.231 sec > HashSet<int> 0.140 sec > Dictionary<int,int> 0.154 sec > int[] (manual inline scan) 0.109 sec > int[] (Array.IndexOf<int>) 0.154 sec > int[] (Array.BinarySearch<int>) 0.190 sec internally an int[] construct to store the elements. So my question then is: what exactly did you lookup and where were these values located in the set of 4 possible elements? O(1) is real time, not amortized, for HashSet, but your test suggests otherwise (amortized). If your tests in general had the element to find in the first half of the linear list for example, it's not a fair benchmark. FB -- ------------------------------------------------------------------------ Lead developer of LLBLGen Pro, the productive O/R mapper for .NET LLBLGen Pro website: http://www.llblgen.com My .NET blog: http://weblogs.asp.net/fbouma Microsoft MVP (C#) ------------------------------------------------------------------------ |
My System Specs![]() |
| | #6 (permalink) |
| | Re: List<> versus HashSet<>: Which is more efficient for lookups on small sets? Frans Bouma [C# MVP] wrote: Quote: > Barry Kelly wrote: > Quote: > > For cases where it really does matter, you need to measure. From my > > measurements (microbenchmarks with a hot cache), the switchover to > > Dictionary<> or HashSet<> being more efficient than a simple array > > comes pretty quickly, and they're not particularly slow to start with > > (normally beating List<int> on almost anything): > > > > Total times for 1310720 iterations of 2 probes in collections of size > > 4: List<int> (linear scan) 0.186 sec > > List<int> (binary search) 0.231 sec > > HashSet<int> 0.140 sec > > Dictionary<int,int> 0.154 sec > > int[] (manual inline scan) 0.109 sec > > int[] (Array.IndexOf<int>) 0.154 sec > > int[] (Array.BinarySearch<int>) 0.190 sec > In a way this is a bit surprising. Especially since a List<int> uses > internally an int[] construct to store the elements. > > So my question then is: what exactly did you lookup and where were > these values located in the set of 4 possible elements? O(1) is real > time, not amortized, for HashSet, but your test suggests otherwise > (amortized). larger values of n, some cache effects should start coming in. Quote: > If your tests in general had the element to find in the > first half of the linear list for example, it's not a fair benchmark. probably be skewed - but the total time shouldn't be high in any case. We are talking >1 million iterations in a fifth of a second or so. The code was using new Random(42).Next(dataset_count) as its series for numbers to probe for. Here's the full code I used: ---8<--- using System; using System.Collections.Generic; using System.Linq; using System.Diagnostics; class App { const int ProfileIterCount = 10; const int ProfileTestCount = 5; const double MinTime = 0.1; static double Profile(Action code, int iterCount) { Stopwatch w = Stopwatch.StartNew(); for (int i = 0; i < iterCount; ++i) code(); return w.ElapsedTicks / (double) Stopwatch.Frequency; } static double Profile(Action setup, Action code, int iterCount) { double bestResult = Double.PositiveInfinity; for (int i = 0; i < ProfileTestCount; ++i) { if (setup != null) setup(); double result = Profile(code, iterCount); if (result < bestResult) bestResult = result; } return bestResult; } class ProfileItem { public string Name { get; set; } public Action Code { get; set; } public Action Setup { get; set; } public double BestTime { get; set; } // Time will be at least 0.1 seconds, therefore we can get away // with three digits. public override string ToString() { return string.Format("{0,-35} {1,7:F3} sec", Name, BestTime); } } static int Profile(params ProfileItem[] items) { int iterCount = ProfileIterCount; too_fast: foreach (var item in items) { item.BestTime = Profile(item.Setup, item.Code, iterCount); if (item.BestTime < MinTime) { iterCount *= 2; goto too_fast; } } return iterCount; } static void Main(string[] args) { try { int itemCount = args.Length > 0 ? int.Parse(args[0]) : 5; int probeCount = args.Length > 1 ? int.Parse(args[1]) : itemCount / 2; List<int> intList = null; HashSet<int> intHash = null; Dictionary<int,int> intDict = null; int[] intArray = null; Random r = null; ProfileItem[] items = new[] { new ProfileItem() { Name = "List<int> (linear scan)", Setup = delegate { intList = new List<int>( Enumerable.Range(0, itemCount)); r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intList.Count); if (!intList.Contains(test)) Console.WriteLine("bad news"); } }, }, new ProfileItem() { Name = "List<int> (binary search)", Setup = delegate { intList = new List<int>( Enumerable.Range(0, itemCount)); r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intList.Count); if (intList.BinarySearch(test) < 0) Console.WriteLine("bad news"); } }, }, new ProfileItem() { Name = "HashSet<int>", Setup = delegate { intHash = new HashSet<int>( Enumerable.Range(0, itemCount)); r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intHash.Count); if (!intHash.Contains(test)) Console.WriteLine("bad news"); } }, }, new ProfileItem() { Name = "Dictionary<int,int>", Setup = delegate { intDict = new Dictionary<int,int>(); for (int i = 0; i < itemCount; ++i) intDict.Add(i, i); r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intDict.Count); if (!intDict.ContainsKey(test)) Console.WriteLine("bad news"); } }, }, new ProfileItem() { Name = "int[] (manual inline scan)", Setup = delegate { intArray = new int[itemCount]; for (int i = 0; i < itemCount; ++i) intArray[i] = i; r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intArray.Length); int foundIndex = -1; for (int j = 0; j < intArray.Length; ++j) if (intArray[j] == test) { foundIndex = j; break; } if (foundIndex < 0) Console.WriteLine("bad news"); } }, }, new ProfileItem() { Name = "int[] (Array.IndexOf<int>)", Setup = delegate { intArray = new int[itemCount]; for (int i = 0; i < itemCount; ++i) intArray[i] = i; r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intArray.Length); int foundIndex = Array.IndexOf(intArray, test); if (foundIndex < 0) Console.WriteLine("bad news"); } }, }, new ProfileItem() { Name = "int[] (Array.BinarySearch<int>)", Setup = delegate { intArray = new int[itemCount]; for (int i = 0; i < itemCount; ++i) intArray[i] = i; r = new Random(42); }, Code = delegate { for (int i = 0; i < probeCount; ++i) { int test = r.Next(intArray.Length); int foundIndex = Array.BinarySearch(intArray, test); if (foundIndex < 0) Console.WriteLine("bad news"); } }, }, }; int count = Profile(items); Console.WriteLine( "Total times for {0} iterations of {1} probes in collections of" + " size {2}:", count, probeCount, itemCount); for (int i = 0; i < items.Length; ++i) Console.WriteLine(items[i]); } catch (Exception ex) { Console.Error.WriteLine(ex.Message); } } } --->8--- It's a micro-benchmark with an overhead in its measuring, so the usual caveats apply. Usage: <.exe> [item-count] [probe-count] -- Barry -- http://barrkel.blogspot.com/ |
My System Specs![]() |
| | #7 (permalink) |
| | Re: List<> versus HashSet<>: Which is more efficient for lookups on small sets? Frans Bouma [C# MVP] wrote: Quote: > Barry Kelly wrote: Quote: > > Total times for 1310720 iterations of 2 probes in collections of size > > 4: ---8<--- Total times for 1310720 iterations of 2 probes in collections of size 4: List<int> (linear scan) 0.174 sec List<int> (binary search) 0.213 sec HashSet<int> 0.133 sec Dictionary<int,int> 0.142 sec int[] (manual inline scan) 0.102 sec int[] (Array.IndexOf<int>) 0.157 sec int[] (Array.BinarySearch<int>) 0.187 sec Ranking: int[] (manual inline scan) 0% slower than best HashSet<int> 30% slower than best Dictionary<int,int> 39% slower than best int[] (Array.IndexOf<int>) 54% slower than best List<int> (linear scan) 71% slower than best int[] (Array.BinarySearch<int>) 84% slower than best List<int> (binary search) 109% slower than best --->8--- Quote: > So my question then is: what exactly did you lookup and where were > these values located in the set of 4 possible elements? O(1) is real > time, not amortized, for HashSet, but your test suggests otherwise > (amortized). If your tests in general had the element to find in the > first half of the linear list for example, it's not a fair benchmark. does change things, but not by a whole lot: ---8<--- Total times for 40960 iterations of 100 probes in collections of size 4: List<int> (linear scan) 0.259 sec List<int> (binary search) 0.308 sec HashSet<int> 0.190 sec Dictionary<int,int> 0.203 sec int[] (manual inline scan) 0.141 sec int[] (Array.IndexOf<int>) 0.244 sec int[] (Array.BinarySearch<int>) 0.277 sec Ranking: int[] (manual inline scan) 0% slower than best HashSet<int> 35% slower than best Dictionary<int,int> 44% slower than best int[] (Array.IndexOf<int>) 73% slower than best List<int> (linear scan) 84% slower than best int[] (Array.BinarySearch<int>) 97% slower than best List<int> (binary search) 119% slower than best --->8--- The ranking stays the same, and the percentage differences aren't that much different either. Added ranking code (for the program I posted earlier): ---8<--- Console.WriteLine(); Console.WriteLine("Ranking:"); items = items.OrderBy(item => item.BestTime).ToArray(); foreach (var item in items) { Console.WriteLine("{0,-37} {1,5:F0}% slower than best", item.Name, (item.BestTime - items[0].BestTime) * 100 / items[0].BestTime); } --->8--- -- Barry -- http://barrkel.blogspot.com/ |
My System Specs![]() |
| | #8 (permalink) |
| | Re: List<> versus HashSet<>: Which is more efficient for lookups on small sets? Barry Kelly wrote: Quote: > int[] (manual inline scan) 0% slower than best -- Barry -- http://barrkel.blogspot.com/ |
My System Specs![]() |
![]() |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Forum | |||
| Most efficient virtual machine? | Virtual PC | |||
| iTunes opening windows as 'Small Icons' but want 'List' - how ? | Vista file management | |||
| Erasing all saved names from lookups . . | Vista performance & maintenance | |||
| A more efficient way to copy files? | Vista file management | |||
| A small developer-oriented wish list for PowerShell | PowerShell | |||