![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
|
Welcome to Vista Forums we are your forum to discuss Windows Vista x64 and x86 systems. Whether you need help or just want to post an idea you have on Vista, this is the forum for you.
br> br> |
| |||||||
![]() |
| | Thread Tools | Display Modes |
| | #1 (permalink) |
| Guest | Much faster than Get-Content. Why? The functions f1() and f2() below do the same popular job, read lines from a file. f1() uses standard Get-Content, f2() uses an interesting feature of switch statement with empty regular expression. The code produces: ... TotalMilliseconds : 3.0332 ... TotalMilliseconds : 1.5639 PS> 3.0332/1.5639 1.93951019886182 Thus, such a peculiar way to read a file as "switch -regex -file" looks almost twice more effective than designed for this task Get-Content. Is this reproducible? If yes: *) Why is it so? *) Is this going to change? CODE: # an existing file $file = "$pshome\about_globbing.help.txt" # Get-Content function f1 { Get-Content $file } # switch -regex -file function f2 { switch -regex -file ($file) { '' {$_} } } # measure measure-command {f1} measure-command {f2} # to be sure that f1 and f2 do the same job Compare-Object (f1) (f2) -- Thanks, Roman |
My System Specs![]() |
| | #2 (permalink) |
| Guest | RE: Much faster than Get-Content. Why? Just for fun: two more competitors are in the game: ItemContent variable and ..NET [System.IO.File]::ReadAllLines: TotalMilliseconds: 3.0423 # Get-Content 1.5551 # switch -regex -file 1.8145 # ItemContent variable 1.0138 # ReadAllLines ( these are averaged results from Measure-CommandEx.ps1 http://nightroman.spaces.live.com/blog/cns!F011223B604739FA!120.entry ) So far ReadAllLines is the fastest (but .NET) way, "switch -regex -file" is still the fastest native PowerShell way. Are there any faster methods to get lines from a file? CODE: # an existing file $file = "$pshome\about_globbing.help.txt" # Get-Content function f1 { Get-Content $file } # switch -regex -file function f2 { switch -regex -file ($file) { '' {$_} } } # ItemContent variable function f3 { Invoke-Expression "`${$file}" } # ReadAllLines function f4 { [System.IO.File]::ReadAllLines($file) } # measure (measure-command {f1}).TotalMilliseconds (measure-command {f2}).TotalMilliseconds (measure-command {f3}).TotalMilliseconds (measure-command {f4}).TotalMilliseconds # to be sure that f1, f2, f3, f4 do the same job Compare-Object (f1) (f2) Compare-Object (f1) (f3) Compare-Object (f1) (f4) -- Thanks, Roman |
My System Specs![]() |
| | #3 (permalink) |
| Guest | Re: Much faster than Get-Content. Why? Sadly the slow speed of powershell is one of its weaknesses. I think once powershell is released, and is known more of, and adopted because of use in microsoft products like exchange, people are going to benchmark it against things like perl and python and even ruby, and it will show lacking, and many may write it off because of that. However i think first powershell is a shell language, and the majority of scripts are going to be small, and rather immediate. Stuff the size of some of the larger perl applications out there may be quite hard to do well in powershell, but thats not such a big problem to me, Also i know that in future versions microsoft will spend alot of time making things go alot faster. Personally i hope the benchmarks come in more favourable than i presume, mostly because i don't want it to be written off because of that , and the benefits of it overlooked, for like perl and other dynamic languages, they came from a presumption, that presumption being: in the past, the cost and speed of computers cost more than the cost of the developers time, while with modern computers and hardware, the preformance for most things is neglible, especially considering over 90% of execution time is spend in OS api calls, drivers etc, the modern reality is the time it takes to for the developer/IT professional is more expensive than the computing hardware. And powershell is very fast in this regard, its learning slight curve pays off again and again with its consistency, interactivity, reusability , composibility etc. Karl |
My System Specs![]() |
| | #4 (permalink) |
| Guest | RE: Much faster than Get-Content. Why? "Roman Kuzmin" wrote: > TotalMilliseconds: > > a) 3.0423 # Get-Content > b) 1.5551 # switch -regex -file > c) 1.8145 # ItemContent variable > d) 1.0138 # ReadAllLines > > Much faster than Get-Content. Why? I'll try to answer your original question: I think it is understandable why d) is the fastest way. PowerShell itself is a .NET application, therefore .NET calls from PowerShell just neet to be translated to native .NET calls, which seems to work very efficiently. Solution b) and c) involve more complicated PowerShell language constructs with a little more language overhead than d). Finally, a) is the slowest possible sulution since that requires analyzing the command to invoke (get-content could be an alias for example), creating new instances of the get-content cmdlet class, etc. In summary I'd say the higher the abstraction, the more overhead you'll get. So you have basically two ways to go: Use cmdlets to write scripts in very little time or use direct .NET calls to get scripts with better performance. However, if you're going to avoid cmdlets in PowerShell completely, because performance is more relevant for you than the time to write such a script, then maybe IronPython or Ruby.NET is a better language for your needs. -- greetings dreeschkind |
My System Specs![]() |
| | #5 (permalink) |
| Guest | RE: Much faster than Get-Content. Why? "dreeschkind" wrote: ... "Karl" wrote: ... Yes, in general I agree with everything. Meanwhile it looks like my benchmark was not quite serious. The test file was too small: "about_globbing.help.txt" - 161 bytes, 12 lines. Now let's try a large file: "microsoft.powershell.commands.management.dll-help.xml" - 886281 bytes, 17420 lines: TotalMilliseconds: a) 1710.3015 # Get-Content b) 715.2535 # switch -regex -file c) 25.0589 # ItemContent variable d) 25.097 # ReadAllLines c) and d) are the fastest and actually almost the same. Now "switch -regex -file" does not look fast at all. But it is still much faster than Get-Content. As for Get-Content, I can understand only some overhead at startup which should be insignificant for large files. But difference is simply enormous (~70 times for this example). IMHO, taking into account popularity and importance of file reading operations, Get-Content should use effective file operations directly avoiding some calls of provider or whatever makes its work so slow. I believe exception for files is necessarily. A user does not care how a cmdlet works internally if it works fine. In my practice parsing of numerous large text files is quite an everyday task. That's a pity that I actually have to use alternatives to Get-Content which is supposed to be a standard way. > then maybe IronPython or Ruby.NET is a better language for your needs Actually I am pretty happy with Perl with its power and performance. Also C# is my old good friend for more complex or performance sensitive tasks. But as many of us I am already familiar with PPSS (Post PowerShell Syndrome)... So I would like to do more things in PowerShell and preferably in its native ways. -- Thanks, Roman |
My System Specs![]() |
| | #6 (permalink) |
| Guest | Re: Much faster than Get-Content. Why? This is a known issue with the way Get-Content works. For each object returned from the pipe, it adds a bunch of extra information to that object in the form of NoteProperties. You can see these properties using get-member: PS (37) > get-content file1.txt | gm -type noteproperty TypeName: System.String Name MemberType Definition ---- ---------- ---------- PSChildName NoteProperty System.String PSChildName=file1.txt PSDrive NoteProperty System.Management.Automation.PSDriveInfo PSDrive=C PSParentPath NoteProperty System.String PSParentPath=C:\Temp\files PSPath NoteProperty System.String PSPath=C:\Temp\files\file1.txt PSProvider NoteProperty System.Management.Automation.ProviderInfo PSProvider=Mi ReadCount NoteProperty System.Int64 ReadCount=1 These properties are being added for *every* object processed in the pipeline. We do this to allow cmdlets to work more effectively together. It's important because things like the Path property may vary across different object types. In effect, we're doing "property name normalization". Unfortunately, while this technique provides significant benefits by making the system more consistent, it isn't free. It adds significant overhead both in terms of processing time and memory space. We're investigating ways to reduce these costs without losing the benefits but in the end, we may need to add a way to suppress adding this extra information. One trick to work around this is to use the -ReadCount parameter. This somewhat misnamed parameter controls the number of records Get-Content writes into the pipeline at a time. So - if you execute Get-Content -readcount 10 foo.txt | out-null you'll see a significant perf improvement because the extra infromation is being added to each collection of 10 records instead of to each record. Take a look at the performace impact -readcount has in some simple examples: PS (42) > (measure-command { get-content junk.txt | out-null }).TotalMilliseconds 249.6448 PS (43) > (measure-command { get-content -readcount 10 junk.txt | out-null }).TotalMilliseconds 52.6695 PS (44) > (measure-command { get-content -readcount 100 junk.txt | out-null }).TotalMilliseconds 7.8794 -bruce -- Bruce Payette [MSFT] Windows PowerShell Technical Lead Microsoft Corporation This posting is provided "AS IS" with no warranties, and confers no rights. Visit the Windows PowerShell Team blog at: http://blogs.msdn.com/PowerShell Visit the Windows PowerShell ScriptCenter at: http://www.microsoft.com/technet/scr.../hubs/msh.mspx My Book: http://manning.com/powershell "Roman Kuzmin" <RomanKuzmin@discussions.microsoft.com> wrote in message news:0716A722-065B-4F00-BEB8-E34CCFC0AB0E@microsoft.com... > "dreeschkind" wrote: ... > "Karl" wrote: ... > > Yes, in general I agree with everything. > > Meanwhile it looks like my benchmark was not quite serious. The test file > was too small: "about_globbing.help.txt" - 161 bytes, 12 lines. Now let's > try > a large file: "microsoft.powershell.commands.management.dll-help.xml" - > 886281 bytes, 17420 lines: > > TotalMilliseconds: > > a) 1710.3015 # Get-Content > b) 715.2535 # switch -regex -file > c) 25.0589 # ItemContent variable > d) 25.097 # ReadAllLines > > c) and d) are the fastest and actually almost the same. Now "switch -regex > -file" does not look fast at all. But it is still much faster than > Get-Content. > > As for Get-Content, I can understand only some overhead at startup which > should be insignificant for large files. But difference is simply enormous > (~70 times for this example). IMHO, taking into account popularity and > importance of file reading operations, Get-Content should use effective > file > operations directly avoiding some calls of provider or whatever makes its > work so slow. I believe exception for files is necessarily. A user does > not > care how a cmdlet works internally if it works fine. > > In my practice parsing of numerous large text files is quite an everyday > task. That's a pity that I actually have to use alternatives to > Get-Content > which is supposed to be a standard way. > >> then maybe IronPython or Ruby.NET is a better language for your needs > > Actually I am pretty happy with Perl with its power and performance. Also > C# > is my old good friend for more complex or performance sensitive tasks. But > as > many of us I am already familiar with PPSS (Post PowerShell Syndrome)... > So I > would like to do more things in PowerShell and preferably in its native > ways. > > -- > Thanks, > Roman > |
My System Specs![]() |
| | #7 (permalink) |
| Guest | Re: Much faster than Get-Content. Why? Bruce Payette [MSFT] wrote: > … Bruce, Thank you for your quite an explanation of the issue and very useful information. I see now that Get-Content is really more complex than I used to think and its performance penalty is perhaps inevitable. >We're investigating ways to reduce these costs without losing the benefits >but in the end, we may need to add a way to suppress adding this extra >information. I wish you all to make things that are good already even much better. Though I am not sure now, but just a thought: perhaps this mechanism should be better disabled by default for some special cases, e.g. like reading lines from a file. It can be optionally enabled by a user only when it is really necessarily. -- Thanks, Roman |
My System Specs![]() |
![]() |
| Thread Tools | |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Set-Content not updating file after get-content and forEach-Object | Tolli | PowerShell | 1 | 06-14-2007 09:01 PM |
| EMC and Microsoft Form New Enterprise Content Management Alliance, Extend Microsoft Office SharePoint Server With Content, Compliance and Archive Solutions | z3r010 | Vista News | 0 | 10-03-2006 08:04 AM |
| Issue: getting/setting variable content using Get/Set-Content | =?Utf-8?B?Um9tYW4gS3V6bWlu?= | PowerShell | 1 | 09-23-2006 04:09 AM |
| Weirdness with get-content | replace | set-content - file content is deleted!! | Andrew Watt [MVP] | PowerShell | 4 | 05-23-2006 05:59 PM |