Windows Vista Forums

The best way to get number of string occurrences in a binary file
  1. #1


    Dainis Guest

    The best way to get number of string occurrences in a binary file

    Hello,

    I would like to know what is the best method for getting a number of
    specified string occurrences in a binary file. After some pondering I've made
    something like this:

    $sum = 0; (Select-String -AllMatches foo .\foobarbaz.txt) |% { $sum +=
    $_.Matches.count }; $sum

    Where foobarbaz.txt is:
    foo bar foo
    baz
    foo foo bar baz
    bar foo bar foo baz
    foo

    But probably you can recommend some other method. It should be optimal for
    large binary files. Or a single command in it's simplest form.

    Thanks in advance



      My System SpecsSystem Spec

  2. #2


    Kiron Guest

    Re: The best way to get number of string occurrences in a binary file

    You're on the right path, Select-String is the Cmdlet for the task. Wrapping the Select-String statement in parenthesis collects all objects before passing them on to ForEach-Object, this affects performance, specially when processing large files/collections. Optionally, you can initialize $Sum in FoEach-Object's -Begin ScriptBlock and output it in ForEach-Object's -End ScriptBlock.

    Using a couple of ,Net methods -IO.File's ReadAllText and Regex's Matches- can speed up the process dramatically but your RAM will suffer, but you can cleanup with another .Net method, GC's Collect. You could write up a function to make it simpler.

    # both one-liners, look out for word wrapping...
    # pure PowerShell
    Select-String -AllMatches foo .\foobarbaz.txt | % {$sum = 0} {$sum += $_.Matches.count} {$sum}

    # .Net through PowerShell, note the fullpath of the file
    [regex]::Matches([IO.File]::ReadAllText('c:\somedir\foobarbaz.txt'), 'foo', 'IgnoreCase').count; [gc]::Collect()

    IO.File's ReadAllText Method is fast. More about it here:
    http://msdn.microsoft.com/en-us/libr...adalltext.aspx

    Regex's Matches Method is casesensitive, use the IgnoreCase option. More here:
    http://msdn.microsoft.com/en-us/libr...x.matches.aspx

    GC's Collect:
    http://msdn.microsoft.com/en-us/libr...c.collect.aspx

    --
    Kiron

      My System SpecsSystem Spec

  3. #3


    Kiron Guest

    Re: The best way to get number of string occurrences in a binary file

    # ...or
    ([IO.File]::ReadAllText('c:\somedir\foobarbaz.txt') | Select-String foo -all).matches.count

    --
    Kiron

      My System SpecsSystem Spec

  4. #4


    Dainis Guest

    Re: The best way to get number of string occurrences in a binary f

    Kiron,

    Many thanks for your suggestions! I think I will stick with your version of
    pure PowerShell one-liner.

    Dainis

      My System SpecsSystem Spec

  5. #5


    Dainis Guest

    Re: The best way to get number of string occurrences in a binary f

    BTW -AllMatches switch is new to PowerShell v2.0. Seems that v1.0 users
    should use the first .NET solution:

    [regex]::Matches([IO.File]::ReadAllText('c:\somedir\foobarbaz.txt'), 'foo',
    'IgnoreCase').count; [gc]::Collect()

      My System SpecsSystem Spec

The best way to get number of string occurrences in a binary file problems?

Similar Threads
Thread Thread Starter Forum Replies Last Post
Suppressing file name and line number when using Select-String Al Fansome PowerShell 4 30 Oct 2009
how to filter file lines that exceed tot occurrences of a char? sardinian_guy PowerShell 5 07 Nov 2008
Adding new String and Binary valules to registry hantana PowerShell 2 16 Oct 2008
Convert a numeric string to COMP? (binary as hex) ssg31415926 PowerShell 1 09 Sep 2008
Re: how to count number of certain char within string Kiron PowerShell 0 16 Jul 2008