Windows Vista Forums

Compare 2 files with 6,000 entries each
  1. #1


    Marco Shaw Guest

    Compare 2 files with 6,000 entries each

    I've got one text file, and one csv file. As per a previous thread, I
    wasn't able to figure out how to use compare-object to get all the
    matches from each file (Keith pointed out that I'd need -synchWindow to
    compare > 100 lines, but I haven't tried it yet).

    The only way I've found to match up each file is to load each file:

    $txt=get-content users.txt
    $csv=import-csv users.csv

    Then do something *ugly* like this:

    $matches=$csv|%{$user=$_;$txt|%{if($_.contains($user)){$_}}}

    So I loop through each entry in $csv, assign the current value to $user,
    then loop through each entry in $txt looking for a match.

    It is taking a *long* time...



    Any performance improvement hints? If the above ever finishes, I'll
    give compare-object a spin.

    Marco

      My System SpecsSystem Spec

  2. #2


    Kiron Guest

    Re: Compare 2 files with 6,000 entries each

    $csv = import-csv users.csv
    # if users list is not of unique users already
    # $csv | sort-object user -unique
    get-content users.txt | foreach-object {if($csv -contains $_) {$_}}


    --
    Kiron

      My System SpecsSystem Spec

  3. #3


    Kiron Guest

    Re: Compare 2 files with 6,000 entries each

    Excuse the typo...

    # if users list is not of unique users already
    # $csv = import-csv users.csv | sort-object <user_field> -unique


    --
    Kiron

      My System SpecsSystem Spec

  4. #4


    Jacques Barathon [MS] Guest

    Re: Compare 2 files with 6,000 entries each

    "Marco Shaw" <marco.shaw@_NO_SPAM_gmail.com> wrote in message
    news:eOmJPg$vHHA.1164@TK2MSFTNGP02.phx.gbl...
    > I've got one text file, and one csv file. As per a previous thread, I
    > wasn't able to figure out how to use compare-object to get all the matches
    > from each file (Keith pointed out that I'd need -synchWindow to compare >
    > 100 lines, but I haven't tried it yet).
    >
    > The only way I've found to match up each file is to load each file:
    >
    > $txt=get-content users.txt
    > $csv=import-csv users.csv
    >
    > Then do something *ugly* like this:
    >
    > $matches=$csv|%{$user=$_;$txt|%{if($_.contains($user)){$_}}}


    It will be difficult to avoid having 36,000,000 comparisons (6,000 lines
    compared to 6,000 lines each). The simplest way I can express it in
    PowerShell is this:

    PS> gc file1.txt | ? {(gc file2.txt) -contains $_}

    If the files have very different sizes, replace file1.txt with the biggest
    file and replace file2.txt with the smallest one.
    If file2.txt is still quite big, too big to be read from disk everytime, you
    can save some cycles by assigning it to a variable first:

    PS> $ref = gc file2.txt; gc file1.txt | ? {$ref -contains $_}

    Hope that helps... Very late here, so I'd rather go sleep...

    Jacques


      My System SpecsSystem Spec

  5. #5


    Marco Shaw Guest

    Re: Compare 2 files with 6,000 entries each

    Kiron wrote:
    > $csv = import-csv users.csv
    > # if users list is not of unique users already
    > # $csv | sort-object user -unique
    > get-content users.txt | foreach-object {if($csv -contains $_) {$_}}
    >
    >


    Wow! The old method took 1 hour 39 minutes, this way took 26 seconds.
    My only problem is that the old way returned 86 matches, but this faster
    way returned only 35.

    Something is up...

      My System SpecsSystem Spec

  6. #6


    Kiron Guest

    Re: Compare 2 files with 6,000 entries each

    Is the matching list (users.csv) made up of unique user names?
    The containment operators (-contains and -notcontains) will stop checking if
    the right operand is present in the left operand and move on to compare the
    next object in the pipeline.

    0..7 | where-object { 1, 3, 5, 3 -contains $_ }
    1
    3
    5

    It only returns 3 once. The 'old way' was matching against each element, if
    there were repeated users in $csv you would have more results. Hence the
    unique users list question. Note the use of where-object, Jacques is right,
    this way is more efficient than:

    0..7 | foreach-object { if(1, 3, 5, 3 -contains $_) {$_} }

    --
    Kiron


      My System SpecsSystem Spec

Compare 2 files with 6,000 entries each problems?

Similar Threads
Thread Thread Starter Forum Replies Last Post
VB Sript to compare and delete files Darrin Owen VB Script 1 16 Oct 2009
Re: ignore whitespace when comparing files using compare-object? Shay Levy [MVP] PowerShell 0 08 Mar 2009
Compare files and get the differential IT Staff PowerShell 2 24 Oct 2008
Compare timespan of Files? Daniel Di Vita .NET General 1 07 May 2008
Comparing files with compare-object Marco Shaw PowerShell 4 06 Jul 2007