Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > PowerShell

Vista - Compare 2 files with 6,000 entries each

Reply
 
Old 07-06-2007   #1 (permalink)
Marco Shaw


 
 

Compare 2 files with 6,000 entries each

I've got one text file, and one csv file. As per a previous thread, I
wasn't able to figure out how to use compare-object to get all the
matches from each file (Keith pointed out that I'd need -synchWindow to
compare > 100 lines, but I haven't tried it yet).

The only way I've found to match up each file is to load each file:

$txt=get-content users.txt
$csv=import-csv users.csv

Then do something *ugly* like this:

$matches=$csv|%{$user=$_;$txt|%{if($_.contains($user)){$_}}}

So I loop through each entry in $csv, assign the current value to $user,
then loop through each entry in $txt looking for a match.

It is taking a *long* time...

Any performance improvement hints? If the above ever finishes, I'll
give compare-object a spin.

Marco

My System SpecsSystem Spec
Old 07-06-2007   #2 (permalink)
Kiron


 
 

Re: Compare 2 files with 6,000 entries each

$csv = import-csv users.csv
# if users list is not of unique users already
# $csv | sort-object user -unique
get-content users.txt | foreach-object {if($csv -contains $_) {$_}}


--
Kiron
My System SpecsSystem Spec
Old 07-06-2007   #3 (permalink)
Kiron


 
 

Re: Compare 2 files with 6,000 entries each

Excuse the typo...

# if users list is not of unique users already
# $csv = import-csv users.csv | sort-object <user_field> -unique


--
Kiron
My System SpecsSystem Spec
Old 07-06-2007   #4 (permalink)
Jacques Barathon [MS]


 
 

Re: Compare 2 files with 6,000 entries each

"Marco Shaw" <marco.shaw@_NO_SPAM_gmail.com> wrote in message
news:eOmJPg$vHHA.1164@TK2MSFTNGP02.phx.gbl...
> I've got one text file, and one csv file. As per a previous thread, I
> wasn't able to figure out how to use compare-object to get all the matches
> from each file (Keith pointed out that I'd need -synchWindow to compare >
> 100 lines, but I haven't tried it yet).
>
> The only way I've found to match up each file is to load each file:
>
> $txt=get-content users.txt
> $csv=import-csv users.csv
>
> Then do something *ugly* like this:
>
> $matches=$csv|%{$user=$_;$txt|%{if($_.contains($user)){$_}}}


It will be difficult to avoid having 36,000,000 comparisons (6,000 lines
compared to 6,000 lines each). The simplest way I can express it in
PowerShell is this:

PS> gc file1.txt | ? {(gc file2.txt) -contains $_}

If the files have very different sizes, replace file1.txt with the biggest
file and replace file2.txt with the smallest one.
If file2.txt is still quite big, too big to be read from disk everytime, you
can save some cycles by assigning it to a variable first:

PS> $ref = gc file2.txt; gc file1.txt | ? {$ref -contains $_}

Hope that helps... Very late here, so I'd rather go sleep...

Jacques

My System SpecsSystem Spec
Old 07-06-2007   #5 (permalink)
Marco Shaw


 
 

Re: Compare 2 files with 6,000 entries each

Kiron wrote:
> $csv = import-csv users.csv
> # if users list is not of unique users already
> # $csv | sort-object user -unique
> get-content users.txt | foreach-object {if($csv -contains $_) {$_}}
>
>


Wow! The old method took 1 hour 39 minutes, this way took 26 seconds.
My only problem is that the old way returned 86 matches, but this faster
way returned only 35.

Something is up...
My System SpecsSystem Spec
Old 07-06-2007   #6 (permalink)
Kiron


 
 

Re: Compare 2 files with 6,000 entries each

Is the matching list (users.csv) made up of unique user names?
The containment operators (-contains and -notcontains) will stop checking if
the right operand is present in the left operand and move on to compare the
next object in the pipeline.

0..7 | where-object { 1, 3, 5, 3 -contains $_ }
1
3
5

It only returns 3 once. The 'old way' was matching against each element, if
there were repeated users in $csv you would have more results. Hence the
unique users list question. Note the use of where-object, Jacques is right,
this way is more efficient than:

0..7 | foreach-object { if(1, 3, 5, 3 -contains $_) {$_} }

--
Kiron

My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
ignore whitespace when comparing files using compare-object? PowerShell
Re: ignore whitespace when comparing files using compare-object? PowerShell
Compare files and get the differential PowerShell
Compare timespan of Files? .NET General
Comparing files with compare-object PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46