Duplicate Picture Finder

ByLine
Finding Duplicate images has never been easier
Synopsis
Duplicate Picture Finder scans any given set of folders and attempts to locate duplicate images for you. But what makes this tool different from all the others out there, is that it does exactly what you would do - it uses visual comparison to determine whether images are the same, enabling it to locate images that different sizes but are visually identical.
Duplicate Picture Finder
How many pictures do you have on your computer? And how often have you run into the problem where you are browsing your picture library, and you discover duplicates? Needless to say, the task of browsing image galleries and manually deleting doubles is problematic at best. The ability to embed meta-tag data into images does help somewhat, but the task of eradicating duplicates remains an ardous task.

This is where Duplicate Picture Finder comes in handy. It scans any given set of folders and attempts to locate duplicate images for you. But what makes this tool different from all the others out there, is that it does exactly what you would do - it uses visual comparison to determine whether images are the same, enabling it to locate images that different sizes but are visually identical.

Download
Duplicate Picture Finder is exclusivly available at the Vista Forums, and is available for download here. Just download and unzip the file to a folder of your choice, and run the DPF.EXE file. No installation is required, and because the are no runtime support files needed, it can be run also from a flash drive.

Download: View attachment dpf.zip


The Main Screen
When you run DPF, you will be presented with the main window, from which you can select various options prior to scanning an image library for duplicates.
Image1.jpg


Scanning a Single Folder
To get a basic idea of how DPF works, the following step-by-step guide will show you how to select and scan a single folder for duplicate images. Don't concern yourself right now with what each section of the program does, as these will be explained in more detail a little later in Advanced Scanning.

1) Start DPF by double-clicking it's icon in the folder where you extracted the contents of the download.
2) Click the "Defaults" button in the lower-left corner of the main DPF window.
3) Remove the check from "Enable this list" in "Other Folders".
4) Remove the check from "Include this folder" for "Folder 2".
5) In "Folder 1", browse to and select a folder containing images, usually you "Pictures" folder, or a subfolder. For the purposes of this demonstration, try to restrict you folder selection to a folder containing approximately 2000 images, or less.
6) Click "Begin". Depending on the number of images found in the selected folder, this procedure could take anything from a few minutes to several hours to complete. You can pause the scan at any time by clicking the Pause button.
7) When DPF has completed it's scans, you will be presented with a message staing the number of files found, and how many duplicates exist. Click OK to continue.
8) The DPF window will switch to the Results view, where you can browse and modify the results. "Original" images are marked with a green check ( Image3.jpg ), while possible duplicates are marked with a crossed-out exclamation point ( Image2.jpg ).
9) Selecting a different file in the "Original Images" list will present you with a list of "Possible Duplicates" for that file. You can use the "Preserve" and "Discard" buttons to change the status of an image.
10) When you have finished scanning the results, click the "Finish" button. You will be asked whether you want to delete the duplicate files to the recycle bin.


Advanced Scanning
In Scanning a Single Folder above, you follow the basic procedures needed to performs a duplicates scan on a single folder. In this section, I'll explain the advanced options of DPF, such as how to setup criteria to mark specific files as duplicates, or how to to create a group of folders that you want to compare with others.

Scanning Two Folders
It's really easy to scan and compare the contents of two different folders. Just select the appropriate folders in the "Folder 1" and "Folder 2" trees, and enable them with their associated "Include this folder" and "Include Subfolders" options. Disable the "Other Folders" list by removing the check mark from "Enable this List"

Once you have selected the appropriate folders, click begin to start scanning.

Creating A Group of Folders to scan
Sometimes even scanning two folders is not enough. The "Other Folders" list represents all additional folders that you want to include in a duplicates scan, and can contain as many folders as you like. Using the "Other Folders" list, you can create a group of folders that can be included, regardless of which other folders are selected in "Folder 1" and "Folder 2":
Image4.jpg

I regularly download images from the internet, some of which are categorized, while other are not. I've long since gotten into the habit of saving these images into a specific folder, but this isn't always possible. When I cannot categorize an image, it usually goes into a particular "Miscellaneous" folder. In order to always include my "Miscellaneous" folder in duplicate scans, I selected it in "Folder 1" (or "Folder 2") and clicked the "Add Folders" button, to add it to the "Other Folders" list, including subfolders. I did this with several other folders also.

Now, whenever I do save a new image into an appropriately categorized folder, I can scan the category folder, along with all "Other Folders" to try and find duplicates that have previously been saved into a "Miscellaneous" folder.

Filtering Scans
The whole point of DPF is to make it easier for you to locate duplicate images. But without some form of filtering, DPF will sometimes select a file as a duplicate when you want to keep it, while preserving the file you want to delete. By default, Duplicate Picture Finder will always tag smaller, low-resolution files to be deleted, allowing you to preserve the high quality images.

But this isn't always enough. The "Discardable Folders and Files" lists allow you to specify a set of criteria under which files will always be tagged as a duplicate that can be deleted:
Image5.jpg

In the above screenshot, you can see the default words for discarable folders and files. Now, whenever DPF completes a scan, and is preparing it's results, any files that match the specified criteria will be tagged as duplicate. In the "Folders" list, if a file path contains any of the words specified, then that file will always be tagged as the duplicate. The same rule applies to files. If any filename contains the words specified in "Files", then that file will be tagged as the duplicate.

The "Discardable Folders & Files" lists can be edited using the appropriate "Edit" button, which will open the window shown in the following screenshot:
Image6.jpg

Advanced Options
The various command options of DPF allow you to fine-tune how it scan for duplicates, and the accuracy of results.

Check Threshold
The Check Threshold determines the accuracy with which images are compared and tagged as duplicates. Any images that fall below the specified threshold will not be tagged.

Scan All Files
If this option is turned on, then all files (including non-image files) will be scanned using a faster binary CRC scan. This is useful for locating duplicates for file types not supported by DPF. Currently, DPF only supports JPG and BMP images.

Do Not Use CRC Scanning
If Check Threshold is set to 100%, then DPF will use the same binary CRC scanning method used by Scan All Files. The one side-effect of this is even if two files are visually identical, but have even a single pixel difference, or different meta tags, binary CRC scanning will not flag them as duplicates.

If you have set Check Threshold to 100%, use the "Do not Use CRC Scanning" to force DPF to use the slower, but more accurate, default Visual Scanning method.

Use Halftone Comparisons
When performing visual scans of images, Duplicate Picture Finder reduces the color depth of images to a standard 16-color palette in order to overcome some sublte differences that may exist in the color palettes of two images, and to remove otherwise redundant information that would skew the results. However, because a lot of image data (particularly image composition and layout) is erased in the process, sometimes files that are not the same wilkl be tagged as being duplicates because their basic composition are nearly the same. In order to maintain a high level of accuracy, the composition of images can be preserved by turning on the "Use Halftone Comparisons".

The following thumbnails shows a standard high-color image, and its conversion to a 16 color palette, both without and with halftoning enabled. The third image clearly still maintains a visual resemblance to the original:

Image8.jpg
Original Image​

Image10.jpg
16 Colors Reduction​

Image9.jpg
16 Colors Reduction with Half-Toning Enabled​

Remove Settings From Registry
If you are running Duplicate Picture Finder from a flash drive on someone elses computer, turning on this option will remove all associated registry entries when DPF is closed, leaving no trace of it ever having been run on the computer.


Fine-Tuning Results
Whatever options you have chosen to use when scanning for duplicates, the results view remains the same. When scanning is complete, you will be presented with a list of "Original" images (marked with a Image3.jpg symbol), each of which will have a unique list of "Possible Duplicates" (marked with a Image2.jpg symbol).

The various commands available to you are as follows:

Preserve
If a file has been tagged as a duplicate, the default action would be to delete the file when the Finish button is clicked. If you want to keep this file, click the "Preserve" button.

Discard
Likewise, a file that you want to have deleted will be marked as an original. Select the file and click the "Discard" button to mark it as a file you want deleted.

Exchange Status
The exchange Status button performs the same function as the "Preserve" and "Discard" buttons, but allows you to swap their status with a single click.

Rename Files
If the duplicates status of any two files is as you want, but you want to keep the filename of the file that is to be deleted, then click the "Rename Files" button to immeditely rename the files.

Exchange & Rename
Using the "Exchange & Rename" button has the same effect as clicking on the "Exchange Status" button, followed by the "Rename Files" button.

Arrow Buttons
The arrow buttons provides a way for you to change the current selection using the mouse, without moving your focus of attention from the command buttons.

Preview
The Preview button opens a preview window where you can perform a manual comparison of two selected images yourself in order to verify the results produced by DPF. The command buttons in the Preview window perform the same functions as those of the main results window.

Save
Because scanning for duplicates can take a long time, you might not want to abandon the results which will be lost if DPF is closed for whatever reason. Click the Save button to save a reference file of the results so that you can reload and check the results at a later time.

Load
Click the Load button to load a set of results that was previously saved using the Save command.

Finish
When you have finished checking the results of a duplicates scan, click the Finish button to send all marked files to the Recycle Bin.


Contributing To DPF Development
Duplicate Picture Finder is a 32-Bit program and currently only supports visual scanning of JPG and BMP files. If you would like to contribute to the further development of DPF, then the following is required, if possible:

1) Delphi code that allows standard TImage and TPicture components to read and manipulate other image formats, such as PNG or GIF.
2) Delphi Code that supports for reading and writing of the same XMP meta tags used by Windows Vista and Photo Gallery.
3) If one exists, a 64-Bit compiler that supports Delphi code files.

In all cases, open-source would be preferable in order to keep DPF free. Also, whatever contributions you may which to make should not rely on third-party support DLLs or ActiveX libraries. DPF must always be allowed to run as a standalone executable without requiring an installer.

You can reach me for comment or contribution by posting to these forums.
 
Last edited by a moderator:

Comments

Top