Windows Vista Forums

Reading PDF file

  1. #1


    Codeblack Guest

    Reading PDF file

    Does any one know how to read a pdf file and search for text within the pdf.
    Any inputs will be greatly appreciated.


      My System SpecsSystem Spec

  2.   


  3. #2


    Codeblack Guest

    RE: Reading PDF file

    Anyone in this forum can help me.

      My System SpecsSystem Spec

  4. #3


    Al Dunbar Guest

    Re: Reading PDF file


    "Codeblack" <Codeblack@xxxxxx> wrote in message
    news:64301878-37E2-4857-9AE8-F5812DE672DA@xxxxxx

    > Anyone in this forum can help me.
    Judging from the responses to date, apparently, many of us cannot help you.

    VBScript's file system object has a difficult time with anything other than
    text files. You will either need to determine the details of the format and
    write your own interface, or find a document object model for pdf's.
    Unfortunately, googling ["document object model" "portable document format"]
    seems to find information about document object models for htnl, dhtml,
    word, and etc, all presented in pdf format. I checked the adobe site, and
    could not find anything helpful there, other than adobe acrobat itself. It
    could be that the full acrobat package provides what you need, but possibly
    not.

    /Al



      My System SpecsSystem Spec

  5. #4


    David Kerber Guest

    Re: Reading PDF file

    In article <195FFDF9-7A3E-4FCD-8D56-BC2F454975D4@xxxxxx>,
    Codeblack@xxxxxx says...

    > Does any one know how to read a pdf file and search for text within the pdf.
    > Any inputs will be greatly appreciated.
    A .pdf is just a text file with some mark-up elements, so you can search
    for contained text just like you would a .html or .txt file.

    --
    /~\ The ASCII
    \ / Ribbon Campaign
    X Against HTML
    / \ Email!

    Remove the ns_ from if replying by e-mail (but keep posts in the
    newsgroups if possible).


      My System SpecsSystem Spec

  6. #5


    gimme_this_gimme_that Guest

    Re: Reading PDF file

    Long shot ....

    In Excel:

    Go into the VBA IDE (Alt-F11)
    Go into Tools->References
    Check all the Adobe Libraries

    I have:
    Adobe Acrobat 7.0 Browser Control Type Library 1.0
    Adobe Acrobat 7.0 Type Library


    Go into Object Browser

    See if you can get a VBA Sub going that looks like this:

    Sub SearchPDF()
    Set a = New AcroAVDoc
    a.Open("C:\mypdf.pdf")
    Set ln = New Long(1)
    b = a.FindText("SearchTextString",ln,ln) 'b is a boolean
    MsgBox CStr(b)
    End Sub

    *IF* you ever get that to work - the arguments to FindText are
    undocumented - the next step is to translate this into VBScript -

    Someone might be able to help you here with another post.
    You'd need to convert this VBA:

    Set a = New AcroAVDoc

    'into VBScript that might look like this:

    Set a = CreateObject("AcroAVDoc")
    Set a = CreateObject("Adobe Acrobat 7.0")

    YMMV

      My System SpecsSystem Spec

  7. #6


    Al Dunbar Guest

    Re: Reading PDF file


    "David Kerber" <ns_dkerber@xxxxxx_WarrenRogersAssociates.com> wrote in message
    news:MPG.23f665d27256871989ce2@xxxxxx

    > In article <195FFDF9-7A3E-4FCD-8D56-BC2F454975D4@xxxxxx>,
    > Codeblack@xxxxxx says...

    >> Does any one know how to read a pdf file and search for text within the
    >> pdf.
    >> Any inputs will be greatly appreciated.
    >
    > A .pdf is just a text file with some mark-up elements
    Not the one I just renamed as .txt and opened in notepad...

    /Al

    > so you can search
    > for contained text just like you would a .html or .txt file.
    >
    > --
    > /~\ The ASCII
    > \ / Ribbon Campaign
    > X Against HTML
    > / \ Email!
    >
    > Remove the ns_ from if replying by e-mail (but keep posts in the
    > newsgroups if possible).
    >


      My System SpecsSystem Spec

  8. #7


    gimme_this_gimme_that Guest

    Re: Reading PDF file

    Opps. I forget to tell you what to do once you get to object browser.

    You probably figured that out...

    In the VBA IDE select View->Object Browser

    In the drop down in the middle of the page where it says <All
    Libraries> select Acrobat

    Peruse the objects.

    For example, click AcroAVDoc - and you see the method FindText.

      My System SpecsSystem Spec

  9. #8


    Tom Lavedas Guest

    Re: Reading PDF file

    On Feb 9, 1:25*pm, "Al Dunbar" <aland...@xxxxxx> wrote:

    > "David Kerber" <ns_dkerber@xxxxxx_WarrenRogersAssociates.com> wrote in message
    >
    > news:MPG.23f665d27256871989ce2@xxxxxx
    >

    > > In article <195FFDF9-7A3E-4FCD-8D56-BC2F45497...@xxxxxx>,
    > > Codebl...@xxxxxx says...

    > >> Does any one know how to read a pdf file and search for text within the
    > >> pdf.
    > >> Any inputs will be greatly appreciated.
    >

    > > A .pdf is just a text file with some mark-up elements
    >
    > Not the one I just renamed as .txt and opened in notepad...
    >
    > /Al
    >

    > > * *so you can search
    > > for contained text just like you would a .html or .txt file.
    >

    > > --
    > > /~\ The ASCII
    > > \ / Ribbon Campaign
    > > X *Against HTML
    > > / \ Email!
    >

    > > Remove the ns_ from if replying by e-mail (but keep posts in the
    > > newsgroups if possible).
    Later versions of pdf seem to be encoded to keep that from happening,
    but I think that's still at the discretion of the creator. That is,
    some are and some aren't searchable. Clearly, the scanner documents
    in pdf format are unsearchable, since they are image based.

    Tom Lavedas
    ***********
    http://there.is.no.more/tglbatch/


      My System SpecsSystem Spec

  10. #9


    Tom Lavedas Guest

    Re: Reading PDF file

    On Feb 9, 1:19*pm, "gimme_this_gimme_t...@xxxxxx"
    <gimme_this_gimme_t...@xxxxxx> wrote:

    > Long shot ....
    >
    > In Excel:
    >
    > Go into the VBA IDE (Alt-F11)
    > Go into Tools->References
    > Check all the Adobe Libraries
    >
    > I have:
    > Adobe Acrobat 7.0 Browser Control Type Library 1.0
    > Adobe Acrobat 7.0 Type Library
    >
    > Go into Object Browser
    >
    > See if you can get a VBA Sub going that looks like this:
    >
    > Sub SearchPDF()
    > Set a = New AcroAVDoc
    > a.Open("C:\mypdf.pdf")
    > Set ln = New Long(1)
    > b = *a.FindText("SearchTextString",ln,ln) 'b is a boolean
    > MsgBox CStr(b)
    > End Sub
    >
    > *IF* you ever get that to work - the arguments to FindText are
    > undocumented - the next step is to translate this into VBScript -
    >
    > Someone might be able to help you here with another post.
    > You'd need to convert this VBA:
    >
    > Set a = New AcroAVDoc
    >
    > 'into VBScript that might look like this:
    >
    > Set a = CreateObject("AcroAVDoc")
    > Set a = CreateObject("Adobe Acrobat 7.0")
    >
    > YMMV
    The Acrobat controls do not provide a shell of their own, but must be
    hosted by an application, like IE. Gunter Born wrote about this years
    ago. His web site, WSH Bazaar, is no longer maintained, but is still
    out there. See: http://freenet-homepage.de/gborn/WSH.../WSHBazaar.htm.
    In the Newsletter #5, he presents the basics of hosting the Acrobat
    Reader ActiveX in IE and does a lot of manipulations. Unfortunately,
    he does not cover the method you discuss and some of the supporting
    files are missing. Further, if the input arguments must be typed as
    Long, they cannot by implemented in script, since all variables in
    script are of type Variant.

    I looked at the methods that are exposed in all of the Acrobat ActiveX
    libraries on my machine and I cannot find a reference to a FindText
    method. I did this with show hidden objects selected. Where did you
    find a reference to this method?

    Tom Lavedas
    ***********
    http://there.is.no.more/tglbatch/

      My System SpecsSystem Spec

  11. #10


    Paul Randall Guest

    Re: Reading PDF file


    "Tom Lavedas" <tglbatch@xxxxxx> wrote in message
    news:429bac15-3b35-46c4-b82b-15d2b0cc23ef@xxxxxx
    On Feb 9, 1:25 pm, "Al Dunbar" <aland...@xxxxxx> wrote:

    > "David Kerber" <ns_dkerber@xxxxxx_WarrenRogersAssociates.com> wrote in message
    >
    > news:MPG.23f665d27256871989ce2@xxxxxx
    >

    > > In article <195FFDF9-7A3E-4FCD-8D56-BC2F45497...@xxxxxx>,
    > > Codebl...@xxxxxx says...

    > >> Does any one know how to read a pdf file and search for text within the
    > >> pdf.
    > >> Any inputs will be greatly appreciated.
    >

    > > A .pdf is just a text file with some mark-up elements
    >
    > Not the one I just renamed as .txt and opened in notepad...
    >
    > /Al
    >

    > > so you can search
    > > for contained text just like you would a .html or .txt file.
    >

    > > --
    > > /~\ The ASCII
    > > \ / Ribbon Campaign
    > > X Against HTML
    > > / \ Email!
    >

    > > Remove the ns_ from if replying by e-mail (but keep posts in the
    > > newsgroups if possible).
    Later versions of pdf seem to be encoded to keep that from happening,
    but I think that's still at the discretion of the creator. That is,
    some are and some aren't searchable. Clearly, the scanner documents
    in pdf format are unsearchable, since they are image based.

    Tom Lavedas
    ***********
    http://there.is.no.more/tglbatch/

    ---------------------------------------------
    I think it is way more complex than that.
    Try downloading http://www.sfmta.com/cms/mmaps/documents/47.pdf.

    Looking at the file with NotePad, you will find almost no text that looks
    like street names or Muni route numbers.
    Look at it with Acrobat Reader. The text zooms beautifully (like text font
    size changes, not zooming a bit map).
    Use Acrobat's binocular icon and search for some text, like 9x. It finds 3
    occurrences that are readable on the map. NotePad finds two occurrences,
    but I think these have nothing to do with text '9x'.

    -Paul Randall



      My System SpecsSystem Spec

Page 1 of 2 12 LastLast

Reading PDF file
Similar Threads
Thread Forum
Reading A Text File VB Script
Find hidden file, Reading from txt file VB Script
reading the first column from a file PowerShell
reading last line of file VB Script
error reading file Vista music pictures video