![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks. |
| |||||||
![]() |
| |
| | #1 (permalink) |
| | Getting access to text in a web page I have been able to get powershell to automate the logon to a web page. Now that i have done this, i have a valid variable with its document object. $document = $ie.Document I would like to search for text in the page, say "Hello Daren Daigle" I know for a fact that they are using frames of some sort in the application but i am unsure how to search for the expected text to either return a string that i can test or a boolean that says it has or has not been found. |
My System Specs![]() |
| | #2 (permalink) |
| | Re: Getting access to text in a web page Daren Daigle wrote: > I have been able to get powershell to automate the logon to a web page. > > Now that i have done this, i have a valid variable with its document object. > > $document = $ie.Document > > I would like to search for text in the page, say "Hello Daren Daigle" > > I know for a fact that they are using frames of some sort in the application > but i am unsure how to search for the expected text to either return a string > that i can test or a boolean that says it has or has not been found. Lee Holmes as an example on his blog that does "good ol’ screen scraping" : http://www.leeholmes.com/blog/PermaL...4e1d609f1.aspx Greetings /\/\o\/\/ |
My System Specs![]() |
| | #3 (permalink) |
| | Re: Getting access to text in a web page Great example, the issue i have is that i am attempting to create an email validator that tells me that i can log into a specific website and thereby get back personalized text in that browser. So it may be the wrong place to ask but i am trying to get at some com method or property that either returns the string or does the search for me. If this is indeed the wrong group to use, please direct me to one that can help me. "/\\/\\o\\/\\/" wrote: > Daren Daigle wrote: > > I have been able to get powershell to automate the logon to a web page. > > > > Now that i have done this, i have a valid variable with its document object. > > > > $document = $ie.Document > > > > I would like to search for text in the page, say "Hello Daren Daigle" > > > > I know for a fact that they are using frames of some sort in the application > > but i am unsure how to search for the expected text to either return a string > > that i can test or a boolean that says it has or has not been found. > > Lee Holmes as an example on his blog that does "good ol’ screen scraping" : > > http://www.leeholmes.com/blog/PermaL...4e1d609f1.aspx > > Greetings /\/\o\/\/ > |
My System Specs![]() |
| | #4 (permalink) |
| | Re: Getting access to text in a web page "Daren Daigle" <DarenDaigle@discussions.microsoft.com> wrote in message news:A16B63BD-0370-4A68-805C-44DA0A8C11F2@microsoft.com... >I have been able to get powershell to automate the logon to a web page. > > Now that i have done this, i have a valid variable with its document > object. > > $document = $ie.Document > I would like to search for text in the page, say "Hello Daren Daigle" COM HtmlDocument use has really been atrocious since late last year. I haven't yet gotten an answer for why so many COM calls fail, but based on the fact that the VB.NET GetObject fails in the same places, I believe it is due to some issues with referencing contained COM objects. When you use a document in IE, you're actually working through an IDispatch interface which I think causes the problems - a qualified developer comment would be helpful. ![]() Here's a quick walkthrough of what I've done in the past, and where it breaks now. You CAN do what you want to do, I think, but all I've been able to make work from PS is accessing literal values - not getting access to objects within a page, sad to say. The basics work nicely - getting an InternetExplorer Application object, then navigating it and waiting for the page to settle down: $ie = New-Object -ComObject InternetExplorer.Application; $url = 'http://www.google.com' $ie.Navigate($url); while ($ie.Document.ReadyState -ne 'complete'){ Start-Sleep -Milliseconds 50}; At this point, you CAN get $ie.Document and $ie.Document.body, and $ie.Document.body.innerText. You can even do this: $doc = $ie.Document $doc.body.innerText However, I have trouble invoking any methods that actually query or filter the document - for example, the Document methods getElementsByTagName() and getElementsByName() - which DID work on $ie objects created in an earlier PS beta - don't work for me right now. > > I know for a fact that they are using frames of some sort in the > application > but i am unsure how to search for the expected text to either return a > string > that i can test or a boolean that says it has or has not been found. |
My System Specs![]() |
| | #5 (permalink) |
| | Re: Getting access to text in a web page Well, it works differently when i do what you suggest, but i still cannot get access to any of the body of the text from any frame. Is there a program out there that can help me browse the active com object and look at the properties and collections so i figure out where IE is actually storing what i need? "Alex K. Angelopoulos [MVP]" wrote: > "Daren Daigle" <DarenDaigle@discussions.microsoft.com> wrote in message > news:A16B63BD-0370-4A68-805C-44DA0A8C11F2@microsoft.com... > >I have been able to get powershell to automate the logon to a web page. > > > > Now that i have done this, i have a valid variable with its document > > object. > > > > $document = $ie.Document > > I would like to search for text in the page, say "Hello Daren Daigle" > > COM HtmlDocument use has really been atrocious since late last year. I > haven't yet gotten an answer for why so many COM calls fail, but based on > the fact that the VB.NET GetObject fails in the same places, I believe it is > due to some issues with referencing contained COM objects. When you use a > document in IE, you're actually working through an IDispatch interface which > I think causes the problems - a qualified developer comment would be > helpful. ![]() > > Here's a quick walkthrough of what I've done in the past, and where it > breaks now. You CAN do what you want to do, I think, but all I've been able > to make work from PS is accessing literal values - not getting access to > objects within a page, sad to say. > > The basics work nicely - getting an InternetExplorer Application object, > then navigating it and waiting for the page to settle down: > > $ie = New-Object -ComObject InternetExplorer.Application; > $url = 'http://www.google.com' > $ie.Navigate($url); > while ($ie.Document.ReadyState -ne 'complete'){ > Start-Sleep -Milliseconds 50}; > > At this point, you CAN get $ie.Document and $ie.Document.body, and > $ie.Document.body.innerText. You can even do this: > $doc = $ie.Document > $doc.body.innerText > > However, I have trouble invoking any methods that actually query or filter > the document - for example, the Document methods getElementsByTagName() and > getElementsByName() - which DID work on $ie objects created in an earlier PS > beta - don't work for me right now. > > > > > > I know for a fact that they are using frames of some sort in the > > application > > but i am unsure how to search for the expected text to either return a > > string > > that i can test or a boolean that says it has or has not been found. > > > |
My System Specs![]() |
| | #6 (permalink) |
| | Re: Getting access to text in a web page "Daren Daigle" <DarenDaigle@discussions.microsoft.com> wrote in message news:2DBA7D5A-A307-4CFA-B3F6-905DF01094DF@microsoft.com... > Well, it works differently when i do what you suggest, but i still cannot > get > access to any of the body of the text from any frame. Nor I, from PS. The following DOES work as WSH-hosted VBScript: Set ie = CreateObject("InternetExplorer.Application") ie.visible = True ie.Navigate2("file://C:\Documents and Settings\aka\Desktop\index.htm") Do while ie.ReadyState <> 4: WScript.Sleep 20: Loop Set frames = ie.document.frames length = frames.length for i = 0 to frames.length -1 WScript.Echo frames(i).document.body.innerText Next > ... Is there a program out > there that can help me browse the active com object and look at the > properties and collections so i figure out where IE is actually storing > what > i need? You can use a type library viewer such as the one embedded in the VBA Editor for Office apps, or TLViewer. Hypothetically, the same process as shown above should work in PS, but it appears to be running afoul of some object typing issues between .NET and COM. > "Alex K. Angelopoulos [MVP]" wrote: > >> "Daren Daigle" <DarenDaigle@discussions.microsoft.com> wrote in message >> news:A16B63BD-0370-4A68-805C-44DA0A8C11F2@microsoft.com... >> >I have been able to get powershell to automate the logon to a web page. >> > >> > Now that i have done this, i have a valid variable with its document >> > object. >> > >> > $document = $ie.Document >> > I would like to search for text in the page, say "Hello Daren Daigle" >> >> COM HtmlDocument use has really been atrocious since late last year. I >> haven't yet gotten an answer for why so many COM calls fail, but based on >> the fact that the VB.NET GetObject fails in the same places, I believe it >> is >> due to some issues with referencing contained COM objects. When you use a >> document in IE, you're actually working through an IDispatch interface >> which >> I think causes the problems - a qualified developer comment would be >> helpful. ![]() >> >> Here's a quick walkthrough of what I've done in the past, and where it >> breaks now. You CAN do what you want to do, I think, but all I've been >> able >> to make work from PS is accessing literal values - not getting access to >> objects within a page, sad to say. >> >> The basics work nicely - getting an InternetExplorer Application object, >> then navigating it and waiting for the page to settle down: >> >> $ie = New-Object -ComObject InternetExplorer.Application; >> $url = 'http://www.google.com' >> $ie.Navigate($url); >> while ($ie.Document.ReadyState -ne 'complete'){ >> Start-Sleep -Milliseconds 50}; >> >> At this point, you CAN get $ie.Document and $ie.Document.body, and >> $ie.Document.body.innerText. You can even do this: >> $doc = $ie.Document >> $doc.body.innerText >> >> However, I have trouble invoking any methods that actually query or >> filter >> the document - for example, the Document methods getElementsByTagName() >> and >> getElementsByName() - which DID work on $ie objects created in an earlier >> PS >> beta - don't work for me right now. >> >> >> > >> > I know for a fact that they are using frames of some sort in the >> > application >> > but i am unsure how to search for the expected text to either return a >> > string >> > that i can test or a boolean that says it has or has not been found. >> >> >> |
My System Specs![]() |
![]() |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Forum | |||
| Re: Navigate to Page - Fill in Input Text Box | VB Script | |||
| Cannot access the admin page | Virtual Server | |||
| How get full page of text when reading email in Windows Mail? | Vista mail | |||
| Cannot access printer via web page | Vista General | |||
| Copying web page text & Image to mail? | Vista General | |||