|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Product Reviews WebZinc
Cool import
By: Derek Ferguson
May. 11, 2005 09:00 AM
Shortly before this magazine was launched, I was sent a product announcement for something known as WebZinc. The first thing I noticed about it was that the company producing it, White Cliff Computing Ltd., was in Yorkshire, England. "That can't be a very common place for software companies to be based," I thought to myself. "I'll have to work them into the queue once we get rolling." Three years later, here they are! The delay was due to the year we spent getting ready to produce this magazine. During that time, the CD for the software got buried under a ton of junk in my office, only to surface two years later while I was looking for something else. "Oh," I thought, "guess I'm a bit late on this one." The good news is that the software fit a specific need I had at the moment I re-discovered it. I had a private Web site that I wanted to harvest large amounts of information from. No access to the back-end database was available so I faced "screen scraping" large amounts of data off the site. Thankfully, WebZinc let me drag all the information I wanted off the site in less than 10 lines of code! It started with the install. It was a standard "take all the defaults" installation that I liked. I noticed that the assem-bly for the library wasn't part of the GAC, so I figured it must be in the installation directory, which was in the "Program Files" folder named "WebZinc .NET," and yes, the assembly was right there. This is a perfect installation default, as far as I'm concerned. I created a normal C# Console Application, added a reference to the Assembly, and tried to code against the library without looking at the documentation. The main object, as I quickly discovered, is called WebZinc, and it has a default constructor that accepts the URL of the page you want to begin scraping - a very logical, easily discovered design! At this point, I had to start consulting the documentation a little. It was very straightforward and easy to read though. To fill out the login form on the first page, I just referenced the Form sub-property of the CurrentPage property on my WebZinc object, assigning values to the Value properties of each of the InputFields contained in it. Finally, as I guessed without looking at the documentation, you call the Submit method on the Form and get back another Page. Easy! I carried on in this fashion for another five lines of code or so and, at that point, I had written a single iteration through my target Web site to gather all the info I wanted. All I had to do now was encapsulate my code in a loop, gather up the rest of the pages on the target site - which were in same format, just different content - and write a database back-end to store my results. None of this was directly related to WebZinc, though, so I'll spare you the details. Bottom line: if you are looking to write a piece of software that interacts with a Web site designed for humans in any kind of automated fashion, you must check out WebZinc! You could do the same things with the low-level networking features built directly into .NET, but WebZinc provides a model that's much higher-level and better suited to dealing specifically with Web content. It'll save you days and days of programming if used properly. Product InformationWebZincWhite Cliff Computing Ltd The Grange Tursdale, Durham, DH6 5NU. Phone: 07092 17 18 19 (UK) or +44 1666 511 527 (outside the UK) Fax: 07092 131 141 Sales: support@webzinc.net Price: $199 for the upgrade, $399 for the full license Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||