Add Google Search to Your Site
Create a GoogleBox ASP.NET user control that can add pizzazz to any site.
by Boris Feldman
Technology Toolbox: VB.NET, ASP.NET, XML, .NET Framework
Web services enable you to link disparate systems within an organization or across enterprises. Despite their benefits, the expected avalanche of publicly available Web services that provide functionality to the development community has been slow in coming. Google's recent release of a Web service API for automatic queries using its search engine is a big leap forward. I'll give you an overview of the functionality available through the Google Web service APIs, then show you how to build a GoogleBox ASP.NET user control that you can include on your own site.
Not everyone needs to run automated search engine queries. In fact, you've been able to let your users query Google from within your site for quite a while with only HTML code. In addition, if you need to do automated queries, it has always been possible to use code to pull down results formatted in HTML and then parse them yourself (also known as Web-scraping). However, if the site you're Web-scraping changes its format in any way, chances are your code is going to break. Google's Terms of Service also forbid Web-scraping specifically—perhaps because they don't want other people to parse their search results programmatically without their knowledge—so you shouldn't be doing it anyway.
Besides, with the release of Google's Web service APIs, you don't need to create or maintain such a hack. The Google Web APIs developer's kit is a free download from the Google site and contains everything you need to get started (see Resources). Included are a Web Services Description Language (WSDL) file and some C# sample code to help you on your way.
Web services are cross-platform, and the Google APIs are no exception. If you're using Visual Studio with a third-party language, such as Perl or Python, you can use the Google Web service too. Note that, as of this writing, the Google Web service is a beta product. As such, both the APIs and your access to them can change at any time and without notice.
In addition to downloading the developer's kit, you also need to register your e-mail address with Google. Google then e-mails you a license key you must include as a parameter in all your queries. The key's purpose is to allow Google to know who's accessing its servers and to restrict each user to 1,000 queries per day. Both the developer's kit and the access to the Google servers through the Web service APIs are free for noncommercial use. Make sure to read and abide by the Terms and Conditions established by the company when creating your programs.
Access Google Functionality
You can use the Google Web service to query the Google search engine, get the contents of a Web page from their content cache, or use the company's spell checker. Accessing this functionality is trivial. Simply add the GoogleSearchService.vb file generated by the WSDL utility to your project, add a reference to System.Web.Services.dll, and you're set (see the sidebar, "Convert WSDL to VB.NET"). From here it takes only two lines of code to run a Google query for "Visual Studio":
' create a Google Search object
Dim s As GoogleSearchService = _
New GoogleSearchService()
' invoke the search method
Dim r As GoogleSearchResult = _
s.doGoogleSearch( _
"<key here>","Visual Studio", _
0, 10, False, "",False, "", "", "")
The first line of code creates a reference to the GoogleSearchService object, and the second line runs the query. The doGoogleSearch method takes 10 parameters. At first this might seem overwhelming, but most of the time you'll only need to provide your Google license key and your query string. The rest of the parameters let you refine your query by specifying where in the resultset you'd like to start, the maximum results to return, and various filtering and encoding options. Once the doGoogleSearch method returns, you can iterate the resultElements collection and get the information for each search result.
Here's how to get Web pages from the Google cache once you have a reference to the GoogleSearchService object:
Dim bytes() As Byte = _
s.doGetCachedPage( _
"<key here>", _
"www.fawcette.com/vsm")
This returns a byte array of base64 encoded text for the VSM home page as it was when the Google indexer last visited it.
Finally, the Google Web service APIs also allow you to use the same spell checker Google uses. This call returns a simple string with a single spelling suggestion. For example, assuming you have a reference to the GoogleSearchService object, this code returns "visual":
Dim suggestion As String = _
s.doSpellingSuggestion( _
"<key here>", _
"vusual")
If the word is spelled correctly, or Google doesn't have a suggestion, then it returns Nothing.
Of course, neither accessing the Google cache nor having it provide spelling suggestions is as useful, or as fun, as running queries. One way to take advantage of automated search engine queries is to create what's known as a GoogleBox. This is a small box on your Web page that shows the top few search results for a topic that's interesting to you.
You can use an ASP.NET user control to encapsulate the Web services functionality exposed by Google and tackle aspects of running automated queries in your own code. I'll show you how to do this using VB.NET. (The code project for this article also includes the same project, in C#. Click here to download the code.) Once you create the GoogleBox.ascx and its associated code-behind file, GoogleBox.ascx.vb, using the control on a page is easy (see Listing 1).
Build a GoogleBox User Control
The guts of the user control is the doQuery helper function, which takes the license key and query string specified through the GoogleBox tag and calls the search engine, as I showed you previously (see Listing 2 and Figure 1). The query can be anything you can type into the input box on the Google site. You can do Boolean searches, search for phrases, limit the results to those from a particular site, and more. Look at the documentation provided by Google with the developer's kit for more information on creating query strings.
Once the doGoogleSearch call returns, the code runs through the resultElements collection and formats each item using String.Format and a formatting string provided in the tag. You can customize this formatting string using the GoogleBox tag's FormatString parameter: {0} is the number of the current search result, {1} is the item title, {2} is the item URL, and {3} is a brief "snippet" that describes the link. You're specifying the FormatString in HTML, not in code, so you need to use the """ HTML entity instead of the VB standard two quotes. As the doQuery function formats each result, it appends it to a StringBuilder list. When done, the function returns a string containing all the formatted results.
However, calling Google every time your GoogleBox is shown is not only slow, but it's also a quick way to burn through your daily allotment of queries. Luckily, the ASP.NET Cache object makes it easy for the code to tuck away your query results and reuse them. The page author can choose how long to cache the already formatted result list using your GoogleBox tag's CacheMinutes parameter. Setting a value of zero disables caching. Adding the text to the Cache takes only one line of code:
Cache.Insert(_cacheKey, outText, _
Nothing, DateTime.Now.AddMinutes _
(_cacheMinutes), TimeSpan.Zero, _
CacheItemPriority.Low, Nothing)
The code uses a key to save the cached text, then uses the same key to retrieve it on subsequent requests to the page. The Page_Init event handler generates the cache key by joining the string "googlebox" with the ID of the GoogleBox control specified by the page's author and the directory path to the page the control is on. This should prevent the problem of multiple GoogleBox controls on different pages interfering with one another's saved search results:
_cacheKey = "googlebox-" & _
ID & "-" & Page.Request.Path
However, if controls on different pages are running the same query with the same formatting, you want them to share the cached item. As a result, the code allows the page author to override the generated cache key and specify his or her own using the GoogleBox tag's CacheKey parameter.
Web services can be somewhat unpredictable. The Web server on the remote machine can crash, there could be a connectivity problem, or the data you receive back could be corrupted. Things can break down and errors can happen in many places. As a result, it's important to handle errors gracefully (see Figure 2). .NET's exception handling makes doing this effortless. For example, if you forget to include your Google license key in the GoogleBox tag, the control throws a custom ArgumentException reminding you that it won't work without the key. In addition, the call to the doQuery function in the code is wrapped in a Try Catch block. The code displays the ErrorMessage string to the user if doQuery throws an exception for any reason.
You've seen how to create an ASP.NET user control that uses Web services and takes advantage of the Cache object. You can take this code further and allow users to modify additional parameters of the doGoogleSearch command through the GoogleBox tag. You can also make the GoogleBox interactive and let site visitors step through multiple pages of search results. Customizing the look and feel of the user control to match your site is doable too. Good luck Googling.
About the Author
Boris Feldman is a developer, author, and speaker. Boris' latest book is Microsoft Internet Explorer 5 Web Programming Unleashed (Sams). Boris is also the founder of feldmangroup, a consulting company specializing in Internet, wireless/mobile, and .NET applications. Reach Boris at .
|