James S. Huggins' Refrigerator Door: Click here to go to my Home Page. free search services; free web search services; free site search services; atomz freefind searchbutton siteminer whatuseek; free search services; free web search services; free site search services; atomz freefind searchbutton siteminer whatuseek
.
Searching on James S. Huggins' Refrigerator Door - - - Photo of a vertical file cabinet with an open drawer - - - original photo from an unknown source

Searching On My Site

Having trouble finding what you are looking for? Looking for something in particular? Trying to remember where that page was the last time you were here? Just curious if I have anything about a particular topic?

This is your answer. Just type your search words in one of the boxes below and click on the search button.

Why All The Options?

Two reasons:

  1. I want to experiment with different engines and services.
     

  2. I want for visiting webmasters to be able to see how the different services work.

I've put a discussion of these various services, their options and my "reviews" at the bottom of this page.

NB: You will leave my domain when you run any of these searches. You will go to the search engine site.

 


Simple Search PageSimple Search Page: A shorter version of this page. Includes the search forms, but not all the discussion.  (search)

Extended Search PageSearch Services Summary: Has a comparison table summarizing the features of each of these search engines.  (search_services_summary)


Atomz

NB: Atomz has not fully indexed my site pending a change to their system to address the Meta Robots (noindex) tag. It indexed 500 pages and stopped. See the discussion on the extended Search Page for more details. However, a sufficient quantity and diversity has been indexed to show the two different types of results pages.

The Fully Custom version shows what is possible with HTML editing. The Template Configured page was created only by completing forms.

 

Fully Custom Results Page

Search For:
 
Match:  Any word All words exact phrase
Sound-alike matching
Within: 
Show:   results   summaries
Sort by: 

 

Template Configured Results Page

Search For:
 
Match:  Any word All words exact phrase
Sound-alike matching
Within: 
Show:   results   summaries
Sort by: 

 


FreEFind

 


PicoSearch

PicoSearch

 


SearchButton

NB: This is still under development. The Simple Search seems to be working. I'm still experimenting with the Advanced Search and don't quite have it right yet, but they are helping me with it. See the discussion on the extended Search Page for more details. 

Simple Search

 
 
Click the little graphic "searchbutton" immediately above to search The Refrigerator Door using SearchButton.

 

Advanced Search

Click the little graphic "searchbutton" immediately above to search The Refrigerator Door using SearchButton.
Search Options
Search in title:
Date range: through
Query parser: Internet Boolean
Result Options
Show summaries:
Results per page: 5 10 20 50 100
Max docs retrieved: 200 500 1000
Sort on: Score Date

 


SiteMiner

 


Thunderstone

N. B.: Because of the total space limit, Thunderstone will not index my entire site. However, it is included to permit you to see how it works and how it can be configured.

Also, because Thunderstone uses the "values" of the two buttons, these cannot be changed as they have been for all other forms on this page.

Simple Search

Thunderstone
    

 

Advanced Search

Thunderstone
Search for this:
Without this:
    
 
RANKING FACTORS
Rank Factor Importance
Word ordering
Word proximity
Database Frequency
Document Frequency
Position in text
PROXIMITY
line
sentence
paragraph
page

 

WORD FORMS
exact match
Plural & posessives
Any word forms
 
SEARCH HeLP
Using logic operators Regular expressions
Phrase Matching Quantity Searching
Wild Card Searches More Subjects...
Using the Thesaurus

 


SiteMiner

 


whatUseek (IntraSearch)

Type your search words:

 


Overview

Before I discuss the specific features and options of each engine I am using, I need to present a few overviews.

Which engines

This review presents only free search engines that can be used to provide search services within a website. I am working with each one I can find. If you find one I don't know about, please email me.

Which Service Options

All of these free services also offer paid options. The paid options offer additional features not offered under the free options. At this time, this review does not include any of the paid options and does not explore the differences between the free option and the paid option.

Results Formatting

Free search services present their results in three basic formats:

  • Configuration  

  • Wrapper  

  • Complete Control

Results Configuration

Under the Configuration option, you can control general colors, layouts, fonts, and maybe include a logo of your site. You might be able to specify whether the links open in the same window or a new window. All the services offer this level of results formatting.

Results Wrapper

Under the Wrapper option, you get everything you got with Configuration, plus you can specify HTML to "wrap" around the results. This lets you construct a page that "looks like" your site and put the Configured results onto your page.

Note that sites offering a Results Wrapper do so as an option. You can always use the simpler Configuration.

Complete Results Control

Under the Complete Control option, you get the ability to completely control the presentation of results. This is typically done by specifying the HTML for the results page and including special tags to control the formatting of the results.

Note that sites offering Complete Control do so as an option. You can always use simpler solutions such as Configuration or Wrapper.

WhatUseek offers a level of Configuration I have classified as Wrapper Plus. I could also have called it Complete Control Minus. It provides some scripting, but not complete control of the appearance of each result entry.

For the best example, so far, of Complete Control, see the Atomz discussion below.

Search & Display Options

In addition, the services take two different approaches to control the search and display options. These include:

  • maximum number of results to return

  • number of results per page

  • sort order (score/relevancy or date)

  • show or hide the page description

  • show or hide the "context" (see below)

The two primary approaches to control of these options is:

  • Configuration 

  • User Control 

Search Options Configuration

Under the Configuration option, you can configure the options. But you can only configure them once. You can decide, for example, to show 10 results on a page, to show them in order of relevancy and to display the summaries. However, all results pages will adhere to these "rules" and the search user cannot change them.

User Control of Search Options

Under the User Control option, the search user can specify, either through keywords or through search box controls, how to conduct and show a particular search.

Results Display

In displaying the results of a search there are several items that can be displayed:

  • Page Title

  • Page URL

  • Page Description

  • Results Context

  • Matching Score

  • Page Update Date

  • Page Size

Two of the most important are the Description and Context.

Description and Context

HTML pages can include a Meta Tag named Description. This tag is designed to provide a description of the page, particularly for search engines. Search engines can display this description in the search results to help describe the page. If there is no Description Meta Tag, the engine usually displays the beginning of the text on the page.

In addition to the description, Context can also be displayed. This information (also called Results Context) shows the search words that were found "in context". That is, it shows the portion of the page that caused the page to be selected, including the search words. The search words are often highlighted (e.g., bolded).

What To Index, What to Score and What To Search

Pages consist of many parts. For example, there are:

  • Body of text

  • Title

  • Description

  • Keywords

  • Alt descriptions for graphics

  • URL

The questions for an engine include:

  • Which of these should be indexed?

  • Which should "count" more?

  • Which should be searched?

Some engines let the webmaster control which parts will be indexed and how to score words in different parts. (For example, a word in the title might "count" more than a word in the body of text.)

And some engines let the user who is searching, search a particular "part". (For example, the user could search only titles or only URLs for a particular word.)

Language Support

Searching a site is more than just doing an old fashioned text match. Today, it involves understanding language subtleties. If you search for "hire" you'd also like to find "hiring". If you search for "tree" you'd also like to find "trees". The better you try to make this, the more you need to know about the language. If your site is written in another language, this might be important.

Page Counting Problems

One of the problems of the free engines is how they handle page limits. My site includes hundreds of links to pages that should not be indexed. These pages include the "noindex" meta tag. It also includes tens of thousands of offsite links. So when an engine says it will handle up to 500 or 1000 or 2000 or 5000 pages, the question is "How does it count?" Does it count the pages it doesn't index?

Searchbutton might (and I emphasize might) be counting some of the pages that they do not (or should not) index. We are still investigating.

Atomz did when I first encountered them. But they fixed it. See the discussion below for more info on how they responded quickly and affirmatively to this issue. Kudos to them.

Robots.txt, Noindex and Partial Noindex

There are two "standard" ways on the web to tell a robot (also called spiders and crawlers) not to index your page. These are both discussed on the Robots exclusion Page

The first is the robots.txt file. It can be used to specify directories that should not be indexed. It can also be used to apply these non-indexing requests to specific robots.

The second is the Robots Meta Tag meta tag. This appears in the head of a particular page that should not be indexed. It contains the noindex option. It may also contain the nofollow option. These tell the robot not to index the page and/or not to follow any links on the page.

In addition some indexing services have added proprietary extensions to permit a webmaster to specify a portion of a page that should not be indexed. I call this Partial Noindex. Although it doesn't appear to be a formal standard, a common approach is to delimit the section with <noindex></noindex>. If I just indicate "Yes" (they support Partial Noindex) that means they use this "standard". A few have invented their own protocols. If they take a different approach, I'll tell you.

(I cannot find many links for the use of <noindex></noindex>. One I did find is at the site of the ht://Dig Open Source Search engine on their FAQ Page.

Branding

These services are free. There are only a couple of ways for a site to offer free services.

  1. Offer them free and pay for them with advertisements. They might also offer a paid version without advertisements.
     

  2. Offer them free and use the free services as a way of branding: creating a market position to enable them to sell "bigger" services (e.g., with higher page limits).

I indicate in my summary and in the reviews which branding approach the service has chosen: Ads or Logos.

One insidious nature of banner advertising and hosted search engines is that the ad company may use the keywords to target advertising, and this may lead to privacy issues as ad banner companies associate searched words with cookie-based identification of the site's visitors. Webmasters need to carefully evaluate the ad policies of such services.

What I've Learned

This has been an exciting learning experience for me. I know these things. I preach them in my consulting and professional speaking, but this experience has proved that they are true.

  • You and I can make a difference. It happens all the time to me but I'm amazed every time. A comment or a suggestion becomes a change. The next time you think someone could do it better, tell them. And tell them how.
     

  • Our customers have good ideas. The firms that listen to their customers will have people who will tell them what they need to know and will tell them for free. The next time your customer calls with a complaint, help them make it an idea. Then recognize that the customer has given you a gift a consultant would have charged you for.
     

  • Customer service isn't dead. The firms on this page are examples. They have listened, they have worked to understand and they have acted. If you aren't doing it, you are losing.

The engines

I am using, or I am trying to use, several engines. These include:

  • Atomz

  • Freefind

  • PicoSearch

  • Searchbutton

  • SiteMiner

  • whatUseek

I have their complete reviews in alphabetical order below. I also have a table that summarizes the results.

Atomz

Results Formatting Complete Control
(the most complete control of any service tested so far)
Search Options User Control
(provided both through search syntax and also through buttons on the search form)
Page Limit 500
robots.txt Yes
Robots Meta (noindex) Yes
Partial Noindex Yes
Ads or Logos Logo
Results Context Yes
Languages English + 14 others
File Formats html txt pdf flash

So far, this is the most customizable service I have encountered. 

It provides both Complete Control of results formatting and full User Control of search options.

In addition to supporting one of my "standard" pages as the "shell", the details of the results are completely customizable using their scripting language. And, the search options are the most complete and flexible of any service I've encountered.

For example, I am able to:

  • Include my red triangle graphic before each result, and hyperlink it to the referenced page,
     

  • Use one font/color for the hyperlinked name of the page and another for the description
     

  • Include both the description and the context
     

  • At my option, show the score, the URL, the date changed and the size with complete ability to format the appearance.
     

  • Have absolute, complete control over placement.

In other words, they provide completely customizable results.

As far as search options, they support:

  • Searching for any word, all words, phrases
     

  • Searching within specific "parts" of the pages (e.g., titles, description)
     

  • Sound-alike matching
     

  • Showing/Not Showing summary information
     

  • Controlling number of results displayed
     

  • Sorting by relevancy or by date

The most severe limitation of their free offering is that it will only index 500 pages. And for most sites, this is not a severe limitation at all. Consider all the other features.

When I began experimenting with them I had two issues with how they processed the Robots Meta Tag (noindex).

  1. They began indexing the page and "stopped" when they encountered the tag. This could cause the first part of the page to be indexed (e.g., the Title) if the Robots Meta Tag (noindex) was not the first tag in the Head.
     

  2. They counted such pages against the limit.

But the day after I wrote them they wrote me back and said:

The good news is that after receiving your email, we have discussed this topic in length and have decided to change how our search robot works in this area. Along with the page counting fix I mentioned above, we are also changing how our search robot processes the robots meta tag. We have decided that if our search robot encounters a robots noindex tag in the header, in any position, we will completely ignore that page. This fix is being reviewed by our QA team and should also be available in 1-2 days.

In other words, they listened and changed their routine to address both issues. Kudos to them. (www.atomz.com)

FreEFind

Results Formatting Unknown -- Still Investigating
Search Options Unknown -- Still Investigating
Page Limit 32MB default
(can be increased by request and justification)
robots.txt Yes
Robots Meta (noindex) No
Partial Noindex Yes
(Uses special protocol; see below)
Ads or Logos Ads
Results Context Unknown -- Still Investigating
Languages Unknown -- Still Investigating
File Formats Unknown -- Still Investigating

I am working with FreEFind. My first attempt failed because:

  • They have a limit on the total size of the site they will index (32MB), and
     

  • They do not honor the Robots Meta (noindex) meta tag (www.freefind.com/faq.html#faq9

Their refusal to honor the "noindex" tag is interesting because they provide their own proprietary substitute. If, instead of using the web standard "noindex", you are willing to use their "No Index" comment (www.freefind.com/faq.html#faq10) you will get the same result. In fact, they even have a proprietary extension (www.freefind.com/faq.html#faq11) that will inhibit indexing "part" of a page. 

They do honor the robots.txt file. 

I am working both to extend the size permitted for my site and to add a robots.txt file as a further test. 

They wrote back to indicate that they have increased the permitted size for my site and would respider once the increase became effective. They also said

Your suggestion for honoring the noindex and nofollow metatags is a very good one - I'll add it to our "idea box".

After they increased the size allowed, they still attempted to index over 1000 pages and failed. I'm writing them again to work through this.

I'll keep you posted. (www.FreEFind.com

PicoSearch

Results Formatting Configured
(your logo, their logo, background color, font face, size and colors, results language [Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, English, estonian, French, German, Hungarian, Icelandic, Indonesian, Italian, Norwegian, Polish, Portugal, Slovenian, Spanish, Swedish, Turkish], link [same page or new page], show/hide description, show/hide context)
Search Options Configured
(exclude directories, index page bodies?, index page titles?, index meta descriptions?, index meta keywords?, index image alts?, remove duplicate pages? )
Page Limit 1500
robots.txt Yes
Robots Meta (noindex) Yes
Partial Noindex Yes -- Proprietary extension
(<nosearchstart><nosearchend>)
Ads or Logos Logo
Results Context Yes
Languages English + 20 others
File Formats html, txt, mp3 tags, flash tags

This service, being one of the last I added, went very smoothly. Part is the simplicity of their offering. Part is that I got better at it. It doesn't offer (in the free option) as much control over the results display. But it will index many pages. 

My inquiries about options and services were answered promptly and completely.

Searchbutton

Results Formatting Wrapper
( results inserted into your template page, font face, size and colors, show/hide description, show/hide context )
Search Options Configuration
(User Control being tested)
Page Limit 1000
robots.txt Yes
Robots Meta (noindex) Yes
Partial Noindex Yes
Ads or Logos Ads
Results Context No
Languages Unknown -- Still Investigating
File Formats Unknown -- Still Investigating
Other Page Count Issues

It seemed to be taking forever to index my site. When I wrote to ask why it was taking so long, they promptly wrote back to say that they only index every 12 hours. (To be fair, their "Getting Started" section says they will advise me "Within one business day (often sooner)". I just hadn't read that section of their site. I skipped it (hey, I'm a male; I don't have to read directions) and I went straight to the signup. I have recommended that they add their schedule more prominently on their site and also in their initial welcome letter.)

I have not yet tested robots.txt compliance but take their word for it.

For some reason the initial index indexed fewer than the expected number of pages on my site. They are working closely with me to identify and resolve this issue.They have been very responsive and helpful so far.

On the plus side, their support has been excellent. Throughout the easter weekend (when I started this crazy idea) I received prompt answers to rather picky questions. Although all the issues aren't yet resolved, I'm impressed by their support.

I am still working to configure my results page and will report back as soon as I've completed that.  (www.Searchbutton.com

SiteMiner

Results Formatting Configuration
Search Options Configuration
Page Limit Unknown -- Still Investigating
robots.txt Unknown -- Still Investigating
Robots Meta (noindex) Unknown -- Still Investigating
Partial Noindex Unknown -- Still Investigating
Ads or Logos Ads
Results Context Unknown -- Still Investigating
Languages English + some
File Formats Unknown -- Still Investigating

Part of MyComputer.com.

(www.SiteMiner.com

Thunderstone

Results Formatting Configuration
Search Options User Control
Page Limit 5,000 pages; 10,000,000 bytes, 100,000 bytes per page
robots.txt Yes
Robots Meta (noindex) No
Partial Noindex No
Ads or Logos Logo
Results Context Yes
Languages English Only
File Formats html txt

Thunderstone is an independent R&D company focusing on information retrieval and document management problems for over 19 years. They claim that more Internet searches are conducted by our software on a daily basis than any other available package.

They primarily sell their software. But they provide a limited version of their Webinator service as an offsite search engine. They also offer a free version of the Webinator for download.

The limits on their offsite service include:

  • Only top level sites are indexed.
    Yes: http://www.mysite.com/
    No: http://www.myisp.com/mysite/
    No: http://www.myisp.com/~username/
     
  • Your EMAIL address must be in the same domain as the site you are indexing
     
  • It will not index more than 5,000 pages (or 10,000,000 bytes) of content. And will truncate individual pages larger than 100,000 bytes.
     
  • Indexes that are unused for 5 days will be deleted.
     
  • You must wait 8 hours between reindexing.

Results customization permits changes to:

  • The page "before" the results (including page title, and other Head information
     
  • The page "after" the results

The 8 hours limit on reindexing is not a major issue; but it does make my "experimentation" a bit more tedious.

Their maintenance process is also interesting. You sign in with your email address and your website URL. Then they email you the password to continue. And you need to get a new password for each access. It is the only site I've ever encountered using this technique.

(www.Thunderstone.com)

whatUseek (IntraSearch)

Results Formatting Wrapper Plus
(More than a Wrapper, includes some scripting but not complete control; see below)
Search Options Configuration
Page Limit 1000
robots.txt Unknown -- Still Investigating
Robots Meta (noindex) Yes
Partial Noindex Unknown -- Still Investigating
Ads or Logos Ads
Results Context No
Languages Unknown -- Still Investigating
File Formats Unknown -- Still Investigating

This formatting is more than just a Wrapper. It provides some scripting. However, it is not Complete because it does not provide full scripting. I do not have control over the exact display of each result.

It also does not seem to offer any user control of search options. (intra.whatUseek.com)

The extra text menu links (previously here) are being removed in the site redesign.
Browser and search engine improvements have eliminated the motivation/necessity for them.

This page created:
before
Wed, 16.Aug.2000

Last updated:
16:17, Sat, 10.May.2014

. . .

NOTICE --- SITE  UNDERGOING REWRITE - SEE LINK BELOW FOR DETAILS

 Explanation of the rewrite: New Page Layout.
 Check out my blog: My Ephemerae
 Yes ... I want you to link to my site Please link to me
 Want to email me? I'd love to hear from you.
 I have begun tutoring in the South Houston, Texas area.

. . .
free search services; free web search services; free site search services; atomz freefind searchbutton siteminer whatuseek; free search services; free web search services; free site search services; atomz freefind searchbutton siteminer whatuseek . . . free search services; free web search services; free site search services; atomz freefind searchbutton siteminer whatuseek; free search services; free web search services; free site search services; atomz freefind searchbutton siteminer whatuseek