Practical Ecommerce

Critique Part Four: Site Search

The Problem: The results aren’t always relevant.

The Fix: A search for “snowboard” returns wax as the top result — but no snowboard until the fourth page of results, even though the site has an entire section dedicated to snowboards. Similarly irrelevant results came back for “bindings” and “longboard.” Relevance is the most critical feature of site search. Daddies Board Shop could significantly improve the relevance of search results by adding weight to the keywords that appear in titles and categories.

The Problem: The search yields results with a jumbled appearance.

The Fix: The results are ordered in a 2×10 grid format and the “add-to-cart” buttons don’t line up, giving the search a fairly jumbled appearance. It would be worthwhile to experiment with a list layout that presented longer descriptions beside each product with search terms in bold. This additional information can help users find what they want. It’s also nice to offer users the option to switch between grid and list views. The click-to-enlarge links bring up the product page, which does display a larger image; still, I doubt this is what the user would expect. Overall, if the images were smaller and arranged more efficiently, more products would appear above the fold.

Other important issues to address: The main features on this search page are the drop-down boxes at the top that allow the visitor to control the number of results shown as well as their order. Unfortunately, when the “sort by rating” option is selected, the ratings aren’t visible.
One standard feature that is missing is the ability to refine the results by category, brand, price, range, truck size, wheel diameter, etc. Further, the search option doesn’t run a spell-checker. The “no results” page contains a link to an advanced search, but showing some popular search terms or additional links to major categories or best sellers would offer significant benefits for the customer. The advanced search feature allows sorting by part number, but again it doesn’t show the part number on the search results page. We also couldn’t see the part numbers on the product details page and they didn’t seem to be searchable.

General advice: One of the best starting points from which to improve your search is to look at your search logs. Focus on the most popular search terms. Hand-tailor the pages for those keywords to ensure relevant results and offers on those pages. This is something that should be done continually, because search terms are never-ending in their variety and change constantly.

On a positive note: The layout of the search results on the Daddies Board Shop site is better than a lot of searches we see — but it could still benefit from significant improvement. The search page displays clear titles showing the search criteria. It also shows images of the products, prices and, where appropriate, “add-to-cart” buttons. Even on a small screen, there will be results above the fold.

The Critique Project

Melanie Loveland and her son Dan built a business together around a mutual passion — snowboarding. What started as a small, brick-and-mortar store in Portland, Ore., has evolved into a full-fledged multichannel merchant. It was a process the owners didn’t foresee when the business started in 1995.

Their business has seen dynamic change in the seven years since it launched a website. Daddies Board Shop now generates 80 percent of its sales through online channels and only 20 percent at the Portland store.

In February, when Practical eCommerce offered a once-in-a-lifetime complimentary website critique to one lucky recipient, Dan (along with many other website owners) petitioned for the critique — and his site was selected.

As part of the critique, five firms took an intensive look at Daddiesboardshop.comto analyze its problems, the opportunities for search engine optimization, its general Internet presence, site search, pay-per-click advertising and customer experience/usability. The firms were:

  • Search Engine Optimization: [Netconcepts], Stephan Spencer, President.
  • General Internet Presence: Red Door Interactive, Reid Carr, President.
  • Site Search: SLI Systems, Shaun Ryan, CEO.
  • Pay-per-click Advertising: Key Relevance, Christine Churchill, President.
  • Customer Experience/Usability: Optimal Usability, Richard Kerr, Usability Consultant.
Practical Ecommerce

Practical Ecommerce

Bio   •   RSS Feed


email-news-env

Sign up for our email newsletter

Comments ( 3 )

  1. Legacy User June 6, 2007 Reply

    Google has only integrated Latent Semantic Indexing (LSI) technology into their search engine over the last 2 or 3 years perhaps, where LSI had been published and available from literatures since the early 1990s.

    The adoption of LSI by Google is mentioned in this article:

    http://www.seobook.com/archives/000657.shtml

    I don't know how Google is combining its PageRank & LSI , but it is sure that they must be computed separately and somehow combine the Indices of those into one. The input to PageRank is a matrix of links (this document links to other documents and vice versa), while the input to LSI is a matrix of term-by-documents, which clearly the 2 are computed separately and then somehow combined.

    There are new emerging algorithms that solve this problem which is based on Multi-linear Algebra and I haven't seen yet from literatures if any commercial application has been developed, even Google still doesn't know how, but I am sure that they are working on it. This unified algorithm , according to mathematicians and physicist in which they called "Tensor Calculus" or "Tensor Matrix" , has been around for over a hundred years, but it adoption in data analysis was only recent. The "Tensor Calculus" is the type of mathematics that Albert Einstein used in formulating his General Theory of Relativity. Multi-linear algebra or tensor matrix combined both the "link-based" of PageRank and "content-based" of LSI into one algorithm that computes both simultaneously, rather than computing them separately as Google is currently doing, and them somehow combine their indexing results into one for retrieval.

    The top researcher in this field is Dr. Tammy Kolda (and her team) of Sandia Corporation.

    I have a seen probably around 6 papers on this multi-linear algebra in its application to search engine and the followings are presentations by the leader in the field Dr. Tammy Kolda & colleagues from Sandia, from a conference on the subject last year (2006):

    "Multilinear algebra for analyzing data with multiple linkages "
    http://www.stanford.edu/group/mmds/slides/kolda-mmds.pdf

    "Analysis of Latent Relationships in Semantic Graphs using DEDICOM"
    http://www.stanford.edu/group/mmds/slides/bader-mmds.pdf

    The dynamic of search engine in the near future is going to change again, when Google jumps to adopt Tensor-based or Multi-linear-based search engine, where PageRank & LSI are combined into one algorithm.

    — *Falafulu Fisi*

  2. Legacy User June 6, 2007 Reply

    I would highly recommend these algorithms for developers to improve their content search engine. Google algorithm PageRank is link-based where it is different from content-based search such as the followings:

    #1) "Using Linear Algebra for Intelligent Information Retrieval"
    http://lsirwww.epfl.ch/courses/dis/2003ws/papers/ut-cs-94-270.pdf

    #2) "Probabilistic Latent Semantic Indexing"
    http://www.cs.brown.edu/people/th/papers/Hofmann-SIGIR99.pdf

    #3) "Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization"
    http://meyer.math.ncsu.edu/Meyer/PS_Files/NMFInitAlgConv.pdf

    #4) "Interactive Search Grouping – Search result grouping using Independent Component Analysis"
    http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/825/pdf/imm825.pdf

    The type of search algorithms described in those papers above are the modern state-of-the-art content search algorithms of today. They are different from the traditional key-word search as they are able to retrieve documents with similar concept. This type of search is called Latent Semantic Indexing (LSI).

    eg:

    Doc #1 : "All the tyres of my car must be replaced"

    Doc #2 : "The truck veered to the left when the front tyres went complete ly flat"

    If someone searches for the term "car" using LSI , then both Doc #1 & Doc #2 are retrieved . Hang on, but Doc #2 does not contain the term "car" in it? The 2 documents belong to the some concept called vehicle. How did the LSI search algorithm found out about that ? You noted that both documents have the term "tyres" co-occured. So LSI can reveal the hidden (latent) meanings of document. If one just uses the traditional key-word search , by searching for term "car", then only Doc #1 is returned since Doc #2 , does not contain the term "car".

    So, if one is to look for a commercial content search engine, inquire if the vendor does implement LSI algorithms, because for a good e-commerce site, you need a robust search engine based on LSI as opposed to one that is based on Boolean (true-or-false) and key-word, which are inefficient. LSI will always return something to the user, while key-word or boolean search engine often return "Not Found". Some users might go away and never come back to the site, because they think that what they have in mind is not available in that online store.

    Fact : Most users do sometimes have a vague idea of the terms to type in to search in an ecommerce website. This is when LSI has an advantage over traditional inefficient Boolean & key-word search that most site search are still using today.

    — *Falafulu Fisi*

  3. Legacy User June 8, 2007 Reply

    I would also recommend to anyone who is seeking to buy an off-the-shelf content search engine product (site search engine) to be deployed in your e-commerce website, to request the vendors "Precision" & "Recall" capability of their search engine. They are given in percentages. The higher these numbers , the better the retrieval capability of the search engine.

    "Precision" is defined in information retrieval as : "The proportion of retrieved and relevant documents to all the documents retrieved".

    "Recall" is also defined in information retrieval as : "The proportion of relevant documents that are retrieved, out of all relevant documents available"

    More on how effective a search engine in terms of the measurement of its retrieval capability can be found in the link shown below:

    "Information Retrieval"
    http://en.wikipedia.org/wiki/Information_retrieval

    It is important that you know these numbers from the vendor, so you can have some understanding of the ability of the engine to retrieve relevant information from queries. Say, if "vendor A" has a Recall = 85% and "vendor B" has a Recall = 80% of their search engine, then obviously, I will buy the product from "vendor A". The rating of search engine algorithm is similar to rating of Finance companies by rating agency as Standard & Poors, where these symbols depicted a higher rating and descending towards the lower end: AAA, AA, A, BBB, BB, B, CCC, CC, C, D, etc,… Investors will go for the provider with higher rating.

    The reason that it is important to request the percentage number for the search engine's "Recall" capabilities is that even the LSI algorithms have different variants, which means that they all have different "Recall" capability.

    The most important thing in selecting a good search engine is how good it can retrieve relevant information relating to the query. Most other functionalities in commercial search engines are almost standard among vendors, such as "word-stemming" , "synonym tagging" , etc, etc… The differentiation in search engine capability lies mainly in their "Precision" & "Recall" capabilities.

    So, don't be fool by some fancy words from search engine vendor's marketing department about their product can do this or do that. Just ask straight for the search engine's "Precision" & "Recall" capabilities. If you're given some figures by the vendor then perhaps you, yourself can independently test the search engine by using your own set of independent short documents (say 50 or more), where you know how many of those documents that are relevant to a pre-defined "query-phrase or key-word". Get the search engine to index your test set of documents, then use your pre-defined query-phrase or key-word to search the indexed documents. You can measure the "Precision" & "Recall" capabilities by using the definition given above. There are lots of document test set available on the internet used by researchers in the field of Information Retrieval , Machine Learning, Data Mining, etc, such as one at NIST or Berkeley.

    — *Falafulu Fisi*