Posts

Showing posts from October, 2005

Dashes, dots and an interesting observation

With our Mini, I have been toying with some ways to search for substrings. What I'm finding is that dashes, dots, etc. don't allow for a substring match. However, there is one interesting observation that I have to note here.

The first special character is dropped.

That's right, be it a dash or a dot, the first one is dropped and ignored. Even when searching, the same behaviour occurs. What does this mean? How can I use it to my advantage?

Well, let's take a look at the UB181 example. If I were to make 2 variations of that in our "Additional Keywords" field (only shown to our Mini), simply making ub-181dz and ub181-dz (adding a dash where a letter meets a number, which is a simple regular expression) will not allow these to match a search for 181dz or ub181. Instead, I'd have to make ub-181-dz for a match of ub181 and u-b-181dz for a match of 181dz. That seems like an odd way to have to handle this, but again - simple regular expressions to make this happen.

Substring limitations

So I have some major beefs with the Mini and how Google's algo handles stemming / substring matches. This just isn't working right.

For example, one of our products is the "UB181DZ". People commonly search for the "UB181", since this is the base model. Well, the default results were 0 matches. That's not good, especially when we have the UB181DZ in stock, along with the DZK and the DZK-2 (we "invented" these kits by customer demand.)

So, a simple fix is to pre-fetch substring matches of any skus. I'm doing a "SELECT sku FROM products WHERE sku like '%$searchstring%';" and sending the search as "ub181 OR ub181dz OR ub181dkz OR ub181dzk-2" for now, but that's just not a good long-term solution. It also doesn't work when they search for multiple words. I'd have to make some funky syntax to get that working, and I'm feeling lazy right now.

As an experiment, I thought we could try dashes. The first attem…

Beginning insights from the Google Mini

We bought Google. Ok, so it was just a mini, but it's cleared up my understanding of Google a bit.

Upon setting it up, nothing really great came to mind. However, after doing some tweaking (it's powering the search on toolbarn.com right now), it has become clear that the more I learn about this box and it's capabilities the more I understand Google.

For starters, I ended up having to cloak some pages to our mini to get our results to come out right. A search for makita drills gave me results of milwaukee drills as well because of our breadcrumb navigation and the cross-linking. Every page on our site was returned for power tools because it's in the main navigation. Some searches return poor results, such as makita 5000, which several people have searched for. I have a temp solution in place for that.

So, after playing with it and then sitting back to think about how it works / serves results, I figured something out that may end up being priceless.

Searches done on our sit…

MP3 File Manipulation

So I started on a site that is going to do some manipulation of MP3 files dynamically, taking a preview out of the song that should be sort of representative of the song. I toyed with a few different ways of pulling out the section I wanted, looking for large dynamic changes, widest frequency range, flattest frequency pattern, breaks in the song, etc. I also tried a few different methods of making the snip. Here's what I found.

1) Regardless how cool it is to be able to detect changes in dynamics, frequency ranges, patterns, breaks, or anything else it just doesn't mean that the section will be representative.

2) MP3 players are fairly bullet proof and don't mind some abuse. There isn't any need to split properly across frames within the file - MP3 players (all that I've tested in Win and Linux) all handle improperly split files just fine.

3) Bitrates don't translate perfectly. Just because you know how many bits per second a file is doesn't mean you can clip …