Mahout – Taste :: Part Three – Estimators

by frank, July 8, 2010

In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the ‘recommended because’ feature is also powered by an estimator. This blog covers some Taste internals and shows you how estimators are used within Taste via a few code samples.

Estimators for recommendations

Let’s start with the main usage of estimators: providing recommendations. Suppose we create a GenericItemBasedRecommender, provide it with a DataModel and one of Taste’s ItemSimilarity implementations.

To fetch a few recommendations we call GenericItemBasedRecommender.mostSimilarItems(long itemID, int howMany), as shown in the snippet below:

  @Override
  public List<RecommendedItem> mostSimilarItems(long itemID, int howMany) throws TasteException {
    return mostSimilarItems(itemID, howMany, null);
  }

  @Override
  public List<RecommendedItem> mostSimilarItems(long itemID, int howMany,
                                                Rescorer<LongPair> rescorer) throws TasteException {
    TopItems.Estimator<Long> estimator = new MostSimilarEstimator(itemID, similarity, rescorer);
    return doMostSimilarItems(new long[] {itemID}, howMany, estimator);
  }

After delegating the method call to a more generic mostSimilarItems method, a MostSimilarEstimator is constructed and passed to the protected method doMostSimilarItems. The whole process of estimating and recommending is implemented via an estimator and algorithm specific logic within a recommender.

Now let’s zoom in on the doMostSimilarItems method. See the snippet below:

  private List<RecommendedItem> doMostSimilarItems(long[] itemIDs,
                                                   int howMany,
                                                   TopItems.Estimator<Long> estimator) throws TasteException {
    DataModel model = getDataModel();
    FastIDSet possibleItemsIDs = new FastIDSet();
    for (long itemID : itemIDs) {
      PreferenceArray prefs = model.getPreferencesForItem(itemID);
      int size = prefs.length();
      for (int i = 0; i < size; i++) {
        long userID = prefs.get(i).getUserID();
        possibleItemsIDs.addAll(model.getItemIDsFromUser(userID));
      }
    }
    possibleItemsIDs.removeAll(itemIDs);
    return TopItems.getTopItems(howMany, possibleItemsIDs.iterator(), null, estimator);
  }

The snippet above describes the core logic for item-based recommendation. This process consists of three steps:

Fetch all preferences for the given item(s)
For each preference get the corresponding user and fetch all their other preferences
From this set of preferences, minus the given item, get the corresponding items and determine the top items based on the given estimator

The TopItems is a helper class for fetching the top ranked items of a set of items for a given estimator.

Now on to the estimator. All estimators implement TopItems.Estimator<T> interface which is really simple. It returns an estimate for a ‘thing’ as a double.

  public interface Estimator<T> {
    double estimate(T thing) throws TasteException;
  }

Now on to the MostSimilarEstimator:

  public static class MostSimilarEstimator implements TopItems.Estimator<Long> {

    private final long toItemID;
    private final ItemSimilarity similarity;
    private final Rescorer<LongPair> rescorer;

    public MostSimilarEstimator(long toItemID, ItemSimilarity similarity, Rescorer<LongPair> rescorer) {
      this.toItemID = toItemID;
      this.similarity = similarity;
      this.rescorer = rescorer;
    }

    @Override
    public double estimate(Long itemID) throws TasteException {
      LongPair pair = new LongPair(toItemID, itemID);
      if ((rescorer != null) && rescorer.isFiltered(pair)) {
        return Double.NaN;
      }
      double originalEstimate = similarity.itemSimilarity(toItemID, itemID);
      return rescorer == null ? originalEstimate : rescorer.rescore(pair, originalEstimate);
    }
  }

This estimator does three things:

Use the Rescorer to filter items. Rescorers can be used to create domain specific filtering of items
Use the ItemSimilarity to calculate the preference of a user for the given item
Optionally boost the similarity value with the Rescorer

This setup allows you to plugin arbitrary ItemSimilarity algorithms in the recommender.

Recommended because…

Another interesting feature of the GenericItemBasedRecommender is the ‘Recommended because’ feature. With this feature you can determine why a certain item was recommended to you, i.e. which of your preferences were largely responsible for giving you this recommendation.

To use this feature call recommendedBecause(long userID, long itemID, int howMany), see snippet below:

  @Override
  public List<RecommendedItem> recommendedBecause(long userID, long itemID, int howMany) throws TasteException {
    if (howMany < 1) {
      throw new IllegalArgumentException("howMany must be at least 1");
    }

    DataModel model = getDataModel();
    TopItems.Estimator<Long> estimator = new RecommendedBecauseEstimator(userID, itemID, similarity);

    PreferenceArray prefs = model.getPreferencesFromUser(userID);
    int size = prefs.length();
    FastIDSet allUserItems = new FastIDSet(size);
    for (int i = 0; i < size; i++) {
      allUserItems.add(prefs.getItemID(i));
    }
    allUserItems.remove(itemID);

    return TopItems.getTopItems(howMany, allUserItems.iterator(), null, estimator);
  }

It takes all items the given user has a preferences for, minus the given item and passes this to TopItems, along the with RecommendedBecauseEstimator, see the code below:

  private class RecommendedBecauseEstimator implements TopItems.Estimator<Long> {

    private final long userID;
    private final long recommendedItemID;
    private final ItemSimilarity similarity;

    private RecommendedBecauseEstimator(long userID, long recommendedItemID, ItemSimilarity similarity) {
      this.userID = userID;
      this.recommendedItemID = recommendedItemID;
      this.similarity = similarity;
    }

    @Override
    public double estimate(Long itemID) throws TasteException {
      Float pref = getDataModel().getPreferenceValue(userID, itemID);
      if (pref == null) {
        return Float.NaN;
      }
      double similarityValue = similarity.itemSimilarity(recommendedItemID, itemID);
      return (1.0 + similarityValue) * pref;
    }
  }

}

This RecommendedBecauseEstimator determines the ranking by multiplying the preference value of the user by the item similarity value of the current item pair. After this process the top ranked items are those items that were most important in causing a recommendation of the given item.

Conclusions

This concludes the overview of some Taste internals and has hopefully given you a clearer picture on how recommendations and estimators work inside Taste. In future posts I will probably expand on this topic, especially within the context the evaluation of recommenders. If you have any questions regarding Taste in general or this topic of estimators feel free to leave a comment.