Serving a heavy load rss feed with Spring 3 and EHCache

by Jettro CoenradieDecember 17, 2009

For a project I am doing there was a feature request to come up with an rss component for their new website. This seems pretty easy but the amount of possible feeds (100.000) and the potential for very high load made us think about a custom made solution based on ehcache, spring 3 and rome.

Some of the requirements for the solution have already been mentioned. The following list gives an overview of the things to consider.

    ?

  • Over 100.000 of possible feeds.
  • Every search on the page can be used as a feed
  • Certain feeds will be under very high load
  • Content can easily be cached

Within this post I will discuss the different technical requirements for caching in these kind of solutions. I will also step through the creating of a feed using spring 3 and finally I present the demo application that can actually be used as a web application that exposes news content using a feed as well as a website.

Cache considerations

In general caching can be used on a lot of different levels. Often caching is used in a proxy server like squid. Problem is that we need a more intelligent cache. The backend needs to be able to check if the cache is still up to date and we only want to go to the backend once when multiple requests for a certain feed reaches the server. So the cache needs to track if a certain request is currently being processed by the server, in that case hold the request till the other one arrives and return the cached value.

Luckily ehcache comes with a filter that can do exactly that:

net.sf.ehcache.constructs.web.filter.SimplePageCachingFilter

Configure caching

I used a small extension to this class to be able to configure the jmx capabilities of the Cachemanager. My filter now looks like this

public class CacheCompleteResponseMonitoringFilter extends SimplePageCachingFilter {
    @Override
    public void doInit(FilterConfig filterConfig) throws CacheException {
        super.doInit(filterConfig);
        CacheManager manager = getCacheManager();
        MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();
        ManagementService.registerMBeans(manager, mBeanServer, false, false, false, true);
    }
}

Of course this filter needs to be configured in the web.xml. The following snippit shows the important parts of the web.xml for this filter.

    <filter>
        <filter-name>RssFeedCachingFilter</filter-name>
        <filter-class>nl.gridshore.newsfeed.web.CacheCompleteResponseMonitoringFilter</filter-class>
        <init-param>
            <param-name>suppressStackTraces</param-name>
            <param-value>false</param-value>
        </init-param>
        <init-param>
            <param-name>cacheName</param-name>
            <param-value>RssFeedCachingFilter</param-value>
        </init-param>
    </filter>

    <filter-mapping>
        <filter-name>RssFeedCachingFilter</filter-name>
        <url-pattern>*.rss</url-pattern>
        <dispatcher>REQUEST</dispatcher>
        <dispatcher>INCLUDE</dispatcher>
        <dispatcher>FORWARD</dispatcher>
    </filter-mapping>

Important to notice in this configuration is the cacheName property. This must correspond to an entry in the ehcache configuration, ehcache.xml. The following snippet shows the caching configuration.

    <cacheManagerEventListenerFactory
            class="nl.gridshore.newsfeed.web.LoggingCacheManagerListenerFactory"/>

    <cache name="RssFeedCachingFilter"
           maxElementsInMemory="50"
           eternal="false"
           timeToIdleSeconds="50"
           timeToLiveSeconds="50"
           overflowToDisk="true">
        <cacheEventListenerFactory class="nl.gridshore.newsfeed.web.LoggingCacheListenerFactory"/>
    </cache>

There are a few things I like to discuss in this file. First of all the name of the configured cache which is the same as in the filter configuration RssFeddCachingFilter. Most of the options are easy to understand, I do want to mention the difference between timeToIdle and timeToLive. Time to idle has to do with the last time the cache entry was requested, time to live with the amount of time the item is in the cache. There are two other elements in the configuration that might be less obvious, the cacheManagerEventListenerFactory and the cacheEventListenerFactory. I use both of these to monitor what happens to the cache. If you are interested in these classes check out the sources and switch logging to DEBUG. Now you get more information about caches being added and items put in the cache.

Monitor the cache

The final thing I want to mention about the caching solution is monitoring the cache. Of course we want to have an idea about the amount of items in the cache, the amount of times items were obtained from the cache and the amount of misses while trying to obtain something from the cache.

We use an additional filter to log statistics about the requests. The filter currently uses log4j to write information to the logs. It would not be hard to write something to a database or file system and create an analyser, but for now this is good enough. Each cache instance holds a statistics object, we can use these statistics to give information about the cache. We do this after each request for an rss feed and that way we can monitor the misses, hits, etc. The next snipped shows the code of the filter we use. And after that some exampled of output.

    public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain) throws ServletException, IOException {
        chain.doFilter(req, resp);
        CacheManager cacheManager = CacheManager.getInstance();
        if (cacheManager != null) {
            Ehcache cache = cacheManager.getEhcache("RssFeedCachingFilter");
            Statistics statistics = cache.getStatistics();

            log.info("uri : {}, average get time : {} ms",
                ((HttpServletRequest)req).getRequestURI(),statistics.getAverageGetTime());
            log.info("Cache hits           : {}",statistics.getCacheHits());
            log.info("Cache misses         : {}",statistics.getCacheMisses());
            log.info("Cache object count   : {}",statistics.getObjectCount());
            log.info("Cache eviction count : {}",statistics.getEvictionCount());
            log.info("Cache eviction count : {}",statistics.getEvictionCount());
        } else {
            log.debug("No caching manager available");
        }
    }
uri : /news/feed.rss, average get time : 1.0 ms
Cache hits           : 2
Cache misses         : 2
Cache object count   : 2
Cache eviction count : 0
Cache eviction count : 0
uri : /news/2/feed.rss, average get time : 0.6666667 ms
Cache hits           : 3
Cache misses         : 2
Cache object count   : 2
Cache eviction count : 0
Cache eviction count : 0

Introducing the sample

Enough about the caching, let us focus on creating something that can be cached. First we create a website that enables users to create news items and to comment on news items. The application supports multiple rss feeds:

  • http://…/news/feed.rss – This is a feed containing the news items
  • http://…/news/{id}/feed.rss – This is a feed for the comments of a specific news feed with the id equal to {id}

The website itself consists of three screens, the list screen with all the news items. The new news item screen, with a form to create a news item. And finally a news item details screen where you can enter a comment to a news item. In the list screen you have access to the feed with all the news items and in the detail screen you have access to the feeds with all the comments for one news item.

Screen shot 2009-12-17 at 3.16.33 PM.png
Screen shot 2009-12-17 at 3.16.53 PM.png
Screen shot 2009-12-17 at 3.17.04 PM.png

Creating the feed with spring

Spring 3 comes with an rss component out of the box. The rss publication uses Rome to provide the feed. I am not going into a lot of details in this post.

Each feed has it’s own class that extends the class:

org.springframework.web.servlet.view.feed.AbstractRssFeedView

We use annotations to configure the spring component. By looking at the code you can see the structure. One method to fill the Metadata ofthe feed and a method to create a collection with Item objects that contain the actual feed items. Have a look at the following code, that should be enough to understand how to create a feed.

@Component("newsRssFeed")
public class NewsRssFeedView extends AbstractRssFeedView {

    @Override
    protected void buildFeedMetadata(Map<String, Object> model, Channel feed, HttpServletRequest request) {
        feed.setTitle("Gridshore news items");
        feed.setDescription("All news items from the gridshore source");
        feed.setLink("http://localhost:8080/news");
    }

    @Override
    protected List<Item> buildFeedItems(Map<String, Object> model, HttpServletRequest request, HttpServletResponse response)
            throws Exception {
        @SuppressWarnings({"unchecked"})
        List<NewsItem> newsItems = (List<NewsItem>) model.get("newsItems");
        List<Item> items = new ArrayList<Item>();
        for (NewsItem newsItem : newsItems) {
            Item item = new Item();
            item.setAuthor(newsItem.metaData().author());
            Content content = new Content();
            content.setType(Content.HTML);
            content.setValue("<p>" + newsItem.introduction() + "</p><p>" + newsItem.item() + "</p>");
            item.setContent(content);
            item.setTitle(newsItem.title());

            items.add(item);
        }
        return items;
    }
}

That’s about it, the following links give some references to libraries that I used and where you can find the code.