Skip to content

Managing Revisions and Duplicates

What are revisions?

As news stories evolve and AP's editorial teams gather additional information and multimedia resources, updates to a content item are published to the AP Media API throughout the day.

These updates are referred to as content item revisions. Some examples are:

  • Story revisions ("writethrus").
  • Additions or corrections to image captions; kills of previously delivered pictures. For more information, see Picture Notification Banners.
  • New linked curated media (for example, pictures or video) added to a story by AP editors.
  • New media renditions added to a content item. For example, video renditions in various formats, quality and encodings are released into AP Media API feeds as soon as some of the renditions are available (and not necessarily when all of the renditions have finished being produced) to deliver new content to you as fast as possible.

What are duplicates?

Duplicate content may be delivered for a variety of reasons; for example:

  • The same Top Headline story may be included in a few Top Headline packages, which are filed multiple times during the day, many times with the same stories.
  • Stories may share linked media; for example, the same picture may be linked to several stories about the same news event.
  • AP editors may file the same story for print and online use.
  • The same story or media may appear in multiple entitlements; for example, multiple products specified in separate feed requests.

    Note

    If multiple products are specified in the same feed request, duplicate items are filtered out in the feed response.

Which metadata tags can I use to manage revisions and detect duplicates?

To enable tracking and management of content item revisions and duplicate detection, the AP Media API returns the following metadata values for each content item in the Search, Feed and Item Metadata responses: Item ID (altids.itemid), ETag (altids.etag) and Version (version):

{// Example:
 "altids": {
   "itemid": "169bf4e1ed114d849bb4bc30b9377929",
   "etag": "169bf4e1ed114d849bb4bc30b9377929_0a1aza3c89898"...},
 "version": 0}

Note

Item ID, ETag and Version are also returned for each linked content item (for example, linked curated media and Top Headline stories) referenced in associations of the content items returned in the Search, Feed and Item Metadata responses.

Item ID

A unique ID that remains the same throughout all revisions of the content item; for example, all stories that have the same item ID are part of the same 'story chain'.

ETag

The ETag value is a unique token for each revision of a content item, which changes not only when there are updates to the story body or item metadata, but also to any item component; for example, if new linked curated media or media renditions are added to the content item. The include/exclude parameters have no impact on the ETag values.

Version

The content item version number: typically 0 for the initial version, 1 for the first version, 2 for the second version and so on. The higher the number, the more recent the content item's version.

  • For text stories, this is the version of the story revision, which indicates where the story is located in the 'story chain'.
  • For other media types (for example, pictures, graphics and video), this is the version of the item metadata; for example, an image caption. Typically, significant changes to the binary asset (such as a picture) are published as a new content item.

Note

The version number is incremented when there are changes to the story body or item metadata (for example, an image caption), or new curated media is added to the story. Unlike ETag, the version number does not reflect other possible changes to the content item; for example, the addition of new video renditions due to their processing time. However, the version number may be useful for placing a story in the specific location in the 'story chain' if you are tracking all versions of the story for news management using the versions=all parameter in your feed request.

How can I determine if a content item is a new revision of one previously downloaded?

If the Item ID value matches one that you previously ingested, but the ETag value for that item differs from the ETag of the previously downloaded item, download and overwrite the previous revision. For more information, see Using ETags and Client Caching for Managing Revisions and Duplicates.

How can I determine if a content item is a duplicate of one previously downloaded?

If the Item ID value matches one that you previously downloaded, and the ETag value for that item matches the ETag of the previously downloaded item, the item is a duplicate, and you do not need to download it again. For more information, see Using ETags and Client Caching for Managing Revisions and Duplicates.

Using ETags and client caching to manage revisions and duplicates

It is recommended that your client application cache a response received from the AP Media API server and then use ETags returned by the API to ensure that you download the same content item more than once only if it has become outdated.

Since the ETag value serves as a unique token for each revision of a content item, it can be used by caching applications to determine if their cache is up-to-date.

Tip

Caching and reusing previously retrieved content also helps optimize performance.

Detecting duplicates using a cache key with ETag validation

When caching content items returned in the Search, Feed or Item Metadata responses or linked items referenced in associations (for example, linked media and Top Headline stories), select a cache key based on ItemID (altids.itemid).

Tip

If you are requesting multiple versions of the content item using the versions=all parameter, select a cache key based on ItemID.Version (altids.itemid and version).

To determine whether the cached document is the latest, use the ETag value (altids.etag) as a validation token:

  • If the value you received from the API matches the value stored in the cache, use the cached document.
  • Otherwise, retrieve the item again and update the cached document and its ETag token.

Detecting duplicates using conditional Item Metadata requests

Note

This section is applicable only to processing responses returned by the Item Metadata method. It provides an alternative to using the cache key described in the previous section.

In addition to returning the ETag value in altids.etag, the Item Metadata method also returns the ETag in the standard HTTP Etag header; for example:

HTTP/1.x 200 OK
... 
Etag: "d5787d1366ae4ca095b936d77c0864b4_0a1aza0c0"

Your client application can cache the ETag value and then send it in the standard HTTP "If-None-Match" header as a conditional request to the AP Media API server to ensure that the cached document is the latest version.

If the client's document is the latest (based on the ETag value), the server sends back the HTTP/304 Not Modified header without the response body. The client then reuses the cached document.

Alternatively, if the client's document is outdated, the server sends back the HTTP/200 OK header and the new response body. The client then uses the new response body and caches it for later reuse.