/
Basics - NewsML-G2 Quick Start Guide

Basics - NewsML-G2 Quick Start Guide

NOTE: This content is no longer maintained and may be out of date. The latest version of this documentation is at https://iptc.org/std/NewsML-G2/guidelines/#quick-start-guide-to-newsml-g2-basics, please consult that document instead.

1. About the Quick Start Guides

Quick Start Guides are intended to give implementers enough information about NewsML-G2 to begin creating a useful working set of model NewsML-G2 documents for their organisation, or to begin working with NewsML-G2 documents provided by another organisation. This Basics Guide covers the NewsML-G2 features that are common to most types of content. The other Quick Start Guides give more specific information about NewsML-G2 for text, pictures, video, and news packages.

2. Introduction

The basic structure of a NewsML-G2 Item document is common to all applications. The available types of Item are:

  • News Item: for all kinds of news content.

  • Package Item: for structured collections of news content.

  • Concept Item: for expressing knowledge about entities, abstract concepts, and events.

  • Knowledge Item: for collections of concepts, often grouped for a specific purpose such as Controlled Vocabularies.

  • Planning Item: for exchanging information about news coverage and fulfilment.

  • Catalog Item: for managing references to Controlled Vocabularies.

3. Item structure

The building blocks of a NewsML-G2 Item are shown in the diagram below:

BasicItemStructure
All NewsML-G2 Items share this basic container structure

All have a root element that is specific to the type of Item that contains identification, version and some basic information to initiate the NewsML-G2 processor.

The Catalog information is required to resolve QCodes, a fundamental feature of NewsML-G2 that enables partners to guarantee that codes used within an Item are globally unique.

The Rights Information block allow publishers to assert fine-grained information about copyright and usage terms as human-readable statements, or by inclusion of a machine-readable rights expression language such as RightsML.

The Item Metadata wrapper contains metadata about the Item as a whole, and this is followed by metadata about whole of the content <contentMeta>, and optionally by the <partMeta> wrapper, which enables publishers to express metadata about specific parts of the content.

Optional "helper" structures are available for specialised processing needs.

Each type of NewsML-G2 Item has a specific wrapper element for content, shown in the diagram below that also shows the basic top level elements common to all NewsML-G2 Items. The colours of the wrapper elements in the diagram are repeated in the code example in order to highlight the relevant sections of the News Item:

ExpandedItemStructure
The basic XML elements associated with each part of a NewsML-G2 Item

The example in this Quick Guide to NewsML-G2 Basics uses a News Item with text content to illustrate the basic principles in action. Read this guide first, and proceed to further Quick Guides specific to Text, Pictures, Video, and Packages, as needed.

LISTING 1: A NewsML-G2 News Item with Text

<?xml version="1.0" encoding="UTF-8"?> <newsItem xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/ ./NewsML-G2_2.30-spec-All-Power.xsd" guid="urn:newsml:acmenews.com:20161018:US-FINANCE-FED" version="14" standard="NewsML-G2" standardversion="2.30" conformance="power" xml:lang="en-GB"> <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_37.xml" /> <catalogRef href="http://www.example.com/newsml-g2/catalog.enews_2.xml" /> <rightsInfo> <copyrightHolder uri="http://www.example.com/about.html#copyright" > <name>Example Enews LLP</name> </copyrightHolder> <copyrightNotice> Copyright 2021-22 Example Enews LLP, all rights reserved </copyrightNotice> <usageTerms> Not for use outside the United States </usageTerms> </rightsInfo> <itemMeta> <itemClass qcode="ninat:text" /> <provider qcode="nprov:REUTERS" /> <versionCreated>2021-10-21T16:25:32-05:00</versionCreated> <firstCreated>2016-10-18T13:12:21-05:00</firstCreated> <embargoed>2021-10-23T12:00:00Z</embargoed> <pubStatus qcode="stat:usable" /> <service qcode="ex-svc:uknews"> <name>UK News Service</name> </service> <edNote> Note to editors: STRICTLY EMBARGOED. Not for public release until 12noon on Friday, October 23, 2021. </edNote> <signal qcode="sig:update" /> <link rel="irel:seeAlso" href="http://www.example.com/video/20081222-PNN-1517-407624/index.html"/> </itemMeta> <contentMeta> <contentCreated>2016-10-18T11:12:00-05:00</contentCreated> <contentModified>2021-10-21T16:22:45-05:00</contentModified> <located type="ex-cptype:city" qcode="ex-geo:345678"> <name>Berlin</name> <broader type="ex-cptype:statprov" qcode="ex-prov:2365"> <name>Berlin</name> </broader> <broader type="ex-cptype:country" qcode="iso3166-1a2:DE"> <name>Germany</name> </broader> </located> <creator uri="http://www.example.com/staff/mjameson" > <name>Meredith Jameson</name> </creator> <infoSource uri="http://www.example.com" /> <subject type="cpnat:abstract" qcode="medtop:04000000"> <name xml:lang="en-GB">economy, business and finance</name> </subject> <subject type="cpnat:abstract" qcode="medtop:20000523"> <name xml:lang="en-GB">labour market</name> <name xml:lang="de">Arbeitsmarkt</name> <broader qcode="medtop:04000000" /> </subject> <genre qcode="genre:interview"> <name xml:lang="en-GB">Interview</name> </genre> <slugline>US-Finance-Fed</slugline> <headline> Fed to halt QE to avert "bubble"</headline> </contentMeta> <contentSet> <inlineXML contenttype="application/nitf+xml"> <!-- A VALID MIME TYPE --> <!-- Inline XML must contain well-formed XML such as NITF or XHTML --> </inlineXML> </contentSet> </newsItem>

4. Root element <newsItem>

Each NewsML-G2 Item Type uses a specific root element name as shown in the diagram above. In the example News Item the root element is <newsItem> (note camel case spelling).

4.1. Root element attributes

Item Identifier

All NewsML-G2 Items must have a @guid, an identifier that should be globally unique for all time and independent of location. The IPTC has registered a URN namespace for the purpose of creating GUIDs for NewsML-G2 Items using a specification based on RFC3085. The syntax for a @guid using this scheme is:

guid="urn:newsml:[ProviderId]:[DateId]:[NewsItemId]"

Use an internet domain name owned by your organisation as a ProviderId, for example:

<newsItem guid= "urn:newsml:acmenews.com:20161018:US-FINANCE-FED"

Do not try to "reverse engineer" the DateId part of a GUID to create a time-stamp. This may have unintended consequences and result in errors. Use the appropriate NewsML-G2 timestamp property instead.

Version

A simple indicator of the version of the Item:

Version numbers need not be consecutive, but must START at 1, and a new version must be a higher number than the previous version. If version is missing, the value is assumed to be 1. See Hidden Values of NewsML-G2

Standard

A string denoting the IPTC Standard, in this case "NewsML-G2".

Standard Version

A string denoting the major and minor version of the Standard being used:

Conformance

There are two levels of Conformance to the NewsML-G2 standard. The "Core" conformance level (CCL) represents a minimal sub-set of NewsML-G2 that can usefully get work done, and is the default if @conformance is omitted. In practice, many implementers use the properties of "Power" conformance (PCL). If an implementer stays within the CCL, the @conformance of the NewsML-G2 Item can be assumed and omitted, but must be declared if using PCL:

Development of NewsML-G2 at Core Conformance has stopped at 2.24; versions beyond this are at Power Conformance only, but the @conformance property must continue to be explicitly stated.

Language

Setting the default language of XML elements with text content in NewsML-G2 is UK English:

IPTC namespace

Sets the default namespace for elements:

Putting this together, the required root element attributes are:

4.2. Validating code

When developing a NewsML-G2 processing application, implementers will need to validate the generated NewsML-G2 code against the appropriate schema. To validate code at PCL against the latest version of NewsML-G2 covered by these Guidelines (2.30) add the following code to the <newsItem> element:

To validate at CCL, substitute the name of the "Core" schema and ensure that the LATEST version of NewsML-G2 that supports CCL (2.24) is set as the @standardversion:

Advanced guidance on validation issues, including validation of SportsML-G2 and other XML content inside NewsML-G2, is covered in Validation of SportsML inside NewsML-G2.

Advanced guidance on validation issues, including validation of SportsML-G2 and other XML content inside NewsML-G2, is covered in Validation of SportsML inside NewsML-G2.

7.4.3. Catalog wrapper <catalogRef>

Codes, short mnemonics used to express the value of properties such as Category, are a long-established feature of news exchange. QCodes are the NewsML-G2 mechanism that enables partners in news exchange to guarantee that codes are globally unique. Without going into details of the mechanism here, the News Item <catalog> enables a NewsML-G2 processor to resolve QCodes, and guarantee that uniqueness, by mapping the code to a globally-unique URI. It is recommended that this URI locates a web resource.

One of the few mandatory NewsML-G2 elements, <itemClass>, uses QCodes issued by the IPTC to identify the business intention of the Item. For a News Item, the scheme is News Item Nature, with a recommended Scheme Alias of "ninat". Values from the scheme include (not limited to) "ninat:text" and "ninat:picture". The catalog reference is:

Other types of NewsML-G2 Item use specific schemes for the <itemClass> property.

All Scheme Aliases used in the example listings indicate IPTC NewsCodes vocabularies, except for the following alias values: ex-svc,ex-cptype, ex-geo,, ex-prov.

As the CVs used by a provider are usually quite consistent across the NewsML-G2 Items they publish, the IPTC recommends that the <catalog> references are aggregated into a stand-alone file which is made available as a web resource referenced by <catalogRef>. This is how the IPTC publishes its Catalogs:

The use of stand-alone web resources is preferable because all of the QCode mappings are shared across many NewsML-G2 Items; a local <catalog> can only be used by the single Item.

It’s likely that provider-specific catalogs will be needed to resolve QCodes used in the Item, for example:

Adding the Catalog information to the example results in the following:

See also Use of URIs in place of QCodes.

4.4. Rights information wrapper <rightsInfo>

The optional <rightsInfo> wrapper holds copyright information and usage terms, such as the following example:

5. Item Metadata <itemMeta>

5.1. Mandatory Properties

The mandatory <itemMeta> section has four mandatory elements, present in the following order:

Item Class

As previously mentioned, the <itemClass> property describes the type of content conveyed by the Item. It is mandatory to use one of the IPTC News Item Nature NewsCodes (recommended Scheme Alias: "ninat") for the Item Class of News Items and Package Items, expressed as a QCode:

Other possible values from this scheme include (not limited to) "ninat:picture", "ninat:video" and "ninat:audio".

Provider

Can be represented by a QCode, or a URI. If the value of this property is NOT taken from a controlled vocabulary, the @qcode or @uri will be omitted and the child <name> element used to give a human-readable value for the property. The IPTC recommends using a QCode with the News Provider NewsCodes, a controlled vocabulary of providers registered with the IPTC with recommended Scheme Alias of "nprov":

IPTC adds News Providers to the vocabulary free of charge. Contact the IPTC office to request adding a new news provider.

Version Created

This contains the date, time and time zone (or UTC) that this version of the NewsML-G2 Item was created. The value must be expressed as XML Schema datetime: YYYY-MM-DDThh:mm:ss±hh:mm

The -05:00 denotes U.S. Eastern Standard Time offset from UTC

Publication Status

Every NewsML-G2 Item must have a publication status. The value defaults to "usable", which permits the <pubStatus> property to be omitted, however it is recommended that a value is explicitly included:

Publication status is highly likely to be used by most news agencies, because the ability to explicitly signal the status of news is essential. The use of the IPTC Publishing Status NewsCodes is mandatory. Its recommended alias is "stat". Other values permitted by the scheme are:

  • stat:canceled (note U.S. spelling) means that the content of the newsItem must not be used, ever.

  • stat:withheld means the content must not be used until further notice.

When an Item is cancelled, it must never be used, but an Item that has been withheld may subsequently have its status updated to "usable".

5.2. Use of URIs in place of QCodes

The original NewsML-G2 developers wanted to use URIs as the preferred way to identify concepts, as this would enable the resulting concept identifiers to be globally unique and optionally to be a reference to a web resource. The constraints on network bandwidth at the time led the developers to propose QCodes to represent URIs, because:

  • they are lightweight and economise file sizes,

  • although they are designed to reference web resources, the delivery of data in response to requesting QCodes that are resolved to full URIs is optional; the codes may be used "as is" without de-referencing to the full URI.

Later, some providers asked for flexibility to use full URIs in property values, so from NewsML-G2 v2.11 this feature was added enabling @uri to be used in place of, or in parallel with, @qcode. (where both are used, @qcode takes precedence). The <pubStatus> assertion:

could be expressed using a URI as:

Further, in NewsML-G2 v2.18 onwards, other properties with QCode Type values had "URI siblings" added, for example properties with @role may now have a value expressed as @roleuri. In NewsML-G2 v2.20, properties with a mandatory @qcode were changed so that @uri may be used instead.

 

The elements <catalog> and <catalogRef> are now OPTIONAL so that implementers who wish to use URI identifiers exclusively do not have to reference schemes that are unused.

5.3. Optional Properties

The following optional properties are frequently used by NewsML-G2 providers.

First Created

The <firstCreated> element indicates when the first version of the Item (not the content) was created.

Embargoed

Business-to-business news organisations often use an embargo to release information in advance, on the strict understanding that it may not be released into the public domain until after the embargo time has expired, or until some other form of permission has been given.

Embargoed is NOT the same as the Publishing Status; embargoed content should have a publishing status of "usable".

It is not required to give any further information about the embargo conditions, but some providers may provide a natural language <edNote>, see below.

Service

The <service> element allows the provider to declare which of its services delivered this package, using a Controlled Vocabulary:

Editorial Note

The <edNote> element contains a note that is intended to be read by internal staff at the receiving organisation, but not published to the end-user; in this example it conveys some optional information about the release condition imposed by the <embargoed> element:

Signal

Additional processing instructions can be given using <signal> and its @qcode. This example uses the IPTC Signal NewsCodes (recommended Scheme Alias "sig") that advises the end-user that this Item updates a previous versions of the Item:

The other value in the Signal scheme is "correction". There is further NewsML-G2 functionality for expressing fine-grained information about the reason and impact of updates and also for applying signals to different parts of an Item’s content.

Link

The <link> element has two basic purposes:

  • To assert relationships to other Items, such as a previous version of an Item

  • To create a navigable link from an Item to some supporting or additional resource.

This example provides a "see also" link to a resource on the Web that end-users can view to get further information about the event. @rel is used to denote the reason that the link is provided. In this example, the QCode uses the recommended IPTC Item Relation NewsCodes with a recommended Scheme Alias of "irel" and the code value is "seeAlso":

Completed Item Metadata

6. Content Metadata <contentMeta>

Conceptually, there are two kinds of content metadata: Administrative and Descriptive.

6.1. Administrative Metadata

This is information about the content that cannot necessarily be deduced by examining it, for example: when it was created and/or modified. Administrative properties that are widely used by implementers of NewsML-G2 for editorial content are:

  • Timestamps (Created, Modified)

  • Story location (Located)

  • Creator

  • Information Source

Timestamps

The <contentCreated> timestamp corresponds to a "Created on" field of the story. It is expressed in NewsML-G2 as Truncated DateTime data type, meaning that the date-time elements may optionally be stripped, starting from the right. If required, the <contentModified> property may also be used to contain the "Last Edit" timestamp. This must be later than the Created timestamp.

Located

The place that the content was created uses the <located> element. Note that this is not necessarily the place of the event or subject. For example, for a UK story written in the London office, <located> would be "London"; a picture of Mount Fuji taken from downtown Tokyo would have a <located> value of "Tokyo".

The semantics of <located> are similar to the natural-language location carried in the dateline that often prefaces news (such as "BERLIN, October 24") but can be conveyed more precisely, and in terms that may be more readily processed by software, using a @qcode or @uri:

The optional @type uses a controlled vocabulary to indicate the nature of the location being expressed; in the example this indicates that <located> refers to a city.

Broader, Narrower

Both Located and Subject (below) contain child elements that express a specific relationship between entities or concepts. For example, the content originated in the city of Berlin and the <located> element shows that the city of Berlin has a "broader" relationship – that is a child-to-parent relationship – to Berlin the state, and to Germany the country:

Creator

The writer, photographer or other party (person or organisation) who created the content is expressed using the <creator> element:

Information Source

The <infoSource> element, together with its optional @role, enables finely-grained identification of the various parties who provided information used to create and develop an item of news. If absent, the default value of @role is the originator of the information used to create or enhance the content.

6.2. Descriptive Metadata

These properties set the context of news content in relation to other news by describing and classifying it. Information that has historically been carried within the content itself, such as the headline and by-line (for text) or embedded metadata (for pictures) may also be specified as metadata. The practical benefit is that the end user no longer needs to scan or retrieve the actual content in order to process it. None of the Descriptive Metadata elements are mandatory, but the following feature frequently in NewsML-G2 implementations.

Subject

The subject matter of content uses the <subject> element. When the value of the Subject is taken from a Controlled Vocabulary, this is identified using either a @qcode or @uri:

For concepts not taken from a CV, the identifier is omitted and the name of the concept is given in the child <name> element, for example:

The optional @type uses the IPTC "nature of the concept" NewsCodes (recommended scheme alias "cpnat") to indicate the type of concept being expressed, for example, an abstract concept, that is a concept that does not represent a real-world entity, but something like an idea, or news category.

The above example uses a concept from the IPTC Media Topic NewsCodes. Also note the use of the W3C XML attribute xml:lang that expresses the language used for the element’s value. It is also possible to add relationships to related concepts, as shown above in <located>. For example:

The IPTC highly recommends that providers use the Media Topic NewsCodes unless there is an over-riding requirement to use proprietary codes. This promotes inter-operability and standardisation in news exchange. It is also recommended that if a <name> is used in conjunction with a QCode, then the value of the <name> agrees with the value in the Scheme for the language specified in xml:lang. IPTC NewsCodes are provided in UK English (en-GB) and translations in French and German have been provided by IPTC members.

The "medtop" code prefix is the recommended Scheme Alias for the MediaTopic NewsCodes, which is resolved via the IPTC Catalog (see Catalog wrapper <catalogRef> above) to the Scheme URI http://cv.iptc.org/newscodes/mediatopic/  This is guaranteed to be globally unique because it is part of the Internet Domain controlled by the IPTC. Appending the code "04000000" to the Scheme URI forms the Concept URI http://cv.iptc.org/newscodes/mediatopic/04000000  that cannot be confused with a concept with the same code 04000000 from another source.

When using @type to indicate the nature of the concept, the possible values from the IPTC Concept Nature NewsCodes are:

  • Abstract: a concept that does not represent a real-world entity

  • Event

  • geoArea: a geo-political area

  • Object: A real-world object, such as a painting; an aircraft

  • Organisation

  • Person

  • Point of Interest

Genre

The <genre> element indicates the style of the content, in this example "interview" as a property that is distinct from <subject> that is used to indicate the subject matter of content. In the example, an IPTC Genre NewsCodes value of "Interview" is used:

Slugline

Some news services implemented in NewsML-G2 retain the <slugline> property as a human-readable index for legacy reasons; therefore receivers may sometimes see this property. However, it has never been a completely reliable identifier, and it is recommended that more purposeful identifiers that are also machine-readable are implemented in its place.

Headline

Even if the Headline is carried inline in text content, it is useful also to place it explicitly in metadata so that it can more easily be identified and extracted by the end-user:

Completed Content Metadata

7. Content <contentSet>

The content of a NewsML-G2 document varies according to its Item type: the example below shows a News Item with a <contentSet> wrapper containing a trivial text payload. Following this code example are skeletal examples showing the other options for conveying content in NewsML-G2 News Items and Package Items:

7.7.1. News Content options for News Items and Package Items

The News Item <contentSet> contains a single logical piece of content, but allows alternative renditions of the SAME content to be carried in a single NewsML-G2 Item:

or

or

Package Item content is wrapped by the <groupSet> element, which can contain one or more <group> children. Each Group contains references to Items that make up the News Package, using the <itemRef> element. A Group can also reference other Groups via the <groupRef> element.

8. Summary and Next Steps

This section has covered the basic structure that is common to all NewsML-G2 Items, and also outlined properties that are commonly used for news content. Further Quick Start Guides show how to build upon this foundation:

Quick Start – Text takes an example news story and shows how the information on an editor’s screen would be implemented in NewsML-G2.

Quick Start – Pictures takes an example image and its embedded metadata and converts this to a NewsML-G2 properties with several image renditions carried in a single NewsML-G2 Item. The guide also shows how to express the various technical characteristics of images using NewsML-G2 properties.

Quick Start – Video is split into two sections: the first covers a simple case of a standalone video file with various technical renditions expressed in NewsML-G2; the second uses a more comprehensive structure that separates the metadata for multiple segments of a video, using the <partMeta> wrapper.

Quick Start – Packages shows how NewsML-G2 Items and other kinds of content can be assembled into Packages of managed objects with an explicit structure.

9. Hidden Default Values of NewsML-G2

There are some default values set by the specification which allow an element or attribute to be omitted and the default assumed. The list below shows NewsML-G2 elements and attributes which optionally appear in an Item but for which a usable value or status exists.

9.1. All NewsML-G2 items

  • @version of the root element = "1"

  • @conformance of the root element = "core"

  • <embargoed> = no embargo

  • <pubStatus> = "usable"

  • <catalog> and <catalogRef> = at least one is required; many of either, or a mix of both may be used

  • @scope of <hash> = "content". Hash value is a message digest included in the Item for security purposes. A hash scope of "content" indicates that the hash value was derived by hashing some/all of the content only.

  • @why (attribute of many elements) = "direct". The attribute value indicates that the value is directly related to the content.

  • @how (attribute of many elements) = "person". The attribute indicates how the value was extracted from the content: by a person. (See Why and How metadata has been added: @why and @how for essential guidance on the use of @why and @how.)

  • @custom (attribute of many elements) = "false". The attribute indicates that the property was added specifically for a customer or group of customers

  • @dir (many elements) = "ltr". The directionality of the script of the language of the property is left to right.

9.2. News Items only

  • @timeunit (when @duration is used) = "seconds"

  • @dimensionunit (when @width/@height is used) = (for example) pixels for a still picture. See the Quick Start Guide – Pictures for details.

9.3. Package Item only

  • @mode of a <group> = "bag" – an unordered collection of complementary components