Text - NewsML-G2 Quick Start Guide

1. Introduction

One of the most fundamental needs of a news organisation is to handle text. This chapter covers the basics of a simple NewsML-G2 News Item containing text content.

We recommend reading the Quick Start Guide to NewsML-G2 Basics before this Quick Start Guide to Text.

2. Example

Below is an example story and supporting information as might be displayed on the journalist’s editing screen at a fictional news provider, Acme News and Media (ANM):

Acme News and Media - Content Editing System

Acme News and Media - Content Editing System

Slugline

US-Finance-Fed

Created on

2016-11-21 15:21:06

Source

ANM

Author

mjameson

Latest edit

2021-11-21 16:22:45

Latest editor

moiras

Categories

economy, finance, business, central bank, monetary policy

Headline

Fed to halt QE to avert "bubble"

Byline

By Meredith Jameson

(Location) Date

(Washington) 21/11/2021

Body Text

Et, sent luptat luptat, commy nim zzriureet vendreetue modo dolenis ex euisis nosto et lan ullandit lum doloreet vulla feugiam coreet, cons eleniam il ute facin veril et aliquis ad minis et lor sum del iriure dit la feugiamcommy nostrud min ullapat velisl duisismodip ero dipit nit utpatum sandrer cipisim nit lortis augiat nulla faccum at am, quam velenis nulput la auguerostrud magna commolore eliquatie exerate facilis modiamconsed dion henisse quipit at..

This screen contains nearly all of the information needed to create the NewsML-G2 document below.

LISTING: NewsML-G2 Text Document

(All Scheme Aliases used in the listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: ex-geoloc, ex-is)

<?xml version="1.0" encoding="UTF-8" ?> <newsItem xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/ ./NewsML-G2_2.24-spec-All-Core.xsd" guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED" version="14" standard="NewsML-G2" standardversion="2.30" conformance="power" xml:lang="en-US"> <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_37.xml" /> <catalogRef href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" /> <rightsInfo> <copyrightHolder uri="http://www.acmenews.com/about.html#copyright"> <name>Acme News and Media LLC</name> </copyrightHolder> <copyrightNotice>Copyright 2021-22 Acme News and Media LLC</copyrightNotice> </rightsInfo> <itemMeta> <itemClass qcode="ninat:text" /> <provider uri="http://www.acmenews.com/about/" /> <versionCreated>2021-11-21T16:25:32-05:00</versionCreated> <pubStatus qcode="stat:usable" /> </itemMeta> <contentMeta> <contentCreated>2016-11-21T15:21:06-05:00</contentCreated> <contentModified>2021-11-21T16:22:45-05:00</contentModified> <located qcode="ex-geoloc:NYC"> <name>New York, NY</name> </located> <creator uri="http://www.acmenews.com/staff/mjameson"> <name>Meredith Jameson</name> </creator> <infoSource qcode="ex-is:AP"> <name>Associated Press</name> </infoSource> <language tag="en-US" /> <subject qcode="medtop:04000000"> <name>economy, business and finance</name> </subject> <subject qcode="medtop:20000350"> <name>central bank</name> </subject> <subject qcode="medtop:20000379"> <name>money and monetary policy</name> </subject> <slugline>US-Finance-Fed</slugline> <headline> Fed to halt QE to avert "bubble"</headline> </contentMeta> <contentSet> <inlineXML contenttype="application/nitf+xml"> <nitf xmlns="http://iptc.org/std/NITF/2006-10-18/"> <body> <body.head> <hedline> <hl1>Fed to halt QE to avert "bubble"</hl1> </hedline> <byline>By Meredith Jameson, <byttl>Staff Reporter</byttl></byline> </body.head> <body.content> <p>(New York, NY - October 21) Et, sent luptat luptat, commy Nim zzriureet vendreetue modo dolenis ex euisis nosto et lan ullandit lum doloreet vulla feugiam coreet, cons eleniam il ute facin veril et aliquis ad minis et lor sum del iriure dit la feugiamcommy nostrud min ulla autpat velisl duisismodip ero dipit nit utpatum sandrer cipisim nit lortis augiat nulla faccum at am, quam velenis nulput la auguerostrud magna commolore eliquatie exerate facilis modiamconsed dion henisse quipit at. Ut la feu facilla feu faccumsan ecte modoloreet ad ex el utat. </p> <p>Ugiating ea feugait utat, venim velent nim quis nulluptat num Volorem inci enim dolobor eetuer sendre ercin utpatio dolorpercing Et accum nullan voluptat wisis alit dolessim zzrilla commy nonulpu tpatinis exer sequatueros adit verit am nonse exerili quismodion esto cons dolutpat, si. </p> </body.content> </body> </nitf> </inlineXML> </contentSet> </newsItem>

3. Document structure

The building blocks of the text document shown above are the <newsItem> root element, with additional wrapping elements for metadata about the News Item (<itemMeta>), metadata about the content (<contentMeta>) and the content itself (<contentSet>). The top level (root) element <newsItem> attributes are:

<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/" guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED" version="14" standard="NewsML-G2" standardversion="2.30" conformance="power" xml:lang="en-US">

This is followed by references to the Catalogs used to resolve QCodes in the Item, and Rights information:

<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_37.xml" /> <catalogRef href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" /> <rightsInfo> <copyrightHolder uri="http://www.acmenews.com/about.html#copyright"> <name>Acme News and Media LLC</name> </copyrightHolder> <copyrightNotice>Copyright 2021-22 Acme News and Media LLC</copyrightNotice> </rightsInfo>

3.1. Item Metadata <itemMeta>

Note the three mandatory child elements of the mandatory <itemMeta>:

  • Item Class

  • Provider

  • Version Created

A publication status is also mandatory, but the <pubStatus> element may be omitted, in which case the publication status defaults to "usable". However, it is recommended that the publication status is explicitly given, as in this example. As Acme News & Media is fictional, the Provider property does not use one of the IPTC Provider NewsCodes, and is expressed by a URI:

3.2. Content Metadata <contentMeta>

3.2.1. Administrative Metadata

The administrative properties of the example text story are:

The place that the content was created uses the <located> element:

<located> denotes where the story was written, not the place where the subject of the story took place. That would be expressed using <subject>, part of Descriptive Metadata described below.

The author of the article is expressed using the <creator> element:

The Information Source for the article is also given. When used without a @role, <infoSource> is used to denote the person or party that provided the original information on which the content is based. This is the relationship to be expressed here:

The default language for the content is given as U.S. English:

3.2.2. Descriptive Metadata

In the example, the Subject properties use QCodes from the Controlled Vocabulary of Media Topic NewsCodes that are owned and maintained by the IPTC and expressed as QCodes. Thus:

The <slugline> property contains the value of the "Slugline" field of the story:

In a similar fashion, the <headline> property will contain the value of the "Headline" field:

3.2.3. Complete Content Metadata

4. Text content choices

4.2. Inline XML

The content of the NewsML-G2 document is enclosed by the <contentSet> wrapper. In the example, IPTC's news text mark-up language NITF (News Industry Text Format) is used to format the text content. As an XML standard, it is contained in an <inlineXML> child element of <contentSet>, and uses @contenttype to denote the XML-based standard, using the IANA Media Type.

XHTML is also a popular text mark-up choices among NewsML-G2 providers. The contents of <inlineXML> may be any XML language that can express generic or specialised news information, including SportsML-G2. Other languages such as XBRL (Extended Business Reporting Language) may also be used. The content inside <inlineXML> must be valid XML; in other words, it must be able to stand alone as a valid XML document in its own namespace.

4.3. Inline data

The <inlineData> child element of <contentSet> holds data encoded as a string in the same encoding as the full XML document, for example utf-8. Data not covered by this encoding, such as binary data, must use a special encoding resulting in a text string. In the case of binary data (images, graphics, video, audio etc) we recommend that the encoding attribute is used to express the encoding used, and the media type of the data expressed by the contenttype attribute, for example “image/jpeg”, “video/quicktime”. We suggest that base64 encoding is used as it can be easily decoded on the receiver side. The encoding is expressed using a QCode or IRI, and it is recommended to use the IPTC Encoding NewsCodes (recommended Scheme Alias "encd").

Any characters that are not within the definition of xs:string, such as syntax characters used in HTML, must be escaped or placed within CDATA. and we recommend that the contenttype attribute is used (for example “text/plain”, “text/markdown”, “text/html”), but NOT the encoding attribute. See below for examples.

When encoding binary assets, we recommend that only relatively small objects are conveyed using <inlineData>. Normally the <remoteContent> wrapper should be used to convey binary assets.

Name

Definition

Name

Definition

Simplest example

Markdown

Embedded GIF

HTML5 embedded widget