Text - NewsML-G2 Quick Start Guide
1. Introduction
One of the most fundamental needs of a news organisation is to handle text. This chapter covers the basics of a simple NewsML-G2 News Item containing text content.
We recommend reading the Quick Start Guide to NewsML-G2 Basics before this Quick Start Guide to Text.
2. Example
Below is an example story and supporting information as might be displayed on the journalist’s editing screen at a fictional news provider, Acme News and Media (ANM):
| |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This screen contains nearly all of the information needed to create the NewsML-G2 document below.
LISTING: NewsML-G2 Text Document
(All Scheme Aliases used in the listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: ex-geoloc, ex-is)
<?xml version="1.0" encoding="UTF-8" ?>
<newsItem
xmlns="http://iptc.org/std/nar/2006-10-01/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
./NewsML-G2_2.24-spec-All-Core.xsd"
guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED"
version="14"
standard="NewsML-G2"
standardversion="2.30"
conformance="power"
xml:lang="en-US">
<catalogRef
href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_37.xml" />
<catalogRef
href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" />
<rightsInfo>
<copyrightHolder uri="http://www.acmenews.com/about.html#copyright">
<name>Acme News and Media LLC</name>
</copyrightHolder>
<copyrightNotice>Copyright 2021-22 Acme News and Media LLC</copyrightNotice>
</rightsInfo>
<itemMeta>
<itemClass qcode="ninat:text" />
<provider uri="http://www.acmenews.com/about/" />
<versionCreated>2021-11-21T16:25:32-05:00</versionCreated>
<pubStatus qcode="stat:usable" />
</itemMeta>
<contentMeta>
<contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
<contentModified>2021-11-21T16:22:45-05:00</contentModified>
<located qcode="ex-geoloc:NYC">
<name>New York, NY</name>
</located>
<creator uri="http://www.acmenews.com/staff/mjameson">
<name>Meredith Jameson</name>
</creator>
<infoSource qcode="ex-is:AP">
<name>Associated Press</name>
</infoSource>
<language tag="en-US" />
<subject qcode="medtop:04000000">
<name>economy, business and finance</name>
</subject>
<subject qcode="medtop:20000350">
<name>central bank</name>
</subject>
<subject qcode="medtop:20000379">
<name>money and monetary policy</name>
</subject>
<slugline>US-Finance-Fed</slugline>
<headline> Fed to halt QE to avert "bubble"</headline>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/nitf+xml">
<nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">
<body>
<body.head>
<hedline>
<hl1>Fed to halt QE to avert "bubble"</hl1>
</hedline>
<byline>By Meredith Jameson, <byttl>Staff Reporter</byttl></byline>
</body.head>
<body.content>
<p>(New York, NY - October 21) Et, sent luptat luptat, commy
Nim zzriureet vendreetue modo
dolenis ex euisis nosto et lan ullandit lum doloreet vulla
feugiam coreet, cons eleniam il ute facin veril et aliquis ad
minis et lor sum del iriure dit la feugiamcommy nostrud min ulla
autpat velisl duisismodip ero dipit nit utpatum sandrer cipisim
nit lortis augiat nulla faccum at am, quam velenis nulput la
auguerostrud magna commolore eliquatie exerate facilis
modiamconsed dion henisse quipit at. Ut la feu facilla feu
faccumsan ecte modoloreet ad ex el utat.
</p>
<p>Ugiating ea feugait utat, venim velent nim quis nulluptat num
Volorem inci enim dolobor eetuer sendre ercin utpatio dolorpercing
Et accum nullan voluptat wisis alit dolessim zzrilla commy nonulpu
tpatinis exer sequatueros adit verit am nonse exerili quismodion
esto cons dolutpat, si.
</p>
</body.content>
</body>
</nitf>
</inlineXML>
</contentSet>
</newsItem>
3. Document structure
The building blocks of the text document shown above are the <newsItem>
root element, with additional wrapping elements for metadata about the News Item (<itemMeta>
), metadata about the content (<contentMeta>
) and the content itself (<contentSet>
). The top level (root) element <newsItem>
attributes are:
<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/"
guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED"
version="14"
standard="NewsML-G2"
standardversion="2.30"
conformance="power"
xml:lang="en-US">
This is followed by references to the Catalogs used to resolve QCodes in the Item, and Rights information:
<catalogRef
href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_37.xml"
/>
<catalogRef
href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" />
<rightsInfo>
<copyrightHolder uri="http://www.acmenews.com/about.html#copyright">
<name>Acme News and Media LLC</name>
</copyrightHolder>
<copyrightNotice>Copyright 2021-22 Acme News and Media LLC</copyrightNotice>
</rightsInfo>
3.1. Item Metadata <itemMeta>
Note the three mandatory child elements of the mandatory <itemMeta>
:
Item Class
Provider
Version Created
A publication status is also mandatory, but the <pubStatus>
element may be omitted, in which case the publication status defaults to "usable". However, it is recommended that the publication status is explicitly given, as in this example. As Acme News & Media is fictional, the Provider property does not use one of the IPTC Provider NewsCodes, and is expressed by a URI:
3.2. Content Metadata <contentMeta>
3.2.1. Administrative Metadata
The administrative properties of the example text story are:
The place that the content was created uses the <located> element:
<located>
denotes where the story was written, not the place where the subject of the story took place. That would be expressed using <subject>
, part of Descriptive Metadata described below.
The author of the article is expressed using the <creator>
element:
The Information Source for the article is also given. When used without a @role
, <infoSource>
is used to denote the person or party that provided the original information on which the content is based. This is the relationship to be expressed here:
The default language for the content is given as U.S. English:
3.2.2. Descriptive Metadata
In the example, the Subject properties use QCodes from the Controlled Vocabulary of Media Topic NewsCodes that are owned and maintained by the IPTC and expressed as QCodes. Thus:
The <slugline>
property contains the value of the "Slugline" field of the story:
In a similar fashion, the <headline>
property will contain the value of the "Headline" field:
3.2.3. Complete Content Metadata
4. Text content choices
4.2. Inline XML
The content of the NewsML-G2 document is enclosed by the <contentSet>
wrapper. In the example, IPTC's news text mark-up language NITF (News Industry Text Format) is used to format the text content. As an XML standard, it is contained in an <inlineXML>
child element of <contentSet>
, and uses @contenttype
to denote the XML-based standard, using the IANA Media Type.
XHTML is also a popular text mark-up choices among NewsML-G2 providers. The contents of <inlineXML>
may be any XML language that can express generic or specialised news information, including SportsML-G2. Other languages such as XBRL (Extended Business Reporting Language) may also be used. The content inside <inlineXML>
must be valid XML; in other words, it must be able to stand alone as a valid XML document in its own namespace.
4.3. Inline data
The <inlineData>
child element of <contentSet>
holds data encoded as a string in the same encoding as the full XML document, for example utf-8. Data not covered by this encoding, such as binary data, must use a special encoding resulting in a text string. In the case of binary data (images, graphics, video, audio etc) we recommend that the encoding
attribute is used to express the encoding used, and the media type of the data expressed by the contenttype
attribute, for example “image/jpeg”, “video/quicktime”. We suggest that base64 encoding is used as it can be easily decoded on the receiver side. The encoding is expressed using a QCode or IRI, and it is recommended to use the IPTC Encoding NewsCodes (recommended Scheme Alias "encd
").
Any characters that are not within the definition of xs:string
, such as syntax characters used in HTML, must be escaped or placed within CDATA. and we recommend that the contenttype
attribute is used (for example “text/plain”, “text/markdown”, “text/html”), but NOT the encoding
attribute. See below for examples.
When encoding binary assets, we recommend that only relatively small objects are conveyed using <inlineData>
. Normally the <remoteContent>
wrapper should be used to convey binary assets.
Name | Definition |
---|---|
Simplest example | |
Markdown | |
Embedded GIF | |
HTML5 embedded widget |