ISNI, the International Standard Name Identifier, has just released a new white paper: The Benefits of ISNI for Publishers. There's more information about it in this press release.

This is an important document worth a read by any decision maker, at any level, in publishing or retail. It’s not long so there’s no need for an executive summary, and you should read it first, but let us take this opportunity to provide our current understanding on where the Canadian industry is on this important topic.

ISNI and book data

Extensive book bibliographic information has existed for centuries and extensive digital datasets of it have existed for decades. They are held by libraries, retailers, publishers and other players in the business of “long-form” text publishing. During that extensive period, book authors have been a subject for copyright laws, pressure both social and political to be responsible and a business (or a brand or commodified?), and they have even occasionally needed to hide or change aspects of their identity. Authors have a special place and enjoy the opportunity to be as celebrated and exploited as any other content producer.

Other industries, music for instance, have built purpose-driven datasets using ISNI as one of a set of identifiers to track complex relations among song, album, group, and individual musicians, song writers, and various other copyright holders. Not all of those are “content producers” who can be tracked by ISNI but most of them are. The music industry supports multiple understandings of content production and that creates additional benefits for how they use ISNI.

Book publishing data has used ISBNs to track published works. An ISBN represents a "book format plus published content" and publishers have a goal of supporting the ordering of and tracking individual sales of a specific saleable book by ISBN. The book publisher owns their individual relationship with the content source whose content was developed by their business into “published content.” If the music industry used complexity to drive development of datasets, book publishing has preserved a simple relationship of owning a format-plus-content with everything else as a contract held below the table by the publisher. That’s a gross simplification but it highlights a fundamental problem with integrating ISNI into the book publishing industry:

Contracts held by book publishers involve multiple entities. A publisher is a content producer and some of the contracts represent other content producers, but the only contracts in our book metadata are those that determine who sells what where. The hidden contracts may affect the success of a book but book publishing metadata does not track the content. Content may be king and the intellectual property that underlies the business of books, but our data treats it as secondary to the product-in-hand.
- There has been a long identified need to support ISTC, an International Standard Text Code, which could act as a system for content tracking (that could even include systematically following translations, abridgements, editions, and so on). It hasn't been supported by book publishers for the reason that it would interfere with the integrity of our transactional system. It’s outside of the scope of this description to go into, but I’ll note that there is a new standard — International Standard Content Code (ISCC) — that creates an identifier for a digital file (text or book file included) that can support the tracking of published content. It seems designed to support transactional content (the same content that might be published in different ISBN-based book formats) without providing a solution to tracking the generalized concept of content in any systematic way.
There are other datasets, typically held by libraries, that are tracking content and authorship and while they explicitly are trying to track "published works," ISBNs are simply an order number for what they buy. Libraries treat content based on their clients' needs, and authorship as a primary source of content to be followed but they do it by diverse means. Libraries have existed for a long time and work in a systematic but decentralized way. Libraries hold a lot of book content-by-title data with a lot of authorship data but little tracking of other content producer players and few direct linkages to ISBN transactions.

This is the potential ISNI brings to book publishing and my point is simple:

There is a lot of existing book data available. See above.
Book data works well within its own system for meeting its purpose, but the systems are not designed to work together.
Libraries and some publishers or industry support players typically have tracked authorship but the opportunity offered by ISNI is relatively new. Library use of ISNI is growing, making it one of the best hopes of providing a cross system enabler for matching publisher metadata.

How does ISNI work anyway?

Name matching is hard and while it should be easy for a publisher who pays royalties to their authors as individuals, many, if not most, publishers of any size will have matches or near matches within their dataset’s contributor name lists. Not every book contributor in the bibliographic data will be covered by royalties, so even as raw data, a single publisher’s data would need to be exceptionally (I would go so far as to say impossibly) well managed to be even approaching 100% as a reliable name match. EDItEUR loves to use an example of two HarperCollins UK authors, both historians with a seemingly endless list of shared attributes including identical names and startlingly similar author photos, to prove how hard disambiguation is without ISNI is and why it’s important. My point is, errors are to be expected when name matching.

ISNI solves this by combining a big data solution — running relatively simple files exportable from any book-based system and using the content-by-title and name association as form of identifiers within an ISNI record. By the names of their content (in multiple languages), thou shalt know the name of thy content producer. Associated names as well as source information, such as data sender and in some cases Wikipedia entries, further clarify the record and quality control through use of the data. Inaccuracies within ISNI are exposed by use, mismatches found by a lot of eyes looking, and can be corrected by a simple report-an-error box in each record.

The neat trick pulled by a big data solution like this is that the data being run as files against the source is used to create the record it’s matched to. Use of the matched data provides the integrity of the record. A publisher with a few bad matches will discover them over time. A library with an aggregated dataset — or perhaps someday author searches against BiblioShare — can find oddball associations (could this content producer really have produced content 80 years apart?) that can be flagged. Greater accuracy promotes better results which continue to expose smaller errors. Over time more content producers will be added but the error rate will be steadily low.

To go back to book publisher metadata: While it naturally privileges the "published work" to the detriment of "authorship", ISNI compensates by using the published works' name as a key part of the record. Library data has long provided different types of author authority systems; ISNI can function by unifying them, and name associations in ISNI are another key part of the record. The two major book data systems support each other so long as each includes quality control and error reporting as part of their ISNI implementation.

There are other cross pollination opportunities that are being lost. Not that many data sources submit data to ISNI and we should be looking to increase them.

It would be a great boon to bibliographic metadata if anyone relying on ISNI:

worked more closely together, perhaps to share the cost of submitting data to ISNI, libraries and book publishers in particular; and
wholesalers, retailers, and other users provided their own quality control of ISNI and systematically reported errors to the data source, and each data source took responsibility for feeding back that information until it reached a responsible body who could confirm.

There’s one player absent here. Book authors are still being cocooned within book publishers' contract bins. It would be a great boon to bibliographic metadata if authors, agents, and other entities representing those content-based intellectual properties sold to publishers, stepped up and engaged in use of identifiers.

Foreseeable problems

Cost

Dumping the cost of supporting a content identifier like ISNI on book publishers is not working very well. It’s hard to argue that they should be responsible for absorbing the cost of tracking the things that are sold to them in order that identification of "content by authorship" can be tracked for needs beyond theirs for published works.

That said, publishers would be irresponsible if they avoided integrating ISNI into their metadata. For a start it would ignore that they made content tracking a problem by focusing solely on ISBN-based-product-only-metadata. The ONIX metadata standard provides an opportunity to track content through Block 5 Related Material's “Work Identifier” — don’t blame EDItEUR for it! My point is only that there’s no solution in dumping the problem on authors and their agents or expecting libraries to solve this for publishers.

All the players, publishers included, should step up and do more because tracking “authorship” provides all the publisher benefits ISNI describes in their white paper. Just go. Go read that ISNI benefit document again.

Publishers will benefit because 400 years of book history says authorship sells book content. Forty years of book history says ISBNs allowed for tracking of saleable products and greatly improved the business of publishing (and helped jump-start the Amazon behemoth). Arguably it didn’t change anything about the importance of the book’s content to readers. Publishers are content producers but their brand doesn’t close the sale for anyone following an author. ISNI is for that part of the industry answering client questions like “Find the author”, “What else have they written?”, and “What other book is in this series?” Publishers have always benefited from supporting good retailers. Maybe future commercial success will be defined in a decentralized utopia by better tracking of content, but we have moved well beyond the scope of what can be addressed here.

Getting ISNIs

ISNIs are hard to get in North America. We’ve seen that ISNI is designed to match the concept of a "content producer" against bibliographic book metadata of all types, but there's no big dataset in North America helping book publishers do this. Québec has two agencies for ISNI but their mandate is to support Québec authors. There are publishers big enough to do it for themselves by contacting ISNI Registrars and whole lot of publishers who aren't big enough data users to do that. Even big publishers may have trouble finding a registration agency willing to work with them (many would have similar mandates as found in Québec).

At the moment there is only one source that provides the ability at reasonable cost to look up and assign an individual ISNI. That's found at UK's British Library ISNI portal. Unfortunately a cyber attack last fall has put it offline but it's expected to be back online soon.

Here, I'm going to provide a caveat based on my industry experience: A single portal providing this service will almost certainly run into volume problems at some point. I don't think anyone who seriously wants ISNI to work should see the completely fabulous and generously available British Library service as a long term solution. But digital gods and cyber criminals willing, this should be up soon as an option.

There may be better answers in a year. BISG has an ISNI Working Group (Metadata Committee) looking at this. BookNet Canada wants to represent Canadian publisher needs but we're not in a position to offer a data based solution at this time. Let's talk if you have ideas.

Authors, their agents, and small publishers have an easier solution: The time it take to go to ISNI.ORG and look up your author's name is miniscule. If the author (or other content producer type) has a publishing history, there is a 60 to 70% chance there's an existing ISNI waiting for you to use at no cost. Here there are two problems: What if it's not there or the existing entry mixes books from multiple content producers and any other error?

Fixing a record

You'll need to familiarize yourself with ISNI records but entries have lists of Related Names, and lists of Related Titles. And there are sources referenced as well, sometimes even Wiki pages. These entries should be recognizable as referencing your target author. If none of the references match then it's because your target author has a same-named competitor out in the world and this is their record. ISNI has proven its worth, go check another record until you've found the right one or run out of matches without a hit and you apply for an ISNI.

It's very likely you'll find cross-matched data not from your author. Hopefully the record is clearly the one for your author with a lot of hits and information and dates that very clearly say this is the right record. If there are a couple of book titles that don't match, well look them up. Amazon probably has them. Is it a different author? (Have you asked the one you’re working with?) That's what the yellow box on the left is for. Let ISNI know what you know. Do them a favour and look at all the data and if "related names" has an entry from that problem book, reference it and the book title. Tell them who you are. Provide as much info as you can about your authority to speak to this. The very best answer would include “confirmed by author” and provide their Wiki page and any author-focused website. Let ISNI know and it's that easy. They do excellent work keeping their records clean. You just have to help them.

Should I make sure that my book is a title here? ISNI is not trying to list every book but as it matches books it records new data. If the author has certain truly notable books that bookmark their career and one is missing maybe it’s worth the effort, but really, if the list of titles is coherent it will speak to the accuracy of the record. If the author bio names a couple of the entries, you’re gold. Anyone can match your book to this record by matching name and confirming the information using the book’s metadata. It’s fabulous if your records include a birth year and so does ISNI (dates are fallible but provide a fast way to the right record to check). Remember that ISNI's data comparison across databases is designed to build the records. Their staff time is precious. Don't waste it by manually adding entries to a record that doesn’t need them for identification.

It would hurt no one to look again in a few years. Data error and mischief is real. It’s never a do it once and it will be accurate forever. For ISNI to continue to work someone has to care that it remains correct and at the very least, publishing a new book by an identified author is worth checking their record and reviewing possible errors with them.

Do you need more information? ISNI is your best source. Click on resources and I've found their document on pseudonyms (found on this page) as one of the most helpful, at least from a database manager's perspective.

Do you have any questions or comments? Contact us.

ISNI in the industry: Everything you should know

ISNI and book data

How does ISNI work anyway?

Foreseeable problems

Listen to our latest podcast episode