Tag Archives: Open data

Metadata & Linked Data seminar – live blog

hashtag #cigslod16 follow @cigscot

Welcome one and all to the Cataloguing and Indexing Group in Scotland 5th linked data event, we’re in Edinburgh on the 12 of September.

This year we are delighted to welcome speakers from the British Library, the Bibliothèque Nationale de France, the Universities of Edinburgh and St Andrews, the RDA Steering Committee, and the National Library of Scotland.

Speakers will describe the practice and challenges of implementing linked data in a library and information environment, from local pilots, projects and experiments, to national services.  The opportunities and challenges that linked data presents to cataloguers, libraries and the wider information landscape will be explored, with speakers describing their organisation’s experience, as well as providing an insight into national metadata strategies.

Kicking the day off we have Janet Aucock (JA), St Andrews University Library, providing a cataloguer’s linked data perspective.
Follow St Andrews University Library

Janet is treating us to some gorgeous St Andrews pics, a quick description of some of the special collections and archives that St Andrews manage and preserve. The idea of linked data and the technical infrastructure needed could be a barrier and how will it fit in within the context of libraries and special collections.

Senior managers are not at the point where they are considering the strategy and how to move linked data forward within the libraries.  JA is thinking about how the discovery service and research data might fit into the practicalities of linked data and how to manage and exploit access to thesis data, enrich the cataloguing space and collections such as rare books.  We need a framework to look at how to link data throughout the institutions.

St Andrews like most other organisations have pots of data throughout the organisation,  digital collections, photographic collections, research repository, digitised collections and research publications and research data system.

JA is considering some use cases:

  • Names and naming authorities might be an area that could best be developed… or ORCID for the living and name authorities for the dead! Need a method for determining which naming convention structure to use for which sets for data.
  • Biographical register  that also links geographical and also link to borrowing register information.
  • Repository  ChemSpider, text mining of chemistry data to pull out chemical compounds authority not controlled, but possibility of the compound information being fed back to our repository.
  • SAULcat alchemy collections  which looks at provenance, binding , names and subject headings which is flat data not linked to anything and nothing to be done within the LMS with this data but if there was ways to link this data to other useful  data sets  that woudl be great.

Also looking at what LMS suppliers are considering for linked data, looking at name authorities and looking at identifying things at library level.  Hopefully now that linked data is on the suppliers horizon and hopefully this will help push forward the developed of linking data in  a library setting.  Opportunity to be had for linking data and what is available out there and hopefully that managers will realise the benefit.

Alasdair MacDonald & Ruby Wilkins, Edinburgh University Library, describing a project to link authorities across local datasets

Follow Edinburgh University Main Library

Library team asked to put forward ideas for a project for the innovative fund to look at personal name authorities across catalogues and look at how to use these through many data sets in the university.  Selected a number if significant people and focused on linking data within the  image archives,  historical individuals and consider other who may have links to the university. First projects was the Edinburgh Seven, first seven women who matriculated to study medicine at the Uni.  Also included was James Miranda Barry, who lived as a man, fought more than one duel and was a physician with a good bedside manner, some debate around James actually being a woman!

Once subjects determined start looking at a scoping methodology, looking at digital data sets online, Wikipedia and also names across all data sets in the university. Looking at Alma LMS, Vernon (archives databases)  Pure research system and the Discovery system Primo. The historical figures chosen were than researched to find in each of the systems. Checking against LCNAF, VIAF and ISNI.

Looked at any potential for linking data further in their main catalogues, it could pull the LCCN but only at point of cataloguing in hand, could we restore batch update but what would you do with differentiated names issues with linking thesis to the wrong names so batch update not the answer.

Can put URIs into the Vernon system but only as a reference. Archives Space does allow both EAD and ‘authorised’ name to co-exist.  The Discovery system Primo can hold URI data but issues between whether a person has written something or is the subject of a book. Looking at the potential of using the LCCN as a matching point rather than the authority.

Moving forward EUL looking at adding URIs to authority records in Vernon and ArchiveSpace, investigating the use of the $0 field in Marc format and also looking at exporting and editing Luna metadata.

Visit the http://images.is.ed.ac.uk/

Alexandra De Pretto, National Library of Scotland, describing experiments with linked data at the national library.

NLS has programmes of digitisation which is increasing and of course more resources, more metadata! NLS uses many interfaces, datasets and systems and the NLS hopes to connect anyone to relevant library resource and hope to enable search over the whole of the library space and also look at data beyond the control of the NLS.

NLS don’t have a metadata strategy but do use internationally recognisable open  standards but possibly not in a consistent way.   The NLS have published their metadata for their digital objects DOD element set on Open Metadata Registry.

Alexandra asks  is linked data the solution to enable search over disparate datasets for the NLS.   For linked data every resource described should be identified with its own URI, you can learn more about linked data at Library Juice Academy

Alex describes how to get started, you need triples and you need URIs and how this can be achieved by looking at your own datasets as a starter. A good example of a large linked data set is DBpedia.

Looking at their images archive she can begin to determine triples such as who of [resource1], who type [photographer] and who depicted in [woodcutter], working with these RDF triples with descriptive metadata held in different schemas  and using linked data technologies to then query them and present.  The triples data data repository  can store and index triples and also allow a means of managing and accessing triples with SPARQL usng SPARQL the NLS should be able to present results using more than one data set.

NLS at the beginning of this experiment, need to define vocabularies for datasets, work out URIs, consider the mappings and develop a publishing platform and teams need the right skills and experience to achieve this.   Worth checking out the  Library of Congress Linked Data services and the following links for further support and guidance:

  • RDA registry
  • RDFS: data-modelling vocabulary for RDF data
  • OWL: Web Ontology Language
  • SKOS: Simple Knowledge Organization System

After our coffee refreshments Torsten is joining us via Skype.

Dr Torsten Reimer, Imperial College London (ICL) will be providing an overview of ORCID and the benefits for global scholarly communication systems.
Follow Torsten Reimer

ORCID offers a unique researcher ID that allows humans and machines to reliably identify the authors of scholarly outputs. Within just a few years ORCID has had rapid uptake with over 2.4m researchers registered globally. Publishers, funders and research institutions are supporting, and in some cases even mandating the use of ORCID. Torsten is neither a librarian or a cataloguer (shock!) but works between the researcher and the university space.

🙂 Torsten is discussing the issue with identifiers and how he has been confused with another Torsten Reimer who works within the realms of psychology research, so names are not a useful unique identifier! ORCID provides a persistent digital identifier, it offers member integration and connect their researchers within an institution. ORCID also provides a hub between  machine-readable connections.

ORCID is a not for profit membership based organisation, once registered you receive a randomly assigned number and individuals control their own IDs and profile. Profiles can include informaiton on works, grants, employment history and publications. Once you publish you share your ORCID iD with the publisher and can add the ID to the metadata for your content.

ORCID  is helping ICL to keep track of their data and traffic over the Janet network. Publications tracking is available and helps with  data flows between systems however some issues still arise with current workflows:

  • Requires academic to login and add sources and articles
  • Authorising of articles not always recognised reliably
  • Pre-publication information would be useful to help document and track

UK funders  have specific controls in place to meet policy requirements.  Some of the requirements can be helped using services such as the Jisc Publications Router https://pubrouter.jisc.ac.uk/about/institutions/  that can link via the iD  CRIS, CrossRef and shares ORCID iD with publisher.

Tracking research data can use similar workflows using ORCID by sharing with a repository or embedding within the content. ICL started a project in 2014 to raise awareness of ORCID and to encourage academics to self register and update their profile and to continue to manage their iDs.

The ORCID project identified 764 existing iDs linked to College staff and created 3,226 new ones. ORCID is becoming the new research identifier although not all the systems are ready or integrated.
ORCID can improve interoperability and aid the transfer of information about researchers and their outputs when they move organisation.

Read more about some of the work that Imperial College London has completed looking at ORCID.

https://spiral.imperial.ac.uk/bitstream/10044/1/19271/2/Imperial%20College%20ORCID%20project.pdf

https://repository.jisc.ac.uk/5876/1/Imperial_College_ORCID_project.pdf

Visit the Jisc ORCID consortium at https://www.jisc.ac.uk/orcid

Alan Danskin (AD), British Library, describing linked data initiatives and BL metadata strategy
Follow Alan Danskin

BL has created their metadata strategy, AD reckons the future is bright for metadata and where is linked data in the vision for the British Library.  BL has three main sites.

BL Act of 1972 records the BL role as national centre for bibliographic and information services.  Some of the BL metadata service originally offered priced services & evolved through many technologies, began to offer open data in 2010 when the BNB was made available as linked data. 2015 saw the publication of the first metadata strategy for the British Library.

Many challenges for the BL but in 2013 regulations changed so that BL can now collect digitally formatted content. 100,000 new printed books received by legal deposit compared to 50,000 electronic books coming into legal deposit from about 10 publishers.  A lot of the content received is back catalogue content and not just a UK imprint but international. The challenge is how to catalogue such large amounts of content.

Some challenges to contend with such as hidden metadata, obsolete formats, printed catalogues and legacy metadata from catalogues that have not been digitised anywhere. Legacy metadata challenges are about data being recorded that are not necessarily easy to translate for the requirements of machine readable and linked data, an example of this can be the publisher details and the language of the content.

People are now interested in the ‘bigger picture’ questions such as discovery and research of collections development and being able to facet this into  language or country would be useful but not possible to do this with legacy.

Another challenge in legacy data is the silos within the organisation between MARC, and the Aleph LMS, archives and manuscripts using IAMS, XML variants such as ETOC &AMED, sound and recroded sound archives using internal SAMIMARC, and web content using Dublin Core.

So these are some of the issues that the BL would like to address and have championed progression by showing staff what they could and couldn’t do without metadata and a strategy.

Collection metadata identifies attributes & relationships, location & availability and the status & rights that allow you access to content. It requires stewardship and leadership ensuring its preservation and continued management over time and to achieve this it requires resourcing which can aid efficiency and improve services.

The BL have put in a structure of how metadata is used and managed within the library by staff from senior management who ensures the metadata strategy is being delivered, an advisory group  who can support  working towards the achievement of this strategy and also a working group who can alert any changes and updates to metadata used and can review and agree anything that is being proposed.

BL has looked at business cases and representation of that metadata and how it can be used within the organisation and there is now a Head of Collection metadata who has overarching responsibility for metadata developments.

BNB has 3.7m entries for UK books, it’s reusable and open  as it has a permissive license CC0.BL are currently looking at going out to tender for a new open data platform. Over 1500 users using  BL open metadata and they hope to increase access and reuse of this. BL are hoping to break down silos and converge the standards and exploit synergies with other data sets. Linked data is a p0tential solution but for many not an objective.

Check out the data services of the BL at:

http://www.bl.uk/bibliographic/datafree.html

http://bnb.data.bl.uk/

Metadata strategy 2015-2018

https://www.bl.uk/bibliographic/pdfs/british-library-collection-metadata-strategy-2015-2018.pdf

Mélanie Roche, Bibliothèque Nationale de France, describing linked data initiatives at the BnF.
Follow Melanie Roche

The National Library of France has successfully developed linked data applications that have received worldwide consideration. Melanie was inspired by a presentation called Let’s make it happen, linked data in libraries’ feeling energised and a call to arms for librarians but has since felt a little disappointed that this has possibly not been achieved as much as she’d hoped! With most initiatives still at project level.

BnF has a main catalogue of almost 19 million records detailing with general collections and a separate manuscripts database, BnF has linked authority files and bibliographic info back to 1975 that can aid linking between the main catalogue and the archives a database.

The BnF wanted to give users the opportunity not have to come to the catalogue to search for content but could use the data.bnf.fr service to find all information and content helped from both main and archives collections.  BnF used an algorithm to bring together all data from the digital library and the main catalogue for any given controlled authority.

BnF are using these algorithms to automate and to help them FRBR-ise their catalogue, it can automatically generate work records for the open data site  http://data.bnf.fr but also use that data to be included back in the catalogue generating over 100,000 records.

The other area that BnF are working is, is a triple store called SPAR Scalable Preservation and Archiving Repository, long term preservation of digitally native documents. SPAR is a modular OAI compliant repository. The data is stored in an RDF format to ensure librarians continue to curate this data and not within an IT department.

Another linked data project is the Doremus project which covers open data for music material, currently in the modelling phase for the project looking at the model for music data, using RDF. BnF hope to use all of these projects to help to develop a nationally facing open data house and considering many other types of format and content.  Read for further information Doremus: aligning value vocabularies

Melanie discusses should we upgrade MARC to accommodate open data as it’s not fit for purpose currently…MARC is dead…long live MARC! 🙂

Visit for further info http://data.bnf.fr/about

Gordon Dunsire, RDA Steering Committee, describing the work carried out within the RDA community.

Follow Gordon Dunsire
rscchair@rdatoolkit.org

Gordon is using examples of RDA data from the RDA toolkit to show how it is transformed into linked data and discusses the benefits for users, for more of his work or other presentations visit http://www.gordondunsire.com/presentations.htm

http://www.rda-rsc.org/

RDA Toolkit http://www.rdatoolkit.org/

All examples of layout are available at the RDA Registry http://www.rdaregistry.info/
and other available in the rballs service RDA data, Jane-athons, etc. http://www.rballs.info/

RIMMF (RDA in Many Metadata Formats) is a free service available to create your own RDA sets and allows you to view the WEM relationship of the full record. RDA doesn’t have an element for authorised access point, it expects other data to express this.

Example data sources: http://www.rdatoolkit.org/sites/default/files/rsc_rda_complete_examples_bibliographic_april2016.pdf

http://www.rdaregistry.info/Examples/exRSCFullScore.html

http://rballs.info/topics/m/rdaex/rdaexScore.html

Our live blogger had to leave before the end of Gordon’s presentation, and the open discussion.  The day ended with a lively chat regarding the way forward with linked data in libraries, and how we can move from experiments and projects to more fully fledged services and infrastructure, with national libraries and other bodies needing to fulfil leadership and enabling roles.

Audience feedback was very positive, with rapturous applause for speakers, and the discussion carried on in a nearby hostelry, where current and future solutions and ideas for future events were mulled over.  All in all, a very successful event!

20160912_123228(ORCID remote presentation from Dr Torsten Reimer – which went without a glitch!)

Advertisement