Tag Archives: Linked Data

Metadata & Linked Data seminar – live blog

hashtag #cigslod16 follow @cigscot

Welcome one and all to the Cataloguing and Indexing Group in Scotland 5th linked data event, we’re in Edinburgh on the 12 of September.

This year we are delighted to welcome speakers from the British Library, the Bibliothèque Nationale de France, the Universities of Edinburgh and St Andrews, the RDA Steering Committee, and the National Library of Scotland.

Speakers will describe the practice and challenges of implementing linked data in a library and information environment, from local pilots, projects and experiments, to national services.  The opportunities and challenges that linked data presents to cataloguers, libraries and the wider information landscape will be explored, with speakers describing their organisation’s experience, as well as providing an insight into national metadata strategies.

Kicking the day off we have Janet Aucock (JA), St Andrews University Library, providing a cataloguer’s linked data perspective.
Follow St Andrews University Library

Janet is treating us to some gorgeous St Andrews pics, a quick description of some of the special collections and archives that St Andrews manage and preserve. The idea of linked data and the technical infrastructure needed could be a barrier and how will it fit in within the context of libraries and special collections.

Senior managers are not at the point where they are considering the strategy and how to move linked data forward within the libraries.  JA is thinking about how the discovery service and research data might fit into the practicalities of linked data and how to manage and exploit access to thesis data, enrich the cataloguing space and collections such as rare books.  We need a framework to look at how to link data throughout the institutions.

St Andrews like most other organisations have pots of data throughout the organisation,  digital collections, photographic collections, research repository, digitised collections and research publications and research data system.

JA is considering some use cases:

  • Names and naming authorities might be an area that could best be developed… or ORCID for the living and name authorities for the dead! Need a method for determining which naming convention structure to use for which sets for data.
  • Biographical register  that also links geographical and also link to borrowing register information.
  • Repository  ChemSpider, text mining of chemistry data to pull out chemical compounds authority not controlled, but possibility of the compound information being fed back to our repository.
  • SAULcat alchemy collections  which looks at provenance, binding , names and subject headings which is flat data not linked to anything and nothing to be done within the LMS with this data but if there was ways to link this data to other useful  data sets  that woudl be great.

Also looking at what LMS suppliers are considering for linked data, looking at name authorities and looking at identifying things at library level.  Hopefully now that linked data is on the suppliers horizon and hopefully this will help push forward the developed of linking data in  a library setting.  Opportunity to be had for linking data and what is available out there and hopefully that managers will realise the benefit.

Alasdair MacDonald & Ruby Wilkins, Edinburgh University Library, describing a project to link authorities across local datasets

Follow Edinburgh University Main Library

Library team asked to put forward ideas for a project for the innovative fund to look at personal name authorities across catalogues and look at how to use these through many data sets in the university.  Selected a number if significant people and focused on linking data within the  image archives,  historical individuals and consider other who may have links to the university. First projects was the Edinburgh Seven, first seven women who matriculated to study medicine at the Uni.  Also included was James Miranda Barry, who lived as a man, fought more than one duel and was a physician with a good bedside manner, some debate around James actually being a woman!

Once subjects determined start looking at a scoping methodology, looking at digital data sets online, Wikipedia and also names across all data sets in the university. Looking at Alma LMS, Vernon (archives databases)  Pure research system and the Discovery system Primo. The historical figures chosen were than researched to find in each of the systems. Checking against LCNAF, VIAF and ISNI.

Looked at any potential for linking data further in their main catalogues, it could pull the LCCN but only at point of cataloguing in hand, could we restore batch update but what would you do with differentiated names issues with linking thesis to the wrong names so batch update not the answer.

Can put URIs into the Vernon system but only as a reference. Archives Space does allow both EAD and ‘authorised’ name to co-exist.  The Discovery system Primo can hold URI data but issues between whether a person has written something or is the subject of a book. Looking at the potential of using the LCCN as a matching point rather than the authority.

Moving forward EUL looking at adding URIs to authority records in Vernon and ArchiveSpace, investigating the use of the $0 field in Marc format and also looking at exporting and editing Luna metadata.

Visit the http://images.is.ed.ac.uk/

Alexandra De Pretto, National Library of Scotland, describing experiments with linked data at the national library.

NLS has programmes of digitisation which is increasing and of course more resources, more metadata! NLS uses many interfaces, datasets and systems and the NLS hopes to connect anyone to relevant library resource and hope to enable search over the whole of the library space and also look at data beyond the control of the NLS.

NLS don’t have a metadata strategy but do use internationally recognisable open  standards but possibly not in a consistent way.   The NLS have published their metadata for their digital objects DOD element set on Open Metadata Registry.

Alexandra asks  is linked data the solution to enable search over disparate datasets for the NLS.   For linked data every resource described should be identified with its own URI, you can learn more about linked data at Library Juice Academy

Alex describes how to get started, you need triples and you need URIs and how this can be achieved by looking at your own datasets as a starter. A good example of a large linked data set is DBpedia.

Looking at their images archive she can begin to determine triples such as who of [resource1], who type [photographer] and who depicted in [woodcutter], working with these RDF triples with descriptive metadata held in different schemas  and using linked data technologies to then query them and present.  The triples data data repository  can store and index triples and also allow a means of managing and accessing triples with SPARQL usng SPARQL the NLS should be able to present results using more than one data set.

NLS at the beginning of this experiment, need to define vocabularies for datasets, work out URIs, consider the mappings and develop a publishing platform and teams need the right skills and experience to achieve this.   Worth checking out the  Library of Congress Linked Data services and the following links for further support and guidance:

  • RDA registry
  • RDFS: data-modelling vocabulary for RDF data
  • OWL: Web Ontology Language
  • SKOS: Simple Knowledge Organization System

After our coffee refreshments Torsten is joining us via Skype.

Dr Torsten Reimer, Imperial College London (ICL) will be providing an overview of ORCID and the benefits for global scholarly communication systems.
Follow Torsten Reimer

ORCID offers a unique researcher ID that allows humans and machines to reliably identify the authors of scholarly outputs. Within just a few years ORCID has had rapid uptake with over 2.4m researchers registered globally. Publishers, funders and research institutions are supporting, and in some cases even mandating the use of ORCID. Torsten is neither a librarian or a cataloguer (shock!) but works between the researcher and the university space.

🙂 Torsten is discussing the issue with identifiers and how he has been confused with another Torsten Reimer who works within the realms of psychology research, so names are not a useful unique identifier! ORCID provides a persistent digital identifier, it offers member integration and connect their researchers within an institution. ORCID also provides a hub between  machine-readable connections.

ORCID is a not for profit membership based organisation, once registered you receive a randomly assigned number and individuals control their own IDs and profile. Profiles can include informaiton on works, grants, employment history and publications. Once you publish you share your ORCID iD with the publisher and can add the ID to the metadata for your content.

ORCID  is helping ICL to keep track of their data and traffic over the Janet network. Publications tracking is available and helps with  data flows between systems however some issues still arise with current workflows:

  • Requires academic to login and add sources and articles
  • Authorising of articles not always recognised reliably
  • Pre-publication information would be useful to help document and track

UK funders  have specific controls in place to meet policy requirements.  Some of the requirements can be helped using services such as the Jisc Publications Router https://pubrouter.jisc.ac.uk/about/institutions/  that can link via the iD  CRIS, CrossRef and shares ORCID iD with publisher.

Tracking research data can use similar workflows using ORCID by sharing with a repository or embedding within the content. ICL started a project in 2014 to raise awareness of ORCID and to encourage academics to self register and update their profile and to continue to manage their iDs.

The ORCID project identified 764 existing iDs linked to College staff and created 3,226 new ones. ORCID is becoming the new research identifier although not all the systems are ready or integrated.
ORCID can improve interoperability and aid the transfer of information about researchers and their outputs when they move organisation.

Read more about some of the work that Imperial College London has completed looking at ORCID.

https://spiral.imperial.ac.uk/bitstream/10044/1/19271/2/Imperial%20College%20ORCID%20project.pdf

https://repository.jisc.ac.uk/5876/1/Imperial_College_ORCID_project.pdf

Visit the Jisc ORCID consortium at https://www.jisc.ac.uk/orcid

Alan Danskin (AD), British Library, describing linked data initiatives and BL metadata strategy
Follow Alan Danskin

BL has created their metadata strategy, AD reckons the future is bright for metadata and where is linked data in the vision for the British Library.  BL has three main sites.

BL Act of 1972 records the BL role as national centre for bibliographic and information services.  Some of the BL metadata service originally offered priced services & evolved through many technologies, began to offer open data in 2010 when the BNB was made available as linked data. 2015 saw the publication of the first metadata strategy for the British Library.

Many challenges for the BL but in 2013 regulations changed so that BL can now collect digitally formatted content. 100,000 new printed books received by legal deposit compared to 50,000 electronic books coming into legal deposit from about 10 publishers.  A lot of the content received is back catalogue content and not just a UK imprint but international. The challenge is how to catalogue such large amounts of content.

Some challenges to contend with such as hidden metadata, obsolete formats, printed catalogues and legacy metadata from catalogues that have not been digitised anywhere. Legacy metadata challenges are about data being recorded that are not necessarily easy to translate for the requirements of machine readable and linked data, an example of this can be the publisher details and the language of the content.

People are now interested in the ‘bigger picture’ questions such as discovery and research of collections development and being able to facet this into  language or country would be useful but not possible to do this with legacy.

Another challenge in legacy data is the silos within the organisation between MARC, and the Aleph LMS, archives and manuscripts using IAMS, XML variants such as ETOC &AMED, sound and recroded sound archives using internal SAMIMARC, and web content using Dublin Core.

So these are some of the issues that the BL would like to address and have championed progression by showing staff what they could and couldn’t do without metadata and a strategy.

Collection metadata identifies attributes & relationships, location & availability and the status & rights that allow you access to content. It requires stewardship and leadership ensuring its preservation and continued management over time and to achieve this it requires resourcing which can aid efficiency and improve services.

The BL have put in a structure of how metadata is used and managed within the library by staff from senior management who ensures the metadata strategy is being delivered, an advisory group  who can support  working towards the achievement of this strategy and also a working group who can alert any changes and updates to metadata used and can review and agree anything that is being proposed.

BL has looked at business cases and representation of that metadata and how it can be used within the organisation and there is now a Head of Collection metadata who has overarching responsibility for metadata developments.

BNB has 3.7m entries for UK books, it’s reusable and open  as it has a permissive license CC0.BL are currently looking at going out to tender for a new open data platform. Over 1500 users using  BL open metadata and they hope to increase access and reuse of this. BL are hoping to break down silos and converge the standards and exploit synergies with other data sets. Linked data is a p0tential solution but for many not an objective.

Check out the data services of the BL at:

http://www.bl.uk/bibliographic/datafree.html

http://bnb.data.bl.uk/

Metadata strategy 2015-2018

https://www.bl.uk/bibliographic/pdfs/british-library-collection-metadata-strategy-2015-2018.pdf

Mélanie Roche, Bibliothèque Nationale de France, describing linked data initiatives at the BnF.
Follow Melanie Roche

The National Library of France has successfully developed linked data applications that have received worldwide consideration. Melanie was inspired by a presentation called Let’s make it happen, linked data in libraries’ feeling energised and a call to arms for librarians but has since felt a little disappointed that this has possibly not been achieved as much as she’d hoped! With most initiatives still at project level.

BnF has a main catalogue of almost 19 million records detailing with general collections and a separate manuscripts database, BnF has linked authority files and bibliographic info back to 1975 that can aid linking between the main catalogue and the archives a database.

The BnF wanted to give users the opportunity not have to come to the catalogue to search for content but could use the data.bnf.fr service to find all information and content helped from both main and archives collections.  BnF used an algorithm to bring together all data from the digital library and the main catalogue for any given controlled authority.

BnF are using these algorithms to automate and to help them FRBR-ise their catalogue, it can automatically generate work records for the open data site  http://data.bnf.fr but also use that data to be included back in the catalogue generating over 100,000 records.

The other area that BnF are working is, is a triple store called SPAR Scalable Preservation and Archiving Repository, long term preservation of digitally native documents. SPAR is a modular OAI compliant repository. The data is stored in an RDF format to ensure librarians continue to curate this data and not within an IT department.

Another linked data project is the Doremus project which covers open data for music material, currently in the modelling phase for the project looking at the model for music data, using RDF. BnF hope to use all of these projects to help to develop a nationally facing open data house and considering many other types of format and content.  Read for further information Doremus: aligning value vocabularies

Melanie discusses should we upgrade MARC to accommodate open data as it’s not fit for purpose currently…MARC is dead…long live MARC! 🙂

Visit for further info http://data.bnf.fr/about

Gordon Dunsire, RDA Steering Committee, describing the work carried out within the RDA community.

Follow Gordon Dunsire
rscchair@rdatoolkit.org

Gordon is using examples of RDA data from the RDA toolkit to show how it is transformed into linked data and discusses the benefits for users, for more of his work or other presentations visit http://www.gordondunsire.com/presentations.htm

http://www.rda-rsc.org/

RDA Toolkit http://www.rdatoolkit.org/

All examples of layout are available at the RDA Registry http://www.rdaregistry.info/
and other available in the rballs service RDA data, Jane-athons, etc. http://www.rballs.info/

RIMMF (RDA in Many Metadata Formats) is a free service available to create your own RDA sets and allows you to view the WEM relationship of the full record. RDA doesn’t have an element for authorised access point, it expects other data to express this.

Example data sources: http://www.rdatoolkit.org/sites/default/files/rsc_rda_complete_examples_bibliographic_april2016.pdf

http://www.rdaregistry.info/Examples/exRSCFullScore.html

http://rballs.info/topics/m/rdaex/rdaexScore.html

Our live blogger had to leave before the end of Gordon’s presentation, and the open discussion.  The day ended with a lively chat regarding the way forward with linked data in libraries, and how we can move from experiments and projects to more fully fledged services and infrastructure, with national libraries and other bodies needing to fulfil leadership and enabling roles.

Audience feedback was very positive, with rapturous applause for speakers, and the discussion carried on in a nearby hostelry, where current and future solutions and ideas for future events were mulled over.  All in all, a very successful event!

20160912_123228(ORCID remote presentation from Dr Torsten Reimer – which went without a glitch!)

RLS-athon – 9th November, Edinburgh – Update

Places are still available for the RLS jane-athon, taking place in Edinburgh on the 9th of November – workshop update below.

Want to experience RDA: Resource Description and Access without wearing MARC glasses? Want to see how WEMI is supposed to work? Want to put the fun back into cataloguing? Are you ready to RIMMF?

Come to the world’s first RLS-athon – a hackathon for RDA metadata about Robert Louis Stevenson and his works, organised by the Cataloguing & Indexing Group in Scotland (CIGS), Joint Steering Committee for Development of RDA (JSC), and The Marc of Quality (TMQ)

Based on the successful jane-athon formula ( http://rballs.info/topics/p/jane/janeathon.html ) the RLS-athon will bring cataloguers together to use the RDA editor RIMMF to experiment with pure RDA data and to discuss the good and bad points of the RDA instructions. RIMMF (RDA in Many Metadata Formats) is available for free download ( http://www.rdaregistry.info/rimmf/ ). It comes with comprehensive self-guided web-based tutorials.

When: Monday 9 November 2015, 10.30 am – 4.00 pm

(registration 10.00-10.30 am; informal discussion 4.00-5.00 pm)

Where: Edinburgh Centre for Carbon Innovation ( http://edinburghcentre.org/Venue.html )

Cost: £60.00 (including VAT)

Registration includes:

  • 2-hour RIMMF training webinar on 27 October 2015, and access to RDA Toolkit until the end of the year.
  • The opportunity to be coached by Deborah Fritz, one of the developers of RIMMF, and members of the new RDA Steering Committee.
  • Lunch and refreshments.

Special topics include

  • Pirates! A focus on Treasure Island in print and digital formats, including translations, e-texts, and audio books. Team members are encouraged to wear pirate gear, for example striped shirts, cut-off jeans, parrots, etc. This topic is recommended for those new to RDA and RIMMF.
  • The Scottish book sculptures, and especially Treasure Island. More pirates, but poetry too, and an unusual cataloguing challenge.
  • Scots! A focus on Kidnapped, Catriona, and The Master of Ballantrae. The Master of Ballantrae base r-ball provides examples of RDA applied to multiple digitized versions of multiple editions of a single Work. It exposes a number of issues relating to mass digitization strategies and the utility of RDA in resolving them. Team members are encouraged to wear tartan, kilts, heather, etc. This topic is suitable for those with some knowledge of RDA or digital formats, including JPEG, PDF, DAISY, Kindle, etc.
  • National collections. A focus on RDA for RLS as a national figure. The National Library of Scotland is experimenting with RIMMF to apply the benefits of RDA in bringing together the metadata for its format-based collections, including manuscripts, sound and film recordings, print, and digital. This raises issues of identity and authority, legacy data, and strategies for the future, as well as exposing areas for RDA and RIMMF development.
  • Rare materials and RDA. This topic is associated with the international seminar on RDA and rare materials on 6 Nov 2015, and uses RIMMF to discuss the issues raised.
  • MARC in, MARC out. A focus on RIMMF as a metadata FRBRization and quality improvement tool. RIMMF can import a MARC 21 record and automatically FRBRize it into RDA Work, Expression, and Manifestation data. RIMMF can also export a MARC 21 record from RDA data. This team will use RIMMF to discuss the opportunities afforded by digitization projects to improve legacy data in legacy systems, while future-proofing it for RDA and linked data systems.

Attendees can expect to learn more about RDA, the global standard for resource discovery, and its application to multiple versions of print and digital resources, and have fun doing it.

Registration

Before registering, please refer to preparatory work expected for participation in the event – this is highly recommended to fully benefit from attendance – rballs.info/xathons/getmost/

To book your place on the RLS-athon, please supply the following details to CIG Scotland at cigscot@gmail.com

  • Your name
  • Your institution
  • Invoice address (or state that you will pay on the day)
  • Any special dietary requirements
  • Confirmation that you have read and understand the expectations for participation in the RLS-athon.

Please note that invoices will be issued after the event. Enquiries regarding purchase orders, payment or invoicing should be directed to the CIGS Treasurer at cigscot@gmail.com

Registration closes 30th October. Please note that cancellations after this date will be charged at the full price.

Further information will be posted on the RLS-athon web pages at http://rballs.info/topics/p/rls/rlsathon1/

#RLSathon

Covers of books by Robert Louis Stevenson

Live Blog: Linked Open Data: current practice in libraries and archives / CIG Scotland

Graeme Forbes, NLS opens the CIGS LOD day on a beautifully bright and crisp morning at the Edinburgh Carbon Centre and invites our first speaker to take to the stage to present on Publishing the British National Bibliography as Linked Open Data / Corine Deliot, British Library.

Corine’s  presentation describes the development of a linked data instance of the British National Bibliography (BNB) by the British Library (BL). The focus is on the development of an RDF data model and the technical process to convert MARC 21 Bibliographic Data to Linked Data using existing resources. BNB was launched as linked open data in 2011 on a Talis platform. In 2013 it was migrated to a new platform, hosted by TSO.

Corine discusses some of the motivations behind the British Library’s drive to open their data, including publishing for others to use and opening up data to a wider audience. BL felt that open data would further benefit staff  developing new skill sets. BNB was published as open data with a CC-0 license, allowing people to modify, adapt and reuse freely, http://bnb.data.bl.uk/

The British Library created a process for opening their data considering the design of the URIs to be used that are readable by humans and what concept vocabularies to use The BL terms and RDF schema is available at http://www.bl.uk/schemas/bibliographic/blterms#

The BNB data model looks a bit complex but successfully maps out what BL have used and how they have referred their data, be it subjects, authors, series and other uniqie identifiers http://www.bl.uk/bibliographic/pdfs/bldatamodelbook.pdf

The BL have taken an event based approach with publication data models, considering future publication events and how to model this within, including scope for future publications and out of print publishing events, birth and death are modelled as biographical events and extensive use of foaf:focus to relate ‘things in the world’.

BL have created a MARC to RDf conversion workflow  that documents their process of the selection of records to be converted, determine the character set conversion, converting to pre-composed UTF8. BL then generate the URIs, quality check using Jena Eyeball and create and load the RDF sets sets.http://jena.sourceforge.net/Eyeball

The outcomes of BLs work within linked data is they have created two datasets, books & serials, created a BNB linked data platform, SPARQL endpoint http://bnb.data.b.uk/sparql and SPARQL editor http://bnb.data.bl.uk/flint and a BNB downloader http://www.bl.uk/bibliographic/download.html that is updated monthly.

The British Library plan to refine and extend the model, investigate frbr-ization, link to further external sources such as DBpedia, geonames, DNB bib resources and expanding the scope beyond current BNB.

Peter McKeague, RCAHMS, takes to the stage to talk about SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links   on behalf of the SENESCHAL Project team

Peter discusses issues arising from the development, implementation and running of a linked data service and also theirplans for future developments.

The Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS) http://www.rcahms.gov.uk along with its sister organisation in Wales (RCAHMW) and English Heritage are each responsible for a series of heritage thesauri and vocabularies. In Scotland, RCAHMS maintains the Scottish Monuments Thesaurus, and thesauri for archaeological objects and maritime craft.

Although these key thesauri are available online for reference until now they have lacked the persistent Linked Open Data (LOD) URIs needed to allow them to act as vocabulary hubs for the Web of Data.

Peter discusses the drivers for Linked Data for cultural heritage and the historic environment and the creation of thesauri as Linked Data, published through http://www.heritagedata.org/blog/ before discussing the practicalities of implementing Linked Data products in a working environment.

http://www.heritagedata.org/blog/about-heritage-data/seneschal/

RCAHMS currently publish their datasets on CANMORE http://www.rcahms.gov.uk/canmore.html  but find the opportunities for linked data can further push the interoperability amongst archaeology and other useful datasets, it also aligns itself with the mandate set out at govt level that all datasets of the same subjects must be created, published and released using the same data structure.

Previous projects identified the same old issues in converging datasets that required a level of data cleansing and unifying prior to alignment of data published. SENESCHAL’s main remit is to enhance the ease of how linked data vocabularies are created and to enable a knowledge exchange amongst differnt archaeological data sets and a wider global audience.

You can keep up  to date  with the project at: http://www.heritagedata.org/blog

Gordon Dunsire, Chair of the  IFLA Namespaces Technical Group discusses methods for publishing local metadata records as linked data, using the National Library of Scotland’s database of metadata records for digital and digitised resources as a case study.

Topics covered include developing database structures as element sets, extracting data and creating linked data triples, and creating links from local data to the global Semantic Web. The presentation will also include a demonstration system for a primitive linked data OPAC.

Gordon discusses how to get started in open data, taking a pick and mix approach to linked data, using global elements and ensuring that your local elements have the same scope as the elements you choose, so as not to degrade your data or confuse meaning.

Many element sets are available from the general such as Dublin Core, FOAF, SKOS to the specific BibO, FRBR, ISBD, RDA etc. Searchable registries such as the Open Metadata Registry, LOV or joinup are also available to find the bib data

Gordon demonstrates the DIY  approach with a  case study of the NLS digital object database, used to publish the data, http://nlsdata.info/dod/elements/  This gives the NLS the opportunity to map to further external services such as DBpedia.

All the NLS data is stored as RDF triples, held in a MySQL triple stores with online access the data…we are now being treated to a live experimental demo of the triple store…fingers crossed…

So far so good, successfully bringing together digital images regarding  the Battle of Passchendaele searched for within the NLS DOD and bringing in DBpedia data too, enabling the user to leave the DOD dataset and begin ‘following your nose’  and switching search interest, instead of a subject we can link to geonodes too and view maps for the area from google maps.

Stopped for lunch, where I’ve just scoffed a coffee cake the same size as my head!

Everyone refreshed and happy after some ace sarnies and cakes and looking forward to Kate Byrne’s presentation on Can documents be Linked Data? from the  Edinburgh University School of Informatics

Kate begins with a statement of  the semantic web being ancient, first referenced by Berners Lee way back in 1994 and in 2009 Tim further laments that ‘the web as I envisaged it is not there yet…’

Awesome analogy using The Hobbit opening sentence to express semantic structure!

Plenty of great work is being done on extracting RDF from existing databases to create Linked Data that can be part of the semantic web. But there is a vast amount of information that is not in structured databases. Most of the information we use every day and curate in archives and libraries is in free text form: documents, books, web pages and so forth.

Kate describes some of the  research into extracting structured data – that can be turned into Linked Data – from natural language text and some of the issues likely to face anyone attempting this task, and also consider how to go about building connections between datasets, to make them full members of the Web of Data, to read more about Kate’s research visit http://homepages.inf.ed.ac.uk/kbyrne3/research.html

Kate brings us back to what is the point of all of this and how it would be really cool to form links with many disparate datasets covering the same concepts or events.

Kate mentions some of the project tools she has used, such as the Unlock Text API run by EDINA http://edina.ac.uk/unlock/texts  the Google Ancient Places GAP at http://googleancientplaces.wordpress.com and an online interface called GapVis at http://nrabinowitz.github.io/gapvis/

Gill Hamilton takes to the stage and has created a real snappy title for her presentation –  LO(d) and Behold!: extending to the Giant Global Graph –  (catalogue that!) Gill is the Digital Access Manager at National Library of Scotland and is outlining some of the challenges, including some tools, tips and techniques on how local library linked data can be linked to other linked data sets such as dbPedia (RDF representation of Wikipedia), Library of Congress Subject Headings and GeoNames.  Gill discusses the idea that one connection exists for all time and how link after links can build a richer landscape of ‘things’

Gil explains the different ways we can data match by large number processes, and matching strings and that the big issue is moving from the string to the thing.

Where LCSH, Getty Thesaurus and other vocabularies used this is recorded in the DOD and it allows the NLS to make an exact match between the DOD and controlled vocabularies such as LCSH, this can in turn make an exact match between the DOD triple and the LCSH triple. Getty Thesaurus isn’t available…yet but real soon now!

The great opportunity and one of the benefits of making these connections and links is that once you have linked your local to a global dataset you are then linked forever.

The work to do achieve this paradise is resources heavy but it’s mainly about maintaining the momentum and enabling staff to dip in and out of when they can, it’s the beginning of a crowd sourced way of working, debating what is the nearest or exact match…in the same guise as something like Project Gutenberg in its heyday. This is an excellent example of getting all things Scottish up on the global data graph!

Gordon Dunsire sums up the day by stating that the data cloud is now so big it’s not as useful as it once was but using a service such as LOV: lov.okfn.org/dataset/lov allows you to link via the graph to meta-metadata of the vocabularies.

Due to the nature and ideology of the semantic web and linked data  Gordon warns us not to believe everything we link,if we can’t identify the authorities, we can be lost, authority control can give us a little of that trust back.  There needs to be a declaration of the authority of the vocabulary.  LOV highlights who’s using a particular vocab, but not it’s authority – its neutral.  But if it’s clear that a vocab is maintained by an individual or an organisation – the authority and trust will generally ensue.

It’s a paradigm shift and points of view of the cataloguer will fundamentally change – the underlying principles are understood by cataloguers, but the way it will work in the modern environment is not – and cataloguers need to engage and influence the new age.

Gordon tells the audience about the Joinup semantic assets – a European commission project. https://joinup.ec.europa.eu/asset/all The project allows you to download the schemas and import the  Open metadata registry elements.

Persistence is fundamental, but no persistence is guaranteed (this should be emblazoned on a t-shirt).  Gordon further backs up the presenters of the day saying that one link is all it takes to make it work – but we don’t know which are the correct links, so use loads of them but beware and go to trusted places to find them.