[linux-elitists] [CHMINF-L] Open Data (fwd from pm286@CAM.AC.UK)

Eugen Leitl eugen@leitl.org
Sun Apr 17 02:07:56 PDT 2005

----- Forwarded message from Peter Murray-Rust <pm286@CAM.AC.UK> -----

From: Peter Murray-Rust <pm286@CAM.AC.UK>
Date: Sun, 17 Apr 2005 00:57:53 +0100
Subject: [CHMINF-L] Open Data
X-Mailer: QUALCOMM Windows Eudora Version

Several recent, apparently disconnected, postings emphasize that we are at 
a critical time in Chemical Information/informatics which could be some 
mixture of exciting and profitable, stressful and acrimonious. Henry Rzepa 
and I have been asked to write on the relationship of chemistry and 
bioinformatics in the Open Access journal BMC Bioinformatics. A key point 
that we are making is that whereas essentially all bioinformatics data is 
Open, almost all chemistry data is either closed or destroyed before or 
during publication.

The major forces for change include:
* Open Access publishing
* new technology (InChI, XML/CML, DSpace/Eprints, Google/MSN).
* bioscience demand for chemical information
* Open source software
* Globalisation

In my view these make the Opening of Chemical information inevitable.

I'd like to propose the concept of Open Data (analogous to Open Access). I 
have made cursory Web searches for the term, but it seems in infrequent 
use. I'd be pleased to hear from anyone who can point to systematic use. 
"Open Data" is formally implied by the Budapest and other OA initiatives 
and is epitomised by bioinformatics data - the right to use and 
redistribute data without permission, whilst honouring the originator. In 
theory the BOAI mandates Open Data but in practice it does not deliver it. 
Even if we achieve 100% "Green" Open Access it will not provide Open Data 
and separate licenses such as CreativeCommons are required.

It is now *technically* possible for a chemical author to publish all their 
data directly into an Open institutional repository, where it is reliably 
indexed by modern search engines without the need for human secondary 
abstracters. The barriers to this are cultural, and include lack of support 
(and sometimes opposition) from primary and secondary publishers.  I 
applaud the work of Dr. Karthikeyan and urge list members to visit his 
site. He has shown that a large number of theses have been systematically 
deposited in an Open repository where they can be indexed and searched by 
modern techniques. The chemical structural data can be converted into InChI 
and we have recently shown that Google or MSN can search for molecules with 
virtually 100% recall/precision [OBC in press, details on 
http://wwmm.ch.cam.ac.uk/inchifaq]. In similar vein the UK national 
Crystallographic service at Southampton publishes their results into an 
Open eprints repository which we have shown to be indexed by Google and 
MSN. In our own work we have published 250000 molecular structures 
optimised by semi-empirical calculations into the Web and our repository 

All of this provides a modern Open approach for abstracting and searching 
for exact chemical structures without the need for centralised 
repositories. If chemists  were to move to depositing their theses and data 
into repositories it could provide a resource to mirror the current 
bioinformatics ones.

We have used our repository to make available our publications on the 
topic, where the publishers allow. A selection:
* A recent 10-min presentation to the UK JISC funding body: 
* Presentation at the ACS meeting 2005-03 on the future of scientific 
publishing: http://wwmm.ch.cam.ac.uk/presentations/acs2005
* Data associated with a publication in the RSC's OBC: 


Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076

CHMINF-L Archives (also to join or leave CHMINF-L, etc.)
Search the CHMINF-L archives at:
Sponsors of CHMINF-L:

----- End forwarded message -----
Eugen* Leitl <a href="http://leitl.org">leitl</a>
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20050417/048e7189/attachment.pgp 

More information about the linux-elitists mailing list