Reusing Existing Data
Can I use someone else’s data?
The short answer is yes!
One of the ideals of open data is that data is as freely and openly accessible as possible to encourage the reuse of data. Data can be expensive and time consuming to collect, so where possible the reuse of data is encouraged to efficiently use available resources and promote collaboration. The reuse of data is viewed favourably by funders. In fact, the ESRC actually requires the researcher to justify the collection of new data as part of their application for funding. Therefore, before collecting research data you should consider if any suitable data already exists which you could reuse.
(One caveat to this is if you're a doctoral student it may be a requirement of your study programme to collect your own data. Please consult your supervisor.)
If you’re planning on using someone else’s data there are a number of things you should be aware of. You need to be aware of the scope and limitations of the data: e.g. when was it collected, whether it can be used to answer your research question, etc. Does the associated metadata (information describing their data) give you sufficient information for you to make that judgement? You will also need to be aware of the licence under which the data you wish to use has been released. This is to ensure that you can use and share the data as you intend.
Where can I find existing data to reuse?
There are variety of resources to explore for secondary datasets, whether you are looking for new data for a study, verifying your own data, calibrating models or for teaching.
Where you look may be driven by the type of data you are looking for, from general purpose to subject-specific repositories, repository directories, aggregators, portals and search engines. Some examples are listed below (by no means an exhaustive list).
Don’t forget to also make use of any personal contacts, including academics colleagues and supervisors!
Within research articles themselves:
It’s worth highlighting that you may also find datasets within the relevant publications themselves. There’s an increasing drive from publishers for authors to include information about how to access the underlying research data. So when exploring related literature (e.g. journal articles) you may be able to see how to access (e.g. DOI and other links) and use the data in the published literature itself.
Data journals:
Another avenue for finding data (and publishing data, for that matter) is data journals. Data journals are journals that publish and share datasets for other people to access and reuse. The figure below from Candela et al. (2015) shows the concept of data journals.
Data Journal examples:
- Data in Brief (Elsevier)
- Earth System Science Data (ESSD) (Copernicus Publications)
- Geoscience Data Journal (Wiley)
- Scientific Data (Springer Nature)
- Ubiquity Press Metajournals, including archaeology, humanities, psychology open software, health data and bioresources.
Data repositories:
They are national and international online databases, which contain research data. Typically, this research data can be downloaded and re-used.
The Registry of Research Data Repositories re3data.org is good place to start to look for data across a variety of subject areas. To use the re3data.org website (below), select Browse from drop-down menu and then click in area of interest on the wheel. This is a good place to start a search as it provides a global registry of data repositories from different academic disciplines.
Some examples of subject-orientated repositories are:
- NERC Data Centres (environmental data) - Five data centres covering all aspects of environmental science.
- Pangaea (Earth & Environmental Science) - Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research.
- UK Data Service ReShare (economic and social research data) - The UK's largest collection of social, economic and population data for research and teaching purposes covering a range of different disciplines.
Library subject pages:
The Library’s subject pages are a useful place to start
Other places to look for research data:
You may also like to look at other database/repository/search options include:
- FAIRsharing - A catalogue of databases, described according to the BioDBcore guidelines, along with the standards used within them.
- EU Open Data Portal - Data includes geographic, geopolitical and financial data, statistics, election, results, legal acts, data on crime, health, the environment, transport and scientific research.
- DataCite Metadata Search - DataCite gathers metadata for each DOI assigned to an object. The metadata is used for a large index of research data that can be queried directly to find data, obtain stats and explore connections.
- DataONE Search - Earth observational data.
- DataSearch (Elsevier) - Search for research data across domains and types, from many domain-specific, cross-domain and institutional data repositories.
- Registry of Open Data on Amazon Web Services - Discover datasets that are available via AWS resources.
General purpose repositories: Dryad, Figshare, Zenodo - General purpose repositories where you can find data from a wide variety of subject areas.
Government web pages from all countries are a source of public data: Data.gov (UK), Data.gov (USA), Data.gov (Australia), Data.gouv (France)
Services: Statista (Campus License) - Statistics and data within 600 industries and 50+ countries.
Google Dataset Search - Dataset Search enables users to find data sets stored across the web by way of a simple keyword search.