Open Science and research data
Un article de Wiki URFIST.
Version du 17 juin 2020 à 14:03 (modifier) Cercamon (Discuter | Contributions) (→the different types of data) ← Différence précédente |
Version du 17 juin 2020 à 14:06 (modifier) (défaire) Cercamon (Discuter | Contributions) (→parent concepts) Différence suivante → |
||
Ligne 61 : | Ligne 61 : | ||
:[http://michaelnielsen.org/blog/open-science-2/ Michael Nielsen] | :[http://michaelnielsen.org/blog/open-science-2/ Michael Nielsen] | ||
- | ===== | + | ===== related concepts ===== |
* "sciences" | * "sciences" |
Version du 17 juin 2020 à 14:06
original in french: Open Science et données de la recherche
While the issue of open access to scientific publications (Open Access) is about twenty years old, today we are talking about access to the data themselves, about sharing research data. What are the reasons for this shift in scale and what are the issues at stake? Scientific issues, but also economic and legal issues. But first, what are we talking about? What exactly are research data? We will see that there are several kinds, each of which raises specific questions. Finally, we will consider the consequences of this new issue for the researcher's own activity and the question of Data Management Plans (DMP).

Sommaire |
First approach: a research and its data
case study (Workshop by Yvette Lafosse and Françoise Cosserat)
What is "research data"?
definitions (data and validation)
"In the context of these Principles and Guidelines, “research data” are defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial repre-sentation of the subject being investigated.
This term does not cover the following: laboratory notebooks, pre-liminary analyses, and drafts of scientific papers, plans for future research, peer reviews, or personal communications with colleagues or physical objects (e.g. laboratory samples, strains of bacteria and test animals such as mice). Access to all of these products or outcomes of research is governed by different considerations than those dealt with here.
These Principles and Guidelines are principally aimed at research data in digital, computer-readable format. It is indeed in this format that the greatest potential lies for improvements in the efficient distribution of data and their application to research because the marginal costs of transmitting data through the Internet are close to zero.These Principles and Guidelinescould also apply to analogue research data in situations where the marginal costs of giving access to such data can be kept reasonably low."
The one of the Australian National Data Service:
"Research Data: Data are facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or analysed, experimental or observational. Data includes: laboratory notebooks; field notebooks; primary research data (including research data in hardcopy or in computer readable form); questionnaires; audiotapes; videotapes; models; photographs; films; test responses."
"At the very least, we implicitly agree on the following idea: when we talk about "research data", we mean figures, readings, measurements, results of experiments, responses to surveys, statistics, counts, and other quantitative data on the basis of which a hypothesis will be developed, and/or which will be used to invalidate or validate this hypothesis... in short, essentially quantitative data, which can be processed, sorted, exploited, visualized in a homogeneous manner. The publication of such data is already part, at least in some disciplines, of the canons of scientific article writing (for example, the "Materials and methods" section in the recommendations for writing articles in medical journals)".
the different types of data
from the published papers to the data: embedded data, underlying data, raw data... (reverse engineering): embedded data > underlying data > raw data...:
purposes of sharing research data
- validation (reproducible science)
- reuse (cumulative science)
Issues and Context:
- back to the Science 2.0 training
Open Science
“Open science is the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.”
related concepts
- "sciences"
- e-science: e-science / digital
- "e-documentation" (cf. open access)
- granularity issues
- computerization of scientific activity (cf. big data): "ideas like recursion, parallelism and abstraction taken from computer science will redefine modern science. Implicit in the idea of a fourth paradigm is the ability, and the need, to share data. In sciences like physics and astronomy, the instruments are so expensive that data must be shared. Now the data explosion and the falling cost of computing and communications are creating pressure to share all scientific data." (John Markoff)
- "e-documentation" (cf. open access)
- open access [ fr ]
- Science 2.0
- Open Science [ fr ]
- e-science: e-science / digital
- data
- Open data
- Big data
- The end of privacy?
- Web squared? / Tim O'Reilly and John Battelle (2009) (cf. Fred Cavazza)
- The end of theory / Chris Anderson (2008): "The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. "
- The end of privacy?
- Web of data / Linked Data /Semantic Web): The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a web of data that can be processed directly or indirectly by machines. Tim Berners-Lee (2000) (via)
Open Science
[Open Science Monitor http://ec.europa.eu/research/openscience/index.cfm?pg=home§ion=monitor]
validation crisis
- an exemple: le Watergate du clonage
- a constat: La fraude scientifique est plus répandue qu’on le croit
- experimental science and validation:> Balibar
"The rule is to describe one's work with sufficient precision so that someone else can understand it in all its details, reproduce it, verify, confirm or refute it."
legal and regulatory context
COUNTRY DEPENDENT
disciplinary variations
Roughly speaking (very roughly) in Humanities and Social Sciences the purpose "reuse" outweighs the purpose "validation". > Text mining
Digital Humanities
- Huma-Num: "Huma-Num est une très grande infrastructure de recherche (TGIR) visant à faciliter le tournant numérique de la recherche en sciences humaines et sociales."
- Text mining : quand le texte devient donnée Emeline Mercier (2015)
- an exemple: Mapping the Republic of Letters
http://web.stanford.edu/group/toolingup/rplviz/
- Linkurious and the "Panama Papers"
disciplinary cultures (some exemples)
Managing and sharing our data
> Faut-il partager ses données?
DMP
- The Data Management Plan
- Organazing and describing
- Storage and conservation
- Sharing and dissemination