-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathdata.html
More file actions
99 lines (89 loc) · 4.58 KB
/
data.html
File metadata and controls
99 lines (89 loc) · 4.58 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
layout: default
nav_active: data
title: Webis Data
description: Overview of corpora that are used by the Webis
---
<nav class="uk-container">
<ul class="uk-breadcrumb">
<li><a href="{{ '/' | relative_url }}">Webis.de</a></li>
<li class="uk-disabled"><a href="#">Data</a></li>
</ul>
</nav>
<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "DataCatalog",
"name": "Webis Data",
"description": "Overview of corpora that are used by the Webis research group.",
"url": "https://webis.de/data.html",
"keywords": [
"webis",
"data",
"corpora",
"corpus"
],
"author": [
{
"@type": "Organization",
"url": "https://webis.de/",
"name": "The Web Technology & Information Systems Network",
"alternateName": "Webis"
}
]
}
</script>
<main class="uk-section uk-section-default">
<div class="uk-container">
<h1>Data</h1>
<ul class="uk-list">
<li><span data-uk-icon="chevron-down"></span> <a href="#released-webis-corpora">Released Webis Corpora</a>
</li>
<li><span data-uk-icon="chevron-down"></span> <a href="#pan-corpora">PAN Corpora</a></li>
<li><span data-uk-icon="chevron-down"></span> <a href="#touche-corpora">Touché Corpora</a></li>
<li><span data-uk-icon="chevron-down"></span> <a href="#internal-webis-corpora">Internal Webis Corpora</a>
</li>
<li><span data-uk-icon="chevron-down"></span> <a href="#other-corpora">Other Corpora</a></li>
</ul>
</div>
<div class="uk-container uk-margin-medium">
<p>
This page organizes all corpora which have resulted from or have been used in our research. Their availability for Webis externals is as follows:
(1) corpora that have been officially released by Webis can be downloaded here,
(2) corpora of the PAN and Touché series can be downloaded here,
(3) internal Webis corpora (which will be officially released in the future) are supplied upon request,
(4) other corpora can be downloaded from their original publisher/creator via the provided links.
</p>
<p>
Most of our released corpora are hosted at
<a title="Download: Zenodo" href="https://zenodo.org/communities/webis">Zenodo <img src="{{ '/data/img/zenodo-icon.png' | relative_url }}" alt="(Zenodo)"></a> and are indexed in the <a title="Indexed: Google" href="https://toolbox.google.com/datasetsearch/search?query=webis-data-catalog">Google Dataset Search <img src="{{ '/data/img/google-icon.png' | relative_url }}" alt="(Google Dataset Search)"></a>;
a few larger corpora are available in the <a title="Internet Archive" href="https://archive.org/details/webis">Internet Archive <img src="{{ '/data/img/ia-icon.png' | relative_url }}" alt="(Internet Archive)"></a>;
some corpora are accessibly via the <a title="Hugging Face" href="https://huggingface.co/webis">Hugging Face <img src="{{ '/data/img/huggingface-icon.png' | relative_url }}" alt="(Huggingface)"></a> and <a title="IR datasets" href="https://ir-datasets.com/index.html">IR datasets <img src="{{ '/data/img/ir-icon.png' | relative_url }}" alt="(ir_datasets)"></a> libraries;
the <img src="{{ '/data/img/browser-icon-magnifier.png' | relative_url }}" alt="Browser"></a> –symbol indicates a browsing facility for the respective corpus.
</p>
<div id="search-control">
<input type="text" class="uk-input" id="filter-field" placeholder="Type here to filter…"/>
</div>
</div>
<div class="uk-container uk-margin-medium webis-data-table">
{% include bib-data.html %}
<div id="filtered-all-message" class="uk-hidden uk-text-muted" aria-hidden="true">
None of our corpora match your filter.
</div>
</div>
</main>
<script src="https://assets.webis.de/js/thirdparty/jquery/jquery.slim.min.js"></script>
<script src="https://assets.webis.de/js/thirdparty/fontawesome/fontawesome.min.js"></script>
<script src="https://assets.webis.de/js/thirdparty/fontawesome/solid.min.js"></script>
<script src="https://assets.webis.de/js/filter.js"></script>
<script src="https://assets.webis.de/js/selection.js"></script>
<script src="https://assets.webis.de/js/tables.js"></script>
<script>
document.addEventListener('DOMContentLoaded', () => {
const tables = document.querySelectorAll('.targetable');
// Initialize sorting and filtering
initWebisDataFiltering(tables);
// Apply default sorting by 'Name' column
sortTablesByName(tables);
});
</script>