Skip to content

alexeyev/RuConceptNet

Repository files navigation

/ru/ConceptNet

ConceptNet 5.7 (Russian part) extraction scripts + fast API object to access the relations. Note: a simple modification of the preprocessing script allows to build a queryable graph of any other subset of ConceptNet.

Python 3.10+ CI PyPI version Downloads

Installation

pip install ruconceptnet

Usage

>>> from ruconceptnet import ConceptNet
>>> cn = ConceptNet()
>>> cn.get_targets("алкоголь")
[('этиловый_спирт', {'Synonym'}), ('спиртной_напиток', {'Synonym'}), ('алкогольный', {'RelatedTo'}), 
('алкоголик', {'RelatedTo'}), ('спирт', {'Synonym'}), ('алкоголизация', {'RelatedTo'})]

>>> cn.get_sources("йога")
[('йоги', {'FormOf'}), ('йогу', {'FormOf'}), ('йогический', {'RelatedTo'}), ('йогою', {'FormOf'}), 
('йогой', {'FormOf'}), ('йог', {'RelatedTo'}), ('йоге', {'FormOf'})]

>>> cn.check_pair("человек", "зверь")
(['DistinctFrom'], [])

>>> cn.check_pair("зверь", "человек")
([], ['DistinctFrom'])

Edge weights

Every relation carries ConceptNet's weight (assertion confidence). Pass with_weights=True to get {relation: weight} mappings instead of plain sets:

>>> cn.get_targets("алкоголь", with_weights=True)
[('спирт', {'Synonym': 2.0}), ('алкоголизм', {'RelatedTo': 3.5}), ...]

>>> cn.check_pair("человек", "зверь", with_weights=True)
({'DistinctFrom': 0.5}, {})

When several assertions share the same (source, target, relation), the strongest (maximum) weight is kept. Data built before weights were added reports 1.0 for every edge.

Preparations for customization

Please see the prepare_data.sh script. We get the Russian-Russian pairs of nodes with simple grep and build a 3-dimensional array (source, target, relation) stored as a single sparse SciPy matrix.

Development

# install the package together with the development dependencies
pip install -e ".[dev]"

# run the test suite with coverage (threshold enforced at 80%)
pytest

# lint and format
ruff check .
ruff format .

# optional: enable the git hooks
pre-commit install

Citing

Please do not forget to cite the ConceptNet5 paper.

@inproceedings{10.5555/3298023.3298212,
  author = {Speer, Robyn and Chin, Joshua and Havasi, Catherine},
  title = {ConceptNet 5.5: An Open Multilingual Graph of General Knowledge},
  year = {2017},
  publisher = {AAAI Press},
  booktitle = {Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence},
  pages = {4444–4451},
  numpages = {8},
  location = {San Francisco, California, USA},
  series = {AAAI'17}
}

Citing the repository is not necessary, but greatly appreciated as well, if you use this work.

@misc{ruconceptnet2020alekseev,
  title     = {{alexeyev/RuConceptNet: /ru/ConceptNet5.7 Python wrapper }},
  year      = {2020},
  url       = {https://github.com/alexeyev/RuConceptNet},
  language  = {english}
}

License

The code is released under the MIT license (please see the LICENSE file).

This work includes a subset data from ConceptNet 5, which was compiled by the Commonsense Computing Initiative. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 3.0) from http://conceptnet.io.

The included data was created by contributors to Commonsense Computing projects, contributors to Wikimedia projects, DBPedia, OpenCyc, Games with a Purpose, Princeton University's WordNet, Francis Bond's Open Multilingual WordNet, and Jim Breen's JMDict.

The complete data in ConceptNet is available under the Creative Commons Attribution-ShareAlike 4.0 license.

For more details, please see "Copying and sharing ConceptNet".

About

/ru/ConceptNet5.7 Python wrapper

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors