GitHub - thjbdvlt/spacy-french-parser: syntactic dependency parser for french using spacy

syntactic dependency parser for french with spacy.

this repository is comprised of scripts that fetch and prepare data to train a syntactic dependencies parser with spacy for the french language, along with a configuration file and script to train it. the model itself is available under releases.

the data used for the training is an aggregation of three UD datasets and makes some minor changes to these datasets.

in the datasets i used, the word du is splitted into its logical component de and le. a text like on parle du ciel becomes on parle de le ciel in the .conllu files. but in the texts i have to analyze, du isn't splitted at all, so i need to unsplit it. thus the following:

11-12	du	...	_	_	_	_
11	de	...	19	case	_	_
12	le	...	11	det	_	_

is transformed into:

11	du	...	19	case:det	_	_

upon that, some labels are replaced by others, and sentences containing certain labels (such as dep which indicates than the parsing failed) are removed. for a list of replaced or removed labels, refer the file lookup_labels.txt.

usage

the parser is not a full pipeline. you have to source it from another pipeline as a component:

import spacy

# load your main pipeline
nlp = spacy.load('fr_core_news_sm', exclude=['parser'])

# load the model containing the parser
nlp_deps = spacy.load('./model', exclude=['tokenizer'])

# put the parser in the main pipeline
nlp.add_pipe('parser', source=nlp_deps)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.cfg		config.cfg
lookup_labels.txt		lookup_labels.txt
pyproject.toml		pyproject.toml
release.sh		release.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

usage

About

Releases 1

Packages

Languages

License

thjbdvlt/spacy-french-parser

Folders and files

Latest commit

History

Repository files navigation

usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages