YLab - software

1. NCRF++: An Open-source Neural Sequence Labeling Toolkit

Introduction

Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. State-of-the-art sequence labeling models mostly utilize the CRF structure with input word features. LSTM (or bidirectional LSTM) is a popular deep learning based feature extractor in sequence labeling task. And CNN can also be used due to faster computation. Besides, features within word are also useful to represent word, which can be captured by character LSTM or character CNN structure or human-defined neural features.

NCRF++ is a PyTorch based framework with flexiable choices of input features and output structures. The design of neural sequence labeling models with NCRF++ is fully configurable through a configuration file, which does not require any code work. NCRF++ can be regarded as a neural network version of CRF++, which is a famous statistical CRF framework.

This framework has been accepted by ACL 2018 as demonstration paper. And the detailed experiment report and analysis using NCRF++ has been accepted at COLING 2018 as the best paper.

NCRF++ supports different structure combinations of on three levels: character sequence representation, word sequence representation and inference layer.

Character sequence representation: character LSTM, character GRU, character CNN and handcrafted word features.
Word sequence representation: word LSTM, word GRU, word CNN.
Inference layer: Softmax, CRF.

Welcome to star this repository!

Github Link

Cite

If you use NCRF++ in your paper, please cite our ACL demo paper:

@inproceedings{yang2018ncrf,
title={NCRF++: An Open-source Neural Sequence Labeling Toolkit},
author={Yang, Jie and Zhang, Yue},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
 Url = {http://aclweb.org/anthology/P18-4013},
 year={2018}
}

If you use experiments results and analysis of NCRF++, please cite our COLING paper:

@inproceedings{yang2018design,
title={Design Challenges and Misconceptions in Neural Sequence Labeling},
author={Yang, Jie and Liang, Shuailong and Zhang, Yue},
booktitle={Proceedings of the 27th International Conference on Computational Linguistics (COLING)},
 Url = {http://aclweb.org/anthology/C18-1327},
 year={2018}
}

2. YEDDA: A Lightweight Collaborative Text Span Annotation Tool

Introduction

YEDDA (the previous SUTDAnnotator) is developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji. It supports shortcut annotation which is extremely efficient to annotate text by hand. The user only need to select text span and press shortcut key, the span will be annotated automatically. It also support command annotation model which annotates multiple entities in batch and support export annotated text into sequence text. Besides, intelligent recommendation and adminstrator analysis is also included in updated version. It is compatiable with all mainstream operating systems includings Windows, Linux and MacOS.

For more details, please refer to our paper (ACL2018:demo, best demo nomination).

This GUI annotation tool is developed with tkinter package in Python.

System required: Python 2.7

Github Link

Cite

If you use YEDDA for research, please cite our ACL paper as follows:

@article{yang2017yedda,
title={YEDDA: A Lightweight Collaborative Text Span Annotation Tool},
author={Yang, Jie and Zhang, Yue and Li, Linwei and Li, Xingxuan},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
 url={http://aclweb.org/anthology/P18-4006},
 year={2018}
}

3. Lattice LSTM

Introduction

Lattice LSTM for Chinese NER. Character based LSTM with Lattice embeddings as input.

Models and results can be found at our ACL 2018 paper Chinese NER Using Lattice LSTM.

It achieves 93.18% F1-value on MSRA dataset, which is the state-of-the-art result on Chinese NER task.

Github Link

Cite

Please cite our ACL 2018 paper:

@article{zhang2018chinese,
title={Chinese NER Using Lattice LSTM},
author={Yue Zhang and Jie Yang},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)},
 year={2018}
}

4. CORA

Introduction

Efficient human validation tool for electronic health records (EHRs) with NLP highlights.

Github Link

Cite

If you use YEDDA for research, please cite our ACL paper as follows:

@article{yang2017yedda,
title={YEDDA: A Lightweight Collaborative Text Span Annotation Tool},
author={Yang, Jie and Zhang, Yue and Li, Linwei and Li, Xingxuan},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
 url={http://aclweb.org/anthology/P18-4006},
 year={2018}
}