Yichao Zhou bio photo

Yichao Zhou

周依潮 (Joe ee Chow)

I'm from Suzhou, a delicate city which displays a verve of painting. Currently in Los Angeles. Passionate about natural language processing, badminton, and photographing.

LinkedIn Twitter   G. Scholar

[INLG 2018] Generation of Company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation

  • study the performance of several state-of-the-art sequence-tosequence models applied to generation of short company descriptions
  • a newly created and publicly available company dataset that has been collected from Wikipedia, which consists of around 51K company descriptions (Download) that can be used for both concept-to-text and text-to-text generation tasks
  • the dataset is not ideal for machine learning since the abstract, the body and the infobox are only loosely correlated

drawing


  • the basic model used for generating company description is based on the RNN seq2seq model architecture (Sutskever et al., 2014) which is divided into two main blocks: encoder which encodes the input sentence into fixed-length vector, and the decoder that decodes the vector into sequence of words
  • as pointed out by (See et al., 2017) the classical seq2seq models suffer from two commonly known problems: repetition of subsequences and wording off-topic
  • repetition is caused at the decoding stage, when the decoder relies too much on the previous output leading to infinite cycle; to deal with this problem is to use a coverage mechanism (Tu et al., 2016): used in machine translation, uses the attention weights to penalize the decoder for attending to input that has already been attended to previously
  • hallucination can appear when the word to predict is infrequent in the training set and therefore has a poor word embedding making it close to a lot of other words; Pointer-Generator Network (See et al., 2017) which computes a generation probability p_gen ∈ [0, 1]. This value evaluates the probability of ‘generating’ a word based on the vocabulary known by the model, versus copying a word from the source
  • standard automatic measures BLEU (Papineni et al., 2002), ROUGE-L (Lin and Hovy, 2003), Meteor (Denkowski and Lavie, 2014) and CIDEr (Vedantam et al., 2015) were computed using the E2E challenge script.

[ACL2019] Multi-Style Generative Reading Comprehension

  • reading comprehension (RC) is a challenge task to answer a question given textual evidence provided in a document set
  • generative models suffer from a dearth of training data to cover open-domain questions
  • tackles generative reading comprehension (RC), which consists of answering questions based on textual evidence and natural language generation (NLG)
    • focuses on generating a summary from the question and multiple passages instead of extracting an answer span from the provided passages –> introduce the pointer-generator mechanism (See et al., 2017) for generating an abstractive answer from the question and multiple passages by extending to a Transformer one that allows words to be generated from a vocabulary and to be copied from the question and passages
    • learns multi-style answers within a model to improve the NLG capability for all styles involved –> also extend the pointer-generator to a conditional decoder by introducing an artificial token corresponding to each style

masque multisource

[NAACL2019] Keyphrase Generation: A Text Summarization Struggle

  • Problems of most existing keyphrase generation methods:
    • First, they are not able to find an optimal value for N (number of keywords to generate for an article) based on article contents and require it as a preset parameter.
    • Second, the semantic and syntactic properties of article phrases (considered as candidate keywords) are analyzed separately. The meaning of longer text units like paragraphs or entire abstract/paper is missed.
    • Third, only phrases that do appear in the paper are returned. In practice, authors do often assign words that are not part of their article.
  • Meng et al. (2017) overcome the second and third problem using an encoder-decoder model (COPYRNN) with a bidirectional Gated Recurrent Unit (GRU) and a forward GRU with attention.
  • we explore abstractive text summarization models proposed in the literature, trained with article abstracts and titles as sources and keyword strings as targets.
  • Pointer-Generator network (POINTCOV) is composed of an attention-based encoder that produces the context vector. The decoder is extended with a pointer-generator model that computes a probability p_gen from the context vector, the decoder states, and the decoder output.

scores

  • The results show that the tried text summarization models perform poorly on full-match keyword predictions. Their higher ROUGE scores further indicate that the problem is not entirely in the summarization process.

[ACL 2017] Deep Keyphrase Generation [code]

  • propose a generative model for keyphrase prediction with an encoder-decoder framework
    • could identify keyphrases that do not appear in the text
    • capture the real semantic meaning behind the text
  • To apply the RNN Encoder-Decoder model, the data need to be converted into text-keyphrase pairs that contain only one source sequence and one target sequence
  • the RNN is not able to recall any keyphrase that contains out-ofvocabulary words –> copying mechanism (Gu et al., 2016) is one feasible solution that enables RNN to predict out-of-vocabulary words by selecting appropriate words from the source text copyrnn model

[ACL 2017] Get To The Point: Summarization with Pointer-Generator Networks [Tensorflow Python2] [Tensorflow Python3] [Pytorch Python2] [Data Prepossessing]

  • Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization but have two shortcomings:
    • they are liable to reproduce factual details inaccurately
    • they tend to repeat themselves
  • augments the standard sequence-to-sequence attentional model in two orthogonal ways
    • copy words from the source text via pointing
    • use coverage to keep track of what has been summarized
  • Although our best model is abstractive, it does not produce novel n-grams (i.e., n-grams that don’t appear in the source text) as often as the reference summaries. The baseline model produces more novel n-grams, but many of these are erroneous summarization model

[SIGIR 2019] DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

  • presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document
  • given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks
  • effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node
  • propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process

graphpointer model graphpointer results


[arxiv19] The Curious Case of Neural Text Degeneration

scores scores

[arxiv19] SenseBERT: Driving Some Sense into BERT

  • propose a method to employ selfsupervision directly at the word sense level
  • SenseBERT is pre-trained to predict not only the masked words but also their WordNet supersenses –> a lexical-semantic level language model

sensebert