Packages

o

org.openkoreantext.processor

OpenKoreanTextProcessor

object OpenKoreanTextProcessor

OpenKoreanTextProcessor provides error and slang tolerant Korean tokenization.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. OpenKoreanTextProcessor
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def addNounsToDictionary(words: Seq[String]): Boolean

    Add user-defined word list to the noun dictionary.

    Add user-defined word list to the noun dictionary. Spaced words are not allowed.

    words

    Sequence of words to add.

  5. def addWordsToDictionary(pos: KoreanPos, words: Seq[String]): Boolean

    Add user-defined word List to the dictionary for the specified KoreanPos.

    Add user-defined word List to the dictionary for the specified KoreanPos.

    pos

    KoreanPos of words to add.

    words

    Sequence of words to add.

  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def detokenize(tokens: Iterable[String]): String

    Detokenize the input list of words.

    Detokenize the input list of words.

    tokens

    List of words.

    returns

    Detokenized string.

  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  11. def extractPhrases(tokens: Seq[KoreanToken], filterSpam: Boolean = false, enableHashtags: Boolean = true): Seq[KoreanPhrase]

    Extract noun-phrases from Korean text

    Extract noun-phrases from Korean text

    tokens

    Korean tokens

    filterSpam

    true if spam/slang terms to be filtered out (default: false)

    enableHashtags

    true if #hashtags to be included (default: true)

    returns

    A sequence of extracted phrases

  12. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int
    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  17. def normalize(text: CharSequence): CharSequence

    Normalize Korean text.

    Normalize Korean text. Uses KoreanNormalizer.normalize().

    text

    Input text

    returns

    Normalized Korean text

  18. final def notify(): Unit
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit
    Definition Classes
    AnyRef
  20. def removeWordsFromDictionary(pos: KoreanPos, words: Seq[String]): Boolean

    Remove user-defined word List from the dictionary for the specified KoreanPos.

    Remove user-defined word List from the dictionary for the specified KoreanPos.

    pos

    KoreanPos of words to add.

    words

    Sequence of words to add.

  21. def splitSentences(text: CharSequence): Seq[Sentence]

    Split input text into sentences.

    Split input text into sentences.

    text

    input text

    returns

    A sequence of sentences.

  22. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  23. def toString(): String
    Definition Classes
    AnyRef → Any
  24. def tokenize(text: CharSequence, profile: TokenizerProfile): Seq[KoreanToken]

    Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.

    Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.

    text

    input text

    profile

    TokenizerProfile

    returns

    A sequence of KoreanTokens.

  25. def tokenize(text: CharSequence): Seq[KoreanToken]

    Tokenize text into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.

    Tokenize text into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.

    text

    input text

    returns

    A sequence of KoreanTokens.

  26. def tokenizeTopN(text: CharSequence, n: Int, profile: TokenizerProfile): Seq[Seq[Seq[KoreanToken]]]

    Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top n candidates.

    Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top n candidates.

    text

    input text

    n

    number of top candidates

    profile

    TokenizerProfile

    returns

    A sequence of sequences of KoreanTokens.

  27. def tokenizeTopN(text: CharSequence, n: Int): Seq[Seq[Seq[KoreanToken]]]

    Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top n candidates.

    Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top n candidates.

    text

    input text

    n

    number of top candidates

    returns

    A sequence of sequences of KoreanTokens.

  28. def tokensToStrings(tokens: Seq[KoreanToken]): Seq[String]

    Tokenize text into a sequence of token strings.

    Tokenize text into a sequence of token strings. This excludes spaces.

    tokens

    Korean tokens

    returns

    A sequence of token strings.

  29. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped