Packages

package tokenizer

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class KoreanChunk (text: String, offset: Int, length: Int) extends Product with Serializable
  2. case class ParsedChunk (posNodes: Seq[KoreanToken], words: Int, profile: TokenizerProfile = TokenizerProfile.defaultProfile) extends Product with Serializable

    A candidate parse for a chunk.

    A candidate parse for a chunk.

    posNodes

    Sequence of KoreanTokens.

    words

    Number of words in this candidate parse.

  3. case class Sentence (text: String, start: Int, end: Int) extends Product with Serializable
  4. case class TokenizerProfile (tokenCount: Float = 0.18f, unknown: Float = 0.3f, wordCount: Float = 0.3f, freq: Float = 0.2f, unknownCoverage: Float = 0.5f, exactMatch: Float = 0.5f, allNoun: Float = 0.1f, unknownPosCount: Float = 10.0f, determinerPosCount: Float = 0.01f, exclamationPosCount: Float = 0.01f, initialPostPosition: Float = 0.2f, haVerb: Float = 0.3f, preferredPattern: Float = 0.6f, preferredPatterns: Seq[Seq[Any]] = ..., spaceGuide: Set[Int] = Set[Int](), spaceGuidePenalty: Float = 3.0f, josaUnmatchedPenalty: Float = 3.0f) extends Product with Serializable

Value Members

  1. object KoreanChunker

    Split input text into Korean Chunks (어절)

  2. object KoreanDetokenizer

    Detokenizes a list of tokenized words into a readable sentence.

  3. object KoreanSentenceSplitter

    Sentence Splitter

  4. object KoreanTokenizer

    Provides Korean tokenization.

    Provides Korean tokenization.

    Chunk: 어절 - 공백으로 구분되어 있는 단위 (사랑하는사람을) Word: 단어 - 하나의 문장 구성 요소 (사랑하는, 사람을) Token: 토큰 - 형태소와 비슷한 단위이지만 문법적으로 정확하지는 않음 (사랑, 하는, 사람, 을)

    Whenever there is an updates in the behavior of KoreanParser, the initial cache has to be updated by running tools.CreateInitialCache.

  5. object ParsedChunk extends Serializable
  6. object TokenizerProfile extends Serializable

Ungrouped