Open Korean Text Processor 2.2.1-SNAPSHOT API - org.openkoreantext.processor.OpenKoreanTextProcessor

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def addNounsToDictionary(words: Seq[String]): Boolean

Add user-defined word list to the noun dictionary.

Add user-defined word list to the noun dictionary. Spaced words are not allowed.

words: Sequence of words to add.

def addWordsToDictionary(pos: KoreanPos, words: Seq[String]): Boolean

Add user-defined word List to the dictionary for the specified KoreanPos.

pos: KoreanPos of words to add.
words: Sequence of words to add.

final def asInstanceOf[T0]: T0

Definition Classes: Any

def clone(): AnyRef

Attributes: protected[java.lang]
Definition Classes: AnyRef
Annotations: @throws( ... )

def detokenize(tokens: Iterable[String]): String

Detokenize the input list of words.

tokens: List of words.
returns: Detokenized string.

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def extractPhrases(tokens: Seq[KoreanToken], filterSpam: Boolean = false, enableHashtags: Boolean = true): Seq[KoreanPhrase]

Extract noun-phrases from Korean text

tokens: Korean tokens
filterSpam: true if spam/slang terms to be filtered out (default: false)
enableHashtags: true if #hashtags to be included (default: true)
returns: A sequence of extracted phrases

def finalize(): Unit

Attributes: protected[java.lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

final def getClass(): Class[_]

Definition Classes: AnyRef → Any

def hashCode(): Int

Definition Classes: AnyRef → Any

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def normalize(text: CharSequence): CharSequence

Normalize Korean text.

Normalize Korean text. Uses KoreanNormalizer.normalize().

text: Input text
returns: Normalized Korean text

final def notify(): Unit

Definition Classes: AnyRef

final def notifyAll(): Unit

Definition Classes: AnyRef

def removeWordsFromDictionary(pos: KoreanPos, words: Seq[String]): Boolean

Remove user-defined word List from the dictionary for the specified KoreanPos.

pos: KoreanPos of words to add.
words: Sequence of words to add.

def splitSentences(text: CharSequence): Seq[Sentence]

Split input text into sentences.

text: input text
returns: A sequence of sentences.

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def toString(): String

Definition Classes: AnyRef → Any

def tokenize(text: CharSequence, profile: TokenizerProfile): Seq[KoreanToken]

Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.

text: input text
profile: TokenizerProfile
returns: A sequence of KoreanTokens.

def tokenize(text: CharSequence): Seq[KoreanToken]

Tokenize text into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.

text: input text
returns: A sequence of KoreanTokens.

def tokenizeTopN(text: CharSequence, n: Int, profile: TokenizerProfile): Seq[Seq[Seq[KoreanToken]]]

Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top n candidates.

text: input text
n: number of top candidates
profile: TokenizerProfile
returns: A sequence of sequences of KoreanTokens.

def tokenizeTopN(text: CharSequence, n: Int): Seq[Seq[Seq[KoreanToken]]]

Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top n candidates.

text: input text
n: number of top candidates
returns: A sequence of sequences of KoreanTokens.

def tokensToStrings(tokens: Seq[KoreanToken]): Seq[String]

Tokenize text into a sequence of token strings.

Tokenize text into a sequence of token strings. This excludes spaces.

tokens: Korean tokens
returns: A sequence of token strings.

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

Packages

OpenKoreanTextProcessor

object OpenKoreanTextProcessor

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped