object OpenKoreanTextProcessor
OpenKoreanTextProcessor provides error and slang tolerant Korean tokenization.
- Alphabetic
- By Inheritance
- OpenKoreanTextProcessor
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
addNounsToDictionary(words: Seq[String]): Boolean
Add user-defined word list to the noun dictionary.
Add user-defined word list to the noun dictionary. Spaced words are not allowed.
- words
Sequence of words to add.
-
def
addWordsToDictionary(pos: KoreanPos, words: Seq[String]): Boolean
Add user-defined word List to the dictionary for the specified KoreanPos.
Add user-defined word List to the dictionary for the specified KoreanPos.
- pos
KoreanPos of words to add.
- words
Sequence of words to add.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
detokenize(tokens: Iterable[String]): String
Detokenize the input list of words.
Detokenize the input list of words.
- tokens
List of words.
- returns
Detokenized string.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
extractPhrases(tokens: Seq[KoreanToken], filterSpam: Boolean = false, enableHashtags: Boolean = true): Seq[KoreanPhrase]
Extract noun-phrases from Korean text
Extract noun-phrases from Korean text
- tokens
Korean tokens
- filterSpam
true if spam/slang terms to be filtered out (default: false)
- enableHashtags
true if #hashtags to be included (default: true)
- returns
A sequence of extracted phrases
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
normalize(text: CharSequence): CharSequence
Normalize Korean text.
Normalize Korean text. Uses KoreanNormalizer.normalize().
- text
Input text
- returns
Normalized Korean text
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
-
def
removeWordsFromDictionary(pos: KoreanPos, words: Seq[String]): Boolean
Remove user-defined word List from the dictionary for the specified KoreanPos.
Remove user-defined word List from the dictionary for the specified KoreanPos.
- pos
KoreanPos of words to add.
- words
Sequence of words to add.
-
def
splitSentences(text: CharSequence): Seq[Sentence]
Split input text into sentences.
Split input text into sentences.
- text
input text
- returns
A sequence of sentences.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
tokenize(text: CharSequence, profile: TokenizerProfile): Seq[KoreanToken]
Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.
Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.
- text
input text
- profile
TokenizerProfile
- returns
A sequence of KoreanTokens.
-
def
tokenize(text: CharSequence): Seq[KoreanToken]
Tokenize text into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.
Tokenize text into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term.
- text
input text
- returns
A sequence of KoreanTokens.
-
def
tokenizeTopN(text: CharSequence, n: Int, profile: TokenizerProfile): Seq[Seq[Seq[KoreanToken]]]
Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top
n
candidates.Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top
n
candidates.- text
input text
- n
number of top candidates
- profile
TokenizerProfile
- returns
A sequence of sequences of KoreanTokens.
-
def
tokenizeTopN(text: CharSequence, n: Int): Seq[Seq[Seq[KoreanToken]]]
Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top
n
candidates.Tokenize text (with a custom profile) into a sequence of KoreanTokens, which includes part-of-speech information and whether a token is an out-of-vocabulary term, and return top
n
candidates.- text
input text
- n
number of top candidates
- returns
A sequence of sequences of KoreanTokens.
-
def
tokensToStrings(tokens: Seq[KoreanToken]): Seq[String]
Tokenize text into a sequence of token strings.
Tokenize text into a sequence of token strings. This excludes spaces.
- tokens
Korean tokens
- returns
A sequence of token strings.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )