Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of DSpace
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Hassan, Malik Tahir"

Now showing 1 - 1 of 1
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Terms-based discriminative information space for robust text classification.
    (Information Sciences, Elsevier, 2016-08-22) Karim, Asim; Hassan, Malik Tahir
    With the popularity of Web 2.0, there has been a phenomenal increase in the utility of text classification in applications like document filtering and sentiment categorization. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. In this paper, we propose a novel and efficient method using terms-based discriminative information space for robust text classification. Terms in the documents are assigned weights according to the discrimination information they provide for one category over the others. These weights also serve to partition the terms into category sets. A linear opinion pool is adopted for combining the discrimination information provided by each set of terms to yield a feature space (discriminative information space) having dimensions equal to the number of classes. Subsequently, a discriminant function is learned to categorize the documents in the feature space. This classification methodology relies upon corpus information only, and is robust to distribution shifts and noise. We develop theoretical parallels of our methodology with generative, discriminative, and hybrid classifiers. We evaluate our methodology extensively with five different discriminative term weighting schemes on six data sets from different application areas. We give a side-by-side comparison with four well-known text classification techniques. The results show that our methodology consistently outperforms the rest, especially when there is a distribution shift from training to test sets. Moreover, our methodology is simple and effective for different application domains and training set sizes. It is also fast with a small and tunable memory footprint.

DSpace software copyright © 2002-2025 LYRASIS

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback