Terms-based discriminative information space for robust text classification.
| dc.contributor.author | Karim, Asim | |
| dc.contributor.author | Hassan, Malik Tahir | |
| dc.date.accessioned | 2018-10-15T10:44:15Z | |
| dc.date.available | 2018-10-15T10:44:15Z | |
| dc.date.issued | 2016-08-22 | |
| dc.description.abstract | With the popularity of Web 2.0, there has been a phenomenal increase in the utility of text classification in applications like document filtering and sentiment categorization. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. In this paper, we propose a novel and efficient method using terms-based discriminative information space for robust text classification. Terms in the documents are assigned weights according to the discrimination information they provide for one category over the others. These weights also serve to partition the terms into category sets. A linear opinion pool is adopted for combining the discrimination information provided by each set of terms to yield a feature space (discriminative information space) having dimensions equal to the number of classes. Subsequently, a discriminant function is learned to categorize the documents in the feature space. This classification methodology relies upon corpus information only, and is robust to distribution shifts and noise. We develop theoretical parallels of our methodology with generative, discriminative, and hybrid classifiers. We evaluate our methodology extensively with five different discriminative term weighting schemes on six data sets from different application areas. We give a side-by-side comparison with four well-known text classification techniques. The results show that our methodology consistently outperforms the rest, especially when there is a distribution shift from training to test sets. Moreover, our methodology is simple and effective for different application domains and training set sizes. It is also fast with a small and tunable memory footprint. | en_US |
| dc.identifier.citation | Junejo, K. N., Karim, A., Hassan, M. T., & Jeon, M., (2016), Terms-based discriminative information space for robust text classification. Information Sciences 372, 518-538. (Asim Karim (Computer Science /SST) Malik Tahir Hassan, JCR LISTED (IF:3:364)) | en_US |
| dc.identifier.uri | https://escholar.umt.edu.pk/handle/123456789/3252 | |
| dc.language.iso | en | en_US |
| dc.publisher | Information Sciences, Elsevier | en_US |
| dc.subject | Computer Science | en_US |
| dc.subject | Text classification, Discriminative term weights, Linear opinion pooling, Feature construction. | en_US |
| dc.title | Terms-based discriminative information space for robust text classification. | en_US |
| dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Terms-based discriminative information space for robust text.pdf
- Size:
- 992.14 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: