KUET Institutional Repository

An Efficient Compression Scheme for Large Natural Language Text

Show simple item record

dc.contributor.advisor Hasan, Prof. Dr. K. M. Azharul
dc.contributor.author Mahmood, Md. Ashiq
dc.date.accessioned 2019-09-26T06:31:25Z
dc.date.available 2019-09-26T06:31:25Z
dc.date.copyright 2019
dc.date.issued 2019-09
dc.identifier.other ID 1707507
dc.identifier.uri http://hdl.handle.net/20.500.12228/531
dc.description This thesis is submitted to the Department of Computer Science and Engineering, Khulna University of Engineering & Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, September 2019. en_US
dc.description Cataloged from PDF Version of Thesis.
dc.description Includes bibliographical references (pages 52-56).
dc.description.abstract Data compression is the route towards adjusting, encoding or changing the bit structure of information so that it requires less space. Data compression is a decrease in the quantity of bits expected to demonstrate the data. Compacting data can spare stockpiling limit, accelerate record exchange, and lessening costs for capacity equipment and system transfer speed. Data compression covers a huge space of jobs including data correspondence, data putting away and database improvement. In the same way, Text compression can be as straightforward as expelling every unneeded character, embedding a solitary recurrent character to demonstrate a string of rehashed characters and substituting a little piece string for a habitually happening bit string. The fundamental standard behind compression is to build up a strategy or convention for utilizing less bits to express the actual data. Character encoding is fairly identified with data compression which represents a character by a type of encoding system. In this thesis, an efficient and simple compression algorithm for large natural text named n-Sequence based m Bit Compression (nSmBC) is proposed which can able to beat WinZip and WinRAR in terms of compression ratio. WinZip and WinRAR are two well-known compression techniques used for text compression in the industry. The scheme provides an efficient encoding algorithm that converts an 8 bit character by 5 bits utilizing a look up table. The look up table is produced by using Zipf’s distribution which is a discrete distribution of commonly used characters in different languages. 8 bit characters are converted to 5 bits by partitioning the characters into 7 sets. After converting the characters into 5 bit, an n-sequence scheme is developed to logically calculate the location number of a particular combination of characters. The reverse algorithm to recover the actual input is further demonstrated. The algorithm is finally compared with the well-known WinZip, WinRAR, Huffman and LZW techniques. Promising performance is demonstrated both by theoretical and experimental analysis. en_US
dc.description.statementofresponsibility Md. Ashiq Mahmood
dc.format.extent 56 pages
dc.language.iso en_US en_US
dc.publisher Khulna University of Engineering & Technology (KUET), Khulna, Bangladesh en_US
dc.rights Khulna University of Engineering & Technology (KUET) thesis/dissertation/internship reports are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject Data Compression en_US
dc.subject Character Encoding en_US
dc.subject Compression Algorithm en_US
dc.subject Compression Techniques en_US
dc.subject Large Natural Text en_US
dc.title An Efficient Compression Scheme for Large Natural Language Text en_US
dc.type Thesis en_US
dc.description.degree Master of Science in Computer Science and Engineering
dc.contributor.department Department of Computer Science and Engineering


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search KUET IR


Browse

My Account

Statistics