An Efficient Compression Scheme for Large Natural Language Text

Mahmood, Md. Ashiq

KUET Institutional Repository Home
→
Faculty of Electrical and Electronic Engineering
→
Department of Computer Science & Engineering (CSE)
→
M.Sc. Engg.
→
View Item

dc.contributor.advisor	Hasan, Prof. Dr. K. M. Azharul
dc.contributor.author	Mahmood, Md. Ashiq
dc.date.accessioned	2019-09-26T06:31:25Z
dc.date.available	2019-09-26T06:31:25Z
dc.date.copyright	2019
dc.date.issued	2019-09
dc.identifier.other	ID 1707507
dc.identifier.uri	http://hdl.handle.net/20.500.12228/531
dc.description	This thesis is submitted to the Department of Computer Science and Engineering, Khulna University of Engineering & Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, September 2019.	en_US
dc.description	Cataloged from PDF Version of Thesis.
dc.description	Includes bibliographical references (pages 52-56).
dc.description.abstract	Data compression is the route towards adjusting, encoding or changing the bit structure of information so that it requires less space. Data compression is a decrease in the quantity of bits expected to demonstrate the data. Compacting data can spare stockpiling limit, accelerate record exchange, and lessening costs for capacity equipment and system transfer speed. Data compression covers a huge space of jobs including data correspondence, data putting away and database improvement. In the same way, Text compression can be as straightforward as expelling every unneeded character, embedding a solitary recurrent character to demonstrate a string of rehashed characters and substituting a little piece string for a habitually happening bit string. The fundamental standard behind compression is to build up a strategy or convention for utilizing less bits to express the actual data. Character encoding is fairly identified with data compression which represents a character by a type of encoding system. In this thesis, an efficient and simple compression algorithm for large natural text named n-Sequence based m Bit Compression (nSmBC) is proposed which can able to beat WinZip and WinRAR in terms of compression ratio. WinZip and WinRAR are two well-known compression techniques used for text compression in the industry. The scheme provides an efficient encoding algorithm that converts an 8 bit character by 5 bits utilizing a look up table. The look up table is produced by using Zipf’s distribution which is a discrete distribution of commonly used characters in different languages. 8 bit characters are converted to 5 bits by partitioning the characters into 7 sets. After converting the characters into 5 bit, an n-sequence scheme is developed to logically calculate the location number of a particular combination of characters. The reverse algorithm to recover the actual input is further demonstrated. The algorithm is finally compared with the well-known WinZip, WinRAR, Huffman and LZW techniques. Promising performance is demonstrated both by theoretical and experimental analysis.	en_US
dc.description.statementofresponsibility	Md. Ashiq Mahmood
dc.format.extent	56 pages
dc.language.iso	en_US	en_US
dc.publisher	Khulna University of Engineering & Technology (KUET), Khulna, Bangladesh	en_US
dc.rights	Khulna University of Engineering & Technology (KUET) thesis/dissertation/internship reports are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Data Compression	en_US
dc.subject	Character Encoding	en_US
dc.subject	Compression Algorithm	en_US
dc.subject	Compression Techniques	en_US
dc.subject	Large Natural Text	en_US
dc.title	An Efficient Compression Scheme for Large Natural Language Text	en_US
dc.type	Thesis	en_US
dc.description.degree	Master of Science in Computer Science and Engineering
dc.contributor.department	Department of Computer Science and Engineering