Abstract:
Data compression is the route towards adjusting, encoding or changing the bit structure of
information so that it requires less space. Data compression is a decrease in the quantity of
bits expected to demonstrate the data. Compacting data can spare stockpiling limit,
accelerate record exchange, and lessening costs for capacity equipment and system transfer
speed. Data compression covers a huge space of jobs including data correspondence, data
putting away and database improvement. In the same way, Text compression can be as
straightforward as expelling every unneeded character, embedding a solitary recurrent
character to demonstrate a string of rehashed characters and substituting a little piece string
for a habitually happening bit string. The fundamental standard behind compression is to
build up a strategy or convention for utilizing less bits to express the actual data. Character
encoding is fairly identified with data compression which represents a character by a type
of encoding system. In this thesis, an efficient and simple compression algorithm for large
natural text named n-Sequence based m Bit Compression (nSmBC) is proposed which can
able to beat WinZip and WinRAR in terms of compression ratio. WinZip and WinRAR are
two well-known compression techniques used for text compression in the industry. The
scheme provides an efficient encoding algorithm that converts an 8 bit character by 5 bits
utilizing a look up table. The look up table is produced by using Zipf’s distribution which
is a discrete distribution of commonly used characters in different languages. 8 bit characters
are converted to 5 bits by partitioning the characters into 7 sets. After converting the
characters into 5 bit, an n-sequence scheme is developed to logically calculate the location
number of a particular combination of characters. The reverse algorithm to recover the
actual input is further demonstrated. The algorithm is finally compared with the well-known
WinZip, WinRAR, Huffman and LZW techniques. Promising performance is demonstrated
both by theoretical and experimental analysis.
Description:
This thesis is submitted to the Department of Computer Science and Engineering, Khulna University of Engineering & Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, September 2019.
Cataloged from PDF Version of Thesis.
Includes bibliographical references (pages 52-56).