The BZip2 is another compression utility, similar to GZip, but with more
strong compression and more liberal license, also free from patented
algorithms and currently popular almost as the GZip. BZip2 is not archiving
utility, i.e. its can't roll up several files together. It simply compress
the specified file.
The quote from BZip2 manual: "bzip2 compresses files using the
Burrows-Wheeler block-sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that achieved by more
conventional LZ77/LZ78-based compressors, and approaches the performance of
the PPM family of statistical compressors". More information about bzip2,
you can find at BZip2 homepage,
on the BZip2 How-To
and MacBZip2 pages.
What is checksum file?
The checksum file is an relatively small file which contains the checksum
values of other files. Because the checksum values have fixed length
(usually from 16 to 256 bits) the size of checksum file not depend from
size of this files, only from their quantity. This allow by small oversizing
of summary data let possibility of verifying integrity of data files at any
There are many formats of checksum files, frequently proprietary or closed.
But there is also several open and widely supported formats.
In the binary newsgroups and file-sharing networks is well known format is
SFV, that mean "Simple File Verification". And it's really "simple" - in
this format is used the CRC32 algorithm and the file structure supposes
ambiguity. We do not recommend use of this format except of case of strong
The another format mostly known in open systems world is format of MD5SUM
utility. This format using the more secure and reliable MD5 algorithm and
have more convenient structure of file. The Advanced CheckSum Verifier
utility support both formats, but we strongly recommend to use the MD5SUM
There is also third format named "alternative" or "BSD checksum format",
which more advanced and allow to use any checksum calculation algorithm.
Unfortunately Advanced CheckSum Verifier not support currently this format,
but we plan to release in the future.
What is checksum utility?
The "checksum utility" or "checksum tool" category usually contains two
subcategories of programs - checksum calculators and checksum verifiers.
The checksum calculators intended for calculating the checksum values of
single object - standalone file and/or entire drive. For example, the
eXpress CheckSum Calculator allow to calculate the checksum value of single
file only. The checksum calculator integrated into AccuHash 2.0 allows also
can calculate checksum values for entire drive.
Another class of checksum tools - checksum verifiers usually work with
checksum files - file of special format, which contain the list of checksum
values for several, sometimes for a lot of files. Although this class is
named "checksum verifier", frequently this programs also can create this
checksum files. For example, the Advanced CheckSum Verifier (ACSV) can both
create and verify checksum files.
What is CRC32?
The CRC is acronym for "Cyclic Redundancy Code" and 32
represent the length of checksum in bits. The "CRC" term is reserved for
algorithms that are based on the "polynomial" division idea. The idea to
compute the checksum is equal for all CRC algorithms: Take the data as a
VERY long binary number and divide it by a constant divisor. If you do this
with integer values you get a rest; this rest is the CRC checksum (for
example 7 / 3 = 2 + rest 1 => 1 is the checksum of 8. CRC is a
family of algorithms and CRC32 is one certain member of this family
(other members are CRC16, XMODEM,...); CRC32 produces a checksum with
a length of 32 Bit (= 4Byte).
What is the "checksum"?
The checksum is a number that has been calculated as a function
of some message. The literal interpretation of the "Check-Sum" word
indicates that the function should make simply adding up the bytes in
the message. Probably so was in the first checksum algorithms. Today,
however, more sophisticated algorithms is used, but term "checksum" is
What is the "message"?
The message in this context mean the input data, which will being
checksummed. This usually is any sequence of bytes.
The GZip (GNU zip) is a compression utility with excellent compression,
free from patented algorithms and currently very popular on the Internet.
GZip is not archiving utility, i.e. its can't roll up several files together.
It simply compress the specified file.
The format of the .gz files generated by gzip and their compression alogrithm
(deflate) is described in RFCs (Request For Comments) 1951 and 1952. More
information about gzip, you can find on it homepage at gzip.org.
The GZip is command line driven utility, but there is several utilities that
support the gzip files. For example, our free WinGZip utility.
What is Message Digest?
A message digest is a compact digital signature for an arbitrarily long
stream of binary data. An ideal message digest algorithm would never
generate the same signature for two different sets of input, but achieving
such theoretical perfection would require a message digest as long as the
input file. Practical message digest algorithms compromise in favour of a
digital signature of modest and usually fixed size created with an algorithm
designed to make preparation of input text with a given signature
Message digest algorithms have much in common with techniques used in
encryption, but to a different end; verification that data have not been
altered since the signature was published.
On the other hand, they also have some in common with 16 or 32 bit cyclical
redundancy codes (CRC) originally developed to verify correct transmission
in data communication protocols, but these short codes, while adequate to
detect the kind of transmission errors for which they were intended, are
insufficiently secure for modern volumes of the data and for applications
such as electronic commerce and verification of security related software
The most commonly used present-day message digest algorithm is the 128 bit
MD5 algorithm, developed by Ron Rivest of the MIT Laboratory for Computer
Science and RSA Data Security, Inc. The algorithm, with a reference
implementation, was published as Internet RFC 1321 in April 1992, and was
placed into the public domain at that time.
Message Digest number 5 (MD5)
The MD5 (Message Digest number 5) algorithm generate a unique, 128-bit
cryptographic message digest value derived from the contents of input
stream. This value is considered to be a highly reliable fingerprint
that can be used to verify the integrity of the file's contents. If as
little as a single bit value in the file is modified, the MD5 checksum
for the file changes. Forgery of a file in a way that causes MD5 to
generate the same result as that for the original file is considered to
be extremely difficult.
A set of MD5 checksums for critical system, application, and data files
provides a compact way of storing information for use in periodic integrity
checks of those files.
Details for the MD5 cryptographic checksum algorithm and C source code are
provided in RFC 1321. The MD5 algorithm has been implemented in numerous
computer languages including C, Perl, and Java.
The Advanced CheckSum Verifier is an Windows GUI utility, which
generates and verifies message digests (digital signatures) using the MD5
algorithm. This program can be useful when necessary verifying of data
burned to CD-R(W), transmitted over network, or for file comparison,
and detection of file corruption and tampering.
History of MD5 as described in the RSA Laboratories Crypto FAQ.
MD2, MD4, and MD5 are message-digest algorithms developed by Rivest.
They are meant for digital signature applications where a large message has
to be "compressed" in a secure manner before being signed with the private
key. All three algorithms take a message of arbitrary length and produce a
128-bit message digest. While the structures of these algorithms are
somewhat similar, the design of MD2 is quite different from that of MD4 and
MD5. MD2 was optimized for 8-bit machines, whereas MD4 and MD5 were aimed at
32-bit machines. Description and source code for the three algorithms can be
found as Internet RFCs 1319-1321.
MD2 was developed by Rivest in 1989. The message is first padded so its
length in bytes is divisible by 16. A 16-byte checksum is then appended to
the message, and the hash value is computed on the resulting message. Rogier
and Chauvaud have found that collisions for MD2 can be constructed if the
calculation of the checksum is omitted. This is the only cryptanalytic
result known for MD2.
MD4 was developed by Rivest in 1990. The message is padded to ensure that its
length in bits plus 64 is divisible by 512. A 64-bit binary representation of
the original length of the message is then concatenated to the message.
The message is processed in 512-bit blocks in the Damgard/Merkle iterative
structure, and each block is processed in three distinct rounds. Attacks on
versions of MD4 with either the first or the last rounds missing were
developed very quickly by Den Boer, Bosselaers and others. Dobbertin has
shown how collisions for the full version of MD4 can be found in under a
minute on a typical PC. In recent work, Dobbertin (Fast Software Encryption,
1998) has shown that a reduced version of MD4 in which the third round of
the compression function is not executed but everything else remains the
same, is not one-way. Clearly, MD4 should now be considered broken.
MD5 was developed by Rivest in 1991. It is basically MD4 with "safety-belts"
and while it is slightly slower than MD4, it is more secure. The algorithm
consists of four distinct rounds, which has a slightly different design from
that of MD4. Message-digest size, as well as padding requirements, remains
the same. Den Boer and Bosselaers have found pseudo-collisions for MD5. More
recent work by Dobbertin has extended the techniques used so effectively in
the analysis of MD4 to find collisions for the compression function of MD5.
While stopping short of providing collisions for the hash function in its
entirety this is clearly a significant step. For a comparison of these
different techniques and their impact the reader is referred to.
Van Oorschot and Wiener have considered a brute-force search for collisions
in hash functions, and they estimate a collision search machine designed
specifically for MD5 (costing $10 million in 1994) could find a collision
for MD5 in 24 days on average. The general techniques can be applied to
other hash functions.
The md5sum is command line driven utility for calculating
the md5 checksums of files. This utility is public domain and licensed under
It is very popular and ported practically for all modern operating systems.
The format of checksum files produced by this utility also widely is used
and is the standard de facto for md5 checksum files.
This glossary item is not yet ready
Search Engines Optimization
This glossary item is not yet ready
Search Engine Position
This glossary item is not yet ready
This glossary item is not yet ready
When people want to find anything in the Internet, they go to the search
engines and made request by typing one or more words in the search form.
These words is named search phrase (of keyphrase). In reply to this request
the search engine returns the list of sites in the Internet which is most
suitable for this search phrase. Revealing of the key phrases popular among
potential customers and optimization of a site for getting the top position
in search results are one of main ways of promotion of a site in the Internet.
Each search engine have own criterions of rating the websites for specified
key phrase. Though details of these criteria are usually kept a secret, two
base principles of selection of sites remain always. It in the first, the
contents of a site and second, its popularity.
The Golden Phrases utility use your webserver access log files
for revealing key phrases already used by your visitors to reach your site
through search engines. It will help you to make more optimized websites.
Simple File Verification (SFV) format
The SFV is file format for store the CRC32 checksums of files.
This format first has been introduced by the WinSFV program and widely used
in binary newsgroups, MP3 and ISO exchange networks and other communities.
It is necessary to notice, that a format has some disadvantages. First of
all, very fast, but not so reliable simple (not cryptographic) checksum
algorithm is used. Second, the structure of a file is chosen not so good.
It's designed for old, DOS-style file names convention. In case of use of
long file names may arise some ambiguitys.
Secure Hash Algorithm
SHA (Secure Hash Algorithm) was designed by the National Security Agency
(NSA) in 1993 as the algorithm of the Secure Hash Standard (SHS, FIPS 180).
It was modeled after MD4 with additional improvements. An undisclosed
security problem prompted the NSA to release an improved SHA-1 in 1995.
Florent Chabaud and Antoine Joux later discovered a differential collision
attack against SHA in 1998. There are currently no known cryptographic
attacks against SHA-1.
It is also described in the ANSI X9.30 (part 2) standard. SHA-1 produces a
160-bit (20 byte) message digest. Although slower than MD5, this larger
digest size makes it stronger against brute force attacks.
Coordinated Universal Time
The UTC is acronym for "Coordinated Universal Time".
The times of various events, particularly astronomical and weather
phenomena, are often given in "Universal Time" (abbreviated UT) which
is sometimes referred to, now colloquially, as "Greenwich Mean Time"
(abbreviated GMT). The two terms are often used loosely to refer to
time kept on the Greenwich meridian (longitude zero)
However, in the most common civil usage, UT refers to a time scale called
"Coordinated Universal Time" (abbreviated UTC), which is the basis for the
worldwide system of civil time. This time scale is kept by time laboratories
around the world, including the U.S. Naval Observatory, and is determined
using highly precise atomic clocks. The International Bureau of Weights and
Measures makes use of data from the timing laboratories to provide the
international standard UTC which is accurate to approximately a nanosecond
(billionth of a second) per day. The length of a UTC second is defined in
terms of an atomic transition of the element cesium under specific
conditions, and is not directly related to any astronomical phenomena.
UTC is equivalent to the civil time for Iceland, Liberia, Morocco, Senegal,
Ghana, Mali, Mauritania, and several other countries. During the winter
months, UTC is also the civil time scale for the United Kingdom and
Why is UTC used as the acronym for Coordinated Universal Time instead of
In 1970 the Coordinated Universal Time system was devised by an
international advisory group of technical experts within the International
Telecommunication Union (ITU). The ITU felt it was best to designate a
single abbreviation for use in all languages in order to minimize confusion.
Since unanimous agreement could not be achieved on using either the English
word order, CUT, or the French word order, TUC, the acronym UTC was chosen
as a compromise.
How I can watch the UTC time?
There is a lot of clocks that can help you. And, certainly, our
AlphaClock also allow this.
This glossary item is not yet ready
Webserver access log files
Usually all webservers records all processed requests into special
file. This file is named as access log file. The location of this file and
his format depend on the used webserver software and its settings.
For Apache based
servers most frequently are used common and/or combined log formats. The Internet Information Server
(IIS) use own, proprietary format of log files.
The content of log files also is various, but usually for every request
includes the date and time of request, the address of remote host, name
of requested file, code of request result, size of handled data. Also the
log files may contains (and often contains) the referring URL and
description of user agent.
The presence of such information allows to analyze the access log files to
produce useful statistics.
Presence of referring URLs in log files allows to trace the sources of
visitors of your site. And in particular, from what page of search engines.
And due to that the address of search page includes also search phrase and
page number, it's always possible to determine a position of your site for
the given phrase.
Golden Phrases utility use this information to determine the most
perspective search phrases
What is webserver? Usually so named the webserver softwares. For example,
"Apache webserver". But HTTPD (HTTP Daemon) software is only one part of
webserver. And serving the websites is only one of many applications
As well as many other modern hi-tech products, the webservers consist from
hardware and the software.
Depending on purpose of webserver its hardware may be very different: from
the microcontroller up to mainframe. For hosting of websites it is usually
used the systems with x86-based architecture.
The software also consists of operating system and the actual server software.
Though sometimes, for reaching of high performance they may be combined.
The two most often used webserver softwares is Apache on *nix (Linux, *BSD,
etc.) systems and Internet Information Server (IIS) on Windows (NT, 2000,
XP, Net) systems.
Golden Phrases utility is software intended to analyze the webserver
access log files.
XTea encryption algorithm
The Tiny Encryption Algorithm, or TEA, is a public domain
block cipher by David Wheeler and Roger Needham. It is intended for use
in applications where code size is at a premium, or where it is necessary
for someone to remember the algorithm and code it on an arbitrary machine
at a later time.
This algorithm is simple enough to be translated into a number of different
languages and assembly languages very easily. It is short enough to be
programmed from memory or a copy.
The XTea is successor of Tea, modified for avoiding some
weaknesses of original Tea algorithm.