Tables of Contents for Managing Gigabytes
Canonical Huffman codes
36
5
Computing Huffman code lengths
41
10
2.4 Arithmetic coding
51
10
How arithmetic coding works
53
3
Implementing arithmetic coding
56
3
Maintaining cumulative counts
59
2
2.5 Symbolwise models
61
13
Prediction by partial matching
61
4
Block-sorting compression
65
4
Dynamic Markov compression
69
3
Word-based compression
72
2
2.6 Dictionary models
74
11
The LZ77 family of adaptive dictionary coders
75
3
The gzip variant of LZ77
78
1
The LZ78 family of adaptive dictionary coders
79
2
The LZW variant of LZ78
81
4
Creating synchronization points
85
2
Self-synchronizing codes
87
3
2.8 Performance comparisons
90
9
Compression performance
91
4
Other performance considerations
99
1
3.1 Sample document collections
106
3
3.2 Inverted file indexing
109
5
3.3 Inverted file compression
114
14
Nonparameterized models
116
3
Global Bernoulli model
119
2
Global observed frequency model
121
1
Local Bernoulli model
121
1
Skewed Bernoulli model
122
1
Local hyperbolic model
123
1
Local observed frequency model
124
1
Context-sensitive compression
125
3
3.4 Performance of index compression methods
128
1
3.5 Signature files and bitmaps
129
14
Bitsliced signature files
133
5
Analysis of signature files
138
2
Compression of signature files and bitmaps
141
2
3.6 Comparison of indexing methods
143
2
3.7 Case folding, stemming, and stop words
145
5
4.1 Accessing the lexicon
156
14
Minimal perfect hashing
161
3
Design of a minimal perfect hash function
164
5
Disk-based lexicon storage
169
1
4.2 Partially specified query terms
170
4
Brute-force string matching
170
1
Indexing using n-grams
170
2
4.3 Boolean query processing
174
6
Term processing order
175
1
Random access and fast lookup
176
2
Blocked inverted files
178
2
Nonconjunctive queries
180
1
4.4 Ranking and information retrieval
180
8
Inner product similarity
181
4
4.5 Evaluating retrieval effectiveness
188
10
Recall-precision curves
191
1
World Wide Web searching
194
3
Other effectiveness measures
197
1
4.6 Implementation of the cosine measure
198
16
Within-document frequencies
198
3
Calculating the cosine value
201
2
Memory for document weights
203
3
Memory for accumulators
206
1
Fast query processing
207
1
Frequency-sorted indexes
208
2
4.7 Interactive retrieval
214
4
4.8 Distributed retrieval
218
3
five Index Construction
223
40
Preview of index construction methods
226
2
5.1 Memory-based inversion
228
3
5.2 Sort-based inversion
231
4
5.3 Exploiting index compression
235
10
Compression the temporary files
236
2
In-place multiway merging
239
6
5.4 Compressed in-memory inversion
245
8
Large memory inversion
245
5
Lexicon-based partitioning
250
1
Text-based partitioning
251
2
5.5 Comparison of inversion methods
253
1
5.6 Construction signature files and bitmaps
254
2
5.7 Dynamic collections
256
4
six Image Compression
263
48
6.2 The CCITT fax standard for bilevel images
268
5
6.3 Context-based compression of bilevel images
273
8
Two-level context models
277
2
"Clairvoyant" compression
279
2
6.4 JBIG: A standard for bilevel images
281
7
Templates and adaptive templates
286
1
Coding and probability estimation
287
1
6.5 Lossless compression of continuous-tone images
288
9
The GIF and PNG lossless image formats
289
2
FELICS: Fast, efficient, lossless image compression system
291
3
CALIC: Context-based adaptive lossless image codec
294
2
JPEG-LS: A new standard for lossless image compression
296
1
6.6 JPEG: A standard for continuous-tone images
297
6
6.7 Progressive transmission of images
303
5
Compression for pyramid coding
304
2
6.8 Summary of image compression techniques
308
2
seven Textual Images
311
44
7.1 The idea of textual image compression
314
4
7.2 Lossy and lossless compression
318
2
Tracing the boundary of a mark
321
2
Removing the mark from the image
323
2
Sorting marks into natural reading order
325
1
7.4 Template matching
325
12
Global template matching
326
3
Local template matching
329
1
Compression-based template matching
330
2
Screening library templates
332
1
Evaluation of template-matching methods
333
4
7.5 From marks to symbols
337
3
Symbols and their offsets
339
1
7.6 Coding the components of a textual image
340
3
7.7 Performance: Lossy and lossless modes
343
6
7.8 System considerations
349
2
7.9 JBIG2: A standard for textual image compression
351
2
eight Mixed Text and Images
355
34
Detecting straight lines using the Hough transform
358
3
The projection profile
361
6
From slope histogram to docstrum
367
5
Bottom-up segmentation methods
374
2
Top-down and combined segmentation methods
376
1
Mark-based segmentation
376
2
Segmenting short text strings
378
5
Segmentation using a document grammar
383
2
9.1 Text compression
390
16
Choice of compression model
391
3
Limitations on Huffman codes
396
5
Length-limited coding
401
5
9.2 Text compression performance
406
9
Compression effectiveness
406
3
9.3 Images and textual images
415
4
Compression of bilevel images
416
1
Compression of grayscale images
417
1
Compression of textual images
417
2
9.4 Index construction
419
2
9.5 Index compression
421
2
ten The Information Explosion
431
20
10.1 Two millennia of information
431
2
10.2 The Internet: A global information resource
433
3
10.3 The paper problem
436
2
10.4 Coping with the information explosion
438
4
Agent-based information retrieval
440
1
10.5 Digital libraries
442
2
10.6 Managing gigabytes better
444
1
10.7 Small is beautiful
445
2
10.8 Personal information support for life
447
4
A Guide to the mg System
451
18
A.1 Installing the mg system
451
2
A.2 A sample storage and retrieval session
453
6
A.3 Database creation
459
4
A.4 Querying an indexed document collection
463
2
A.6 Image compression programs
466
3
B.1 What's in the NZDL?
469
9
Computer Science Technical Reports
469
1
Collection development
476
1
B.2 How the NZDL works
478
4
Searching and indexing
480
2