ISBN.nu logo
isbn.nu
search for books and compare prices
Search >
Tables of Contents for Managing Gigabytes
Chapter/Section Title
Page #
Page Count
Preface
xxiii
 
one Overview
1
20
1.1 Document databases
6
3
1.2 Compression
9
2
1.3 Indexes
11
4
1.4 Document images
15
3
1.5 The mg system
18
1
Further reading
19
2
two Text Compression
21
82
2.1 Models
24
3
2.2 Adaptive models
27
3
2.3 Huffman coding
30
21
Canonical Huffman codes
36
5
Computing Huffman code lengths
41
10
Summary
51
1
2.4 Arithmetic coding
51
10
How arithmetic coding works
53
3
Implementing arithmetic coding
56
3
Maintaining cumulative counts
59
2
2.5 Symbolwise models
61
13
Prediction by partial matching
61
4
Block-sorting compression
65
4
Dynamic Markov compression
69
3
Word-based compression
72
2
2.6 Dictionary models
74
11
The LZ77 family of adaptive dictionary coders
75
3
The gzip variant of LZ77
78
1
The LZ78 family of adaptive dictionary coders
79
2
The LZW variant of LZ78
81
4
2.7 Synchronization
85
5
Creating synchronization points
85
2
Self-synchronizing codes
87
3
2.8 Performance comparisons
90
9
Compression performance
91
4
Compression speed
95
4
Other performance considerations
99
1
Further reading
99
4
three Indexing
103
50
3.1 Sample document collections
106
3
3.2 Inverted file indexing
109
5
3.3 Inverted file compression
114
14
Nonparameterized models
116
3
Global Bernoulli model
119
2
Global observed frequency model
121
1
Local Bernoulli model
121
1
Skewed Bernoulli model
122
1
Local hyperbolic model
123
1
Local observed frequency model
124
1
Context-sensitive compression
125
3
3.4 Performance of index compression methods
128
1
3.5 Signature files and bitmaps
129
14
Signature files
130
3
Bitsliced signature files
133
5
Analysis of signature files
138
2
Bitmaps
140
1
Compression of signature files and bitmaps
141
2
3.6 Comparison of indexing methods
143
2
3.7 Case folding, stemming, and stop words
145
5
Case folding
146
1
Stemming
146
1
Effect on index size
147
1
Stop words
147
3
Further reading
150
3
four Querying
153
70
4.1 Accessing the lexicon
156
14
Access structures
156
3
Front coding
159
2
Minimal perfect hashing
161
3
Design of a minimal perfect hash function
164
5
Disk-based lexicon storage
169
1
4.2 Partially specified query terms
170
4
Brute-force string matching
170
1
Indexing using n-grams
170
2
Rotated lexicons
172
2
4.3 Boolean query processing
174
6
Conjunctive queries
174
1
Term processing order
175
1
Random access and fast lookup
176
2
Blocked inverted files
178
2
Nonconjunctive queries
180
1
4.4 Ranking and information retrieval
180
8
Coordinate matching
181
1
Inner product similarity
181
4
Vector space models
185
3
4.5 Evaluating retrieval effectiveness
188
10
Recall and precision
188
3
Recall-precision curves
191
1
The TREC project
192
2
World Wide Web searching
194
3
Other effectiveness measures
197
1
4.6 Implementation of the cosine measure
198
16
Within-document frequencies
198
3
Calculating the cosine value
201
2
Memory for document weights
203
3
Memory for accumulators
206
1
Fast query processing
207
1
Frequency-sorted indexes
208
2
Sorting
210
4
4.7 Interactive retrieval
214
4
Relevance feedback
214
2
Probabilistic models
216
2
4.8 Distributed retrieval
218
3
Further reading
221
2
five Index Construction
223
40
Computational model
226
1
Preview of index construction methods
226
2
5.1 Memory-based inversion
228
3
5.2 Sort-based inversion
231
4
5.3 Exploiting index compression
235
10
Compression the temporary files
236
2
Multiway merging
238
1
In-place multiway merging
239
6
5.4 Compressed in-memory inversion
245
8
Large memory inversion
245
5
Lexicon-based partitioning
250
1
Text-based partitioning
251
2
5.5 Comparison of inversion methods
253
1
5.6 Construction signature files and bitmaps
254
2
5.7 Dynamic collections
256
4
Expanding the text
256
1
Expanding the index
257
3
Further reading
260
3
six Image Compression
263
48
6.1 Types of images
265
3
6.2 The CCITT fax standard for bilevel images
268
5
6.3 Context-based compression of bilevel images
273
8
Context models
275
2
Two-level context models
277
2
"Clairvoyant" compression
279
2
6.4 JBIG: A standard for bilevel images
281
7
Resolution reduction
282
4
Templates and adaptive templates
286
1
Coding and probability estimation
287
1
6.5 Lossless compression of continuous-tone images
288
9
The GIF and PNG lossless image formats
289
2
FELICS: Fast, efficient, lossless image compression system
291
3
CALIC: Context-based adaptive lossless image codec
294
2
JPEG-LS: A new standard for lossless image compression
296
1
6.6 JPEG: A standard for continuous-tone images
297
6
6.7 Progressive transmission of images
303
5
Pyramid coding
304
1
Compression for pyramid coding
304
2
Median aggregation
306
1
Error modeling
307
1
6.8 Summary of image compression techniques
308
2
Further reading
310
1
seven Textual Images
311
44
7.1 The idea of textual image compression
314
4
7.2 Lossy and lossless compression
318
2
7.3 Extracting marks
320
5
Tracing the boundary of a mark
321
2
Removing the mark from the image
323
2
Sorting marks into natural reading order
325
1
7.4 Template matching
325
12
Global template matching
326
3
Local template matching
329
1
Compression-based template matching
330
2
Screening library templates
332
1
Evaluation of template-matching methods
333
4
7.5 From marks to symbols
337
3
Library construction
338
1
Symbols and their offsets
339
1
7.6 Coding the components of a textual image
340
3
Library
340
1
Symbol numbers
341
1
Symbol offsets
341
1
Original image
341
2
7.7 Performance: Lossy and lossless modes
343
6
7.8 System considerations
349
2
7.9 JBIG2: A standard for textual image compression
351
2
Further reading
353
2
eight Mixed Text and Images
355
34
8.1 Orientation
357
15
Detecting straight lines using the Hough transform
358
3
Left-margin search
361
1
The projection profile
361
6
From slope histogram to docstrum
367
5
8.2 Segmentation
372
13
Bottom-up segmentation methods
374
2
Top-down and combined segmentation methods
376
1
Mark-based segmentation
376
2
Segmenting short text strings
378
5
Segmentation using a document grammar
383
2
8.3 Classification
385
3
Further reading
388
1
nine Implementation
389
42
9.1 Text compression
390
16
Choice of compression model
391
3
Choice of coder
394
2
Limitations on Huffman codes
396
5
Length-limited coding
401
5
9.2 Text compression performance
406
9
Compression effectiveness
406
3
Decompression speed
409
1
Decompression memory
410
2
Dynamic collections
412
3
9.3 Images and textual images
415
4
Compression of bilevel images
416
1
Compression of grayscale images
417
1
Compression of textual images
417
2
9.4 Index construction
419
2
9.5 Index compression
421
2
9.6 Query processing
423
5
Boolean queries
423
2
Ranked queries
425
3
Further reading
428
3
ten The Information Explosion
431
20
10.1 Two millennia of information
431
2
10.2 The Internet: A global information resource
433
3
10.3 The paper problem
436
2
10.4 Coping with the information explosion
438
4
Web search engines
438
2
Agent-based information retrieval
440
1
Data mining
441
1
10.5 Digital libraries
442
2
10.6 Managing gigabytes better
444
1
10.7 Small is beautiful
445
2
10.8 Personal information support for life
447
4
A Guide to the mg System
451
18
A.1 Installing the mg system
451
2
A.2 A sample storage and retrieval session
453
6
A.3 Database creation
459
4
A.4 Querying an indexed document collection
463
2
A.5 Nontextual files
465
1
A.6 Image compression programs
466
3
B Guide to the NZDL
469
16
B.1 What's in the NZDL?
469
9
Computer Science Technical Reports
469
1
Other collections
470
6
Collection development
476
1
Audio collections
476
1
Melody Index
477
1
B.2 How the NZDL works
478
4
The raw material
478
2
Searching and indexing
480
2
B.3 Implications
482
1
Further reading
483
2
References
485
22
Index
507
12
About the Authors
519