Pages

Tuesday, June 19, 2012

Adding an index to your pdf-file

Have you ever created a pdf-file from multiple documents? And wanted to have a clickable index for the pdf-file?
on more than one occasion have I created documents that I need to make an index to, and after some gogling I have come up with the solution that I will share in this article.
We only ned a couple of files and programs to make this work.

First of all we need the original pdf- file we want to add an index to.
Secondly we need to know what index title goes to which page.
Third and most important is the gs command. if it's missing we'll install it using:

sudo apt-get install ghostscript

We then create our index-file (index.info):

[/Page 1 /View [/XYZ null null null] /Title (First page) /OUT pdfmark 
[/Page 2 /View [/XYZ null null null] /Title (Table of Content) /OUT pdfmark 
[/Page 3 /View [/XYZ null null null] /Title (Document content) /OUT pdfmark 
[/Page 6 /View [/XYZ null null null] /Title (Appendix A) /OUT pdfmark 
[/Page 8 /View [/XYZ null null null] /Title (Preface) /OUT pdfmark 


This file contains the index for our pdf-file. We can add more pages and titles as we like simply by changing the page number and title within the (). To create our new indexed pdf-file we simply run the following command:
gs -sDEVICE=pdfwrite -q -dBATCH -dNOPAUSE \
-sOutputFile=newfile_indexed.pdf index.info \
-f original_unindexed_file.pdf

We now have a new pdf-file with a clickable index.

7 comments:

  1. Thank you!!!
    pdftk's "update_info" was not working for me, this did!

    ReplyDelete
  2. Thank you too.
    It worked for me also without issue.
    One question: If I want a subsection within my Table of Contents e.g. ---
    1 1
    1.1 3
    1.2 7
    2 12
    2.1 13
    2.2 18
    ... ...

    I assume I alter the "null null null" entries? Do you know where I can find this information? ... I'll search Google and man pages meanwhile...

    ReplyDelete
  3. ...I've found a solution to my above query here:
    http://dekonvoluted.wordpress.com/tag/ghostscript/
    It creates links sections within a chapter and adds them to the index of the newly created pdf.

    ReplyDelete
  4. ... just one more note for anyone looking for the same solution as I, namely, how to include sections in the ToC. This is how I wrote my index.info:

    ...
    [/Page 7 /View [/XYZ null null null] /Title (Foreword) /OUT pdfmark
    [/Page 21 /Count 10/View [/XYZ null null null] /Title (ONE: TAB:ES OR NO TABLES?) /OUT pdfmark
    [/Page 21 /View [/XYZ null null null] /Title (Basic Multiplication) /OUT pdfmark
    [/Page 22 /View [/XYZ null null null] /Title (Multiplication by eleven) /OUT pdfmark
    ...
    [/Page 55 /View [/XYZ null null null] /Title (TWO: RAPID MULTIPLICATION BY THE DIRECT METHOD) /OUT pdfmark
    ...
    (there were 10 sections to Chapter 1)

    ReplyDelete
  5. I'm triyng to get the index from an existing pdf file. Farther I'd like to extract all the metadata and reuse it.

    Thanks in advance

    ReplyDelete
  6. To get the metadata and index info you should look into pdftk and its dump_data operation.
    Runnning: pdftk pdffile.pdf dump_data output index.txt will create a file called index.txt containing metadata and index information. The index info will not be formatted in such a manner that it is usable with gs though. You will need to process it either manually or making up a script that will do it for you.

    ReplyDelete