Indexing

Index Creation

Index only what you need: DocFetcher Server follows an “index only what you need” philosophy, meaning it won’t index anything until you explicitly tell it to. This not only saves computational resources, but also yields better search results, since the latter won’t be cluttered with irrelevant files. However, a consequence of this philosophy is that the clients won’t be able to find anything until you create some indexes, so let’s do just that.

Select a folder for indexing: First, you need to get the server up and running. Then visit the web interface in your browser and click your way to the Admin Area. In the Admin Area, all index management is done on the Indexes tab. On this tab, you can see two tables, the Indexes table and the Index Tasks table. The Indexes table evidently lists the existing indexes, and has a row of buttons underneath it for various index-related actions. Click Add > Folder..., then select a folder to be indexed. For starters, pick a small one so indexing won’t take too long.

Note

On Windows, mapped drives may not be available in the folder chooser dialog, due to the application running under the Local System account and this account not having access to mapped drives. As a workaround, try indexing the UNC path of the underlying network resource via Add > Path....

A queue of index tasks: The folder you selected for indexing will appear in the Index Tasks table as a new index task. The Index Tasks table is basically a queue of index tasks to be executed one after another, and the one you just created will sit idly in the queue until you’re done configuring it and you’ve confirmed that it is ready for execution.

Index task configuration: If you click on an index task in the Index Tasks table, a section labeled “Selected Index Task” appears below the table. At this point, the Selected Index Task section should already be visible, since the newly created index task should have been auto-selected by the application. Now, in the Selected Index Task section you can configure the selected index task, but for our purposes the default configuration should suffice, so just click the Run Task button at the bottom of the page to mark the index task as ready for execution. Now just sit back and wait for the index task to be completed.

New index: Once the index task is completed, the newly created index will be listed in the Indexes table. If you head over to the User Area, you will see the index in the Search Scope pane in the bottom left. Now users will be able to run searches via the search field at the top of the User Area.

Retrieving files: Since the DocFetcher Server web interface is a web application that runs in a browser, and modern browsers are designed to keep web applications locked up in a sandbox for the sake of security, the DocFetcher Server web interface can neither open result files via local viewer applications (PDF readers, Microsoft Word, etc.) nor navigate to them via Windows Explorer or other file managers. Hence, the only ways to retrieve files in the DocFetcher Server web interface are (1) downloading them, and (2) copying their paths to the clipboard and then retrieving them through other means.

Server vs. client file paths: Depending on your use case, one issue you may need to pay attention to is that the file paths shown in the search results are by default from the server’s point of view. This is a problem if from the client’s point of view the same files are located at different file paths, due to different operating systems being used and/or the network drives on which the files reside being mounted at different locations. For example, the server may be running on Linux and indexing files below /path/to/documents, while the clients may be running on Windows and seeing those same files below G:\docs. To resolve this discrepancy between server and client file paths, DocFetcher Server let’s you set a custom client path for each index. To do so, go to the Admin Area, then on the Indexes tab select an index and click the Client Path... button. To continue our example, if you set the client path of the relevant index to G:\docs, then instead of /path/to/documents/test/test.docx the clients would see the path G:\test\test.docx in their search results. (Note that DocFetcher Server will try to guess the correct file path separator — either backslash or forward slash — from the client path you set.)

Index Updates

The need for index updates: As was explained in the beginning, an index is a kind of dictionary that accelerates searches. But this acceleration comes at a price: First, the index has to be created, and then it has to be updated whenever files are added, removed, renamed or modified. Without updating, the search results will become increasingly out of sync with the indexed files: Newly added files will not show up in the results, while some results may point to files that no longer exist or have been moved elsewhere. Hence the need for index updates.

Manual index updates: DocFetcher Server offers three ways to update indexes. The first way is the manual way: Below the Indexes table, you’ll see the buttons Update and Rebuild. Update means the application will scan the indexed folder for any changes that have occurred since the last time the index was updated, or since the index was created. Rebuild means the application will rebuild the index from scratch.

Automatic index updates: In the Indexes table, you’ll notice an Auto-Update column. This is the second way of updating indexes. The Auto-Update column marks each index with a tick or a cross, and clicking a tick will turn it into a cross, and vice versa. If an index is marked with a tick, the application will use the operating system’s folder watching capabilities to detect and immediately process file changes occurring in the indexed folder. Two caveats apply: First, folder watching is usually not available if the indexed folder resides on a network drive. And second, you may want to not use folder watching if file changes in the indexed folder occur with high frequency, as this would continually trigger index updates and thus keep the application extremely busy with scanning and updating.

Index update URL: The third way to update indexes is represented by the Index Update URL section on the Access tab. In that section, you’ll find a link which when visited will trigger index updates for all existing indexes. This means you can update your indexes manually by visiting the link in a browser, or you can write a script that visits the link, and use a task scheduler of your choice to run the script on a regular basis, e.g., every day at midnight.

Index update URL and security: Disclosing the update link to untrusted parties poses a security risk, as the link could be used to intentionally trigger non-stop index updates, thereby bringing the server down to its knees. For this reason, the link ends with a password-like sequence which you can customize in the Index Update URL section. In addition, when the link is visited, the server will give no indication to the visitor that the link was correct, in order to prevent attackers from finding the correct link via brute force. However, you as the server administrator can see whether you visited the correct link, by checking the URL access log in the Index Update URL section.