Memory Usage

RAM requirement: Due to the unfortunate combination of running on top of the Java runtime, having to deal with a potentially enormous amount of data, and serving this data to multiple or even many clients, your DocFetcher Server instance may need a lot of RAM to run properly. Therefore, when choosing a server computer to run DocFetcher Server on, the amount of RAM is the one big factor you probably shouldn’t skimp on. 1–2 GB of RAM may be too little, unless the server has only small and moderate-sized files to deal with. With 16 GB you’re likely on the safe side. However, in the end, the amount of RAM you’re actually going to need really depends on your data and the number of clients.

The size of the biggest file matters: A commonly misunderstood point with respect to memory usage is that the amount of RAM needed for successful indexing does not depend on the number of files or their combined size, as is often believed, but on the size of the biggest file. For instance, DocFetcher Server may index multiple terabytes of data without breaking a sweat if all files are small or moderate-sized. But trying to index just one file that is several hundreds of MB in size can cause the application to run out of memory. Think of it as food: You can gulp down a lot of porridge, but you may choke if you try to swallow a big lump. So more RAM is like having a wider throat, which allows you to swallow bigger lumps.

File size limits: On the indexing pane, where you can configure new indexing tasks to be run, there are two settings for skipping content indexing of files whose size is beyond a certain size limit:

  • Index only files that are smaller than:

  • Index files without file extension as text files, but no bigger than:

By choosing sufficiently low values for these two settings, you can prevent the application from running out of memory during indexing, of course at the expense of not indexing the contents of the skipped files. If skipping is undesirable, or if the application is running out of memory for reasons other than indexing big files, you need to raise the application’s so-called memory limit. This is explained in the following.

Memory limit: Due to the fact that DocFetcher Server runs on top of the Java runtime, it has a cap on its memory usage, meaning that it won’t automatically take all of the available RAM when it needs to, but only as much RAM as it’s allowed to take. This amount of RAM is a startup parameter of the Java runtime and is officially known as the “maximum heap size”, but here we will refer to it as the “memory limit” for the sake of simplicity. In addition to the server configuration discussed earlier, the memory limit is another important parameter you may have to tweak before as well as after deployment.

Setting the memory limit: Depending on the platform you’re running DocFetcher Server on, here’s how you can set the memory limit:

  • Windows: On Windows, run DocFetcherServerw.exe to open the configuration GUI. On the Java tab, set the memory limit in MB in the Maximum memory pool field. For example, set a value of 8192 (= 8*1024) for a memory limit of 8 GB. After that, restart the server for the change to take effect. Now, there’s another way to set the memory limit, via the DocFetcherServer.ini file. See the documentation in that file for further instructions. However, this second method only takes effect when the server is installed or reinstalled.

  • Linux and macOS: On Linux and macOS, the memory limit can be set in the launch scripts server-start.sh and server-start-foreground.sh. If you open them in a text editor, you’ll notice an -Xmx parameter near the end. That’s the memory limit. The default is -Xmx16g, which means the default memory limit is 16 GB.

Memory limit: Recommended values: A memory limit of 16 GB is actually only a good choice if your server computer has at least 16 GB of RAM. If it has less than that, then leaving the memory limit at the default 16 GB provides no benefit and will cause the Java runtime to silently crash if it ever runs into a situation where it needs more RAM than is available. So rule #1 is that if your server computer has less than 16 GB of RAM, you need to decrease the memory limit to the amount of RAM available, or to less than that. And rule #2 is that you’ll want to give DocFetcher Server as much of the available RAM as it needs in order to finish indexing successfully and serve clients.

Dealing with out-of-memory errors during indexing: So what if you’ve already given DocFetcher Server all the available RAM and it still runs out of memory during indexing? One way of course is to get more RAM. Another is to skip file content indexing for files above a certain file size limit. But what if you want to skip only specific files above a certain file size limit, rather than all of them? Then you have to go and find all the files to be excluded, and add exclusion rules for them on the indexing pane. To find files that cause problems during indexing, have a look at the setting ShowPathsDuringIndexing in the server-conf.txt file.

Searching and preview generation: Indexing is the most memory-intensive, as well as CPU-intensive, part of DocFetcher Server, and therefore the number-one source of out-of-memory errors. However, there are two other notable areas where the application may run out of memory: Running searches, and performing text extraction in order to generate content for the preview pane. Unlike indexing, which is performed one index at a time and one file at a time, searching and preview generation may be run many times in parallel, so as to serve multiple active users. This amplifies the potential for out-of-memory errors, and so DocFetcher Server has two settings to cap memory usage specifically for searching and preview generation:

  • The setting MaxResultsTotal in the server-conf.txt file limits the number of results the server returns per search. Note that a too-low value for this setting may noticeably distort the results, as explained in the documentation in the server-conf.txt file.

  • The setting Preview pane file size limit (MB), to be found in the Admin Area of the web interface, limits the size of the files for which a preview is generated. That is to say, if a file is larger than this size limit, no preview will be generated for it.

In addition to capping memory usage, the two settings above will also cap the server’s computational load per user. But then, values that are too low will be undesirable for users, so it’s up to you to pick the right values for your specific use case.