Jason Kim's Blog

Investigating Memory Leak in a Node.js Application
2020-07-12

I was struck with a memory leak problem in Node.js application recently. It is not fun dealing with memory leak problems. Unlike other typical bugs you face caused by errors in syntax of your code or failures in upstream services, a memory leak problem defies conventional approaches to squashing the bug. You need to use unusual tools you don't normally use and you typically will need more time to solve the problem.

memory leak

Let's evaluate some some approaches and tools that you can use to resolve a memory leak issue in Node.js Applications.

Heapdump NPM package

node-heapdump is an easy to use NPM package that generates a V8 heap dump of your Node.js application. You can examine the heap dump with the Developer Tool in the Chrome browser.

  • Pros
    • Simple setup
    • Can integrate with your Node.js application and create a maintenance endpoint
    • Easily inspect the V8 heap dump using Chrome Developer Tool
  • Cons
    • Only useable if the app is still responsive

My Node.js app was unresponsive because it was executing a blocking function. The blocking function was preventing the Node.js app from accepting any requests to create a heapdump.

As you can see, there's no silver bullet to investigating and resolving a memory leak problem. The essence of solving memory leak problem can be described as simply as 1) get a heap dump 2) inspect the heap dump 3) identify the cause of the most memory exhausting object creation. However, there are many different tools and techniques you to solve the problem and no single way will present a simple solution. Let's continue evaluating other tools.

gcore + MDB on Solaris

4 years ago, I solved another memory leak problem in Node.js application. At that time, the tool that allows you to inspect memory heap dump of a process, llnode (I'll present this tool last), wasn't as mature as it is today. And at that time, I could not use it with the heap dump. I had to use a tool called MDB with gcore.

  • Pros
    • You can run gcore even when the Node.js app is unresponsive
  • Cons
    • Difficult setup. You need Solaris to run MDB. To learn more, you can read this previous blog post I wrote on this topic.

Because setting up Solaris is too cumbersome, I decided to explore using llnode to inspect the gcore heap dump.

gcore + llnode

llnode is another tool for inspecting gcore heap dump. I decided to use llnode because I was able to install the tool inside a docker container, which hosts my Node.js app.

  • Pros
    • Moderately easy to setup.
    • llnode is easy to use.
    • You can run gcore even when the Node.js app is unresponsive.

I could not find notable reasons why I shouldn't use gcore + llnode to investigate the memory issue.

Here are the steps needed to prepare the tools needed to perform your investigation. I am running Node.js application inside Ubuntu 18 Docker container.

  • Go inside the Docker container running the Node.js app. docker exec -it app_name bash

  • Update Ubuntu apt-get update

  • Install lldb apt install lldb-4.0 liblldb-4.0-dev

    You might see an warning message that reads

    mount: permission denied
    update-binfmts: warning: Couldn't mount the binfmt_misc filesystem on /proc/sys/fs/binfmt_misc.
    

    You can safely ignore it for our purpose.

  • Install node-gyp npm i node-gyp

  • Install llnode npm install llnode

  • Install gcore apt-get install gdb

  • Run top to identify the process number for your Node.js app. 36 root 20 0 5700608 4.482g 29724 R 99.7 29.7 100:45.02 node /usr/src/a In this case, the process number is 36.

  • Run gcore on the process. gcore 36 You might see this error.

    ptrace: Operation not permitted.
    You can't do that without a process to debug.
    The program is not being run.
    gcore: failed to create core.36
    
  • To solve the problem in 8, you need to add this to your docker-compose file.

    cap_add:
    - SYS_PTRACE
    

    This is a good blog post on why you need if you are curious.

  • Try step 8 again. gcore should work now.

  • Inspect the core dump with llnode ./node_modules/.bin/llnode node -c core.36

The process of investigation goes something like this.

  • Run v8 findjsobjects inside llnode to determine what object is causing the memory leak. You might be wondering how does one tell which object is causing the memory leak. There are mainly two ways to nail down the object causing the memory leak.

    • When you have a rapidly growing memory leak, your heap dump presents an extreme version of Pareto principle. The object will present itself to be occupying a vast majority of memory will be where you will want to investigate. Here's my result of v8 findjsobjects demonstrating this effect.
    ...
    Instances      Size    Name
            109       3488 ContextifyScript
            138       9936 I
            187      13464 (ArrayBufferView)
            213      11928 NodeError
            220      17600 Layer
            226      12656 Node
            226      14456 Entry
            231      11096 Source
            273       6552 CallSite
           3129     250240 Module
          10930     961840 Tok
          16150    1033480 Loc
          30496     975872 (Array)
         336951    8086824 RowDataPacket
        8901702  286210248 Object
       11881688    3728960 (String)
    
  • When you have a much more slowly growing memory leak, you can't easily tell what JS object is responsible for the memory leak. In this case, you have to take 2 heap dumps over a period time and see if you see any growth in some JS objects. The object growing in the number of instances will be the cause of your memory leak.

  • Generic JS primitives (String, Number, Array etc) and Objects are unactionable. Determine what is the JS object that is not JS primitives and Objects which appears to be cause of the memory leak? In my case, it is RowDataPacket.

  • By padding logs and metrics around suspicious code, determine where in your code causes the RowDataPacket object to be created in such number.

You've now found the cause of the memory leak. Unfortunately, finding the cause of the memory leak is not sufficient to resolving the actual memory leak issue. You still have to apply a remedy to mitigate the cause of the memory leak. And for that step, good luck.