Neetcode - Longest Common Subsequence

Neetcode - Longest Common Subsequence
Photo by Pawel Czerwinski / Unsplash
Code like a scientist, test like a skeptic, and refactor like an artist.

Longest Common Subsequence

Given two strings text1 and text2, return the length of the longest common subsequence between the two strings if one exists, otherwise return 0.

subsequence is a sequence that can be derived from the given sequence by deleting some or no elements without changing the relative order of the remaining characters.

  • For example, "cat" is a subsequence of "crabt".

common subsequence of two strings is a subsequence that exists in both strings.

Input: text1 = "cat", text2 = "crabt"

Output: 3

Explanation: The longest common subsequence is "cat" which has a length of 3.

Input: text1 = "abcd", text2 = "abcd"

Output: 4

Input: text1 = "abcd", text2 = "efgh"

Output: 0

This problem asks us to find the longest common subsequence (LCS), which means the characters in the subsequence must appear in order, but not necessarily contiguously.

The most straightforward approach is to use recursion to traverse both strings and find the longest subsequence that appears in both.

For the first example, the function call tree (or recursion flow) looks like this:

dfs(i=0, j=0): 'c' == 'c' → match!
  ↳ dfs(1,1): 'a' != 'r'
    ├─ dfs(2,1): 't' != 'r'
    │   ├─ dfs(3,1): end → return 0
    │   └─ dfs(2,2): 't' != 'a'
    │       ├─ dfs(3,2): end
    │       └─ dfs(2,3): 't' == 'b'? → no
    │           ├─ dfs(3,3): end
    │           └─ dfs(2,4): 't' == 't' → YES!
    │               ↳ dfs(3,5): end → return 1
    └─ dfs(1,2): 'a' == 'a' → match!
        ↳ dfs(2,3): 't' != 'b'
            ├─ dfs(3,3): end
            └─ dfs(2,4): 't' == 't' → match!
                ↳ dfs(3,5): end → return 1

DFS Call - CSY

The DFS is implemented with memo as follows:

class Solution {
public:
    int longestCommonSubsequence(string text1, string text2) {
        int n = text1.size();
        int m = text2.size();

        vector<vector<string>> memo(n + 1, vector<string>(m + 1, ""));
        
        string lcs = dfs(0, 0, text1, text2, memo);

        return lcs.size();
    }

    string dfs(int i, int j, string t1, string t2, vector<vector<string>>& memo) {
        if (i == t1.size() || j == t2.size()) {
            return "";
        }

        if (!memo[i][j].empty()) return memo[i][j];

        if (t1[i] == t2[j]) {
            memo[i][j] = t1[i] + dfs(i + 1, j + 1, t1, t2, memo);
        } else {
            string t1_skip_str = dfs(i, j + 1, t1, t2, memo);
            string t2_skip_str = dfs(i + 1, j, t1, t2, memo);
            memo[i][j] = (t1_skip_str.size() > t2_skip_str.size() ? t1_skip_str : t2_skip_str);
        }

        return memo[i][j];
    }  
};

DFS with Memo - CSY

Initially, I tried solving it using a plain DFS without memoization, but that quickly ran into a Time Limit Exceeded (TLE) error.

To address this, I implemented a memoized version. If you look at the code, you'll notice that the base case depends on the lengths of text1 and text2. Whenever we find matching characters at the current positions, we move to the next index in both strings. Otherwise, we explore two options: either move forward in text1 or in text2, to see if a match can be found further ahead.

Now, let’s try the dynamic programming (DP) approach as well.

class Solution {
public:
    int longestCommonSubsequence(string text1, string text2) {
        int n = text1.size();
        int m = text2.size();

        vector<vector<int>> dp(n + 1, vector<int>(m + 1, 0));
        
        for (int i = 1; i <= n; ++i) {
            for (int j = 1; j <= m; ++j) {
                if (text1[i - 1] == text2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }

        return dp[n][m];
    }
};

DP - CSY

This solution is much clearer, as it doesn't rely on any additional function calls. Instead, it constructs a 2D table with dimensions based on the lengths of the two strings. You can easily visualize it as an x-y plane, where each cell represents the alignment of characters from text1 and text2. For example, the table for Example 1 looks like this:

"" c r a b t
"" 0 0 0 0 0 0
c 0 1 1 1 1 1
a 0 1 1 2 2 2
t 0 1 1 2 2 3

You can see how elegant this solution is—dp[i][0] represents characters from text1, and dp[0][j] represents those from text2. The matching sequence is embedded directly into the index positions, and the history of previous matches is preserved through the DP table. When you think about it, that’s pretty amazing, isn’t it?

Conclusion

In fact, this question reminds me of the Frozen Lake and Cliff Walking tasks in Gymnasium. The main difference is that I approached those tasks using the Bellman equation to approximate the solution.

That said, it's not surprising—Richard Sutton introduces Dynamic Programming even before formally presenting the Bellman equation in his book.

CSY

CSY

Nagoya, Japan