leetCode Question: Remove Invalid Parentheses

Remove Invalid Parentheses

Remove the minimum number of invalid parentheses in order to make the input string valid. Return all possible results.

Note: The input string may contain letters other than the parentheses ( and ).

Examples:
"()())()" -> ["()()()", "(())()"]
"(a)())()" -> ["(a)()()", "(a())()"]
")(" -> [""]

Analysis:

In this post, I will firstly present the very basic yet workable solutions using BFS, which is implemented in C++.

Then I will explain the DFS solution which is much faster but needs a little more thinking. It is implemented in python.

BFS

First of all, notice that the problem requires all possible results, thus a search algorithm (BFS, or DFS) often would be a good option.

Considering the BFS solution, in this problem we will expand a "tree" with all its child nodes from one level to the other, until we find the leaf nodes we want. So given a string s, the mimimum number of deletion is 0, which is the string itself, also it is the root node of our tree. If it is a valid string, we're done.

Then we expand the tree by delete only 1 parentheses, each child node is one string where 1 parentheses is deleted from the root. Therefore, nodes in this level of tree, represents all possible strings where only 1 parentheses is removed.

We could check each new string and see if it is a valid one. If in current level there is at lease one valid string, the search is terminated after this level is done. Since the minimum of deletion is current level depth, while the possible valid string are all in this level.

For the implementation, we should have:

  • A function to check if the string is valid or not.
  • A queue for the BFS, which stores the nodes in the same level.
    • Here I use a "map" which we will eliminate the duplicate strings, so that the number of "check" could be reduced.
      Please check the C++ code below for more details.

DFS

Some will notice that in the BFS solution above, the bottleneck is the number of calls for "check". Each time we have to check all the nodes in current level and then go to next level. Alternatively, we could try using DFS.

To reduce the number of "check" being called, we could store the state (true or false) of nodes in each search path. So in next search if we go to the same node, we already have the states, which we don't have to check again.

Also, we could firstly compute the number of '(' and ')' to be removed by counting the parentheses in original string. Notice that after the removal, we still have to check the result string again.

Code (C++):

class Solution {
public:
bool check(string s, int k){
int c = 0;
for (int i=0;i<s.size();i++){
if (i!=k && s[i] == '('){ c+=1; }
if (i!=k && s[i] == ')'){ c-=1; if (c<0){return false;}}
}
return c==0;
}
vector<string> removeInvalidParentheses(string s) {
vector<string> res;
unordered_map<string, bool> mp1;
unordered_map<string, bool> mp2;
if (check(s,-1)){
res.push_back(s);
return res;
}
mp1[s] = true;
while (!mp1.empty()){
bool check_flag = false;
for (auto it1 = mp1.begin();it1!=mp1.end();it1++){
string tmp = it1->first;
for (int i=0; i < tmp.size(); i++){
string tmp2 = tmp;
tmp2.erase(i,1);
if (check(tmp, i) && mp2.find(tmp2) == mp2.end()){
res.push_back(tmp2);
check_flag = true;
}
mp2[tmp2] = true;
}
}
if (check_flag){
return res;
}else{
mp1 = mp2;
mp2.clear();
}
}
return res;
}
};

Code (Python):

class Solution(object):
def check(self, s):
c = 0
for ch in s:
if ch == '(':
c += 1
if ch == ')':
c -= 1
if c < 0:
return False
return c == 0
def dfs(self, s, d_l, d_r, res, hist):
if d_l == 0 and d_r == 0 :
if not s in res and self.check(s):
res[s] = True
elif s == "":
return
else:
for i in range(len(s)):
if not s[0:i] + s[i+1:] in hist:
hist[s[0:i] + s[i+1:]] = True
if s[i] == '(' and d_l > 0:
self.dfs(s[0:i] + s[i+1:], d_l-1, d_r, res, hist)
if s[i] == ')' and d_r > 0:
self.dfs(s[0:i] + s[i+1:], d_l, d_r-1, res, hist)
def removeInvalidParentheses(self, s):
"""
:type s: str
:rtype: List[str]
"""
res = {}
hist = {}
d_r = 0 # number of '(' to be deleted
d_l = 0 # number of ')' to be deleted
for ch in s:
if ch == '(':
d_l += 1
if ch == ')':
if d_l > 0:
d_l -= 1
else:
d_r += 1
self.dfs(s, d_l, d_r, res, hist)
return res.keys()

leetCode Question: Find Median from Data Stream

Find Median from Data Stream

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Examples:
[2,3,4] , the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.
For example:

add(1)
add(2)
findMedian() -> 1.5
add(3)
findMedian() -> 2

Analysis:

The problem give us a data stream instead of a regular data array, and ask to get the median of current data. An obvious but not efficient way of getting median is by either sorting the data ( O( n log n ) time ) or inserting the data in ordered data and ( O( n ) time ). In such scenarios, it's quite slow when findMedian is called very often. Since we have already all the relations between previous data, it would be much more efficient if we could utilize that when adding new data.

Here please refer my previous post (see heap sort) for the basic concept in heap, and it is this very useful data structure that can solve this problem very efficiently ( O( log n ) time to add data, O( 1 ) time to find median).

Specifically in this problem, we still have to figure out how to utilze heap to find the median.

The definition of median can be roughly viewed as the middle one or two values of an ordered list. In other words, the ordered list can be divided into two parts, one part contains all the values less than the median, while values in the other part are all greater than the median. No matter the size of list is even or not, the size of two sub-lists divided by the median must be the same.

OK, now let's see what a heap can help us. We know that a min heap is saving the smallest vaule at root, all the other values are greater than the root. A max heap is saving the largest value at root, with all the child nodes smaller than the root.

Thus, a max heap and a min heap now seems perfect fit for the two sub lists divided by the median. Even, for the median, we don't have to know the full order of the sub lists. We only care about how many elements are greater or smaller than the median.

In order to keep the two heap balanced, once we met a new number, intuitively we should add to each heap alternately one by one. However, we don't want to see any numbers in the max heap is greater than any number in min heap. So we have to adjust the heap after each insertion. Sepecifically,

  • If current size is even, we try to add the new element to the min heap
    • If the new number to be added is greater than the root in min heap. OK, we push the number into min heap.
    • However, if the new number to be added is smaller than the largest number in the other heap (root node in max heap), this number should be in the max heap but not the min heap we intend to add to. So we add the new number to max heap instead of min heap. But, we still want to keep both heaps balanced in size. So we pop the top number in max heap (which we added the new element into), and push it into the min heap (which we intended to add one into in this round).
  • If current size is odd, we try to add the new element to the max heap
    • If the new number to be added is smaller than the root in max heap. OK, we push the number into max heap.
    • Similarly, if the new number to be added is greater than the smallest number in the other heap. We stil have to the same procedure according to the above case.
  • To get the median, if current data size is even, we choose the root of both heap and take average. If current data size is odd, we just return the root of min heap.

For the implementation, in C++, we can use priority queue, which is implemented using heap, it has methods such as push, pop, and top, and we don't have to worry about the node downshift or deletion in detail. The constructor of priority queue provides arguments to set the std::geater for the max heap.

In python, for simplicity, I use headq module which could apply heap operations on list, e.g.,headq.heappush(h, num) is to push num into list h using heap operation. For max heap, I just take a negative of value and insert into a min heap to simulate a max heap.

Code (C++):

class MedianFinder {
public:
// Constructor
MedianFinder(){
count = 0;
}
// Adds a number into the data structure.
void addNum(int num) {
if (count % 2 == 0){
if (minHeap.empty()){
minHeap.push(num);
}else{
if (num <= maxHeap.top()){
maxHeap.push(num);
minHeap.push(maxHeap.top());
maxHeap.pop();
}else {
minHeap.push(num);
}
}
}else{
if (num >= minHeap.top()){
minHeap.push(num);
maxHeap.push(minHeap.top());
minHeap.pop();
}else{
maxHeap.push(num);
}
}
count ++;
// cout << "count = " << count<< endl;
// cout << "maxHeap: ";
// print_queue(maxHeap);
// cout << endl << "minHeap: ";
// print_queue(minHeap);
// cout << endl;
}
// Returns the median of current data stream
double findMedian() {
return count % 2 == 0 ? double(minHeap.top() + maxHeap.top()) / 2 : minHeap.top();
}
private:
priority_queue<int> maxHeap;
priority_queue<int, vector<int>, std::greater<int> > minHeap;
int count;
};
// Your MedianFinder object will be instantiated and called as such:
// MedianFinder mf;
// mf.addNum(1);
// mf.findMedian();

Code (Python):

class MedianFinder:
def __init__(self):
"""
Initialize your data structure here.
"""
import heapq #import heapq to support heap operations on list
self.count = 0
self.minHeap = [] # default heapq is smallest element as root
self.maxHeap = [] # modify the heapq in max heap by making nums negative (* -1), whenever push in or pop out from the maxHeap
def addNum(self, num):
"""
Adds a num into the data structure.
:type num: int
:rtype: void
"""
if self.count % 2 == 0:
if len(self.minHeap) == 0:
heapq.heappush(self.minHeap, num)
else:
if num <= -self.maxHeap[0]:
heapq.heappush(self.maxHeap, -num)
heapq.heappush(self.minHeap, -heapq.heappop(self.maxHeap))
else:
heapq.heappush(self.minHeap, num)
else:
if num >= self.minHeap[0]:
heapq.heappush(self.minHeap, num)
heapq.heappush(self.maxHeap, -heapq.heappop(self.minHeap))
else:
heapq.heappush(self.maxHeap, -num)
self.count += 1
def findMedian(self):
"""
Returns the median of current data stream
:rtype: float
"""
if self.count % 2 == 0:
return (float(self.minHeap[0]) + float(-self.maxHeap[0])) / 2.0
else:
return float(self.minHeap[0])
# Your MedianFinder object will be instantiated and called as such:
# mf = MedianFinder()
# mf.addNum(1)
# mf.findMedian()

leetCode Question: Longest Increasing Subsequence

Longest Increasing Subsequence

Given an unsorted array of integers, find the length of longest increasing subsequence.

For example,
Given [10, 9, 2, 5, 3, 7, 101, 18],
The longest increasing subsequence is [2, 3, 7, 101], therefore the length is 4. Note that there may be more than one LIS combination, it is only necessary for you to return the length.

Your algorithm should run in O(n2) complexity.

Follow up: Could you improve it to O(n log n) time complexity?

Analysis:

The simple but relatively slow solution is classic DP (dynamic programming) approach. For now, I will analyze this solution in detail and breifly introduce the n log n algorithm.

For each index i in the array nums, we could have an array res to store the length of longest increasing subsequence from nums[ 0 ] to nums[ i ]. For index i, we have multiple numbers from nums[ 0 ] to nums[ i-1 ], also, we have multiple results from res[ 0 ] to res[ i ]. Denote j is index from 0 to i-1, for each possible j, if nums[ i ] > nums[ j ] (current number > one previous number), it means the longest increasing subsequence that ends at num[ j ] + num[ i ] is a valid increasing subsequence, and the length of it is res[ j ] + 1. So if we got the longest length of all possible j (from 0 to i-1), it should be the longest length in current index i. The outer loop for i is just the scan the whole nums array from the begining to the end. In the end, the maximum value in array res, is the final longest length we want.


In the figure above, blue color means the data array (green font mean the actural longest increading subseuqnce), red color means the result array storing the longest length.

In the above solution, everytime we have to look back and check from 0 to current index - 1 for updating the current index. This look back part is most time consuming. So for the n log( n ) algorithm, we have to modify the look back part. Instead of looking back, we could have a new array s, s[ i ] stores the minimum A[i] when the len of longest increaing seq is d[ i ]. It is not hard to observe array s is ordered. Then, for each nums [ i ], we search the array s[ i ], to find the index j where s[ j ] is less than but closest to nums[ i ]. The index j is then the result.

Code (C++) (Simple DP):

class Solution {
public:
int lengthOfLIS(vector<int>& nums) {
if (nums.size()==0){return 0;}
int res_max = 1;
vector<int> res(nums.size(), 1);
for (int i=1;i< nums.size();i++){
for (int j = 0; j < i; j++){
if (nums[i] > nums[j]){
res[i] = max(res[i], res[j] + 1);
}
}
//cout << res[i] << ", ";
res_max = max(l, res[i]);
}
return res_max;
}
};

Code (Python) (Simple DP):

class Solution(object):
def lengthOfLIS(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
if len(nums) == 0:
return 0
res_max = 1
res = [1]*len(nums)
for i in range(len(nums)):
for j in range(0,i):
if nums[i] > nums[j]:
res[i] = max(res[j]+1, res[i])
res_max = max(res_max, res[i])
return res_max