哈希表与堆

摘要

数据结构设计题的考点
数据结构设计题的时间复杂度计算方式
在线算法 vs 离线算法
堆 Heap 的实战真题
高级数据结构的 Cheat Sheet

数据结构类面试问题

数据结构可以认为是一个数据存储集合以及定义在这个集合上的若干操作 (功能)

三种考法:

问某种数据结构的基本原理，并要求实现
使用某种数据结构完成事情
实现一种数据结构，提供一些特定的功能

数据结构时间复杂度的衡量方法

数据结构通常会提供“多个”对外接口

只用一个时间复杂度是很难对其进行正确评价的

所以通常要对每个接口的时间复杂度进行描述

按照接口的使用频率做出设计的决策

哈希表 Hash

支持操作：O (1) Insert / O (1) Find / O (1) Delete

问：这些操作都是 O (1) 的前提条件是什么? A: 没有哈希冲突 B: key 是 O (1) 大小

答案是 B，哈希表一定是能够处理哈希冲突的，你不可能在 O (1) 的时间内判断 2 个 1 m 长的字符串是否相等

134 · LRU 缓存策略

新节点从尾部加入，老节点从头部移走

type linkedNode struct {
	key   int
	value int
	next  *linkedNode
}

type LRUCache struct {
	keyToPrev map[int]*linkedNode
	dummy     *linkedNode
	tail      *linkedNode
	capacity  int
	size      int
}

func NewLRUCache(capacity int) *LRUCache {
	dummy := &linkedNode{}
	c := &LRUCache{
		keyToPrev: make(map[int]*linkedNode),
		dummy:     dummy,
		tail:      dummy,
		capacity:  capacity,
	}
	return c
}

func (c *LRUCache) moveToTail(key int) {
	prev := c.keyToPrev[key]
	cur := prev.next

	if cur == c.tail {
		return
	}

	prev.next = prev.next.next
	c.tail.next = cur
	cur.next = nil

	if prev.next != nil {
		c.keyToPrev[prev.next.key] = prev
	}
	c.keyToPrev[cur.key] = c.tail

	c.tail = cur
}

func (c *LRUCache) Get(key int) int {
	if _, ok := c.keyToPrev[key]; !ok {
		return -1
	}
	c.moveToTail(key)
	return c.tail.value
}

func (c *LRUCache) Set(key, value int) {
	if c.Get(key) != -1 {
		c.tail.value = value
		return
	}

	if c.size < c.capacity {
		c.size++
		cur := &linkedNode{
			key:   key,
			value: value,
			next:  nil,
		}
		c.tail.next = cur
		c.keyToPrev[key] = c.tail
		c.tail = cur
		return
	}

	first := c.dummy.next
	delete(c.keyToPrev, first.key)

	first.key = key
	first.value = value
	c.keyToPrev[key] = c.dummy

	c.moveToTail(key)
}

657 · O (1) 实现数组插入/删除/随机访问

使用数组来保存当前集合中的元素，同时用一个 hashMap 来保存数字与它在数组中下标的对应关系。

插入操作时：
- 若已存在此元素返回 false
- 不存在时将新的元素插入数组最后一位，同时更新 hashMap。
删除操作时：
- 若不存在此元素返回 false
- 存在时先根据 hashMap 得到要删除数字的下标，再将数组的最后一个数放到需要删除的数的位置上，删除数组最后一位，同时更新 hashMap。
获取随机数操作时：
- 根据数组的长度来获取一个随机的下标，再根据下标获取元素。

type RandomizedSet struct {
	nums    []int
	num2idx map[int]int
	rand    *rand.Rand
}

func NewRandomizedSet() *RandomizedSet {
	return &RandomizedSet{
		nums:    []int{},
		num2idx: make(map[int]int),
		rand:    rand.New(rand.NewSource(time.Now().UnixNano())),
	}
}

// Insert inserts a value to the set
// Returns true if the set did not already contain the specified element
func (this *RandomizedSet) Insert(val int) bool {
	if _, ok := this.num2idx[val]; ok {
		return false
	}
	this.num2idx[val] = len(this.nums)
	this.nums = append(this.nums, val)
	return true
}

// Remove removes a value from the set
// Returns true if the set contained the specified element
func (this *RandomizedSet) Remove(val int) bool {
	idx, ok := this.num2idx[val]
	if !ok {
		return false
	}

	lastIdx := len(this.nums) - 1
	if idx < lastIdx {
		lastNum := this.nums[lastIdx]
		this.nums[idx] = lastNum
		this.num2idx[lastNum] = idx
	}
	this.nums = this.nums[:lastIdx]
	delete(this.num2idx, val)
	return true
}

// GetRandom get a random element from the set
func (this *RandomizedSet) GetRandom() int {
	return this.nums[this.rand.Intn(len(this.nums))]
}

954 · Insert Delete GetRandom O (1) - 允许重复

实现较为困难，看懂参考代码即可

685 · 数据流中第一个唯一的数字

给一个连续的数据流, 写一个函数返回终止数字到达时的第一个唯一数字 (包括终止数字), 如果找不到这个终止数字, 返回 -1.

func FirstUniqueNumber(nums []int, number int) int {
	counter := make(map[int]int)
	for _, num := range nums {
		counter[num]++
		if num == number {
			break
		}
	}
	if _, ok := counter[number]; !ok {
		return -1
	}
	for _, num := range nums {
		if counter[num] == 1 {
			return num
		}
		if num == number {
			break
		}
	}
	return -1
}

960 · 数据流中第一个独特的数 II

只遍历一次

实现一个具有加一个新的数和返回第一个独特的数两个方法的数据结构

type listNode struct {
	num  int
	next *listNode
}

type DataStream struct {
	dummy, tail *listNode
	num2Prev    map[int]*listNode
	duplicates  map[int]bool
}

func NewDataStream() *DataStream {
	node := &listNode{}
	ds := &DataStream{
		dummy:      node,
		tail:       node,
		num2Prev:   make(map[int]*listNode),
		duplicates: make(map[int]bool),
	}
	return ds
}

func (ds *DataStream) Add(num int) {
	if _, ok := ds.duplicates[num]; ok {
		return
	}

	if _, ok := ds.num2Prev[num]; !ok {
		ds.push(num)
		return
	}

	ds.duplicates[num] = true
	ds.remove(num)
}

func (ds *DataStream) push(num int) {
	ds.num2Prev[num] = ds.tail
	newNode := &listNode{
		num: num,
	}
	ds.tail.next = newNode
	ds.tail = newNode
}

func (ds *DataStream) remove(num int) {
	prev := ds.num2Prev[num]
	delete(ds.num2Prev, num)
	prev.next = prev.next.next

	if prev.next != nil {
		ds.num2Prev[prev.next.num] = prev
	} else {
		ds.tail = prev
	}
}

func (ds *DataStream) FirstUnique() int {
	if ds.dummy.next != nil {
		return ds.dummy.next.num
	}
	return -1
}

Data Stream 相关问题

https://www.lintcode.com/problem-tag/355/

数据流问题 = 数据只能遍历一次

Data Stream 大都和 Sliding Window 有关

哈希表的其他练习题

Heap

支持操作：O (log N) Add / O (log N) Remove / O (log N) Pop / O (1) Min or Max

Remove 取最后一个节点覆盖掉要删除的节点然后再调整堆

构建一个 heap 的时间复杂度：O (n)

4 · 丑数 II

设计一个算法，找出只含素因子 2，3，5 的第 n 小的数。

O (nlogn) HashMap + Heap (小顶堆)

import "container/heap"

/
 * @param n: An integer
 * @return: return a  integer as description.
 */
func NthUglyNumber(n int) int {
	h := &IntHeap{1}
	heap.Init(h)
	visited := make(map[int]bool)
	var val int
	// for range n {
	for i := 0; i < n; i++ {
		val = heap.Pop(h).(int)
		for _, factor := range []int{2, 3, 5} {
			cand := val * factor
			if !visited[cand] {
				visited[cand] = true
				heap.Push(h, cand)
			}
		}
	}
	return val
}

// An IntHeap is a min-heap of ints.
type IntHeap []int

func (h IntHeap) Len() int           { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int)      { h[i], h[j] = h[j], h[i] }

func (h *IntHeap) Push(x interface{}) {
	// Push and Pop use pointer receivers because they modify the slice's length,
	// not just its contents.
	*h = append(*h, x.(int))
}

func (h *IntHeap) Pop() interface{} {
	old := *h
	n := len(old)
	x := old[n-1]
	*h = old[0 : n-1]
	return x
}

三个指针法

用三个指针分别指向没有乘过 2, 没有乘过 3, 没有乘过 5 的第一个数

优点：时间复杂度低 O (n)

缺点：思维复杂度比较高

func NthUglyNumber(n int) int {
	uglys := []int{1}

	p2, p3, p5 := 0, 0, 0

	for i := 1; i < n; i++ {
		lastNumber := uglys[i-1]
		for uglys[p2]*2 <= lastNumber {
			p2++
		}
		for uglys[p3]*3 <= lastNumber {
			p3++
		}
		for uglys[p5]*5 <= lastNumber {
			p5++
		}

		nextUgly := min(uglys[p2]*2, uglys[p3]*3, uglys[p5]*5)
		uglys = append(uglys, nextUgly)
	}

	return uglys[n-1]
}

在线算法 vs 离线算法

在线算法 = 数据结构设计类问题 = 数据流问题 = 数据不可二次访问 = 多次输入和输出

离线算法 = 一次输入输出 = 数据是一开始给定的 = 数据可以多次访问

612 · K 个最近的点

Top K 问题离线算法

Microsoft / Apple / Facebook

Quick Select

计算所有的点到原点的 distance – O (n)
Quick Select 去找到 k-th smallest distance – O (n)
遍历所有 distance 找到 top k smallest distance – O (n)
找到的 top k smallest points 按 distance 排序并返回 – O (klogk)

总体时间复杂度 – O (n + klogk)

545 · K 个最近的点 II

	quick select	max heap	min heap
add	O (1)	O (logn)	O (logk)
top k	O (n+klogk)	O (klogn)	O (klogk)

# @param {int} k an integer
def __init__(self, k):
    self.k = k
    self.heap = []
    
# @param {int} num an integer
def add(self, num):
    heapq.heappush(self.heap, num)
    if len(self.heap) > self.k:
        heapq.heappop(self.heap)

# @return {int[]} the top k largest numbers in array
def topk(self):
    return sorted(self.heap, reverse=True)

http://www.lintcode.com/en/problem/high-five/

http://www.lintcode.com/problem/merge-k-sorted-arrays/

http://www.lintcode.com/problem/data-stream-median/

http://www.lintcode.com/problem/top-k-largest-numbers/

http://www.lintcode.com/problem/kth-smallest-number-in-sorted-matrix/

独孤九剑 —— 破掌式

高级数据结构 Cheat Sheet

数据结构知识点及考察频率 Cheat Sheet Part 1

面试算法知识点	考察情况	学习难度	最少刷题数
链表 LinkedList	中小公司考得多，大公司近年来考得少。题目一般不难，主要考察 Reference	低	20
二叉树 Binary Search	中小公司考得多，大公司近年来考得少。题目一般不难，主要考察 Reference	低	20
堆 Heap	高频，经常会用到，原理必须掌握，但不用掌握代码实现。应用必须掌握代码。	中	5
哈希表 Hash Table	高频，原理和应用都需要掌握且需要掌握代码实现。	中	10
线段树 Segment Tree	不太考，有的题目存在多种解法的时候 Segment Tree 可以帮上忙降低思考难度	中	3
树状数组 Binary Indexed Tree	不太考，与其学这个不如学线段树	中	2
跳跃表 Skip List	不太考，需要大致知道原理，分布式数据库里会用到这个数据结构	难	1
字典树 Trie	考察频率中等，跟单词有关的问题一般多多少少都可以用到去优化，可替代哈希表	中	3
并查集 Union Find	考察频率中等，主要是 G 和 F 可能会考，不会的话很多时候可以用 BFS 替代	中	3
红黑树 RB-Tree	只有 G 可能会问到，也只是问大致原理，能干啥，Java 会用 TreeMap 就行	难	1

数据结构知识点及考察频率 Cheat Sheet Part 2

面试算法知识点	能干哪些事情，复杂度如何
数组 Array	O (1) append, update (知道 index) O (n) delete (移动后面数的补空), find
链表 LinkedList	O (1) insert, delete (必须知道前面的点), O (n) find
二叉树 Binary Search	最坏 O (n) 增删查改, 注意二叉树的高度不是 O (logn), 是 O (n) 的
堆 Heap	O (logn) push, delete, pop, O (1) top
哈希表 Hash Table	O (1) 增删查改, 如果 key 是字符串那就是 O (L) 增删查改, L 是字符串长度
线段树 Segment Tree	O (logn) insert, delete, find, update, rangeMax, rangeMin, rangeSum, lowerBound, upperBound, O (1) min, max, sum, 万能数据结构
树状数组 Binary Indexed Tree	O (logn) rangeSum
跳跃表 Skip List	O (logn) insert, delete, find, update, lowerBound, upperBound O (1) max, min
红黑树 RB-Tree	O (logn) insert, delete, find, update, lowerBound, upperBound O (1) max, min
字典树 Trie	O (L) insert, delete, find, update, L 是字符串长度
并查集 Union Find	O (1) find, union, isConnected, getSize

本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

摘要