Nicksxs's Blog

What hurts more, the pain of hard work or the pain of regret?

0%

上一篇聊了mysql 的 innodb 引擎基于 read view 实现的 mvcc 和事务隔离级别,可能有些细心的小伙伴会发现一些问题,第一个是在 RC 级别下的事务提交后的可见性,这里涉及到了三个参数,m_low_limit_id,m_up_limit_id,m_ids,之前看到知乎的一篇写的非常不错的文章,但是就在这一点上似乎有点疑惑,这里基于源码和注释来解释下这个问题

1
2
3
4
5
6
7
8
9
10
11
/**
Opens a read view where exactly the transactions serialized before this
point in time are seen in the view.
@param id Creator transaction id */

void ReadView::prepare(trx_id_t id) {
ut_ad(mutex_own(&trx_sys->mutex));

m_creator_trx_id = id;

m_low_limit_no = m_low_limit_id = m_up_limit_id = trx_sys->max_trx_id;

m_low_limit_id赋的值是trx_sys->max_trx_id,代表的是当前系统最小的未分配的事务 id,所以呢,举个例子,当前有三个活跃事务,事务 id 分别是 100,200,300,而 m_up_limit_id = 100, m_low_limit_id = 301,当事务 id 是 200 的提交之后,它的更新就是可以被 100 和 300 看到,而不是说 m_ids 里没了 200,并且 200 比 100 大就应该不可见了

幻读

还有一个问题是幻读的问题,这貌似也是个高频面试题,啥意思呢,或者说跟它最常拿来比较的脏读,脏读是指读到了别的事务未提交的数据,因为未提交,严格意义上来讲,不一定是会被最后落到库里,可能会回滚,也就是在 read uncommitted 级别下会出现的问题,但是幻读不太一样,幻读是指两次查询的结果数量不一样,比如我查了第一次是 select * from table1 where id < 10 for update,查出来了一条结果 id 是 5,然后再查一下发现出来了一条 id 是 5,一条 id 是 7,那是不是有点尴尬了,其实呢这个点我觉得脏读跟幻读也比较是从原理层面来命名,如果第一次接触的同学发觉有点不理解也比较正常,因为从逻辑上讲总之都是数据不符合预期,但是基于源码层面其实是不同的情况,幻读是在原先的 read view 无法完全解决的,怎么解决呢,简单的来说就是锁咯,我们知道innodb 是基于 record lock 行锁的,但是貌似没有办法解决这种问题,那么 innodb 就引入了 gap lock 间隙锁,比如上面说的情况下,id 小于 10 的情况下,是都应该锁住的,gap lock 其实是基于索引结构来锁的,因为索引树除了树形结构之外,还有一个next record 的指针,gap lock 也是基于这个来锁的
看一下 mysql 的文档

SELECT … FOR UPDATE sets an exclusive next-key lock on every record the search encounters. However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.
对于一个 for update 查询,在 RR 级别下,会设置一个 next-key lock在每一条被查询到的记录上,next-lock 又是啥呢,其实就是 gap 锁和 record 锁的结合体,比如我在数据库里有 id 是 1,3,5,7,10,对于上面那条查询,查出来的结果就是 1,3,5,7,那么按照文档里描述的,对于这几条记录都会加上next-key lock,也就是(-∞, 1], (1, 3], (3, 5], (5, 7], (7, 10) 这些区间和记录会被锁起来,不让插入,再唠叨一下呢,就是其实如果是只读的事务,光 read view 一致性读就够了,如果是有写操作的呢,就需要锁了。

很久以前,有位面试官问到,你知道 mysql 的事务隔离级别吗,“额 O__O …,不太清楚”,完了之后我就去网上找相关的文章,找到了这篇MySQL 四种事务隔离级的说明, 文章写得特别好,看了这个就懂了各个事务隔离级别都是啥,不过看了这个之后多思考一下的话还是会发现问题,这么神奇的事务隔离级别是怎么实现的呢

其中 innodb 的事务隔离用到了标题里说到的 mvcc,Multiversion concurrency control, 直译过来就是多版本并发控制,先不讲这个究竟是个啥,考虑下如果纯猜测,这个事务隔离级别应该会是怎么样实现呢,愚钝的我想了下,可以在事务开始的时候拷贝一个表,这个可以支持 RR 级别,RC 级别就不支持了,而且要是个非常大的表,想想就不可行

腆着脸说虽然这个不可行,但是思路是对的,具体实行起来需要做一系列(肥肠多)的改动,首先根据我的理解,其实这个拷贝一个表是变成拷贝一条记录,但是如果有多个事务,那就得拷贝多次,这个问题其实可以借助版本管理系统来解释,在用版本管理系统,git 之类的之前,很原始的可能是开发完一个功能后,就打个压缩包用时间等信息命名,然后如果后面要找回这个就直接用这个压缩包的就行了,后来有了 svn,git 中心式和分布式的版本管理系统,它的一个特点是粒度可以控制到文件和代码行级别,对应的我们的 mysql 事务是不是也可以从一开始预想的表级别细化到行的级别,可能之前很多人都了解过,数据库的一行记录除了我们用户自定义的字段,还有一些额外的字段,去源码data0type.h里捞一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/* Precise data types for system columns and the length of those columns;
NOTE: the values must run from 0 up in the order given! All codes must
be less than 256 */
#define DATA_ROW_ID 0 /* row id: a 48-bit integer */
#define DATA_ROW_ID_LEN 6 /* stored length for row id */

/** Transaction id: 6 bytes */
constexpr size_t DATA_TRX_ID = 1;

/** Transaction ID type size in bytes. */
constexpr size_t DATA_TRX_ID_LEN = 6;

/** Rollback data pointer: 7 bytes */
constexpr size_t DATA_ROLL_PTR = 2;

/** Rollback data pointer type size in bytes. */
constexpr size_t DATA_ROLL_PTR_LEN = 7;

一个是 DATA_ROW_ID,这个是在数据没指定主键的时候会生成一个隐藏的,如果用户有指定主键就是主键了

一个是 DATA_TRX_ID,这个表示这条记录的事务 ID

还有一个是 DATA_ROLL_PTR 指向回滚段的指针

指向的回滚段其实就是我们常说的 undo log,这里面的具体结构就是个链表,在 mvcc 里会使用到这个,还有就是这个 DATA_TRX_ID,每条记录都记录了这个事务 ID,表示的是这条记录的当前值是被哪个事务修改的,下面就扯回事务了,我们知道 Read Uncommitted, 其实用不到隔离,直接读取当前值就好了,到了 Read Committed 级别,我们要让事务读取到提交过的值,mysql 使用了一个叫 read view 的玩意,它里面有这些值是我们需要注意的,

m_low_limit_id, 这个是 read view 创建时最大的活跃事务 id

m_up_limit_id, 这个是 read view 创建时最小的活跃事务 id

m_ids, 这个是 read view 创建时所有的活跃事务 id 数组

m_creator_trx_id 这个是当前记录的创建事务 id

判断事务的可见性主要的逻辑是这样,

  1. 当记录的事务 id 小于最小活跃事务 id,说明是可见的,
  2. 如果记录的事务 id 等于当前事务 id,说明是自己的更改,可见
  3. 如果记录的事务 id 大于最大的活跃事务 id, 不可见
  4. 如果记录的事务 id 介于 m_low_limit_idm_up_limit_id 之间,则要判断它是否在 m_ids 中,如果在,不可见,如果不在,表示已提交,可见
    具体的代码捞一下看看
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    /** Check whether the changes by id are visible.
    @param[in] id transaction id to check against the view
    @param[in] name table name
    @return whether the view sees the modifications of id. */
    bool changes_visible(trx_id_t id, const table_name_t &name) const
    MY_ATTRIBUTE((warn_unused_result)) {
    ut_ad(id > 0);

    if (id < m_up_limit_id || id == m_creator_trx_id) {
    return (true);
    }

    check_trx_id_sanity(id, name);

    if (id >= m_low_limit_id) {
    return (false);

    } else if (m_ids.empty()) {
    return (true);
    }

    const ids_t::value_type *p = m_ids.data();

    return (!std::binary_search(p, p + m_ids.size(), id));
    }
    剩下来一点是啥呢,就是 Read CommittedRepeated Read 也不一样,那前面说的 read view 都能支持吗,又是怎么支持呢,假如这个 read view 是在事务一开始就创建,那好像能支持的只是 RR 事务隔离级别,其实呢,这是通过创建 read view的时机,对于 RR 级别,就是在事务的第一个 select 语句是创建,对于 RC 级别,是在每个 select 语句执行前都是创建一次,那样就可以保证能读到所有已提交的数据

LRU

说完了过期策略再说下淘汰策略,redis 使用的策略是近似的 lru 策略,为什么是近似的呢,先来看下什么是 lru,看下 wiki 的介绍
,图中一共有四个槽的存储空间,依次访问顺序是 A B C D E D F,
当第一次访问 D 时刚好占满了坑,并且值是 4,这个值越小代表越先被淘汰,当 E 进来时,看了下已经存在的四个里 A 是最小的,代表是最早存在并且最早被访问的,那就先淘汰它了,E 占领了 A 的位置,并设置值为 4,然后又访问 D 了,D 已经存在了,不过又被访问到了,得更新值为 5,然后是 F 进来了,这时 B 是最老的且最近未被访问,所以就淘汰它了。以上是一个 lru 的简要说明,但是 redis 没有严格按照这个去执行,理由跟前面过期策略一致,最严格的过期策略应该是每个 key 都有对应的定时器,当超时时马上就能清除,但是问题是这样的cpu 消耗太大,所换来的内存效率不太值得,淘汰策略也是这样,类似于上图,要维护所有 key 的一个有序 lru 值,并且遍历将最小的淘汰,redis 采用的是抽样的形式,最初的实现方式是随机从 dict 抽取 5 个 key,淘汰一个 lru 最小的,这样子勉强能达到淘汰的目的,但是效果不是特别好,后面在 redis 3.0开始,将随机抽取改成了维护一个 pool,pool 的大小默认是 16,每次放入的都是按lru 值有序排列好,每一次放入的必须是 lru小于 pool 中最小的 lru 才允许放入,直到放满,后面再有新的就会将大的踢出。
redis 针对这个策略的改进做了一个实验,这里借用下图

首先背景是这图中的所有点都对应一个 redis 的 key,灰色部分加入后被顺序访问过一遍,然后又加入了绿色部分,那么按照理论的 lru 算法,应该是图左上中,浅灰色部分全都被淘汰,那么对比来看看图右上,左下和右下,左下表示 2.8 版本就是随机抽样 5 个 key,淘汰其中 lru 最小的一个,发现是灰色和浅灰色的都有被淘汰的,右下的 3.0 版本抽样数量不变的情况下,稍好一些,当 3.0 版本的抽样数量调整成 10 后,已经较为接近理论上的 lru 策略了,通过代码来简要分析下

1
2
3
4
5
6
7
8
9
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;

对于 lru 策略来说,lru 字段记录的就是redisObj 的LRU time,
redis 在访问数据时,都会调用lookupKey方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* Low level key lookup API, not actually called directly from commands
* implementations that should instead rely on lookupKeyRead(),
* lookupKeyWrite() and lookupKeyReadWithFlags(). */
robj *lookupKey(redisDb *db, robj *key, int flags) {
dictEntry *de = dictFind(db->dict,key->ptr);
if (de) {
robj *val = dictGetVal(de);

/* Update the access time for the ageing algorithm.
* Don't do it if we have a saving child, as this will trigger
* a copy on write madness. */
if (!hasActiveChildProcess() && !(flags & LOOKUP_NOTOUCH)){
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
// 这个是后面一节的内容
updateLFU(val);
} else {
// 对于这个分支,访问时就会去更新 lru 值
val->lru = LRU_CLOCK();
}
}
return val;
} else {
return NULL;
}
}
/* This function is used to obtain the current LRU clock.
* If the current resolution is lower than the frequency we refresh the
* LRU clock (as it should be in production servers) we return the
* precomputed value, otherwise we need to resort to a system call. */
unsigned int LRU_CLOCK(void) {
unsigned int lruclock;
if (1000/server.hz <= LRU_CLOCK_RESOLUTION) {
// 如果服务器的频率server.hz大于 1 时就是用系统预设的 lruclock
lruclock = server.lruclock;
} else {
lruclock = getLRUClock();
}
return lruclock;
}
/* Return the LRU clock, based on the clock resolution. This is a time
* in a reduced-bits format that can be used to set and check the
* object->lru field of redisObject structures. */
unsigned int getLRUClock(void) {
return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;
}

redis 处理命令是在这里processCommand

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/* If this function gets called we already read a whole
* command, arguments are in the client argv/argc fields.
* processCommand() execute the command or prepare the
* server for a bulk read from the client.
*
* If C_OK is returned the client is still alive and valid and
* other operations can be performed by the caller. Otherwise
* if C_ERR is returned the client was destroyed (i.e. after QUIT). */
int processCommand(client *c) {
moduleCallCommandFilters(c);



/* Handle the maxmemory directive.
*
* Note that we do not want to reclaim memory if we are here re-entering
* the event loop since there is a busy Lua script running in timeout
* condition, to avoid mixing the propagation of scripts with the
* propagation of DELs due to eviction. */
if (server.maxmemory && !server.lua_timedout) {
int out_of_memory = freeMemoryIfNeededAndSafe() == C_ERR;
/* freeMemoryIfNeeded may flush slave output buffers. This may result
* into a slave, that may be the active client, to be freed. */
if (server.current_client == NULL) return C_ERR;

/* It was impossible to free enough memory, and the command the client
* is trying to execute is denied during OOM conditions or the client
* is in MULTI/EXEC context? Error. */
if (out_of_memory &&
(c->cmd->flags & CMD_DENYOOM ||
(c->flags & CLIENT_MULTI &&
c->cmd->proc != execCommand &&
c->cmd->proc != discardCommand)))
{
flagTransaction(c);
addReply(c, shared.oomerr);
return C_OK;
}
}
}

这里只摘了部分,当需要清理内存时就会调用, 然后调用了freeMemoryIfNeededAndSafe

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
/* This is a wrapper for freeMemoryIfNeeded() that only really calls the
* function if right now there are the conditions to do so safely:
*
* - There must be no script in timeout condition.
* - Nor we are loading data right now.
*
*/
int freeMemoryIfNeededAndSafe(void) {
if (server.lua_timedout || server.loading) return C_OK;
return freeMemoryIfNeeded();
}
/* This function is periodically called to see if there is memory to free
* according to the current "maxmemory" settings. In case we are over the
* memory limit, the function will try to free some memory to return back
* under the limit.
*
* The function returns C_OK if we are under the memory limit or if we
* were over the limit, but the attempt to free memory was successful.
* Otehrwise if we are over the memory limit, but not enough memory
* was freed to return back under the limit, the function returns C_ERR. */
int freeMemoryIfNeeded(void) {
int keys_freed = 0;
/* By default replicas should ignore maxmemory
* and just be masters exact copies. */
if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;

size_t mem_reported, mem_tofree, mem_freed;
mstime_t latency, eviction_latency;
long long delta;
int slaves = listLength(server.slaves);

/* When clients are paused the dataset should be static not just from the
* POV of clients not being able to write, but also from the POV of
* expires and evictions of keys not being performed. */
if (clientsArePaused()) return C_OK;
if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
return C_OK;

mem_freed = 0;

if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
goto cant_free; /* We need to free memory, but policy forbids. */

latencyStartMonitor(latency);
while (mem_freed < mem_tofree) {
int j, k, i;
static unsigned int next_db = 0;
sds bestkey = NULL;
int bestdbid;
redisDb *db;
dict *dict;
dictEntry *de;

if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
{
struct evictionPoolEntry *pool = EvictionPoolLRU;

while(bestkey == NULL) {
unsigned long total_keys = 0, keys;

/* We don't want to make local-db choices when expiring keys,
* so to start populate the eviction pool sampling keys from
* every DB. */
for (i = 0; i < server.dbnum; i++) {
db = server.db+i;
dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
db->dict : db->expires;
if ((keys = dictSize(dict)) != 0) {
evictionPoolPopulate(i, dict, db->dict, pool);
total_keys += keys;
}
}
if (!total_keys) break; /* No keys to evict. */

/* Go backward from best to worst element to evict. */
for (k = EVPOOL_SIZE-1; k >= 0; k--) {
if (pool[k].key == NULL) continue;
bestdbid = pool[k].dbid;

if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
de = dictFind(server.db[pool[k].dbid].dict,
pool[k].key);
} else {
de = dictFind(server.db[pool[k].dbid].expires,
pool[k].key);
}

/* Remove the entry from the pool. */
if (pool[k].key != pool[k].cached)
sdsfree(pool[k].key);
pool[k].key = NULL;
pool[k].idle = 0;

/* If the key exists, is our pick. Otherwise it is
* a ghost and we need to try the next element. */
if (de) {
bestkey = dictGetKey(de);
break;
} else {
/* Ghost... Iterate again. */
}
}
}
}

/* volatile-random and allkeys-random policy */
else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
{
/* When evicting a random key, we try to evict a key for
* each DB, so we use the static 'next_db' variable to
* incrementally visit all DBs. */
for (i = 0; i < server.dbnum; i++) {
j = (++next_db) % server.dbnum;
db = server.db+j;
dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
db->dict : db->expires;
if (dictSize(dict) != 0) {
de = dictGetRandomKey(dict);
bestkey = dictGetKey(de);
bestdbid = j;
break;
}
}
}

/* Finally remove the selected key. */
if (bestkey) {
db = server.db+bestdbid;
robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
/* We compute the amount of memory freed by db*Delete() alone.
* It is possible that actually the memory needed to propagate
* the DEL in AOF and replication link is greater than the one
* we are freeing removing the key, but we can't account for
* that otherwise we would never exit the loop.
*
* AOF and Output buffer memory will be freed eventually so
* we only care about memory used by the key space. */
delta = (long long) zmalloc_used_memory();
latencyStartMonitor(eviction_latency);
if (server.lazyfree_lazy_eviction)
dbAsyncDelete(db,keyobj);
else
dbSyncDelete(db,keyobj);
latencyEndMonitor(eviction_latency);
latencyAddSampleIfNeeded("eviction-del",eviction_latency);
latencyRemoveNestedEvent(latency,eviction_latency);
delta -= (long long) zmalloc_used_memory();
mem_freed += delta;
server.stat_evictedkeys++;
notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
keyobj, db->id);
decrRefCount(keyobj);
keys_freed++;

/* When the memory to free starts to be big enough, we may
* start spending so much time here that is impossible to
* deliver data to the slaves fast enough, so we force the
* transmission here inside the loop. */
if (slaves) flushSlavesOutputBuffers();

/* Normally our stop condition is the ability to release
* a fixed, pre-computed amount of memory. However when we
* are deleting objects in another thread, it's better to
* check, from time to time, if we already reached our target
* memory, since the "mem_freed" amount is computed only
* across the dbAsyncDelete() call, while the thread can
* release the memory all the time. */
if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
/* Let's satisfy our stop condition. */
mem_freed = mem_tofree;
}
}
} else {
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("eviction-cycle",latency);
goto cant_free; /* nothing to free... */
}
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("eviction-cycle",latency);
return C_OK;

cant_free:
/* We are here if we are not able to reclaim memory. There is only one
* last thing we can try: check if the lazyfree thread has jobs in queue
* and wait... */
while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
if (((mem_reported - zmalloc_used_memory()) + mem_freed) >= mem_tofree)
break;
usleep(1000);
}
return C_ERR;
}

这里就是根据具体策略去淘汰 key,首先是要往 pool 更新 key,更新key 的方法是evictionPoolPopulate

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
int j, k, count;
dictEntry *samples[server.maxmemory_samples];

count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
for (j = 0; j < count; j++) {
unsigned long long idle;
sds key;
robj *o;
dictEntry *de;

de = samples[j];
key = dictGetKey(de);

/* If the dictionary we are sampling from is not the main
* dictionary (but the expires one) we need to lookup the key
* again in the key dictionary to obtain the value object. */
if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
if (sampledict != keydict) de = dictFind(keydict, key);
o = dictGetVal(de);
}

/* Calculate the idle time according to the policy. This is called
* idle just because the code initially handled LRU, but is in fact
* just a score where an higher score means better candidate. */
if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
idle = estimateObjectIdleTime(o);
} else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
/* When we use an LRU policy, we sort the keys by idle time
* so that we expire keys starting from greater idle time.
* However when the policy is an LFU one, we have a frequency
* estimation, and we want to evict keys with lower frequency
* first. So inside the pool we put objects using the inverted
* frequency subtracting the actual frequency to the maximum
* frequency of 255. */
idle = 255-LFUDecrAndReturn(o);
} else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
/* In this case the sooner the expire the better. */
idle = ULLONG_MAX - (long)dictGetVal(de);
} else {
serverPanic("Unknown eviction policy in evictionPoolPopulate()");
}

/* Insert the element inside the pool.
* First, find the first empty bucket or the first populated
* bucket that has an idle time smaller than our idle time. */
k = 0;
while (k < EVPOOL_SIZE &&
pool[k].key &&
pool[k].idle < idle) k++;
if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
/* Can't insert if the element is < the worst element we have
* and there are no empty buckets. */
continue;
} else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
/* Inserting into empty position. No setup needed before insert. */
} else {
/* Inserting in the middle. Now k points to the first element
* greater than the element to insert. */
if (pool[EVPOOL_SIZE-1].key == NULL) {
/* Free space on the right? Insert at k shifting
* all the elements from k to end to the right. */

/* Save SDS before overwriting. */
sds cached = pool[EVPOOL_SIZE-1].cached;
memmove(pool+k+1,pool+k,
sizeof(pool[0])*(EVPOOL_SIZE-k-1));
pool[k].cached = cached;
} else {
/* No free space on right? Insert at k-1 */
k--;
/* Shift all elements on the left of k (included) to the
* left, so we discard the element with smaller idle time. */
sds cached = pool[0].cached; /* Save SDS before overwriting. */
if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
memmove(pool,pool+1,sizeof(pool[0])*k);
pool[k].cached = cached;
}
}

/* Try to reuse the cached SDS string allocated in the pool entry,
* because allocating and deallocating this object is costly
* (according to the profiler, not my fantasy. Remember:
* premature optimizbla bla bla bla. */
int klen = sdslen(key);
if (klen > EVPOOL_CACHED_SDS_SIZE) {
pool[k].key = sdsdup(key);
} else {
memcpy(pool[k].cached,key,klen+1);
sdssetlen(pool[k].cached,klen);
pool[k].key = pool[k].cached;
}
pool[k].idle = idle;
pool[k].dbid = dbid;
}
}

Redis随机选择maxmemory_samples数量的key,然后计算这些key的空闲时间idle time,当满足条件时(比pool中的某些键的空闲时间还大)就可以进poolpool更新之后,就淘汰pool中空闲时间最大的键。

estimateObjectIdleTime用来计算Redis对象的空闲时间:

1
2
3
4
5
6
7
8
9
10
11
/* Given an object returns the min number of milliseconds the object was never
* requested, using an approximated LRU algorithm. */
unsigned long long estimateObjectIdleTime(robj *o) {
unsigned long long lruclock = LRU_CLOCK();
if (lruclock >= o->lru) {
return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
} else {
return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
LRU_CLOCK_RESOLUTION;
}
}

空闲时间第一种是 lurclock 大于对象的 lru,那么就是减一下乘以精度,因为 lruclock 有可能是已经预生成的,所以会可能走下面这个

LFU

上面介绍了LRU 的算法,但是考虑一种场景

1
2
3
4
~~~~~A~~~~~A~~~~~A~~~~A~~~~~A~~~~~A~~|
~~B~~B~~B~~B~~B~~B~~B~~B~~B~~B~~B~~B~|
~~~~~~~~~~C~~~~~~~~~C~~~~~~~~~C~~~~~~|
~~~~~D~~~~~~~~~~D~~~~~~~~~D~~~~~~~~~D|

可以发现,当采用 lru 的淘汰策略的时候,D 是最新的,会被认为是最值得保留的,但是事实上还不如 A 跟 B,然后 antirez 大神就想到了LFU (Least Frequently Used) 这个算法, 显然对于上面的四个 key 的访问频率,保留优先级应该是 B > A > C = D
那要怎么来实现这个 LFU 算法呢,其实像LRU,理想的情况就是维护个链表,把最新访问的放到头上去,但是这个会影响访问速度,注意到前面代码的应该可以看到,redisObject 的 lru 字段其实是两用的,当策略是 LFU 时,这个字段就另作他用了,它的 24 位长度被分成两部分

1
2
3
4
      16 bits      8 bits
+----------------+--------+
+ Last decr time | LOG_C |
+----------------+--------+

前16位字段是最后一次递减时间,因此Redis知道 上一次计数器递减,后8位是 计数器 counter。
LFU 的主体策略就是当这个 key 被访问的次数越多频率越高他就越容易被保留下来,并且是最近被访问的频率越高。这其实有两个事情要做,一个是在访问的时候增加计数值,在一定长时间不访问时进行衰减,所以这里用了两个值,前 16 位记录上一次衰减的时间,后 8 位记录具体的计数值。
Redis4.0之后为maxmemory_policy淘汰策略添加了两个LFU模式:

volatile-lfu:对有过期时间的key采用LFU淘汰策略
allkeys-lfu:对全部key采用LFU淘汰策略
还有2个配置可以调整LFU算法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
lfu-log-factor 10
lfu-decay-time 1
```
`lfu-log-factor` 可以调整计数器counter的增长速度,lfu-log-factor越大,counter增长的越慢。

`lfu-decay-time`是一个以分钟为单位的数值,可以调整counter的减少速度
这里有个问题是 8 位大小够计么,访问一次加 1 的话的确不够,不过大神就是大神,才不会这么简单的加一。往下看代码
```C
/* Low level key lookup API, not actually called directly from commands
* implementations that should instead rely on lookupKeyRead(),
* lookupKeyWrite() and lookupKeyReadWithFlags(). */
robj *lookupKey(redisDb *db, robj *key, int flags) {
dictEntry *de = dictFind(db->dict,key->ptr);
if (de) {
robj *val = dictGetVal(de);

/* Update the access time for the ageing algorithm.
* Don't do it if we have a saving child, as this will trigger
* a copy on write madness. */
if (!hasActiveChildProcess() && !(flags & LOOKUP_NOTOUCH)){
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
// 当淘汰策略是 LFU 时,就会调用这个updateLFU
updateLFU(val);
} else {
val->lru = LRU_CLOCK();
}
}
return val;
} else {
return NULL;
}
}

updateLFU 这个其实个入口,调用了两个重要的方法

1
2
3
4
5
6
7
8
/* Update LFU when an object is accessed.
* Firstly, decrement the counter if the decrement time is reached.
* Then logarithmically increment the counter, and update the access time. */
void updateLFU(robj *val) {
unsigned long counter = LFUDecrAndReturn(val);
counter = LFULogIncr(counter);
val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

首先来看看LFUDecrAndReturn,这个方法的作用是根据上一次衰减时间和系统配置的 lfu-decay-time 参数来确定需要将 counter 减去多少

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* If the object decrement time is reached decrement the LFU counter but
* do not update LFU fields of the object, we update the access time
* and counter in an explicit way when the object is really accessed.
* And we will times halve the counter according to the times of
* elapsed time than server.lfu_decay_time.
* Return the object frequency counter.
*
* This function is used in order to scan the dataset for the best object
* to fit: as we check for the candidate, we incrementally decrement the
* counter of the scanned objects if needed. */
unsigned long LFUDecrAndReturn(robj *o) {
// 右移 8 位,拿到上次衰减时间
unsigned long ldt = o->lru >> 8;
// 对 255 做与操作,拿到 counter 值
unsigned long counter = o->lru & 255;
// 根据lfu_decay_time来算出过了多少个衰减周期
unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
if (num_periods)
counter = (num_periods > counter) ? 0 : counter - num_periods;
return counter;
}

然后是加,调用了LFULogIncr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/* Logarithmically increment a counter. The greater is the current counter value
* the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
// 最大值就是 255,到顶了就不加了
if (counter == 255) return 255;
// 生成个随机小数
double r = (double)rand()/RAND_MAX;
// 减去个基础值,LFU_INIT_VAL = 5,防止刚进来就被逐出
double baseval = counter - LFU_INIT_VAL;
// 如果是小于 0,
if (baseval < 0) baseval = 0;
// 如果 baseval 是 0,那么 p 就是 1了,后面 counter 直接加一,如果不是的话,得看系统参数lfu_log_factor,这个越大,除出来的 p 越小,那么 counter++的可能性也越小,这样子就把前面的疑问给解决了,不是直接+1 的
double p = 1.0/(baseval*server.lfu_log_factor+1);
if (r < p) counter++;
return counter;
}

大概的变化速度可以参考

1
2
3
4
5
6
7
8
9
10
11
+--------+------------+------------+------------+------------+------------+
| factor | 100 hits | 1000 hits | 100K hits | 1M hits | 10M hits |
+--------+------------+------------+------------+------------+------------+
| 0 | 104 | 255 | 255 | 255 | 255 |
+--------+------------+------------+------------+------------+------------+
| 1 | 18 | 49 | 255 | 255 | 255 |
+--------+------------+------------+------------+------------+------------+
| 10 | 10 | 18 | 142 | 255 | 255 |
+--------+------------+------------+------------+------------+------------+
| 100 | 8 | 11 | 49 | 143 | 255 |
+--------+------------+------------+------------+------------+------------+

简而言之就是 lfu_log_factor 越大变化的越慢

总结

总结一下,redis 实现了近似的 lru 淘汰策略,通过增加了淘汰 key 的池子(pool),并且增大每次抽样的 key 的数量来将淘汰效果更进一步地接近于 lru,这是 lru 策略,但是对于前面举的一个例子,其实 lru 并不能保证 key 的淘汰就如我们预期,所以在后期又引入了 lfu 的策略,lfu的策略比较巧妙,复用了 redis 对象的 lru 字段,并且使用了factor 参数来控制计数器递增的速度,防止 8 位的计数器太早溢出。

这一篇不再是数据结构介绍了,大致的数据结构基本都介绍了,这一篇主要是查漏补缺,或者说讲一些重要且基本的概念,也可能是经常被忽略的,很多讲 redis 的系列文章可能都会忽略,学习 redis 的时候也会,因为觉得源码学习就是讲主要的数据结构和“算法”学习了就好了。
redis 的主要应用就是拿来作为高性能的缓存,那么缓存一般有些啥需要注意的,首先是访问速度,如果取得跟数据库一样快,那就没什么存在的意义,第二个是缓存的字面意思,我只是为了让数据读取快一些,通常大部分的场景这个是需要更新过期的,这里就把我要讲的第一点引出来了(真累,

redis过期策略

redis 是如何过期缓存的,可以猜测下,最无脑的就是每个设置了过期时间的 key 都设个定时器,过期了就删除,这种显然消耗太大,清理地最及时,还有的就是 redis 正在采用的懒汉清理策略和定期清理
懒汉策略就是在使用的时候去检查缓存是否过期,比如 get 操作时,先判断下这个 key 是否已经过期了,如果过期了就删掉,并且返回空,如果没过期则正常返回
主要代码是

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
/* This function is called when we are going to perform some operation
* in a given key, but such key may be already logically expired even if
* it still exists in the database. The main way this function is called
* is via lookupKey*() family of functions.
*
* The behavior of the function depends on the replication role of the
* instance, because slave instances do not expire keys, they wait
* for DELs from the master for consistency matters. However even
* slaves will try to have a coherent return value for the function,
* so that read commands executed in the slave side will be able to
* behave like if the key is expired even if still present (because the
* master has yet to propagate the DEL).
*
* In masters as a side effect of finding a key which is expired, such
* key will be evicted from the database. Also this may trigger the
* propagation of a DEL/UNLINK command in AOF / replication stream.
*
* The return value of the function is 0 if the key is still valid,
* otherwise the function returns 1 if the key is expired. */
int expireIfNeeded(redisDb *db, robj *key) {
if (!keyIsExpired(db,key)) return 0;

/* If we are running in the context of a slave, instead of
* evicting the expired key from the database, we return ASAP:
* the slave key expiration is controlled by the master that will
* send us synthesized DEL operations for expired keys.
*
* Still we try to return the right information to the caller,
* that is, 0 if we think the key should be still valid, 1 if
* we think the key is expired at this time. */
if (server.masterhost != NULL) return 1;

/* Delete the key */
server.stat_expiredkeys++;
propagateExpire(db,key,server.lazyfree_lazy_expire);
notifyKeyspaceEvent(NOTIFY_EXPIRED,
"expired",key,db->id);
return server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
dbSyncDelete(db,key);
}

/* Check if the key is expired. */
int keyIsExpired(redisDb *db, robj *key) {
mstime_t when = getExpire(db,key);
mstime_t now;

if (when < 0) return 0; /* No expire for this key */

/* Don't expire anything while loading. It will be done later. */
if (server.loading) return 0;

/* If we are in the context of a Lua script, we pretend that time is
* blocked to when the Lua script started. This way a key can expire
* only the first time it is accessed and not in the middle of the
* script execution, making propagation to slaves / AOF consistent.
* See issue #1525 on Github for more information. */
if (server.lua_caller) {
now = server.lua_time_start;
}
/* If we are in the middle of a command execution, we still want to use
* a reference time that does not change: in that case we just use the
* cached time, that we update before each call in the call() function.
* This way we avoid that commands such as RPOPLPUSH or similar, that
* may re-open the same key multiple times, can invalidate an already
* open object in a next call, if the next call will see the key expired,
* while the first did not. */
else if (server.fixed_time_expire > 0) {
now = server.mstime;
}
/* For the other cases, we want to use the most fresh time we have. */
else {
now = mstime();
}

/* The key expired if the current (virtual or real) time is greater
* than the expire time of the key. */
return now > when;
}
/* Return the expire time of the specified key, or -1 if no expire
* is associated with this key (i.e. the key is non volatile) */
long long getExpire(redisDb *db, robj *key) {
dictEntry *de;

/* No expire? return ASAP */
if (dictSize(db->expires) == 0 ||
(de = dictFind(db->expires,key->ptr)) == NULL) return -1;

/* The entry was found in the expire dict, this means it should also
* be present in the main dict (safety check). */
serverAssertWithInfo(NULL,key,dictFind(db->dict,key->ptr) != NULL);
return dictGetSignedIntegerVal(de);
}

这里有几点要注意的,第一是当惰性删除时会根据lazyfree_lazy_expire这个参数去判断是执行同步删除还是异步删除,另外一点是对于 slave,是不需要执行的,因为会在 master 过期时向 slave 发送 del 指令。
光采用这个策略会有什么问题呢,假如一些key 一直未被访问,那这些 key 就不会过期了,导致一直被占用着内存,所以 redis 采取了懒汉式过期加定期过期策略,定期策略是怎么执行的呢

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
/* This function handles 'background' operations we are required to do
* incrementally in Redis databases, such as active key expiring, resizing,
* rehashing. */
void databasesCron(void) {
/* Expire keys by random sampling. Not required for slaves
* as master will synthesize DELs for us. */
if (server.active_expire_enabled) {
if (server.masterhost == NULL) {
activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
} else {
expireSlaveKeys();
}
}

/* Defrag keys gradually. */
activeDefragCycle();

/* Perform hash tables rehashing if needed, but only if there are no
* other processes saving the DB on disk. Otherwise rehashing is bad
* as will cause a lot of copy-on-write of memory pages. */
if (!hasActiveChildProcess()) {
/* We use global counters so if we stop the computation at a given
* DB we'll be able to start from the successive in the next
* cron loop iteration. */
static unsigned int resize_db = 0;
static unsigned int rehash_db = 0;
int dbs_per_call = CRON_DBS_PER_CALL;
int j;

/* Don't test more DBs than we have. */
if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;

/* Resize */
for (j = 0; j < dbs_per_call; j++) {
tryResizeHashTables(resize_db % server.dbnum);
resize_db++;
}

/* Rehash */
if (server.activerehashing) {
for (j = 0; j < dbs_per_call; j++) {
int work_done = incrementallyRehash(rehash_db);
if (work_done) {
/* If the function did some work, stop here, we'll do
* more at the next cron loop. */
break;
} else {
/* If this db didn't need rehash, we'll try the next one. */
rehash_db++;
rehash_db %= server.dbnum;
}
}
}
}
}
/* Try to expire a few timed out keys. The algorithm used is adaptive and
* will use few CPU cycles if there are few expiring keys, otherwise
* it will get more aggressive to avoid that too much memory is used by
* keys that can be removed from the keyspace.
*
* Every expire cycle tests multiple databases: the next call will start
* again from the next db, with the exception of exists for time limit: in that
* case we restart again from the last database we were processing. Anyway
* no more than CRON_DBS_PER_CALL databases are tested at every iteration.
*
* The function can perform more or less work, depending on the "type"
* argument. It can execute a "fast cycle" or a "slow cycle". The slow
* cycle is the main way we collect expired cycles: this happens with
* the "server.hz" frequency (usually 10 hertz).
*
* However the slow cycle can exit for timeout, since it used too much time.
* For this reason the function is also invoked to perform a fast cycle
* at every event loop cycle, in the beforeSleep() function. The fast cycle
* will try to perform less work, but will do it much more often.
*
* The following are the details of the two expire cycles and their stop
* conditions:
*
* If type is ACTIVE_EXPIRE_CYCLE_FAST the function will try to run a
* "fast" expire cycle that takes no longer than EXPIRE_FAST_CYCLE_DURATION
* microseconds, and is not repeated again before the same amount of time.
* The cycle will also refuse to run at all if the latest slow cycle did not
* terminate because of a time limit condition.
*
* If type is ACTIVE_EXPIRE_CYCLE_SLOW, that normal expire cycle is
* executed, where the time limit is a percentage of the REDIS_HZ period
* as specified by the ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC define. In the
* fast cycle, the check of every database is interrupted once the number
* of already expired keys in the database is estimated to be lower than
* a given percentage, in order to avoid doing too much work to gain too
* little memory.
*
* The configured expire "effort" will modify the baseline parameters in
* order to do more work in both the fast and slow expire cycles.
*/

#define ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP 20 /* Keys for each DB loop. */
#define ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1000 /* Microseconds. */
#define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* Max % of CPU to use. */
#define ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE 10 /* % of stale keys after which
we do extra efforts. */
void activeExpireCycle(int type) {
/* Adjust the running parameters according to the configured expire
* effort. The default effort is 1, and the maximum configurable effort
* is 10. */
unsigned long
effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
2*effort,
config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
effort;

/* This function has some global state in order to continue the work
* incrementally across calls. */
static unsigned int current_db = 0; /* Last DB tested. */
static int timelimit_exit = 0; /* Time limit hit in previous call? */
static long long last_fast_cycle = 0; /* When last fast cycle ran. */

int j, iteration = 0;
int dbs_per_call = CRON_DBS_PER_CALL;
long long start = ustime(), timelimit, elapsed;

/* When clients are paused the dataset should be static not just from the
* POV of clients not being able to write, but also from the POV of
* expires and evictions of keys not being performed. */
if (clientsArePaused()) return;

if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
/* Don't start a fast cycle if the previous cycle did not exit
* for time limit, unless the percentage of estimated stale keys is
* too high. Also never repeat a fast cycle for the same period
* as the fast cycle total duration itself. */
if (!timelimit_exit &&
server.stat_expired_stale_perc < config_cycle_acceptable_stale)
return;

if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
return;

last_fast_cycle = start;
}

/* We usually should test CRON_DBS_PER_CALL per iteration, with
* two exceptions:
*
* 1) Don't test more DBs than we have.
* 2) If last time we hit the time limit, we want to scan all DBs
* in this iteration, as there is work to do in some DB and we don't want
* expired keys to use memory for too much time. */
if (dbs_per_call > server.dbnum || timelimit_exit)
dbs_per_call = server.dbnum;

/* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
* time per iteration. Since this function gets called with a frequency of
* server.hz times per second, the following is the max amount of
* microseconds we can spend in this function. */
timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
timelimit_exit = 0;
if (timelimit <= 0) timelimit = 1;

if (type == ACTIVE_EXPIRE_CYCLE_FAST)
timelimit = config_cycle_fast_duration; /* in microseconds. */

/* Accumulate some global stats as we expire keys, to have some idea
* about the number of keys that are already logically expired, but still
* existing inside the database. */
long total_sampled = 0;
long total_expired = 0;

for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
/* Expired and checked in a single loop. */
unsigned long expired, sampled;

redisDb *db = server.db+(current_db % server.dbnum);

/* Increment the DB now so we are sure if we run out of time
* in the current DB we'll restart from the next. This allows to
* distribute the time evenly across DBs. */
current_db++;

/* Continue to expire if at the end of the cycle more than 25%
* of the keys were expired. */
do {
unsigned long num, slots;
long long now, ttl_sum;
int ttl_samples;
iteration++;

/* If there is nothing to expire try next DB ASAP. */
if ((num = dictSize(db->expires)) == 0) {
db->avg_ttl = 0;
break;
}
slots = dictSlots(db->expires);
now = mstime();

/* When there are less than 1% filled slots, sampling the key
* space is expensive, so stop here waiting for better times...
* The dictionary will be resized asap. */
if (num && slots > DICT_HT_INITIAL_SIZE &&
(num*100/slots < 1)) break;

/* The main collection cycle. Sample random keys among keys
* with an expire set, checking for expired ones. */
expired = 0;
sampled = 0;
ttl_sum = 0;
ttl_samples = 0;

if (num > config_keys_per_loop)
num = config_keys_per_loop;

/* Here we access the low level representation of the hash table
* for speed concerns: this makes this code coupled with dict.c,
* but it hardly changed in ten years.
*
* Note that certain places of the hash table may be empty,
* so we want also a stop condition about the number of
* buckets that we scanned. However scanning for free buckets
* is very fast: we are in the cache line scanning a sequential
* array of NULL pointers, so we can scan a lot more buckets
* than keys in the same time. */
long max_buckets = num*20;
long checked_buckets = 0;

while (sampled < num && checked_buckets < max_buckets) {
for (int table = 0; table < 2; table++) {
if (table == 1 && !dictIsRehashing(db->expires)) break;

unsigned long idx = db->expires_cursor;
idx &= db->expires->ht[table].sizemask;
dictEntry *de = db->expires->ht[table].table[idx];
long long ttl;

/* Scan the current bucket of the current table. */
checked_buckets++;
while(de) {
/* Get the next entry now since this entry may get
* deleted. */
dictEntry *e = de;
de = de->next;

ttl = dictGetSignedIntegerVal(e)-now;
if (activeExpireCycleTryExpire(db,e,now)) expired++;
if (ttl > 0) {
/* We want the average TTL of keys yet
* not expired. */
ttl_sum += ttl;
ttl_samples++;
}
sampled++;
}
}
db->expires_cursor++;
}
total_expired += expired;
total_sampled += sampled;

/* Update the average TTL stats for this database. */
if (ttl_samples) {
long long avg_ttl = ttl_sum/ttl_samples;

/* Do a simple running average with a few samples.
* We just use the current estimate with a weight of 2%
* and the previous estimate with a weight of 98%. */
if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
}

/* We can't block forever here even if there are many keys to
* expire. So after a given amount of milliseconds return to the
* caller waiting for the other active expire cycle. */
if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
elapsed = ustime()-start;
if (elapsed > timelimit) {
timelimit_exit = 1;
server.stat_expired_time_cap_reached_count++;
break;
}
}
/* We don't repeat the cycle for the current database if there are
* an acceptable amount of stale keys (logically expired but yet
* not reclained). */
} while ((expired*100/sampled) > config_cycle_acceptable_stale);
}

elapsed = ustime()-start;
server.stat_expire_cycle_time_used += elapsed;
latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

/* Update our estimate of keys existing but yet to be expired.
* Running average with this sample accounting for 5%. */
double current_perc;
if (total_sampled) {
current_perc = (double)total_expired/total_sampled;
} else
current_perc = 0;
server.stat_expired_stale_perc = (current_perc*0.05)+
(server.stat_expired_stale_perc*0.95);
}

执行定期清除分成两种类型,快和慢,分别由beforeSleepdatabasesCron调用,快版有两个限制,一个是执行时长由ACTIVE_EXPIRE_CYCLE_FAST_DURATION限制,另一个是执行间隔是 2 倍的ACTIVE_EXPIRE_CYCLE_FAST_DURATION,另外这还可以由配置的server.active_expire_effort参数来控制,默认是 1,最大是 10

1
2
onfig_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort

然后会从一定数量的 db 中找出一定数量的带过期时间的 key(保存在 expires中),这里的数量是由

1
2
3
4
5
6
7
8
config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort
```
控制,慢速的执行时长是
```C
config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
2*effort
timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;

这里还有一个额外的退出条件,如果当前数据库的抽样结果已经达到我们所允许的过期 key 百分比,则下次不再处理当前 db,继续处理下个 db

在Java8的stream之前,将对象进行排序的时候,可能需要对象实现Comparable接口,或者自己实现一个Comparator,

比如这样子

我的对象是Entity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class Entity {

private Long id;

private Long sortValue;

public Long getId() {
return id;
}

public void setId(Long id) {
this.id = id;
}

public Long getSortValue() {
return sortValue;
}

public void setSortValue(Long sortValue) {
this.sortValue = sortValue;
}
}

Comparator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class MyComparator implements Comparator {
@Override
public int compare(Object o1, Object o2) {
Entity e1 = (Entity) o1;
Entity e2 = (Entity) o2;
if (e1.getSortValue() < e2.getSortValue()) {
return -1;
} else if (e1.getSortValue().equals(e2.getSortValue())) {
return 0;
} else {
return 1;
}
}
}

比较代码

1
2
3
4
5
6
7
8
9
10
11
12
13
private static MyComparator myComparator = new MyComparator();

public static void main(String[] args) {
List<Entity> list = new ArrayList<Entity>();
Entity e1 = new Entity();
e1.setId(1L);
e1.setSortValue(1L);
list.add(e1);
Entity e2 = new Entity();
e2.setId(2L);
e2.setSortValue(null);
list.add(e2);
Collections.sort(list, myComparator);

看到这里的e2的排序值是null,在Comparator中如果要正常运行的话,就得判空之类的,这里有两点需要,一个是不想写这个MyComparator,然后也没那么好排除掉list里排序值,那么有什么办法能解决这种问题呢,应该说java的这方面真的是很强大

看一下nullsFirst的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
final static class NullComparator<T> implements Comparator<T>, Serializable {
private static final long serialVersionUID = -7569533591570686392L;
private final boolean nullFirst;
// if null, non-null Ts are considered equal
private final Comparator<T> real;

@SuppressWarnings("unchecked")
NullComparator(boolean nullFirst, Comparator<? super T> real) {
this.nullFirst = nullFirst;
this.real = (Comparator<T>) real;
}

@Override
public int compare(T a, T b) {
if (a == null) {
return (b == null) ? 0 : (nullFirst ? -1 : 1);
} else if (b == null) {
return nullFirst ? 1: -1;
} else {
return (real == null) ? 0 : real.compare(a, b);
}
}

核心代码就是下面这段,其实就是帮我们把前面要做的事情做掉了,是不是挺方便的,小记一下哈

最近做 docker 系列,会经常进到 docker 内部,如上一篇介绍的,这些镜像一般都有用 ubuntu 或者alpine 这样的 Linux 系统作为底包,如果构建镜像的时候没有替换源的话,因为你懂的原因,在内部想编辑下东西要安装 vim 就会很慢很慢,thousand years later~
而且如果在容器内部想改源配置的话也要编辑器,就陷入了一个鸡生蛋,跟蛋生鸡的问题中,对于 linux 大神来说应该有一万种方法解决这个问题,对于我这个渣渣来说可能只想到了这个土方法,先 cp backup 一下 sources.list, 再 echo “xxx” > sources.list, 这里就碰到了一个问题,这个 sources.list 一般不止一行,直接 echo 的话就解析不了了,不过 echo 可以支持”\n”转义,就是加-e
看一下解释和示例

还有一点也是在这个时候要安装 vim 之类的,得知道是什么镜像底包,如果是用 uname 指令,其实看到的是宿主机的系统,得用cat /etc/issue

这里稍稍记一下

运行第一个 Dockerfile

上一篇的 Dockerfile 我们停留在构建阶段,现在来把它跑起来

1
2
docker run -d -p 80 --name static_web nicksxs/static_web \
nginx -g "daemon off;"

这里的-d表示以分离模型运行docker (detached),然后-p 是表示将容器的 80 端口开放给宿主机,然后容器名就叫 static_web,使用了我们上次构建的 static_web 镜像,后面的是让 nginx 在前台运行

可以看到返回了个容器 id,但是具体情况没出现,也没连上去,那我们想看看怎么访问在 Dockerfile 里写的静态页面,我们来看下docker 进程

发现为我们随机分配了一个宿主机的端口,32768,去服务器的防火墙把这个外网端口开一下,看看是不是符合我们的预期呢

好像不太对额,应该是 ubuntu 安装的 nginx 的默认工作目录不对,我们来进容器看看,再熟悉下命令docker exec -it 4792455ca2ed /bin/bash
记得容器 id 换成自己的,进入容器后得找找 nginx 的配置文件,通常在/etc/nginx,/usr/local/etc等目录下,然后找到我们的目录是在这

所以把刚才的内容复制过去再试试

目标达成,give me five✌️

第二个 Dockerfile

然后就想来动态一点的,毕竟写过 PHP,就来试试 PHP
再建一个目录叫 dynamic_web,里面创建 src 目录,放一个 index.php
内容是

1
2
<?php
echo "Hello World!";

然后在 dynamic_web 目录下创建 Dockerfile,

1
2
3
FROM trafex/alpine-nginx-php7:latest
COPY src/ /var/www/html
EXPOSE 80

Dockerfile 虽然只有三行,不过要着重说明下,这个底包其实不是docker 官方的,有两点考虑,一点是官方的基本都是 php apache 的镜像,还有就是 alpine这个,截取一段中文介绍

Alpine 操作系统是一个面向安全的轻型 Linux 发行版。它不同于通常 Linux 发行版,Alpine 采用了 musl libc 和 busybox 以减小系统的体积和运行时资源消耗,但功能上比 busybox 又完善的多,因此得到开源社区越来越多的青睐。在保持瘦身的同时,Alpine 还提供了自己的包管理工具 apk,可以通过 https://pkgs.alpinelinux.org/packages 网站上查询包信息,也可以直接通过 apk 命令直接查询和安装各种软件。
Alpine 由非商业组织维护的,支持广泛场景的 Linux发行版,它特别为资深/重度Linux用户而优化,关注安全,性能和资源效能。Alpine 镜像可以适用于更多常用场景,并且是一个优秀的可以适用于生产的基础系统/环境。

Alpine Docker 镜像也继承了 Alpine Linux 发行版的这些优势。相比于其他 Docker 镜像,它的容量非常小,仅仅只有 5 MB 左右(对比 Ubuntu 系列镜像接近 200 MB),且拥有非常友好的包管理机制。官方镜像来自 docker-alpine 项目。

目前 Docker 官方已开始推荐使用 Alpine 替代之前的 Ubuntu 做为基础镜像环境。这样会带来多个好处。包括镜像下载速度加快,镜像安全性提高,主机之间的切换更方便,占用更少磁盘空间等。

一方面在没有镜像的情况下,拉取 docker 镜像还是比较费力的,第二个就是也能节省硬盘空间,所以目前有大部分的 docker 镜像都将 alpine 作为基础镜像了
然后再来构建下

这里还有个点,就是上面的那个镜像我们也是 EXPOSE 80端口,然后外部宿主机会随机映射一个端口,为了偷个懒,我们就直接指定外部端口了
docker run -d -p 80:80 dynamic_web打开浏览器发现访问不了,咋回事呢
因为我们没看trafex/alpine-nginx-php7:latest这个镜像说明,它内部的服务是 8080 端口的,所以我们映射的暴露端口应该是 8080,再用docker run -d -p 80:8080 dynamic_web这个启动,

限制下 docker 的 cpu 使用率

这里我们开始玩一点有意思的,我们在容器里装下 vim 和 gcc,然后写这样一段 c 代码

1
2
3
4
5
6
7
#include <stdio.h>
int main(void)
{
int i = 0;
for(;;) i++;
return 0;
}

就是一个最简单的死循环,然后在容器里跑起来

1
2
$ gcc 1.c 
$ ./a.out

然后我们来看下系统资源占用(CPU)
Xs562iawhHyMxeO
上图是在容器里的,可以看到 cpu 已经 100%了
然后看看容器外面的
ecqH8XJ4k7rKhzu
可以看到一个核的 cpu 也被占满了,因为是个双核的机器,并且代码是单线程的
然后呢我们要做点啥
因为已经在这个 ubuntu 容器中装了 vim 和 gcc,考虑到国内的网络,所以我们先把这个容器 commit 一下,

1
docker commit -a "nick" -m "my ubuntu" f63c5607df06 my_ubuntu:v1

然后再运行起来

1
docker run -it --cpus=0.1 my_ubuntu:v1 bash


我们的代码跟可执行文件都还在,要的就是这个效果,然后再运行一下

结果是这个样子的,有点神奇是不,关键就在于 run 的时候的--cpus=0.1这个参数,它其实就是基于我前一篇说的 cgroup 技术,能将进程之间的cpu,内存等资源进行隔离

开始第一个 Dockerfile

上一面为了复用那个我装了 vim 跟 gcc 的容器,我把它提交到了本地,使用了docker commit命令,有点类似于 git 的 commit,但是这个不是个很好的操作方式,需要手动介入,这里更推荐使用 Dockerfile 来构建镜像

1
2
3
4
5
6
7
8
From ubuntu:latest
MAINTAINER Nicksxs "nicksxs@hotmail.com"
RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list
RUN apt-get clean
RUN apt-get update && apt install -y nginx
RUN echo 'Hi, i am in container' \
> /usr/share/nginx/html/index.html
EXPOSE 80

先解释下这是在干嘛,首先是这个From ubuntu:latest基于的 ubuntu 的最新版本的镜像,然后第二行是维护人的信息,第三四行么作为墙内人你懂的,把 ubuntu 的源换成阿里云的,不然就有的等了,然后就是装下 nginx,往默认的 nginx 的入口 html 文件里输入一行欢迎语,然后暴露 80 端口
然后我们使用sudo docker build -t="nicksxs/static_web" .命令来基于这个 Dockerfile 构建我们自己的镜像,过程中是这样的


可以看到图中,我的 Dockerfile 是 7 行,里面就执行了 7 步,并且每一步都有一个类似于容器 id 的层 id 出来,这里就是一个比较重要的东西,docker 在构建的时候其实是有这个层的概念,Dockerfile 里的每一行都会往上加一层,这里有还注意下命令后面的.,代表当前目录下会自行去寻找 Dockerfile 进行构建,构建完了之后我们再看下我们的本地镜像

我们自己的镜像出现啦
然后有个问题,如果这个构建中途报了错咋办呢,来试试看,我们把 nginx 改成随便的一个错误名,nginxx(不知道会不会运气好真的有这玩意),再来 build 一把

找不到 nginxx 包,是不是这个镜像就完全不能用呢,当然也不是,因为前面说到了,docker 是基于层去构建的,可以看到前面的 4 个 step 都没报错,那我们基于最后的成功步骤创建下容器看看
也就是sudo docker run -t -i bd26f991b6c8 /bin/bash
答案是可以的,只是没装成功 nginx

还有一点注意到没,前面的几个 step 都有一句 Using cache,说明 docker 在构建镜像的时候是有缓存的,这也更能说明 docker 是基于层去构建镜像,同样的底包,同样的步骤,这些层是可以被复用的,这就是 docker 的构建缓存,当然我们也可以在 build 的时候加上--no-cache去把构建缓存禁用掉。

因为最近想搭一个phabricator用来做看板和任务管理,一开始了解这个是Easy大大有在微博推荐过,后来苏洋也在群里和博客里说到了,看上去还不错的样子,因为主角是docker所以就不介绍太多,后面有机会写一下。

docker最开始是之前在某位大佬的博客看到的,看上去有点神奇,感觉是一种轻量级的虚拟机,但是能做的事情好像差不多,那时候是在Ubuntu系统的vps里起一个Ubuntu的docker,然后在里面装个nginx,配置端口映射就可以访问了,后来也草草写过一篇使用docker搭建mysql集群,但是最近看了下好像是因为装docker的大佬做了一些别名还是什么操作,导致里面用的操作都不具有普遍性,而且主要是把搭的过程写了下,属于囫囵吞枣,没理解docker是干啥的,为啥用docker,就是操作了下,这几天借着搭phabricator的过程,把一些原来不理解,或者原来理解错误的地方重新理一下。

之前写的 mysql 集群,一主二备,这种架构在很多小型应用里都是这么配置的,而且一般是直接在三台 vps 里启动三个 mysql 实例,但是如果换成 docker 会有什么好处呢,其实就是方便部署,比如其中一台备库挂了,我要加一台,或者说备库的 qps 太高了,需要再加一个,如果要在 vps 上搭建的话,首先要买一台机器,等初始化,然后在上面修改源,更新,装 mysql ,然后配置主从,可能还要处理防火墙等等,如果把这些打包成一个 docker 镜像,并且放在自己的 docker registry,那就直接run 一下就可以了;还有比如在公司要给一个新同学整一套开发测试环境,以 Java 开发为例,要装 git,maven,jdk,配置 maven settings 和各种 rc,整合在一个镜像里的话,就会很方便了;再比如微服务的水平扩展。

但是为啥 docker 会有这种优势,听起来好像虚拟机也可以干这个事,但是虚拟机动辄上 G,而且需要 VMware,virtual box 等支持,不适合在Linux服务器环境使用,而且占用资源也会非常大。说得这么好,那么 docker 是啥呢

docker 主要使用 Linux 中已经存在的两种技术的一个整合升级,一个是 namespace,一个是cgroups,相比于虚拟机需要完整虚拟出一个操作系统运行基础,docker 基于宿主机内核,通过 namespace 和 cgroups 分隔进程,理念就是提供一个隔离的最小化运行依赖,这样子相对于虚拟机就有了巨大的便利性,具体的 namespace 和 cgroups 就先不展开讲,可以参考耗子叔的文章

安装

那么我们先安装下 docker,参考官方的教程,安装,我的系统是 ubuntu 的,就贴了 ubuntu 的链接,用其他系统的可以找到对应的系统文档安装,安装完了的话看看 docker 的信息

1
sudo docker info

输出以下信息

简单运行

然后再来运行个 hello world 呗,

1
sudo docker run hello-world

输出了这些

看看这个运行命令是怎么用的,一般都会看到这样子的,sudo docker run -it ubuntu bash, 前面的 docker run 反正就是运行一个容器的意思,-it是啥呢,还有这个什么 ubuntu bash,来看看docker run`的命令帮助信息

1
-i, --interactive                    Keep STDIN open even if not attached

就是要有输入,我们运行的时候能输入

1
-t, --tty                            Allocate a pseudo-TTY

要有个虚拟终端,

1
2
3
Usage:	docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container

镜像

上面说的-it 就是这里的 options,后面那个 ubuntu 就是 image 辣,image 是啥呢

Docker 把应用程序及其依赖,打包在 image 文件里面,可以把它理解成为类似于虚拟机的镜像或者运行一个进程的代码,跑起来了的叫docker 容器或者进程,比如我们将要运行的docker run -it ubuntu bash的ubuntu 就是个 ubuntu 容器的镜像,将这个镜像运行起来后,我们可以进入容器像使用 ubuntu 一样使用它,来看下我们的镜像,使用sudo docker image ls就能列出我们宿主机上的 docker 镜像了

一个 ubuntu 镜像才 64MB,非常小巧,然后是后面的bash,我通过交互式启动了一个 ubuntu 容器,然后在这个启动的容器里运行了 bash 命令,这样就可以在容器里玩一下了

在容器里看下进程,

只有刚才运行容器的 bash 进程和我刚执行的 ps,这里有个可以注意下的,bash 这个进程的 pid 是 1,其实这里就用到了 linux 中的PID Namespace,容器会隔离出一个 pid 的名字空间,这里面的进程跟外部的 pid 命名独立

查看宿主机上的容器

1
sudo docker ps -a

如何进入一个正在运行中的 docker 容器

这个应该是比较常用的,因为比如是一个微服务容器,有时候就像看下运行状态,日志啥的

1
sudo docker exec -it [containerID] bash

查看日志

1
sudo docker logs [containerID]

我在运行容器的终端里胡乱输入点啥,然后通过上面的命令就可以看到啦

寄生虫这部电影在获得奥斯卡之前就有关注了,豆瓣评分很高,一开始看到这个片名以为是像《铁线虫入侵》那种灾难片,后来看到男主,宋康昊,也是老面孔了,从高中时候在学校操场组织看的《汉江怪物》,有点二的感觉,后来在大学寝室电脑上重新看的时候,室友跟我说是韩国国宝级演员,真人不可貌相,感觉是个呆子的形象。

但是你说这不是个灾难片,而是个反映社会问题的,就业比较容易往这个方向猜,只是剧情会是怎么样的,一时也没啥头绪,后来不知道哪里看了下一个剧情透露,是一个穷人给富人做家教,然后把自己一家都带进富人家,如果是这样的话可能会把这个怎么带进去作为一个主线,不过事实告诉我,这没那么重要,从第一步朋友的介绍,就显得无比顺利,要去当家教了,作为一个穷成这样的人,瞬间转变成一个衣着得体,言行举止都没让富人家看出破绽的延世大学学生,这真的挺难让人理解,所谓江山易改,本性难移,还有就是这人也正好有那么好能力去辅导,并且诡异的是,多惠也是瞬间就喜欢上了男主,多惠跟将男主介绍给她做家教,也就是多惠原来的家教敏赫,应该也有不少的相处时间,这变了有点大了吧,当然这里也可能因为时长需要,如果说这一点是因为时长,那可能我所有的槽点都是因为这个吧,因为我理解的应该是把家里的人如何一步步地带进富人家,这应该是整个剧情的一个需要更多铺垫去克服这个矛盾点,有时候也想过如果我去当导演,是能拍出个啥,没这个机会,可能有也会是很扯淡的,当然这也不能阻拦我谈谈对这个点的一些看法,毕竟评价一台电冰箱不是说我必须得自己会制冷对吧,这大概是我觉得这个电影的第一个槽点,接下去接二连三的,就是我说的这个最核心的矛盾点,不知道谁说过,这种影视剧应该是源自于生活又高于生活,越是好的作品,越要接近生活,这样子才更能有感同身受。

接下去的点是金基宇介绍金基婷去给多颂当美术家教,这一步又是我理解的败笔吧,就怎么说呢,没什么铺垫,突然从一个社会底层的穷姑娘,转变成一个气场爆表,把富人家太太唬得一愣一愣的,如果说富太太是比较简单无脑的,那富人自己应该是比较有见识而且是做 IT 的,给自己儿子女儿做家教的,查查底细也很正常吧,但是啥都没有,然后呢,她又开始耍司机的心机了,真的是莫名其妙了,司机真的很惨,窈窕淑女君子好逑,而且这个操作也让我摸不着头脑,这是多腹黑并且有经验才会这么操作,脱内裤真的是让我看得一愣愣的,更看得我一愣一愣的,富人竟然也完全按着这个思路去想了,完全没有别的可能呢,甚至可以去查下行车记录仪或者怎样的,或者有没有毛发体液啥的去检验下,毕竟金基婷也乘坐过这辆车,但是最最让我不懂的还是脱内裤这个操作,究竟是什么样的人才会的呢,值得思考。

金基泽和忠淑的点也是比较奇怪,首先是金基泽,引起最后那个杀人事件的一个由头,大部分观点都是人为朴社长在之前跟老婆啪啪啪的时候说金基泽的身上有股乘地铁的人的味道,简而言之就是穷人的味道,还有去雯光丈夫身下拿钥匙是对金基泽和雯光丈夫身上的味道的鄙夷,可是这个原因真的站不住脚,即使是同样经济水平,如果身上有比较重的异味,背后讨论下,或者闻到了比较重的味道,有不适的表情和动作很正常吧,像雯光丈夫,在地下室里呆了那么久,身上有异味并且比较重太正常了,就跟在厕所呆久了不会觉得味道大,但是从没味道的地方一进有点味道的厕所就会觉得异样,略尴尬的理由;再说忠淑呢,感觉是太厉害了,能胜任这么一家有钱人的各种挑剔的饮食口味要求的保姆职位,也是让人看懵逼了,看到了不禁想到一个问题,这家人开头是那么地穷,不堪,突然转变成这么地像骗子家族,如果有这么好的骗人能力,应该不会到这种地步吧,如果真的是那么穷,没能力,没志气,又怎么会突然变成这么厉害呢,一家人各司其职,把富人家唬得团团转,而这个前提是,这些人的确能胜任这四个位置,这就是我非常不能理解的点。

然后说回这个标题,寄生虫,不知道是不是翻译过来不准确,如果真的是叫寄生虫的话,这个寄生虫智商未免也太低了,没有像新冠那样机制,致死率低一点,传染能力强一点,潜伏期也能传染,这个寄生虫第一次受到免疫系统的攻击就自爆了;还有呢,作为一个社会比较低层的打工者,乡下人,对这个审题也是不太审的清,是指这一家人是社会的寄生虫,不思进取,并且死的应该,富人是傻白甜,又有钱又善良,这是给有钱人洗地了还是啥,这个奥斯卡真不知道是怎么得的,总觉得奥斯卡,甚至低一点,豆瓣,得奖的,评分高的都是被一群“精英党”把持的,有黑人主角的,得分高;有同性恋的,得分高;结局惨的,得分高;看不懂的,得分高;就像肖申克的救赎,真不知道是哪里好了,最近看了关于明朝那些事的三杨,杨溥的经历应该比这个厉害吧,可是外国人看不懂,就像外国人不懂中国为什么有反分裂国家法,经历了鸦片战争,八国联军,抗日战争等等,其实跟外国对于黑人的权益的问题,因为有南北战争,所以极度重视这个问题,相应的中国也有自己的历史,请理解。

简而言之我对寄生虫的评分大概 5~6 分吧。