国内精品久久影院,中文字幕国产日韩,91视频综合网

亚洲视频二区_亚洲欧洲日本天天堂在线观看_日韩一区二区在线观看_中文字幕不卡一区

公告：魔扣目錄網(wǎng)為廣大站長(zhǎng)提供免費(fèi)收錄網(wǎng)站服務(wù)，提交前請(qǐng)做好本站友鏈：【網(wǎng)站目錄：http://www.430618.com 】，免友鏈快審服務(wù)（50元/站），

網(wǎng)站：51998
待審：31
小程序：12
文章：1030137
會(huì)員：747

首頁 > 新聞資訊 > IT業(yè)界 >正文

為什么linux下多線程程序如此消耗虛擬內(nèi)存

發(fā)布時(shí)間：2023-07-03 15:54:22 作者：網(wǎng)友整理

最近在進(jìn)行服務(wù)器內(nèi)存優(yōu)化的時(shí)候，發(fā)現(xiàn)一個(gè)非常奇妙的問題，我們的認(rèn)證服務(wù)器（AuthServer）負(fù)責(zé)跟第三方渠道SDK打交道，由于采用了curl阻塞的方式，所以這里開了128個(gè)線程，奇怪的是每次剛啟動(dòng)的時(shí)候占用的虛擬內(nèi)存在2.3G，然后每次處理消息就增加64M，增加到4.4G就不再增加了，由于我們采用預(yù)分配的方式，在線程內(nèi)部根本沒有大塊分內(nèi)存，那么這些內(nèi)存到底是從哪來的呢？讓人百思不得其解。

1.探索

一開始首先排除掉內(nèi)存泄露，不可能每次都泄露64M內(nèi)存這么巧合，為了證明我的觀點(diǎn)，首先，我使用了valgrind。

1: valgrind --leak-check=full --track-fds=yes --log-file=./AuthServer.vlog &

然后啟動(dòng)測(cè)試，跑至內(nèi)存不再增加，果然valgrind顯示沒有任何內(nèi)存泄露。反復(fù)試驗(yàn)了很多次，結(jié)果都是這樣。

在多次使用valgrind無果以后，我開始懷疑程序內(nèi)部是不是用到mmap之類的調(diào)用,于是使用strace對(duì)mmap,brk等系統(tǒng)函數(shù)的檢測(cè)：

1: strace -f -e"brk,mmap,munmap" -p $(pidof AuthServer)

其結(jié)果如下：

1: [pid 19343] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f53c8ca9000

2: [pid 19343] munmap(0x7f53c8ca9000, 53833728) = 0

3: [pid 19343] munmap(0x7f53d0000000, 13275136) = 0

4: [pid 19343] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f53d04a8000

5: Process 19495 attached

我檢查了一下trace文件也沒有發(fā)現(xiàn)大量?jī)?nèi)存mmap動(dòng)作，即便是brk動(dòng)作引起的內(nèi)存增長(zhǎng)也不大。于是感覺人生都沒有方向了，然后懷疑是不是文件緩存把虛擬內(nèi)存占掉了，注釋掉了代碼中所有讀寫日志的代碼，虛擬內(nèi)存依然增加，排除了這個(gè)可能。

2.靈光一現(xiàn)

后來，我開始減少thread的數(shù)量開始測(cè)試，在測(cè)試的時(shí)候偶然發(fā)現(xiàn)一個(gè)很奇怪的現(xiàn)象。那就是如果進(jìn)程創(chuàng)建了一個(gè)線程并且在該線程內(nèi)分配一個(gè)很小的內(nèi)存1k，整個(gè)進(jìn)程虛擬內(nèi)存立馬增加64M，然后再分配，內(nèi)存就不增加了。測(cè)試代碼如下：

1: #include <IOStream>

2: #include <stdio.h>

3: #include <stdlib.h>

4: #include <unistd.h>

5: using namespace std;

6:

7: volatile bool start = 0;

8:

9:

10: void* thread_run( void* )

11: {

12:

13:while(1)

14:{

15: if(start)

16: {

17: cout << "Thread malloc" << endl;

18: char *buf = new char[1024];

19: start = 0;

20: }

21: sleep(1);

22:}

23: }

24:

25: int main()

26: {

27: pthread_t th;

28:

29: getchar();

30: getchar();

31: pthread_create(&th, 0, thread_run, 0);

32:

33: while((getchar()))

34: {

35: start = 1;

36: }

37:

38:

39: return 0;

40: }

其運(yùn)行結(jié)果如下圖，剛開始時(shí)，進(jìn)程占用虛擬內(nèi)存14M，輸入0，創(chuàng)建子線程，進(jìn)程內(nèi)存達(dá)到23M，這增加的10M是線程堆棧的大小（查看和設(shè)置線程堆棧大小可用ulimit –s），第一次輸入1，程序分配1k內(nèi)存，整個(gè)進(jìn)程增加64M虛擬內(nèi)存，之后再輸入2，3，各再次分配1k，內(nèi)存均不再變化。

這個(gè)結(jié)果讓我欣喜若狂，由于以前學(xué)習(xí)過谷歌的Tcmalloc，其中每個(gè)線程都有自己的緩沖區(qū)來解決多線程內(nèi)存分配的競(jìng)爭(zhēng)，估計(jì)新版的glibc同樣學(xué)習(xí)了這個(gè)技巧，于是查看pmap $(pidof main) 查看內(nèi)存情況，如下：

請(qǐng)注意65404這一行，種種跡象表明，這個(gè)再加上它上面那一行（在這里是132）就是增加的那個(gè)64M）。后來增加thread的數(shù)量，就會(huì)有新增thread數(shù)量相應(yīng)的65404的內(nèi)存塊。

3.刨根問底

經(jīng)過一番搜索和代碼查看。終于知道了原來是glibc的malloc在這里搗鬼。glibc 版本大于2.11的都會(huì)有這個(gè)問題：在redhat 的官方文檔上：

Red Hat Enterprise linux 6 features version 2.11 of glibc, providing many features and enhancements, including... An enhanced dynamic memory allocation (malloc) behaviour enabling higher scalability across many sockets and cores.This is achieved by assigning threads their own memory pools and by avoiding locking in some situations. The amount of additional memory used for the memory pools (if any) can be controlled using the environment variables MALLOC_ARENA_TEST and MALLOC_ARENA_MAX. MALLOC_ARENA_TEST specifies that a test for the number of cores is performed once the number of memory pools reaches this value. MALLOC_ARENA_MAX sets the maximum number of memory pools used, regardless of the number of cores.

The developer, Ulrich Drepper, has a much deeper explanation on his blog:

Before, malloc tried to emulate a per-core memory pool. Every time when contention for all existing memory pools was detected a new pool is created. Threads stay with the last used pool if possible... This never worked 100% because a thread can be descheduled while executing a malloc call. When some other thread tries to use the memory pool used in the call it would detect contention. A second problem is that if multiple threads on multiple core/sockets hAppily use malloc without contention memory from the same pool is used by different cores/on different sockets. This can lead to false sharing and definitely additional cross traffic because of the meta information updates. There are more potential problems not worth going into here in detail.

The changes which are in glibc now create per-thread memory pools. This can eliminate false sharing in most cases. The meta data is usually accessed only in one thread (which hopefully doesn’t get migrated off its assigned core). To prevent the memory handling from blowing up the address space use too much the number of memory pools is capped. By default we create up to two memory pools per core on 32-bit machines and up to eight memory per core on 64-bit machines. The code delays testing for the number of cores (which is not cheap, we have to read /proc/stat) until there are already two or eight memory pools allocated, respectively.

While these changes might increase the number of memory pools which are created (and thus increase the address space they use) the number can be controlled. Because using the old mechanism there could be a new pool being created whenever there are collisions the total number could in theory be higher. Unlikely but true, so the new mechanism is more predictable.

... Memory use is not that much of a premium anymore and most of the memory pool doesn’t actually require memory until it is used, only address space... We have done internally some measurements of the effects of the new implementation and they can be quite dramatic.

New versions of glibc present in RHEL6 include a new arena allocator design. In several clusters we've seen this new allocator cause huge amounts of virtual memory to be used, since when multiple threads perform allocations, they each get their own memory arena. On a 64-bit system, these arenas are 64M mappings, and the maximum number of arenas is 8 times the number of cores. We've observed a DN process using 14GB of vmem for only 300M of resident set. This causes all kinds of nasty issues for obvious reasons.

Setting MALLOC_ARENA_MAX to a low number will restrict the number of memory arenas and bound the virtual memory, with no noticeable downside in performance - we've been recommending MALLOC_ARENA_MAX=4. We should set this in Hadoop-env.sh to avoid this issue as RHEL6 becomes more and more common.

總結(jié)一下，glibc為了分配內(nèi)存的性能的問題，使用了很多叫做arena的memory pool,缺省配置在64bit下面是每一個(gè)arena為64M，一個(gè)進(jìn)程可以最多有 cores * 8個(gè)arena。假設(shè)你的機(jī)器是4核的，那么最多可以有4 * 8 = 32個(gè)arena，也就是使用32 * 64 = 2048M內(nèi)存。當(dāng)然你也可以通過設(shè)置環(huán)境變量來改變arena的數(shù)量.例如export MALLOC_ARENA_MAX=1

hadoop推薦把這個(gè)值設(shè)置為4。當(dāng)然了，既然是多核的機(jī)器，而arena的引進(jìn)是為了解決多線程內(nèi)存分配競(jìng)爭(zhēng)的問題，那么設(shè)置為cpu核的數(shù)量估計(jì)也是一個(gè)不錯(cuò)的選擇。設(shè)置這個(gè)值以后最好能對(duì)你的程序做一下壓力測(cè)試，用以看看改變arena的數(shù)量是否會(huì)對(duì)程序的性能有影響。

mallopt(M_ARENA_MAX, xxx)如果你打算在程序代碼中來設(shè)置這個(gè)東西，那么可以調(diào)用mallopt(M_ARENA_MAX, xxx)來實(shí)現(xiàn)，由于我們AuthServer采用了預(yù)分配的方式，在各個(gè)線程內(nèi)并沒有分配內(nèi)存，所以不需要這種優(yōu)化，在初始化的時(shí)候采用mallopt(M_ARENA_MAX, 1)將其關(guān)掉，設(shè)置為0，表示系統(tǒng)按CPU進(jìn)行自動(dòng)設(shè)置。

4.意外發(fā)現(xiàn)

想到tcmalloc小對(duì)象才從線程自己的內(nèi)存池分配，大內(nèi)存仍然從中央分配區(qū)分配，不知道glibc是如何設(shè)計(jì)的，于是將上面程序中線程每次分配的內(nèi)存從1k調(diào)整為1M，果然不出所料，再分配完64M后，仍然每次都會(huì)增加1M，由此可見，新版 glibc完全借鑒了tcmalloc的思想。