分析Java中Map的遍歷性能問題

Posted on 2021-06-26 by WalkonNet

一、引言

我們知道java HashMap的擴容是有成本的，為瞭減少擴容的次數和成本，可以給HashMap設置初始容量大小，如下所示：

HashMap<string, integer=""> map0 = new HashMap<string, integer="">(100000);

但是在實際使用的過程中，發現性能不但沒有提升，反而顯著下降瞭！代碼裡對HashMap的操作也隻有遍歷瞭，看來是遍歷出瞭問題，於是做瞭一番測試，得到如下結果：

HashMap的迭代器遍歷性能與 initial capacity 有關，與size無關

二、迭代器測試

貼上測試代碼：

public class MapForEachTest {

    public static void main(String[] args) {
        HashMap<string, integer=""> map0 = new HashMap<string, integer="">(100000);

        initDataAndPrint(map0);

        HashMap<string, integer=""> map1 = new HashMap<string, integer="">();

        initDataAndPrint(map1);

    }



    private static void initDataAndPrint(HashMap map) {

        initData(map);

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " 耗時: " + (end - start) + " ms");
    }

    private static void forEach(HashMap map) {
        for (Iterator<map.entry<string, integer="">> it = map.entrySet().iterator(); it.hasNext();){
            Map.Entry<string, integer=""> item = it.next();
            System.out.print(item.getKey());
            // do something
        }

    }

    private static void initData(HashMap map) {
        map.put("a", 0);
        map.put("b", 1);
        map.put("c", 2);
        map.put("d", 3);
        map.put("e", 4);
        map.put("f", 5);
    }

}

這是運行結果：

我們將第一個Map初始化10w大小，第二個map不指定大小(實際16)，兩個存儲相同的數據，但是用迭代器遍歷100次的時候發現性能迥異，一個36ms一個4ms，實際上性能差距更大，這裡的4ms是600次System.out.print的耗時，這裡將print註掉再試下

for (Iterator<map.entry<string, integer="">> it = map.entrySet().iterator(); it.hasNext();){
    Map.Entry<string, integer=""> item = it.next();
    // System.out.print(item.getKey());
    // do something
}

輸出結果如下：

可以發現第二個map耗時幾乎為0，第一個達到瞭28ms，遍歷期間沒有進行任何操作，既然石錘瞭和 initial capacity 有關，下一步我們去看看為什麼會這樣，找找Map迭代器的源碼看看。

三、迭代器源碼探究

我們來看看Map.entrySet().iterator()的源碼；

public final Iterator<map.entry<k,v>> iterator() {
    return new EntryIterator();
}

其中EntryIterator是HashMap的內部抽象類，源碼並不多，我全部貼上來並附上中文註釋

abstract class HashIterator {
    // 下一個Node
    Node<k,v> next; // next entry to return
    // 當前Node
    Node<k,v> current;     // current entry
    // 預期的Map大小，也就是說每個HashMap可以有多個迭代器(每次調用 iterator() 會new 一個迭代器出來)，但是隻能有一個迭代器對他remove，否則會直接報錯(快速失敗)
    int expectedModCount;  // for fast-fail
    
    // 當前節點所在的數組下標，HashMap內部是使用數組來存儲數據的，不瞭解的先去看看HashMap的源碼吧
    int index;             // current slot

    HashIterator() {
        // 初始化 expectedModCount
        expectedModCount = modCount;
        // 淺拷貝一份Map的數據
        Node<k,v>[] t = table;
        current = next = null;
        index = 0;
        // 如果 Map 中數據不為空，遍歷數組找到第一個實際存儲的素，賦值給next
        if (t != null && size > 0) { // advance to first entry
            do {} while (index < t.length && (next = t[index++]) == null);
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Node<k,v> nextNode() {
        // 用來淺拷貝table，和別名的作用差不多，沒啥用
        Node<k,v>[] t;
        // 定義一個e指存儲next，並在找到下一值時返它自己
        Node<k,v> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
            
        // 使current指向e，也就是next，這次要找的值，並且讓next = current.next，一般為null
        if ((next = (current = e).next) == null && (t = table) != null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }

    /**
     * 刪除元素，這裡不講瞭，調的是HashMap的removeNode，沒啥特別的
     **/
    public final void remove() {
        Node<k,v> p = current;
        if (p == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        current = null;
        K key = p.key;
        removeNode(hash(key), key, null, false, false);
        // 用來保證快速失敗的
        expectedModCount = modCount;
    }
}

上面的代碼一看就明白瞭，迭代器每次尋找下一個元素都會去遍歷數組，如果 initial capacity 特別大的話，也就是說 threshold 也大，table.length就大，所以遍歷比較耗性能。

table數組的大小設置是在resize()方法裡：

Node<k,v>[] newTab = (Node<k,v>[])new Node[newCap];
table = newTab;

四、其他遍歷方法

註意代碼裡我們用的是Map.entrySet().iterator()，實際上和keys().iterator(), values().iterator() 一樣，源碼如下：

final class KeyIterator extends HashIterator
    implements Iterator<k> {
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<v> {
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<map.entry<k,v>> {
    public final Map.Entry<k,v> next() { return nextNode(); }
}

這兩個就不分析瞭，性能一樣。

實際使用中對集合的遍歷還有幾種方法：

普通for循環+下標
增強型for循環
Map.forEach
Stream.forEach

普通for循環+下標的方法不適用於Map，這裡不討論瞭。

4.1、增強型for循環

增強行for循環實際上是通過迭代器來實現的，我們來看兩者的聯系

源碼：

private static void forEach(HashMap map) {
    for (Iterator<map.entry<string, integer="">> it = map.entrySet().iterator(); it.hasNext();){
        Map.Entry<string, integer=""> item = it.next();
        System.out.print(item.getKey());
        // do something
    }
}


private static void forEach0(HashMap<string, integer=""> map) {
    for (Map.Entry entry : map.entrySet()) {
        System.out.print(entry.getKey());
    }
}

編譯後的字節碼：

// access flags 0xA
  private static forEach(Ljava/util/HashMap;)V
   L0
    LINENUMBER 41 L0
    ALOAD 0
    INVOKEVIRTUAL java/util/HashMap.entrySet ()Ljava/util/Set;
    INVOKEINTERFACE java/util/Set.iterator ()Ljava/util/Iterator; (itf)
    ASTORE 1
   L1
   FRAME APPEND [java/util/Iterator]
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.hasNext ()Z (itf)
    IFEQ L2
   L3
    LINENUMBER 42 L3
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object; (itf)
    CHECKCAST java/util/Map$Entry
    ASTORE 2
   L4
    LINENUMBER 43 L4
    GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
    ALOAD 2
    INVOKEINTERFACE java/util/Map$Entry.getKey ()Ljava/lang/Object; (itf)
    CHECKCAST java/lang/String
    INVOKEVIRTUAL java/io/PrintStream.print (Ljava/lang/String;)V
   L5
    LINENUMBER 45 L5
    GOTO L1
   L2
    LINENUMBER 46 L2
   FRAME CHOP 1
    RETURN
   L6
    LOCALVARIABLE item Ljava/util/Map$Entry; L4 L5 2
    // signature Ljava/util/Map$Entry<ljava lang="" string;ljava="" integer;="">;
    // declaration: item extends java.util.Map$Entry<java.lang.string, java.lang.integer="">
    LOCALVARIABLE it Ljava/util/Iterator; L1 L2 1
    // signature Ljava/util/Iterator<ljava util="" map$entry<ljava="" lang="" string;ljava="" integer;="">;>;
    // declaration: it extends java.util.Iterator<java.util.map$entry<java.lang.string, java.lang.integer="">>
    LOCALVARIABLE map Ljava/util/HashMap; L0 L6 0
    MAXSTACK = 2
    MAXLOCALS = 3

  // access flags 0xA
  // signature (Ljava/util/HashMap<ljava lang="" string;ljava="" integer;="">;)V
  // declaration: void forEach0(java.util.HashMap<java.lang.string, java.lang.integer="">)
  private static forEach0(Ljava/util/HashMap;)V
   L0
    LINENUMBER 50 L0
    ALOAD 0
    INVOKEVIRTUAL java/util/HashMap.entrySet ()Ljava/util/Set;
    INVOKEINTERFACE java/util/Set.iterator ()Ljava/util/Iterator; (itf)
    ASTORE 1
   L1
   FRAME APPEND [java/util/Iterator]
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.hasNext ()Z (itf)
    IFEQ L2
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object; (itf)
    CHECKCAST java/util/Map$Entry
    ASTORE 2
   L3
    LINENUMBER 51 L3
    GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
    ALOAD 2
    INVOKEINTERFACE java/util/Map$Entry.getKey ()Ljava/lang/Object; (itf)
    INVOKEVIRTUAL java/io/PrintStream.print (Ljava/lang/Object;)V
   L4
    LINENUMBER 52 L4
    GOTO L1
   L2
    LINENUMBER 53 L2
   FRAME CHOP 1
    RETURN
   L5
    LOCALVARIABLE entry Ljava/util/Map$Entry; L3 L4 2
    LOCALVARIABLE map Ljava/util/HashMap; L0 L5 0
    // signature Ljava/util/HashMap<ljava lang="" string;ljava="" integer;="">;
    // declaration: map extends java.util.HashMap<java.lang.string, java.lang.integer="">
    MAXSTACK = 2
    MAXLOCALS = 3

都不用耐心觀察，兩個方法的字節碼除瞭局部變量不一樣其他都幾乎一樣，由此可以得出增強型for循環性能與迭代器一樣，實際運行結果也一樣，我不展示瞭，感興趣的自己去copy文章開頭和結尾的代碼試下。

4.2、Map.forEach

先說一下為什麼不把各種方法一起運行同時打印性能，這是因為CPU緩存的原因和JVM的一些優化會幹擾到性能的判斷，附錄全部測試結果有說明

直接來看源碼吧

@Override
public void forEach(BiConsumer<!--? super K, ? super V--> action) {
    Node<k,v>[] tab;
    if (action == null)
        throw new NullPointerException();
    if (size > 0 && (tab = table) != null) {
        int mc = modCount;
        for (int i = 0; i < tab.length; ++i) {
            for (Node<k,v> e = tab[i]; e != null; e = e.next)
                action.accept(e.key, e.value);
        }
        if (modCount != mc)
            throw new ConcurrentModificationException();
    }
}

很簡短的源碼，就不打註釋瞭，從源碼我們不難獲取到以下信息：

該方法也是快速失敗的，遍歷期間不能刪除元素
需要遍歷整個數組
BiConsumer加瞭@FunctionalInterface註解，用瞭 lambda

第三點和性能無關，這裡隻是提下

通過以上信息我們能確定這個性能與table數組的大小有關。

但是在實際測試的時候卻發現性能比迭代器差瞭不少：

4.3、Stream.forEach

Stream與Map.forEach的共同點是都使用瞭lambda表達式。但兩者的源碼沒有任何復用的地方。

不知道你有沒有看累，先上測試結果吧：

耗時比Map.foreach還要高點。

下面講講Straam.foreach順序流的源碼，這個也不復雜，不過累的話先去看看總結吧。

Stream.foreach的執行者是分流器，HashMap的分流器源碼就在HashMap類中，是一個靜態內部類，類名叫 EntrySpliterator

下面是順序流執行的方法

public void forEachRemaining(Consumer<!--? super Map.Entry<K,V-->> action) {
    int i, hi, mc;
    if (action == null)
        throw new NullPointerException();
    HashMap<k,v> m = map;
    Node<k,v>[] tab = m.table;
    if ((hi = fence) < 0) {
        mc = expectedModCount = m.modCount;
        hi = fence = (tab == null) ? 0 : tab.length;
    }
    else
        mc = expectedModCount;
    if (tab != null && tab.length >= hi &&
        (i = index) >= 0 && (i < (index = hi) || current != null)) {
        Node<k,v> p = current;
        current = null;
        do {
            if (p == null)
                p = tab[i++];
            else {
                action.accept(p);
                p = p.next;
            }
        } while (p != null || i < hi);
        if (m.modCount != mc)
            throw new ConcurrentModificationException();
    }
}

從以上源碼中我們也可以輕易得出遍歷需要順序掃描所有數組

五、總結

至此，Map的四種遍歷方法都測試完瞭，我們可以簡單得出兩個結論

Map的遍歷性能與內部table數組大小有關，也就是說與常用參數 initial capacity 有關，不管哪種遍歷方式都是的
性能（由高到低）：迭代器 == 增強型For循環 > Map.forEach > Stream.foreach

這裡就不說什麼多少倍多少倍的性能差距瞭，拋開數據集大小都是扯淡，當我們不指定initial capacity的時候，四種遍歷方法耗時都是3ms，這3ms還是輸入輸出流的耗時，實際遍歷耗時都是0，所以數據集不大的時候用哪種都無所謂，就像不加輸入輸出流耗時不到1ms一樣，很多時候性能消耗是在遍歷中的業務操作，這篇文章不是為瞭讓你去優化代碼把foreach改成迭代器的，在大多數場景下並不需要關註迭代本身的性能，Stream與Lambda帶來的可讀性提升更加重要。

所以此文的目的就當是知識拓展吧，除瞭以上說到的遍歷性能問題，你還應該從中能獲取到的知識點有：

HashMap的數組是存儲在table數組裡的
table數組是resize方法初始化的，new Map不會初始化數組
Map遍歷是table數組從下標0遞增排序的，所以他是無序的
keySet().iterator，values.iterator， entrySet.iterator 來說沒有本質區別，用的都是同一個迭代器
各種遍歷方法裡，隻有迭代器可以remove，雖然增強型for循環底層也是迭代器，但這個語法糖隱藏瞭 remove 方法
每次調用迭代器方法都會new 一個迭代器，但是隻有一個可以修改
Map.forEach與Stream.forEach看上去一樣，實際實現是不一樣的

附：四種遍歷源碼

private static void forEach(HashMap map) {
    for (Iterator<map.entry<string, integer="">> it = map.entrySet().iterator(); it.hasNext();){
        Map.Entry<string, integer=""> item = it.next();
        // System.out.print(item.getKey());
        // do something
    }
}


private static void forEach0(HashMap<string, integer=""> map) {
    for (Map.Entry entry : map.entrySet()) {
        System.out.print(entry.getKey());
    }
}

private static void forEach1(HashMap<string, integer=""> map) {
    map.forEach((key, value) -> {
        System.out.print(key);
    });

}

private static void forEach2(HashMap<string, integer=""> map) {
    map.entrySet().stream().forEach(e -> {
        System.out.print(e.getKey());
    });

}

附：完整測試類與測試結果+一個奇怪的問題

public class MapForEachTest {

    public static void main(String[] args) {
        HashMap<string, integer=""> map0 = new HashMap<string, integer="">(100000);
        HashMap<string, integer=""> map1 = new HashMap<string, integer="">();
        initData(map0);
        initData(map1);

        
        testIterator(map0);
        testIterator(map1);
        testFor(map0);
        testFor(map1);
        testMapForeach(map0);
        testMapForeach(map1);
        testMapStreamForeach(map0);
        testMapStreamForeach(map1);

    }



    private static void testIterator(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " 迭代器 耗時: " + (end - start) + " ms");
    }

    private static void testFor(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach0(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " 增強型For 耗時: " + (end - start) + " ms");
    }

    private static void testMapForeach(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach1(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " MapForeach 耗時: " + (end - start) + " ms");
    }


    private static void testMapStreamForeach(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach2(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " MapStreamForeach 耗時: " + (end - start) + " ms");
    }

    private static void forEach(HashMap map) {
        for (Iterator<map.entry<string, integer="">> it = map.entrySet().iterator(); it.hasNext();){
            Map.Entry<string, integer=""> item = it.next();
            System.out.print(item.getKey());
            // do something
        }
    }


    private static void forEach0(HashMap<string, integer=""> map) {
        for (Map.Entry entry : map.entrySet()) {
            System.out.print(entry.getKey());
        }
    }

    private static void forEach1(HashMap<string, integer=""> map) {
        map.forEach((key, value) -> {
            System.out.print(key);
        });

    }

    private static void forEach2(HashMap<string, integer=""> map) {
        map.entrySet().stream().forEach(e -> {
            System.out.print(e.getKey());
        });

    }

    private static void initData(HashMap map) {
        map.put("a", 0);
        map.put("b", 1);
        map.put("c", 2);
        map.put("d", 3);
        map.put("e", 4);
        map.put("f", 5);
    }

}

測試結果：

如果你認真看瞭上面的文章的話，會發現測試結果有個不對勁的地方：

MapStreamForeach的耗時似乎變少瞭

我可以告訴你這不是數據的原因，從我的測試測試結果來看，直接原因是因為先執行瞭 Map.foreach，如果你把 MapForeach 和 MapStreamForeach 調換一下執行順序，你會發現後執行的那個耗時更少。

以上就是分析Java中Map的遍歷性能問題的詳細內容，更多關於Java Map 遍歷性能的資料請關註WalkonNet其它相關文章！

分析Java中Map的遍歷性能問題

一、引言

二、迭代器測試

三、迭代器源碼探究

四、其他遍歷方法

4.1、增強型for循環

4.2、Map.forEach

4.3、Stream.forEach

五、總結

推薦閱讀：

發佈留言取消回覆

近期文章

一、引言

二、迭代器測試

三、迭代器源碼探究

四、其他遍歷方法

4.1、增強型for循環

4.2、Map.forEach

4.3、Stream.forEach

五、總結

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆