Analysing an Interesting Java's Problem About foreach loop
21 Mar 2017
背景
前几天,和朋友讨论起下面这个奇怪的Java问题,当时我还没想过来,甚是觉得奇怪,然后也写起代码来测试一下,果然真是!
List < String > datas = new ArrayList <>();
datas . add ( "1" );
datas . add ( "2" );
String target = "1" ;
for ( String temp : datas ) {
if ( temp . equals ( target )) {
datas . remove ( temp );
}
}
问这代码会不会报错?如果把target换成“2”又怎么样?
当时脑袋没怎么想,就说,target是“1”会报错,换成“2”就不会报错。结果当然是反过来的,我们甚是觉得惊讶。自己去写代码的时候才发现,把target换成“2”以后报这个错:
Exception in thread "main" java . util . ConcurrentModificationException
at java . util . ArrayList $Itr . checkForComodification ( ArrayList . java : 901 )
at java . util . ArrayList $Itr . next ( ArrayList . java : 851 )
at org . zhangge . testes . TestIterator . test ( TestIterator . java : 24 )
at org . zhangge . testes . TestIterator . main ( TestIterator . java : 15 )
at sun . reflect . NativeMethodAccessorImpl . invoke0 ( Native Method )
at sun . reflect . NativeMethodAccessorImpl . invoke ( NativeMethodAccessorImpl . java : 62 )
at sun . reflect . DelegatingMethodAccessorImpl . invoke ( DelegatingMethodAccessorImpl . java : 43 )
at java . lang . reflect . Method . invoke ( Method . java : 498 )
at com . intellij . rt . execution . application . AppMain . main ( AppMain . java : 140 )
脑袋发热原因
当时没认真想明白foreach循环的本质是什么,我只记得,大学学习计算机编译原理的时候,在c里面,用foreach的话会经过编译器的优化,然后当然效率就高一些咯,但是具体汇编是什么样的,还真没去看过。
又让我想起当年面试的时候被问到Java相关的问题:
当遍历一个列表,然后满足条件需要修改列表长度的时候怎么做?
菜鸟的我回答说,用两个列表来实现呗,满足条件就复制过去就好了。再被问还有别的方法不,再答是倒序遍历。再要别的方法的时候,想不出来了。面试官给了一个方向,往Iterator方面去想。当时一脸懵逼,噢,原来Iterator是可以实现删除的?难道是复制了一份么?
测试代码如下:
List < String > datas = new ArrayList <>();
datas . add ( "1" );
datas . add ( "2" );
datas . add ( "3" );
String target = "2" ;
Iterator < String > iterator = datas . iterator ();
while ( iterator . hasNext ()) {
if ( iterator . next (). equals ( target )) {
iterator . remove ();
}
System . out . println ( datas );
}
看了一下remove的接口,说明如下:
Removes from the underlying collection the last element returned
by this iterator (optional operation). This method can be called
only once per call to {@link #next}. The behavior of an iterator
is unspecified if the underlying collection is modified while the
iteration is in progress in any way other than by calling this
method.
只能删除最后一个返回元素,即调用了next()以后的元素。也没有add()等方法,只有remove()方法,需要注意的是,调用的是iterator的remove()方法,不是List的remove()方法。于是也就只记住了这个,并没有去研究为什么,当然在多线程的环境下,使用CopyOnWriteArrayList解决,这都是网上找到的解决方法。
foreach的本质
上面提到的删除问题,如果是普通的for循环会是怎么样的?下面代码会报错吗?
List < String > datas = new ArrayList <>();
datas . add ( "1" );
datas . add ( "2" );
datas . add ( "3" );
String target = "2" ;
for ( int i = 0 ; i < datas . size (); i ++) {
String temp = datas . get ( i );
if ( temp . equals ( target )) {
datas . remove ( temp );
}
System . out . println ( datas );
}
可能我们潜意识会认为,删除了一个元素以后就修改了就会修改列表大小,然后至少会有一个OutOfIndexBoundsException
,或者是上面的ConcurrentModificationException
。
事实上是不会报错的,如果我们写过汇编就知道循环的逻辑是怎么实现的,这个用do while循环就比较好理解,而本质上,for循环最终都是转换成do while循环的。每次运行完毕循环体,执行索引加一以后就会执行条件判断。我们发现size()方法返回的只是list里面的size的一个属性值。而其实每次remove一个元素这个值都会修改的。因此,并不会抛出异常,但是下一次循环操作的数据就是脏乱的数据了,因为列表发生了移动。因此就用我上面面试回答的方式来解决。
foreach 在Java到底是什么呢?思考到这里了,很容易就猜想,foreach循环就是转换成了Iterator来使用。
编写以下的java代码,分别用三种形式来遍历列表:
public class TestIterator {
private List < String > datas = new ArrayList <>();
public void testSimpleFor () {
for ( int i = 0 ; i < datas . size (); i ++) {
String temp = datas . get ( i );
System . out . println ( temp );
}
}
public void testForeach () {
for ( String temp : datas ) {
System . out . println ( temp );
}
}
public void testIterator () {
Iterator < String > taskIterator = datas . iterator ();
while ( taskIterator . hasNext ()) {
String temp = taskIterator . next ();
System . out . println ( temp );
}
}
}
然后,我们在分析一下编译出来的class字节码,关于Class字节码的相关知识,以前写过一篇blog 。可以去看看,就不详细说了。然后用javap命令来翻译class字节码,使用看以前写blog 。
执行javap -c TestIterator
命令,结果如下:
Compiled from "TestIterator.java"
public class org . zhangge . testes . TestIterator {
public org . zhangge . testes . TestIterator ();
Code:
0 : aload_0
1 : invokespecial # 1 // Method java/lang/Object."<init>":()V
4 : aload_0
5 : new # 2 // class java/util/ArrayList
8 : dup
9 : invokespecial # 3 // Method java/util/ArrayList."<init>":()V
12 : putfield # 4 // Field datas:Ljava/util/List;
15 : return
public void testSimpleFor ();
Code:
0 : iconst_0
1 : istore_1
2 : iload_1
3 : aload_0
4 : getfield # 4 // Field datas:Ljava/util/List;
7 : invokeinterface # 5 , 1 // InterfaceMethod java/util/List.size:()I
12 : if_icmpge 42
15 : aload_0
16 : getfield # 4 // Field datas:Ljava/util/List;
19 : iload_1
20 : invokeinterface # 6 , 2 // InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
25 : checkcast # 7 // class java/lang/String
28 : astore_2
29 : getstatic # 8 // Field java/lang/System.out:Ljava/io/PrintStream;
32 : aload_2
33 : invokevirtual # 9 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
36 : iinc 1 , 1
39 : goto 2
42 : return
public void testForeach ();
Code:
0 : aload_0
1 : getfield # 4 // Field datas:Ljava/util/List;
4 : invokeinterface # 10 , 1 // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
9 : astore_1
10 : aload_1
11 : invokeinterface # 11 , 1 // InterfaceMethod java/util/Iterator.hasNext:()Z
16 : ifeq 39
19 : aload_1
20 : invokeinterface # 12 , 1 // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
25 : checkcast # 7 // class java/lang/String
28 : astore_2
29 : getstatic # 8 // Field java/lang/System.out:Ljava/io/PrintStream;
32 : aload_2
33 : invokevirtual # 9 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
36 : goto 10
39 : return
public void testIterator ();
Code:
0 : aload_0
1 : getfield # 4 // Field datas:Ljava/util/List;
4 : invokeinterface # 10 , 1 // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
9 : astore_1
10 : aload_1
11 : invokeinterface # 11 , 1 // InterfaceMethod java/util/Iterator.hasNext:()Z
16 : ifeq 39
19 : aload_1
20 : invokeinterface # 12 , 1 // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
25 : checkcast # 7 // class java/lang/String
28 : astore_2
29 : getstatic # 8 // Field java/lang/System.out:Ljava/io/PrintStream;
32 : aload_2
33 : invokevirtual # 9 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
36 : goto 10
39 : return
}
由上面的字节码看出来,testForeach和testIterator生成的字节码一模一样!这样,研究的问题就转换到了Iterator了。其实还是可以延伸一个问题,如果是数组使用foreach会是怎么样的呢?
猜想一下下面的字节码是怎么样的?
public void testArray () {
String [] array = new String [ datas . size ()];
datas . toArray ( array );
for ( String temp : array ) {
System . out . println ( temp );
}
}
public void testArrayOppsite () {
String [] array = new String [ datas . size ()];
datas . toArray ( array );
for ( int i = 0 ; i < array . length ; i ++) {
String temp = array [ i ];
System . out . println ( temp );
}
}
执行javap翻译class字节码以后:
public void testArray ();
Code:
0 : aload_0
1 : getfield # 4 // Field datas:Ljava/util/List;
4 : invokeinterface # 21 , 1 // InterfaceMethod java/util/List.size:()I
9 : anewarray # 14 // class java/lang/String
12 : astore_1
13 : aload_0
14 : getfield # 4 // Field datas:Ljava/util/List;
17 : aload_1
18 : invokeinterface # 23 , 2 // InterfaceMethod java/util/List.toArray:([Ljava/lang/Object;)[Ljava/lang/Object;
23 : pop
24 : aload_1
25 : astore_2
26 : aload_2
27 : arraylength
28 : istore_3
29 : iconst_0
30 : istore 4
32 : iload 4
34 : iload_3
35 : if_icmpge 58
38 : aload_2
39 : iload 4
41 : aaload
42 : astore 5
44 : getstatic # 19 // Field java/lang/System.out:Ljava/io/PrintStream;
47 : aload 5
49 : invokevirtual # 24 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
52 : iinc 4 , 1
55 : goto 32
58 : return
public void testArrayOppsite ();
Code:
0 : aload_0
1 : getfield # 4 // Field datas:Ljava/util/List;
4 : invokeinterface # 21 , 1 // InterfaceMethod java/util/List.size:()I
9 : anewarray # 14 // class java/lang/String
12 : astore_1
13 : aload_0
14 : getfield # 4 // Field datas:Ljava/util/List;
17 : aload_1
18 : invokeinterface # 23 , 2 // InterfaceMethod java/util/List.toArray:([Ljava/lang/Object;)[Ljava/lang/Object;
23 : pop
24 : iconst_0
25 : istore_2
26 : iload_2
27 : aload_1
28 : arraylength
29 : if_icmpge 49
32 : aload_1
33 : iload_2
34 : aaload
35 : astore_3
36 : getstatic # 19 // Field java/lang/System.out:Ljava/io/PrintStream;
39 : aload_3
40 : invokevirtual # 24 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
43 : iinc 2 , 1
46 : goto 26
49 : return
虽然上面的指令有区别,但是明显看到了arraylength
这条指令,说明差别不是很大,可能有优化,这点我目前就没有深入了解了,要读过jvm规范才知道这些指令的区别了。
关于Iterator
到这里,要知道最初那个问题的原因,就必须了解Iterator了。当然,上去google一下ConcurrentModificationException,会发现已经有很多blog分析这个问题了,可是我还是决定自己也读读源码,研究一番。
直接查看ArrayList的iterator():
/**
* Returns an iterator over the elements in this list in proper sequence.
*
* <p>The returned iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
*
* @return an iterator over the elements in this list in proper sequence
*/
public Iterator < E > iterator () {
return new Itr ();
}
上面看到是直接new了一个Itr的类,并且注释里面提到了fail-fast
。具体是什么东西,暂时先看不懂,直接去看Itr这个类。它是很短的一个内部类,实现了Iterator接口:
/**
* An optimized version of AbstractList.Itr
*/
private class Itr implements Iterator < E > {
int cursor ; // index of next element to return
int lastRet = - 1 ; // index of last element returned; -1 if no such
int expectedModCount = modCount ;
public boolean hasNext () {
return cursor != size ;
}
@SuppressWarnings ( "unchecked" )
public E next () {
checkForComodification ();
int i = cursor ;
if ( i >= size )
throw new NoSuchElementException ();
Object [] elementData = ArrayList . this . elementData ;
if ( i >= elementData . length )
throw new ConcurrentModificationException ();
cursor = i + 1 ;
return ( E ) elementData [ lastRet = i ];
}
public void remove () {
if ( lastRet < 0 )
throw new IllegalStateException ();
checkForComodification ();
try {
ArrayList . this . remove ( lastRet );
cursor = lastRet ;
lastRet = - 1 ;
expectedModCount = modCount ;
} catch ( IndexOutOfBoundsException ex ) {
throw new ConcurrentModificationException ();
}
}
@Override
@SuppressWarnings ( "unchecked" )
public void forEachRemaining ( Consumer <? super E > consumer ) {
Objects . requireNonNull ( consumer );
final int size = ArrayList . this . size ;
int i = cursor ;
if ( i >= size ) {
return ;
}
final Object [] elementData = ArrayList . this . elementData ;
if ( i >= elementData . length ) {
throw new ConcurrentModificationException ();
}
while ( i != size && modCount == expectedModCount ) {
consumer . accept (( E ) elementData [ i ++]);
}
// update once at end of iteration to reduce heap write traffic
cursor = i ;
lastRet = i - 1 ;
checkForComodification ();
}
final void checkForComodification () {
if ( modCount != expectedModCount )
throw new ConcurrentModificationException ();
}
}
因此看到,它实际操作的还是List的数据,这就推翻了我之前认为iterator复制了一份数据的猜想。
看到hasNext()方法,很简单的判断游标cursor是否等于list的size。再看next()方法,首先执行了方法checkForComodification(),看到里面的实现,如果modCount不等于expectedModCount就抛出了ConcurrentModificationException。接着继续判断游标是否大于size,则抛出NoSuchElementException;最后判断游标是否大于data数组的length,则继续抛出ConcurrentModificationException,这个判断出错概率低,一般认为size和数组的length是相等的,除非出现多线程的情况,修改了数组还没有修改size,所以它不是线程安全的 。
关键在于modCount和expectedModCount了。
可以看到expectedModCount是Itr的变量,初始值就是等于modCount的。在Itr里面expectedModCount的值只有在执行remove的时候发生了改变,而且也是赋值为modCount。看看modCount的定义如下:
/**
* The number of times this list has been <i>structurally modified</i>.
* Structural modifications are those that change the size of the
* list, or otherwise perturb it in such a fashion that iterations in
* progress may yield incorrect results.
*
* <p>This field is used by the iterator and list iterator implementation
* returned by the {@code iterator} and {@code listIterator} methods.
* If the value of this field changes unexpectedly, the iterator (or list
* iterator) will throw a {@code ConcurrentModificationException} in
* response to the {@code next}, {@code remove}, {@code previous},
* {@code set} or {@code add} operations. This provides
* <i>fail-fast</i> behavior, rather than non-deterministic behavior in
* the face of concurrent modification during iteration.
*
* <p><b>Use of this field by subclasses is optional.</b> If a subclass
* wishes to provide fail-fast iterators (and list iterators), then it
* merely has to increment this field in its {@code add(int, E)} and
* {@code remove(int)} methods (and any other methods that it overrides
* that result in structural modifications to the list). A single call to
* {@code add(int, E)} or {@code remove(int)} must add no more than
* one to this field, or the iterators (and list iterators) will throw
* bogus {@code ConcurrentModificationExceptions}. If an implementation
* does not wish to provide fail-fast iterators, this field may be
* ignored.
*/
protected transient int modCount = 0 ;
上面的意思就是,这个变量记录了list结构上的修改次数,结构上的改变就是改变size,否则的话会引起iterator的错误结果。这个属性主要用在了iterator上,并且如果这个值被异常修改,那么iterator就会抛出ConcurrentModificationException,如此一来,就可以fail-fast,这样比非确定性的错误性能高。这样也处理了多线程引发的错误,当多线程操作一个list的时候,当别的线程操作了list,发生了结构的变化,本线程就能即时检查出来,快速抛出异常,而不会去操作脏数据导致数据错误。
也可以从代码里面看到modCount在add,remove等方法都会发生修改。
在ArrayList类的头部还能找到下面这个注释:
* <p><a name="fail-fast">
* The iterators returned by this class's {@link #iterator() iterator} and
* {@link #listIterator(int) listIterator} methods are <em>fail-fast</em>:</a>
* if the list is structurally modified at any time after the iterator is
* created, in any way except through the iterator's own
* {@link ListIterator#remove() remove} or
* {@link ListIterator#add(Object) add} methods, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of
* concurrent modification, the iterator fails quickly and cleanly, rather
* than risking arbitrary, non-deterministic behavior at an undetermined
* time in the future.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw {@code ConcurrentModificationException} on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
大概也是同样的意思。
最初的现象本质
到这里就能解释最初的问题现象了。
List < String > datas = new ArrayList <>();
datas . add ( "1" );
datas . add ( "2" );
String target = "1" ;
for ( String temp : datas ) {
if ( temp . equals ( target )) {
datas . remove ( temp );
}
}
当target是”1”的时候,remove了一个元素之后,modCount和size都发生了改变,再次进入循环需要调用hasNext()方法来判断,此时是不会进入循环的了,因为执行next()方法的时候cursor+1了的,cursor和size相等了。
当target是“2”的情况,remove了一个元素之后,modCount和size都发生了改变,注意到这个时候,size是变小了,size=1; cursor却加了两次,cursor=2;调用hasNext()判断返回时true,依然进入循环,这时候再次调用next()方法就出错了!
记住原理,不要记住结论!
到这里我知道了foreach是iterator的本质,而ConcurrentModificationException则是fail-fast机制结果,fail-fast机制则是由modCount引起。只要掌握了这些原理,就可以了解释遇到的问题!不然,我现在问你,问题中的list由两个元素增加到3,4,甚至多个元素的时候,remove哪些会抛异常,哪些不会异常呢?