有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

http Java HttpURLConnection InputStream。close()挂起(或工作时间过长?)

首先,一些背景有一个worker可以扩展/解析一堆短URL:

http://t.co/example -> http://example.com

所以,我们只需遵循重定向。就这样。我们不从连接中读取任何数据。在我们得到200之后,我们返回最终的URL并关闭InputStream

现在,问题本身在生产服务器上,一个解析器线程挂起在InputStream.close()调用中:

"ProcessShortUrlTask" prio=10 tid=0x00007f8810119000 nid=0x402b runnable [0x00007f882b044000]
   java.lang.Thread.State: RUNNABLE
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.skip(BufferedInputStream.java:352)
        - locked <0x0000000561293aa0> (a java.io.BufferedInputStream)
        at sun.net.www.MeteredStream.skip(MeteredStream.java:134)
        - locked <0x0000000561293a70> (a sun.net.www.http.KeepAliveStream)
        at sun.net.www.http.KeepAliveStream.close(KeepAliveStream.java:76)
        at java.io.FilterInputStream.close(FilterInputStream.java:155)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.close(HttpURLConnection.java:2735)
        at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:131)
        at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:55)
        at ...

经过简短的研究,我了解到skip()被调用来清理流,然后再将其发送回连接池(如果keep-alive被设置为启用?)。我仍然不明白如何避免这种情况。此外,我怀疑我们的代码中是否存在一些糟糕的设计,或者JDK中是否存在问题

因此,问题是:

  1. 是否可以避免挂在close()上?保证一些合理的 例如,超时
  2. 是否可以完全避免从连接读取数据? 记住,我只需要最终的URL。事实上,我想,我不想 skip()根本不需要调用

更新:

KeepAliveStream,第79行,close()方法:

    // Skip past the data that's left in the Inputstream because
    // some sort of error may have occurred.
    // Do this ONLY if the skip won't block. The stream may have
    // been closed at the beginning of a big file and we don't want
    // to hang around for nothing. So if we can't skip without blocking
    // we just close the socket and, therefore, terminate the keepAlive
    // NOTE: Don't close super class
    try {
        if (expected > count) {
        long nskip = (long) (expected - count);
        if (nskip <= available()) {
            long n = 0;
            while (n < nskip) {
            nskip = nskip - n;
            n = skip(nskip);} ...

在我看来,JDK本身似乎有一个bug。不幸的是,这很难重现


共 (3) 个答案

  1. # 1 楼答案

    您链接的KeepAliveStream的实现违反了合同,根据该合同available()skip()保证是非阻塞的,因此可能确实是阻塞的

    The contract of available()保证一个单一非阻塞skip()

    Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.

    其中,实现在每次对available()的调用中多次调用skip()

        if (nskip <= available()) {
            long n = 0;
            // The loop below can iterate several times,
            // only the first call is guaranteed to be non-blocking. 
            while (n < nskip) { 
            nskip = nskip - n;
            n = skip(nskip);
            }
    

    这并不能证明你的应用程序阻塞是因为KeepAliveStream错误地使用了InputStream。一些InputStream的实现可能会提供更强大的非阻塞保证,但我认为这很可能是个问题

    编辑:经过进一步研究,这是JDK中最近修复的一个错误:https://bugs.openjdk.java.net/browse/JDK-8004863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel。bug报告提到了一个无限循环,但是阻塞skip()也可能是一个结果。修复似乎解决了这两个问题(每个available()只有一个skip()

  2. # 2 楼答案

    当我试图提出“头部”请求时,我也面临着类似的问题。为了修复它,我删除了“HEAD”方法,因为我只想ping url

  3. # 3 楼答案

    我猜close()上的这个skip()是为了保持活动支持

    http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html

    Prior to Java SE 6, if an application closes a HTTP InputStream when more than a small amount of data remains to be read, then the connection had to be closed, rather than being cached. Now in Java SE 6, the behavior is to read up to 512 Kbytes off the connection in a background thread, thus allowing the connection to be reused. The exact amount of data which may be read is configurable through the http.KeepAlive.remainingData system property.

    因此,使用http.KeepAlive.remainingData=0http.keepAlive=false可以有效地禁用keep alive。 但是,如果总是寻址到同一http://t.co主机,这可能会对性能产生负面影响

    正如@artbristol所建议的,在这里使用HEAD而不是GET似乎是更好的解决方案