[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#64766: apt: The apt program does not obey the HTTP/1.1 RFC



> > apt assumes that HTTP/1.0 servers will have persistent connections.
> 
> Not quite, APT pipelines from the start, see below for section 19.6.2
> which allows this.

Section 19.6.2 does not mention pipelining for HTTP/1.0 servers (see below).

> It assumes that any server unwilling to support 
> persistant connections will do one of two things when presented with a
> pipelined request: 
>    1) Immediately die upon a pipelined query
>    2) Process only the first query and flush the rest of the queries from 
>       the socket buffer [linger]
> The RFC reall makes the same assumptions about the server.

The HTTP/1.0 specification does not mention pipelining so it cannot
say anything about number 2.


> BTW, you are confusing Pipelining with Persistance, you need to sort that
> out to make correct sense of the RFC.

I know that they are different, but they are not independant.  You
cannot do pipelining if the connection is not persistent.  If a server
is persistent then it should allow pipelining.  The persistent/not-persistent
nature of the connection can be determined by the headers.  The
pipelining nature of the connection can be assumed to be the same as
the persistent nature.  I do not see any case where Persistent!=Pipeline.


> APT does not have any idea what the server is until it is too late [that
> is what you get for pipelining from the start].

Then why do you do this?  I don't believe that the performance
increase is significant.  You only need to wait for the first
reply from the server before sending the second and subsequent.  Since
the types of data that apt is getting are relatively large files the
performance hit is tiny.  I think that apt should see if the
connection is persistent and then pipeline the rest of the requests.


> IIRC the only way a server can have a short response (aside from big
> server screw ups, like SEGV's) is if the server does not do a lingering
> close, or read all pipelined requests into an internal buffer.  This
> causes the TCP stack to abort the unread data upon close and results in a
> short file.  This is a bug in the server, not the clients.

I believe that this is the sort of thing that the HTTP/1.1 RFC is
talking about when it says:

|    Clients which assume persistent connections and pipeline immediately
|    after connection establishment SHOULD be prepared to retry their
|    connection if the first pipelined attempt fails. If a client does
|    such a retry, it MUST NOT pipeline before it knows the connection is
|    persistent. Clients MUST also be prepared to resend their requests if
|    the server closes the connection before sending all of the
|    corresponding responses.

If a server is written for HTTP/1.0 clients then it should not need to
change just because HTTP/1.1 clients start to use it.  The HTTP/1.1
RFC needs to make sure that it specifies behaviour that is backwards
compatible with HTTP/1.0 servers.  This is what I think that paragraph
above is doing.


> The only resonable way to deal with these servers is to not pipeline from
> the start at all (even though Section 19.6.2 says you can). It is the
> pipelined request that breaks the server, not anything to do with
> persistance.  You can make APT do this by setting the pipline depth to 0. 

Section 19.6.2 does not say that you can pipeline at all.

| 19.6.2 Compatibility with HTTP/1.0 Persistent Connections
| 
|    Some clients and servers might wish to be compatible with some
|    previous implementations of persistent connections in HTTP/1.0
|    clients and servers. Persistent connections in HTTP/1.0 are
|    explicitly negotiated as they are not the default behavior. HTTP/1.0
|    experimental implementations of persistent connections are faulty,
|    and the new facilities in HTTP/1.1 are designed to rectify these
|    problems. The problem was that some existing 1.0 clients may be
|    sending Keep-Alive to a proxy server that doesn't understand
|    Connection, which would then erroneously forward it to the next
|    inbound server, which would establish the Keep-Alive connection and
|    result in a hung HTTP/1.0 proxy waiting for the close on the
|    response. The result is that HTTP/1.0 clients must be prevented from
|    using Keep-Alive when talking to proxies.
| 
|    However, talking to proxies is the most important use of persistent
|    connections, so that prohibition is clearly unacceptable. Therefore,
|    we need some other mechanism for indicating a persistent connection
|    is desired, which is safe to use even when talking to an old proxy
|    that ignores Connection. Persistent connections are the default for
|    HTTP/1.1 messages; we introduce a new keyword (Connection: close) for
|    declaring non-persistence. See section 14.10.
| 
|    The original HTTP/1.0 form of persistent connections (the Connection:
|    Keep-Alive and Keep-Alive header) is documented in RFC 2068. [33]

It also says that you can only do persistent connections if you see a
"Connection: keep-alive".


In your other e-mail:

> Here is a cute little patch that makes APT pre-emptively close the
> connection if the server is not going to be persistant. That fixes the
> only RFC incompatibility you found.

This does indeed seem to check the headers and decide if the
connection is persistent on not based on this.  This is what I was
trying to do when I was messing with the Encoding and Pipeline
variables in the code before.  Obviously you understand the code
better than I do.

> If someone does want to fix any server that does the 'Connection Reset'
> bit, please check out:
>   http://www.apache.org/docs-1.2/misc/fin_wait_2.html
> Read the appendix.

This is indeed an interesting document.

Putting in code that empties the read queue of the client socket in
WWWOFFLE before it is closed does indeed stop apt from having the
problem.  This does not mean that I agree that this was a bug in
WWWOFFLE, but rather that WWWOFFLE now handles those clients that try
pipelining before they realise that they are talking to an HTTP/1.0
server (which I believe the HTTP/1.1 spec says the client should do
with a retry).  I try to program using the rule "be lenient in what
you accept and strict in what you send" (I think I saw this in an RFC,
but I can't find it now).  This means handling broken servers and
clients but being careful to send the correct responses to them.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             amb@gedanken.demon.co.uk
                                      http://www.gedanken.demon.co.uk/



Reply to: