[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hardware for Debian-68k




On Mon, 26 Mar 2007, Michael Schmitz wrote:

> > > If you could remember what the nature of the race was, I could 
> > > perhaps find a way to prevent it. Note that the transmit function 
> > > now only stops the queue if the ring is full; that may be important.
> >
> > When I look at the 2.2 code now, I'm immediately suspicious of the 
> > interaction between sonic_send_packet() and sonic_interrupt(), with 
> > respect to lp->tx_full, lp->cur_tx, lp->dirty_tx and lp->tx_laddr[]. 
> > Reason being, there is no mutual exclusion to prevent interrupt 
> > handler from interfering with the other routine.
> 
> That would be deadly in a race, perhaps.

tx_skb[n] (the array elements themselves), tx_full and dirty_tx should all 
be made volatile because sonic_interrupt() writes to them.

We may need a write barrier to ensure that send_packet updates cur_tx 
before tx_full?

> > Still, it isn't clear to me how this leads to "Warning: kfree_skb 
> > passed an skb still on a list (from 000e4416)." I don't know how these 
> > lists work. Does the address 000e4416 tell you anything?
> 
> The address doesn't tell me anything without the kernel symbol table 
> (from the range, it seems to be a static symbol, that's all I can tell). 

Well, I was hoping that perhaps you could get access to the machine. We 
know the approximate date of the crash (24th May 2006) we can probably 
find the symbol in one of the kernels installed at the time. But then 
again, you may reproduce the crash yourself sooner or later...

> But if the interrupt handler calls kfree_skb while send_packet hasn't, 
> in fact, pulled the skb off the list (skb_release() ??), kfree_skb may 
> still find the skb being used. This is a warning only, but may be 
> symptomatic of a race between interrupt handler and send routine. 
> send_packet should perhaps protect against interruption while handling 
> the skb list.

Also, it seems to me that

    if (lp->cur_tx < lp->dirty_tx + SONIC_NUM_TDS)

is going to give the wrong result when cur_tx wraps around. There is a 
similar test in sonic_interrupt,

if (lp->tx_full && dev->tbusy
    && dirty_tx + SONIC_NUM_TDS > lp->cur_tx + 2) 

that also looks broken to me...

-f
 
> Just speculating - not looking at the code right now.
> 
> 	Michael
> 



Reply to: