Re: [PATCH 1/4] Blackfin: arch patch for 2.6.18

Arnd Bergmann wrote:

On Saturday 23 September 2006 18:29, Robin Getz wrote:
> I would be interested in the reasons why this is bad. We thought is
> solved the problem, and our driver developers are Ok using it, and it
> is catching more problems at compile time than we use to (which is the
> main reason I thought to remove all the volatile casting.

It adds a lot of unnecessary defines. The problem is mostly that for eachregister you add three macros.

That is what host machines are good for - isn't it - keeping track ofredirection? :)

The simplest way to avoid this would be using your bfin_write16() andrelated functions directly instead of wrapping each register in a separatemacro.

Right - but I am lazy, and I don't want to remember which is a 16 and whichis a 32 when I am maintaining the software.

Having the extra define is good in that it allows hardware nastiness to behidden. For example, we just fixed an issue about how to write to aregister that effects the PLL stability.


/* Writing to VR_CTL initiates a PLL relock sequence. */
static __inline__ void bfin_write_VR_CTL(unsigned int val) {
        unsigned long flags, iwr ;

        bfin_write16(VR_CTL,val);
        __builtin_bfin_ssync();
        /* Enable the PLL Wakeup bit in SIC IWR */
        iwr = bfin_read32(SIC_IWR);
        /* Only allow PPL Wakeup) */
        bfin_write32(SIC_IWR, IWR_ENABLE(0));
        local_irq_save(flags);
        asm("IDLE;");
        local_irq_restore(flags);
        bfin_write32(SIC_IWR, iwr);
}

There are a few bits in the register that controls the on-chip MAC - in theethernet driver, the person was just writing to the register, withoutunderstanding that it unlocked/re-locked the PLL.

This saves every person who writes to this register from duplicating thecode, and we ensures that it is actually done it properly.

> >Now, there are
> >at least two common methods for how to do this better, both involving
> >the readl/writel low-level accessors (or something similar).
>
> The thing to understand about the Blackfin architecture - is not all
> system register, or peripheral registers are 32 bits. Some are 16
> bits, and some are 32. The Hardware will error if you access a 32 bit
> instruction, with a
> 16 bit access, or a 16 bit access on a 32 bit instruction.
>
> This is why we added:
> bfin_write16(); bfin_read16(); bfin_write32(); bfin_read32();

Sure, that's not unusual at all. Instead of readl(), you can use

readw() for 16 bit accesses. Your bfin_{read,write}{16,32} functions arefine as well, but you should make the implementation more robust and change



#define bfin_read16(addr) ({
        unsigned __v; \
        __asm__ __volatile__ ("NOP;\n\t %0 = w[%1] (z);\n\t" \
        : "=d"(__v) : "a"(addr)); \
        unsigned short)__v; \
})

to

static inline __le16 bfin_read16(const __be16 __iomem *addr) {
        __be16 val;
        asm volatile("nop; \n\t %0 = w[%1] (z)" : "=d" (val) : "a"(addr);
        return val;
}

This adds strict type checking (size, endianess, io address space).
You may prefer to use u16 instead of __be16 here, depending on your needs.


No problem - I can change/update that.

> We can't use a common function, like:
>
> bfin_sica_read(int reg)
>
> or use the read16/read32 directly - it would be to hard for the driver
> developers to keep the manual with them all the time to find out if a
> register was 16 or 32 bits. It would move what we have today (compiler
> errors), to run time errors if someone used the wrong function.

Ok, in the example I picked they were all the same size, so I assumed thatwould be the case for most of your register areas.


[snip]

> >
> >The alternative approach uses a structure:
>
> We could use this approach, filling it up with 16 bit padding all over

> the place (which is easy to do), but it is also difficult for the samereason.

> (Although I really like this, and can see lots of value from it - I am
> not sure we can use it).
>
> You are still using writel(reg, value) and readl(reg) - which is hard
> to do, without a hardware reference beside you all the time - to
> determine which time you should use a readl or reads.
>
> Unless I am completely missing something (which might be true).

Ok, I see your point. If you have different size registers in the area fora single device, then it is better to use a structure as I showed below.


> >static struct bfin_sica_regs __iomem *bfin_sica = (void __iomem
> >*)0xffc00100ul;
> >

> Although I think the code is smaller, and nicer - it involves more run
> time complexity, and will consume more processor cycles - the old example:
>
> static void bf561_internal_mask_irq(unsigned int irq) {
>     unsigned long irq_mask;
>     if ((irq - (IRQ_CORETMR + 1)) < 32) {
>         irq_mask = (1 << (irq - (IRQ_CORETMR + 1)));
>         bfin_write_SICA_IMASK0(bfin_read_SICA_IMASK0() &
> ~irq_mask);
>     } else {
>         irq_mask = (1 << (irq - (IRQ_CORETMR + 1) - 32));
>         bfin_write_SICA_IMASK1(bfin_read_SICA_IMASK1() &
> ~irq_mask);
>     }
> }
>
> resolves all addresses at compile time, not run time. So, wouldn't
> your example slow things down?

The run time overhead of this is very small compared to an actual mmioregister access, so you should not notice this at all.

I'm not sure I agree - on machines with tiny cache (only 16k), and highCore Clocks (600+ MHz), and slow SDRAM clocks (133MHz), extra reads toSDRAM will kill performance & pollute cache. It is not just the singlecycle read - if the core is writing something into SDRAM, it takes 12 SDRAMcycles for the SDRAM to turn the bus from a write to a read.

run time overhead is critical to us. People are complaining already thatthe overhead of our drivers is too high, and I don't want to add extrareads to SDRAM if I don't have to.

You can still do the same with a compile time constant by replacing
static struct bfin_sica_regs __iomem *bfin_sica = (void __iomem*)0xffc00100ul;
from my example with

#define bfin_sica_regs ((struct bfin_sica_regs __iomem*)0xffc00100ul)

That should result in the same object code you had before.
However, I'm used to having a single kernel binary that can run onmultiple hardware versions. Normally this means that you have the sameregister layout on every CPU, but on different physical addresses that aredetected at boot time, so I like to avoid hardcoding absolute addresses inthe object code.


People who use our architecture are OK with compiling for a specific platform.

Moreover, if you ever want to run with an MMU, the virtual address of thatdevice is computed by the ioremap function, like
static struct bfin_sica_regs __iomem *bfin_sica;

void __init bfin_init_sica(void)
{
        bfin_sica = ioremap(0xffc00100ul);
}

There are no plans to add a MMU to the Blackfin as is - it would require anextensive change to the architecture.

But still - I can see the value of it for large scale systems, where peopleare not willing to compile a kernel targeted at one specific implementation- but the people who use this kernel are worse - they compile a kernel fora specific board, not a just a specific processor. Anything extra - theydon't want it. Anything they need - they normally put it in the kernel, asto reduce boot time/size.

The processor sells for less than a good cup of coffee. People are sellingcomplete systems (processor, SDRAM, Flash, etc) for less than $50 (US).

People want to do, and expect to do, all kinds of optimizations when theyare working at this level.

This is the hesitation of adding levels of indirection - any additionaloverhead - and people will stop using Linux - but I will try out thestructure, and see how that works.

-Robin-

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: Re: [PATCH] Linux Kernel Markers 0.11 for 2.6.17
Next by Date: Re: When will the lunacy end? (Was Re: [PATCH] uswsusp: add pmops->{prepare,enter,finish} support (aka "platform mode"))
Previous by thread: Re: [PATCH 1/4] Blackfin: arch patch for 2.6.18
Next by thread: Re: [PATCH 1/4] Blackfin: arch patch for 2.6.18
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]