Arnd Bergmann wrote:
On Saturday 23 September 2006 18:29, Robin Getz wrote:
> I would be interested in the reasons why this is bad. We thought is
> solved the problem, and our driver developers are Ok using it, and it
> is catching more problems at compile time than we use to (which is the
> main reason I thought to remove all the volatile casting.
It adds a lot of unnecessary defines. The problem is mostly that for each
register you add three macros.
That is what host machines are good for - isn't it - keeping track of
redirection? :)
The simplest way to avoid this would be using your bfin_write16() and
related functions directly instead of wrapping each register in a separate
macro.
Right - but I am lazy, and I don't want to remember which is a 16 and which
is a 32 when I am maintaining the software.
Having the extra define is good in that it allows hardware nastiness to be
hidden. For example, we just fixed an issue about how to write to a
register that effects the PLL stability.
/* Writing to VR_CTL initiates a PLL relock sequence. */
static __inline__ void bfin_write_VR_CTL(unsigned int val) {
unsigned long flags, iwr ;
bfin_write16(VR_CTL,val);
__builtin_bfin_ssync();
/* Enable the PLL Wakeup bit in SIC IWR */
iwr = bfin_read32(SIC_IWR);
/* Only allow PPL Wakeup) */
bfin_write32(SIC_IWR, IWR_ENABLE(0));
local_irq_save(flags);
asm("IDLE;");
local_irq_restore(flags);
bfin_write32(SIC_IWR, iwr);
}
There are a few bits in the register that controls the on-chip MAC - in the
ethernet driver, the person was just writing to the register, without
understanding that it unlocked/re-locked the PLL.
This saves every person who writes to this register from duplicating the
code, and we ensures that it is actually done it properly.
> >Now, there are
> >at least two common methods for how to do this better, both involving
> >the readl/writel low-level accessors (or something similar).
>
> The thing to understand about the Blackfin architecture - is not all
> system register, or peripheral registers are 32 bits. Some are 16
> bits, and some are 32. The Hardware will error if you access a 32 bit
> instruction, with a
> 16 bit access, or a 16 bit access on a 32 bit instruction.
>
> This is why we added:
> bfin_write16(); bfin_read16(); bfin_write32(); bfin_read32();
Sure, that's not unusual at all. Instead of readl(), you can use
readw() for 16 bit accesses. Your bfin_{read,write}{16,32} functions are
fine as well, but you should make the implementation more robust and change
#define bfin_read16(addr) ({
unsigned __v; \
__asm__ __volatile__ ("NOP;\n\t %0 = w[%1] (z);\n\t" \
: "=d"(__v) : "a"(addr)); \
unsigned short)__v; \
})
to
static inline __le16 bfin_read16(const __be16 __iomem *addr) {
__be16 val;
asm volatile("nop; \n\t %0 = w[%1] (z)" : "=d" (val) : "a"(addr);
return val;
}
This adds strict type checking (size, endianess, io address space).
You may prefer to use u16 instead of __be16 here, depending on your needs.
No problem - I can change/update that.
> We can't use a common function, like:
>
> bfin_sica_read(int reg)
>
> or use the read16/read32 directly - it would be to hard for the driver
> developers to keep the manual with them all the time to find out if a
> register was 16 or 32 bits. It would move what we have today (compiler
> errors), to run time errors if someone used the wrong function.
Ok, in the example I picked they were all the same size, so I assumed that
would be the case for most of your register areas.
[snip]
> >
> >The alternative approach uses a structure:
>
> We could use this approach, filling it up with 16 bit padding all over
> the place (which is easy to do), but it is also difficult for the same
reason.
> (Although I really like this, and can see lots of value from it - I am
> not sure we can use it).
>
> You are still using writel(reg, value) and readl(reg) - which is hard
> to do, without a hardware reference beside you all the time - to
> determine which time you should use a readl or reads.
>
> Unless I am completely missing something (which might be true).
Ok, I see your point. If you have different size registers in the area for
a single device, then it is better to use a structure as I showed below.
> >static struct bfin_sica_regs __iomem *bfin_sica = (void __iomem
> >*)0xffc00100ul;
> >
> Although I think the code is smaller, and nicer - it involves more run
> time complexity, and will consume more processor cycles - the old example:
>
> static void bf561_internal_mask_irq(unsigned int irq) {
> unsigned long irq_mask;
> if ((irq - (IRQ_CORETMR + 1)) < 32) {
> irq_mask = (1 << (irq - (IRQ_CORETMR + 1)));
> bfin_write_SICA_IMASK0(bfin_read_SICA_IMASK0() &
> ~irq_mask);
> } else {
> irq_mask = (1 << (irq - (IRQ_CORETMR + 1) - 32));
> bfin_write_SICA_IMASK1(bfin_read_SICA_IMASK1() &
> ~irq_mask);
> }
> }
>
> resolves all addresses at compile time, not run time. So, wouldn't
> your example slow things down?
The run time overhead of this is very small compared to an actual mmio
register access, so you should not notice this at all.
I'm not sure I agree - on machines with tiny cache (only 16k), and high
Core Clocks (600+ MHz), and slow SDRAM clocks (133MHz), extra reads to
SDRAM will kill performance & pollute cache. It is not just the single
cycle read - if the core is writing something into SDRAM, it takes 12 SDRAM
cycles for the SDRAM to turn the bus from a write to a read.
run time overhead is critical to us. People are complaining already that
the overhead of our drivers is too high, and I don't want to add extra
reads to SDRAM if I don't have to.
You can still do the same with a compile time constant by replacing
static struct bfin_sica_regs __iomem *bfin_sica = (void __iomem
*)0xffc00100ul;
from my example with
#define bfin_sica_regs ((struct bfin_sica_regs __iomem*)0xffc00100ul)
That should result in the same object code you had before.
However, I'm used to having a single kernel binary that can run on
multiple hardware versions. Normally this means that you have the same
register layout on every CPU, but on different physical addresses that are
detected at boot time, so I like to avoid hardcoding absolute addresses in
the object code.
People who use our architecture are OK with compiling for a specific platform.
Moreover, if you ever want to run with an MMU, the virtual address of that
device is computed by the ioremap function, like
static struct bfin_sica_regs __iomem *bfin_sica;
void __init bfin_init_sica(void)
{
bfin_sica = ioremap(0xffc00100ul);
}
There are no plans to add a MMU to the Blackfin as is - it would require an
extensive change to the architecture.
But still - I can see the value of it for large scale systems, where people
are not willing to compile a kernel targeted at one specific implementation
- but the people who use this kernel are worse - they compile a kernel for
a specific board, not a just a specific processor. Anything extra - they
don't want it. Anything they need - they normally put it in the kernel, as
to reduce boot time/size.
The processor sells for less than a good cup of coffee. People are selling
complete systems (processor, SDRAM, Flash, etc) for less than $50 (US).
People want to do, and expect to do, all kinds of optimizations when they
are working at this level.
This is the hesitation of adding levels of indirection - any additional
overhead - and people will stop using Linux - but I will try out the
structure, and see how that works.
-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]