[PATCH 09/10] IOCHK interface for I/O error handling/detecting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[This is 9 of 10 patches, "iochk-09-cpeh.patch"]

- SAL behavior doesn't affect only MCA. There are other
  chances to call SAL_GET_STATE_INFO, that's when CMC, CPE,
  and INIT is happen.

   - CMC(Corrected Machine Check) is for non-fatal, processor
     local errors. Fortunately, calling SAL_GET_STATE_INFO for
     CMC only collect data from a processor issued it, without
     touching any bridge and its status. So, this is safe.

   - CPE(Corrected Platform Error) is for non-fatal, platform
     related errors. Even it says corrected, but calling
     SAL procedure for CPE touchs every bridge on the platform,
     and "correct" bridge status that's bad for iochk works.

   - INIT is a kind of system reset request, as far as I know.
     So restarting from INIT is out of design, also iochk after
     INIT is not required at this time.

  In short, only MCA and CPE have the problem of SAL behavior.
  One of the difference from MCA is that SAL will not gather
  data before OS actually request it.

  MCA:
      1) SAL gathers data and keep it internally
      2) OS gets control
      3) if OS requests, SAL returns data gathered at beginning.
  CPE:
      1) OS gets control
      2) OS request to SAL
      3) SAL gathers data and return it to OS

  Therefore, we can make CPE handler to care bridge states,
  to check states before calling SAL procedure.

Signed-off-by: Hidetoshi Seto <[email protected]>

---

 arch/ia64/kernel/mca.c      |   21 +++++++++++++++++++++
 arch/ia64/lib/iomap_check.c |   17 +++++++++++++++++
 2 files changed, 38 insertions(+)

Index: linux-2.6.11.11/arch/ia64/lib/iomap_check.c
===================================================================
--- linux-2.6.11.11.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.11.11/arch/ia64/lib/iomap_check.c
@@ -19,6 +19,7 @@ static int have_error(struct pci_dev *de

 void notify_bridge_error(struct pci_dev *bridge);
 void clear_bridge_error(struct pci_dev *bridge);
+void save_bridge_error(void);

 void iochk_init(void)
 {
@@ -143,6 +144,22 @@ void clear_bridge_error(struct pci_dev *
 	}
 }

+void save_bridge_error(void)
+{
+	iocookie *cookie;
+
+	if (list_empty(&iochk_devices))
+		return;
+
+	/* mark devices if its root bus bridge have errors */
+	list_for_each_entry(cookie, &iochk_devices, list) {
+		if (cookie->error)
+			continue;
+		if (have_error(cookie->host))
+			notify_bridge_error(cookie->host);
+	}
+}
+
 EXPORT_SYMBOL(iochk_read);
 EXPORT_SYMBOL(iochk_clear);
 EXPORT_SYMBOL(iochk_devices);	/* for MCA driver */
Index: linux-2.6.11.11/arch/ia64/kernel/mca.c
===================================================================
--- linux-2.6.11.11.orig/arch/ia64/kernel/mca.c
+++ linux-2.6.11.11/arch/ia64/kernel/mca.c
@@ -80,6 +80,8 @@
 #ifdef CONFIG_IOMAP_CHECK
 #include <linux/pci.h>
 extern void notify_bridge_error(struct pci_dev *bridge);
+extern void save_bridge_error(void);
+extern spinlock_t iochk_lock;
 #endif

 #if defined(IA64_MCA_DEBUG_INFO)
@@ -288,11 +290,30 @@ ia64_mca_cpe_int_handler (int cpe_irq, v
 	IA64_MCA_DEBUG("%s: received interrupt vector = %#x on CPU %d\n",
 		       __FUNCTION__, cpe_irq, smp_processor_id());

+#ifndef CONFIG_IOMAP_CHECK
+
 	/* SAL spec states this should run w/ interrupts enabled */
 	local_irq_enable();

 	/* Get the CPE error record and log it */
 	ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE);
+#else
+	/*
+	 * Because SAL_GET_STATE_INFO for CPE might clear bridge states
+	 * in process of gathering error information from the system,
+	 * we should check the states before clearing it.
+	 * While OS and SAL are handling bridge status, we have to protect
+	 * the states from changing by any other I/Os running simultaneously,
+	 * so this should be handled w/ lock and interrupts disabled.
+	 */
+	spin_lock(&iochk_lock);
+	save_bridge_error();
+	ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE);
+	spin_unlock(&iochk_lock);
+
+	/* Rests can go w/ interrupt enabled as usual */
+	local_irq_enable();
+#endif

 	spin_lock(&cpe_history_lock);
 	if (!cpe_poll_enabled && cpe_vector >= 0) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux