NVME M.2 fails on Icicle

From earlier posts, it looks like others have NVME storage working. I’m using a PCIe to M.2 adapter from Amazon, and an M.2 drive also from Amazon.

Were there any jumpers of firmware/software settings, kernel modules to start, or anything?

In my case, the device appears to have been enumerated. The base and block devices /dev/nvme0 and /dev/nvme0n1 have been created.

I’m able to mount /dev/nvme0n1p5 to /mnt, and it seems to sort of work, but there are continuous errors - see the end below.

I also noted that this device, once installed, failed to boot mint on the funky little SuperMicro baby-server where it was made. Not sure if that’s a BIOS problem with the server, or a defective NVME drive.

any thoughts would be appreciated.

root@icicle-kit-es:~# dmesg |grep pci
[ 0.000000] Kernel command line: earlycon=sbi root=/dev/mmcblk0p3 rootwait console=ttyS0,115200n8 uio_pdrv_genirq.of_id=generic-uio pci-hpmemsize=0M libata.force=noncq
[ 0.318749] microchip-pcie 70000000.pcie: host bridge /soc/pcie@70000000 ranges:
[ 0.326183] microchip-pcie 70000000.pcie: Parsing ranges property…
[ 0.326226] microchip-pcie 70000000.pcie: MEM 0x0078000000…0x007bffffff → 0x0078000000
[ 0.336906] microchip-pcie 70000000.pcie: ECAM at [mem 0x70000000-0x77ffffff] for [bus 00-7f]
[ 0.345558] microchip-pcie 70000000.pcie: PCI host bridge to bus 0000:00
[ 0.352320] pci_bus 0000:00: root bus resource [bus 00-7f]
[ 0.357843] pci_bus 0000:00: root bus resource [mem 0x78000000-0x7bffffff]
[ 0.364702] pci_bus 0000:00: scanning bus
[ 0.364780] pci 0000:00:00.0: [11aa:1556] type 01 class 0x060400
[ 0.370906] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x7fffffff 64bit pref]
[ 0.378228] pci 0000:00:00.0: supports D1 D2
[ 0.382435] pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 0.389193] pci 0000:00:00.0: PME# disabled
[ 0.390254] pci_bus 0000:00: fixups for bus
[ 0.390277] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[ 0.390297] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 0.398335] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[ 0.398480] pci_bus 0000:01: scanning bus
[ 0.398537] pci 0000:01:00.0: [1d97:2263] type 00 class 0x010802
[ 0.404531] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[ 0.411633] pci 0000:01:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x4 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[ 0.446990] pci_bus 0000:01: fixups for bus
[ 0.447008] pci_bus 0000:01: bus scan returning with max=01
[ 0.447030] pci_bus 0000:01: busn_res: [bus 01-7f] end is updated to 01
[ 0.453588] pci_bus 0000:00: bus scan returning with max=01
[ 0.453637] pci 0000:00:00.0: BAR 0: no space for [mem size 0x80000000 64bit pref]
[ 0.461295] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x80000000 64bit pref]
[ 0.469295] pci 0000:00:00.0: BAR 8: assigned [mem 0x78000000-0x781fffff]
[ 0.476119] pci 0000:00:00.0: BAR 9: assigned [mem 0x78200000-0x783fffff 64bit pref]
[ 0.483878] pci 0000:00:00.0: BAR 7: no space for [io size 0x1000]
[ 0.490258] pci 0000:00:00.0: BAR 7: failed to assign [io size 0x1000]
[ 0.496951] pci 0000:01:00.0: BAR 0: assigned [mem 0x78000000-0x78003fff 64bit]
[ 0.504292] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.509361] pci 0000:00:00.0: bridge window [mem 0x78000000-0x781fffff]
[ 0.516246] pci 0000:00:00.0: bridge window [mem 0x78200000-0x783fffff 64bit pref]
[ 0.791112] nvme nvme0: pci function 0000:01:00.0
[ 0.796349] pci 0000:00:00.0: enabling device (0000 → 0002)
[ 0.802053] pci 0000:00:00.0: enabling bus mastering
[ 0.850360] ehci-pci: EHCI PCI platform driver
[ 0.866633] ohci-pci: OHCI PCI platform driver
[ 1.367298] pci-hpmemsize=0M
root@icicle-kit-es:~# dmesg |grep nvme
[ 0.790579] nvme 0000:01:00.0: assign IRQ: got 49
[ 0.791112] nvme nvme0: pci function 0000:01:00.0
[ 0.809104] nvme 0000:01:00.0: enabling device (0000 → 0002)
[ 0.814871] nvme 0000:01:00.0: enabling bus mastering
[ 0.819565] nvme 0000:01:00.0: saving config space at offset 0x0 (reading 0x22631d97)
[ 0.819588] nvme 0000:01:00.0: saving config space at offset 0x4 (reading 0x100406)
[ 0.819605] nvme 0000:01:00.0: saving config space at offset 0x8 (reading 0x1080203)
[ 0.819622] nvme 0000:01:00.0: saving config space at offset 0xc (reading 0x0)
[ 0.819639] nvme 0000:01:00.0: saving config space at offset 0x10 (reading 0x78000004)
[ 0.819656] nvme 0000:01:00.0: saving config space at offset 0x14 (reading 0x0)
[ 0.819672] nvme 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
[ 0.819690] nvme 0000:01:00.0: saving config space at offset 0x1c (reading 0x0)
[ 0.819707] nvme 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
[ 0.819723] nvme 0000:01:00.0: saving config space at offset 0x24 (reading 0x0)
[ 0.819741] nvme 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
[ 0.819758] nvme 0000:01:00.0: saving config space at offset 0x2c (reading 0x22631d97)
[ 0.819775] nvme 0000:01:00.0: saving config space at offset 0x30 (reading 0x0)
[ 0.819791] nvme 0000:01:00.0: saving config space at offset 0x34 (reading 0x40)
[ 0.819808] nvme 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
[ 0.819825] nvme 0000:01:00.0: saving config space at offset 0x3c (reading 0x131)
[ 0.936942] nvme nvme0: missing or invalid SUBNQN field.
[ 0.946571] nvme nvme0: allocated 64 MiB host memory buffer.
[ 0.954806] nvme nvme0: 4/0/0 default/read/poll queues
[ 0.988176] nvme0n1: p1 p2 < p5 >
[ 35.716280] nvme nvme0: I/O 97 QID 2 timeout, completion polled
[ 65.796318] nvme nvme0: I/O 66 QID 2 timeout, completion polled
[ 65.802332] nvme nvme0: I/O 476 QID 3 timeout, completion polled
[ 95.876298] nvme nvme0: I/O 483 QID 3 timeout, completion polled
[ 95.882395] nvme nvme0: I/O 321 QID 4 timeout, completion polled
[ 125.956438] nvme nvme0: I/O 328 QID 4 timeout, completion polled
[ 126.044332] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null)
[ 229.636325] nvme nvme0: I/O 493 QID 3 timeout, completion polled
[ 309.636285] nvme nvme0: I/O 106 QID 2 timeout, completion polled
[ 339.716274] nvme nvme0: I/O 108 QID 2 timeout, completion polled
[ 371.076278] nvme nvme0: I/O 67 QID 2 timeout, completion polled
[ 401.796282] nvme nvme0: I/O 113 QID 2 timeout, completion polled
[ 433.156309] nvme nvme0: I/O 81 QID 2 timeout, completion polled
[ 437.636355] nvme nvme0: I/O 456 QID 3 timeout, completion polled
[ 508.676277] nvme nvme0: I/O 268 QID 1 timeout, completion polled
root@icicle-kit-es:~#

Hi @taiowa ,
I’m not aware if this specific device has been tested. PCI enumeration looks correct.
It seems like I/O queues interrupts are not coming. Did u maybe check on other platforms (different from the server) ?

@LDC

Thanks for the reply, Leonardo.

Yes, that was the first thing to check-off the list the next day. The M.2 SSD was tested inside the baby server both on the internal M.2 socket, and also attached to the PCIe to M.2 adapter that I used with the Icicle. In both cases, thousands of MB were written and read back without delay or error, and at expected data rates. My only other M.2 platform is this laptop…off limits.

Another observation is that the PolarFire core is operating at 1.05v, not 1.00v.

Best,
Will

Hi Will ,
I would investigate on MSI interrupts that is now configured as MSI1 . Not sure if it can be increased transparently. I also would try to use latest nvme driver to see if something changes.

Just wanted to put this one to bed in case others are playing along…

The M.2 from TeamGroup via Amazon was the culprit here. It did / does work fine on other systems, but does not work well on the latest (2021.04) build of MSS and Yocto on Icicle.

In contrast, the XLR8 250GB board from PNY, does not exhibit this error.