Server segfault with NVIDIA proprietary drivers CENTOS8

Forum / NoMachine for Linux / Server segfault with NVIDIA proprietary drivers CENTOS8

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #28547
    vladz
    Participant

    Server machine CLEVO laptop i7 with NVIDIA GEFORCE 870 GPU, 64 bit.  Dual display (additional external monitor) 

    Multiboots to

    CENTOS 8 GNOME 3.32.2 with NVIDIA proprietary GPU drivers for linux

    CENTOS 8 GNOME 3.32.2 with nouveau GPU drivers for linux

    FEDORA 32 GNOME with  nouveau  GPU drivers for linux

    Windows 10 with proprietary GPU drivers.

    NoMachine V 6.11.2_1 client and server on all OSs  instances (free version)

    NVIDIA driver version 430.50

    —————————————————————————————————————–

    I can connect remotely to this machine when it is booted into CENTOS 8 /Nouveau and FEDORA 32/Nouveau from 32 and 64 bit clients running Mint/Cinnamon and Mageia 7/Plasma respectively and the above version of Player. Both of these laptops (AMD and Intel graphics respectively) work equally well as servers and can be connected to  from the CLEVO/CENTOS 8 machine with proprietary NVIDIA drivers.

    If I try to connect into CENTOS 8/Proprietary, the same clients  say iNX has connected but present a white screen (with no scroll bars). Server acknowledges connection then segfaults.  OS  reports presence of connected session on shutdown.

    Windows 10 attempt  produces a white screen on the client.  I mention this for interest, but I am not reporting this as a problem for this post.

    I have tried some of the workarounds such as connection using tcp and  switching to h264.

    None has helped.

    These cases have all been connections to a physical desktop.

    Next I attempted to install virtualGL support on the CENTOS 8 (proprietary GPU drivers) server ( per  AR05P00982).

    This proved difficult the installation script failed, requesting  removal of  the nvidia_modeset module. This could not be done directly via rmmod because the subject module was reported as  busy.

    rmmod: ERROR: Module nvidia is in use by: nvidia_uvm nvidia_modeset

    I think I removed these modules from init 3, but they seemed to reappear? I need to look into this further.

     

    Below an extract from nxd.log

    Info: Server started with pid 9275 on Fri Jul 17 15:18:51 2020.
    Info: Listening for connections on any interface on port 4000.
    Info: Accepting connections from any host with encryption enabled.
    Info: Connection from 192.168.0.40 port 34842 accepted on Fri Jul 17 15:21:19 2020.
    Info: Connection from 192.168.0.40 port 34842 process 11347 started on Fri Jul 17 15:21:19 2020.
    Info: Connection from 192.168.0.40 port 34842 process 11347 closed on Fri Jul 17 15:25:41 2020.

     

    Extract from nxserver.log

    2020-07-17 15:18:50 848.111  9143 NXSERVER Starting LS 6.11.2 and services.
    2020-07-17 15:18:50 848.217  9143 NXSERVER System information: CentOS Linux release 8.1.1911 (Core), standalone.
    2020-07-17 15:21:29 837.983 11347 NXSERVER User ‘vlad’ logged in from ‘192.168.0.40’ using authentication method NX-password.
    2020-07-17 15:22:19 788.646  9143 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec –node –user vlad –priority realtime –mode 0 –pid 12’ with pid ‘9274/9274’ finished with exit code 11 after 208,728 seconds.
    2020-07-17 15:22:19 788.780  9143 NXSERVER WARNING! 75AFD15E527042D652826A9948D778A0 nxnode died with exit code 11.
    2020-07-17 15:22:19 854.978  9143 NXSERVER WARNING! Session 75AFD15E527042D652826A9948D778A0 crashed on display 1.
    2020-07-17 15:22:22 687.095  9143 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec –node –user vlad –priority realtime –mode 0 –pid 14’ with pid ‘12290/12290’ finished with exit code 11 after 2,826 seconds.
    2020-07-17 15:22:22 687.254  9143 NXSERVER WARNING! 4EFB0AA9FCCF3E5FD00F00AEB8526B6F nxnode died with exit code 11.
    2020-07-17 15:22:22 775.408  9143 NXSERVER WARNING! Session 4EFB0AA9FCCF3E5FD00F00AEB8526B6F crashed on display 1.

    SEGFAULT from dmesg

    [17832.398181] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 237
    [17832.449716] nxnode.bin[12075]: segfault at e ip 00007ffa5e0880c0 sp 00007ffa8def65e0 error 4 in libnvcuvid.so.430.50[7ffa5e047000+312000]
    [17832.449720] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68 01 00
    [17835.338677] nxnode.bin[12674]: segfault at e ip 00007f32384fd0c0 sp 00007f3278ef45e0 error 4 in libnvcuvid.so.430.50[7f32384bc000+312000]
    [17835.338682] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68 01 00
    [17837.566321] nxnode.bin[13095]: segfault at e ip 00007f731366d0c0 sp 00007f73437f95e0 error 4 in libnvcuvid.so.430.50[7f731362c000+312000]
    [17837.566325] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68 01 00
    [17839.803160] nxnode.bin[13545]: segfault at e ip 00007f66a87240c0 sp 00007f66d3ffa5e0 error 4 in libnvcuvid.so.430.50[7f66a86e3000+312000]
    [17839.803165] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68 01 00
    [17842.086833] nxnode.bin[14047]: segfault at e ip 00007f054c0d90c0 sp 00007f0573ffa5e0 error 4 in libnvcuvid.so.430.50[7f054c098000+312000]
    [17842.086838] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68 01 00
    [17844.338016] nxnode.bin[14553]: segfault at e ip 00007f545974b0c0 sp 00007f5485ff65e0 error 4 in libnvcuvid.so.430.50[7f545970a000+312000]
    [17844.338023] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68 01 00
    [17846.571330] nxnode.bin[15054]: segfault at e ip 00007f2b4a77b0c0 sp 00007f2b76ff85e0 error 4 in libnvcuvid.so.430.50[7f2b4a73a000+312000]
    [17846.571335] Code: 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00 00 c3 90 90 90 90 90 90 90 90 90 f6 07 08 53 48 89 fb 75 10 48 8b 97 50 01 00 00 <0f> b7 42 0e 66 ff c0 74 07 5b 31 c0 c3 0f 1f 00 48 8b 87 68

    My specific need is to to get CENTOS 8/proprietary to work with NoMachine server so that I can remotely connect into it with full multimedia capability including audio, as that machine is primarily used for multimedia work.  NVIDIA proprietary drivers are essential as some applications on that machine require it.

    My last resort might be to revert to nouveau, install VirtualGL support then install the proprietary Nvidia drivers. Would that work?

    Otherwise, is there a proven work-around for connecting a NoMachine client to a server on Linux that uses Nvidia proprietary drivers with a GEFORCE graphics card?

    If not, should I submit this as a trouble report?

    Thanks for any help.

     

    #28607
    fra81
    Moderator

    Hi,

    VirtualGL wouldn’t help in your case, as it is intended for virtual desktop sessions that are supported in the NoMachine Terminal Server product family. If you can find a core file or a crash report from those segfaults, it would be very useful to gather a backtrace from them (see https://www.nomachine.com/AR09L00810). Also logs can be useful (see https://www.nomachine.com/AR10K00697).

    #28633
    vladz
    Participant

    Hi,

    thank you for your reply.  I did not know about GDM at the time.

    As I needed to resolve this quickly, I removed the NVIDIA (originally installed from first principles) drivers and re-enabled nouveau. After this was done, NoMachine worked as expected in all aspects. I then re-installed NVIDIA drivers using the CENTOS 8 preferred method using kmod-nvidia from elrepo. This process went smoothly and when rebooted, NoMachine server was again functioning faultlessly.

    Backup/restores have destroyed the logs associated with the original problem. However, I have to conclude that the problems from my earlier post were due to a messy installation of NVIDIA drivers and nothing to do with NoMachine, which is a perfect product for my purposes.

    Thank you for explaining the place of VirtualGL. I will remove this.

    How would I mark this post as solved?

    Thank you.

    Vladz

    .

Viewing 3 posts - 1 through 3 (of 3 total)

This topic was marked as solved, you can't post.