Client froze, not able to reconnect to server

Forums / NoMachine for Windows / Client froze, not able to reconnect to server

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #19042
    Avatar7777777
    Participant

    Here is what happened: I have been using NoMachine for about 1-2 months to connect from a Windows to a Linux machine – it worked beautifully. However, today I went to have coffee and when I came back the client was frozen. The server machine itself is fine – I am able to connect via ssh. Attempting to reconnect to the server gives me one of two errors: either “Error is 138: Connection timed out.” or “Connection reset by peer”.

    Here is resent content of /usr/NX/var/log/nxerror.log on the server side:

    Info: Handler started with pid 2388042 on Thu Jul 12 10:29:22 2018.

    Info: Handling connection from AAA.BBB.CCC.DDD port 57741 on Thu Jul 12 10:29:22 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 57741 closed on Thu Jul 12 10:29:30 2018.

    Info: Handler with pid 2388042 terminated on Thu Jul 12 10:29:30 2018.

    Info: Handler started with pid 869208 on Wed Jul 18 10:42:32 2018.

    Info: Handling connection from AAA.BBB.CCC.DDD port 52396 on Wed Jul 18 10:42:32 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 52396 closed on Wed Jul 18 10:42:47 2018.

    Info: Handler with pid 869208 terminated on Wed Jul 18 10:42:47 2018.

    Info: Handler started with pid 869396 on Wed Jul 18 10:43:19 2018.

    Info: Handling connection from AAA.BBB.CCC.DDD port 52421 on Wed Jul 18 10:43:19 2018.

    kill: sending signal to 869261 failed: No such process

    Usage: nxnode …

    this command is intended to be used only by NX Server

    or NX Workstation.

    Info: Connection from AAA.BBB.CCC.DDD port 52421 closed on Wed Jul 18 10:43:27 2018.

    Info: Handler with pid 869396 terminated on Wed Jul 18 10:43:27 2018.

    Info: Handler started with pid 869916 on Wed Jul 18 10:43:58 2018.

    Info: Handling connection from AAA.BBB.CCC.DDD port 58827 on Wed Jul 18 10:43:58 2018.

    kill: sending signal to 869688 failed: No such process

    Usage: nxnode …

    this command is intended to be used only by NX Server

    or NX Workstation.

    Info: Connection from AAA.BBB.CCC.DDD port 58827 closed on Wed Jul 18 10:44:05 2018.

    Info: Handler with pid 869916 terminated on Wed Jul 18 10:44:05 2018.

    Info: Handler started with pid 870367 on Wed Jul 18 10:44:37 2018.

    Info: Handling connection from AAA.BBB.CCC.DDD port 58851 on Wed Jul 18 10:44:37 2018.

    kill: sending signal to 869984 failed: No such process

    Usage: nxnode …

    this command is intended to be used only by NX Server

    or NX Workstation.

    Info: Connection from AAA.BBB.CCC.DDD port 58851 closed on Wed Jul 18 10:44:44 2018.

    Info: Handler with pid 870367 terminated on Wed Jul 18 10:44:44 2018.

    And then I get many more “Info: Handler started with pid…” to “Info: Handler with pid ******* terminated…” pieces corresponding to my login attempts.

    The corresponding pieces of my /usr/NX/var/log/nxd.log

    Info: Connection from AAA.BBB.CCC.DDD port 57741 accepted on Thu Jul 12 10:29:21 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 57741 process 2388042 started on Thu Jul 12 10:29:21 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 52396 accepted on Wed Jul 18 10:42:30 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 52396 process 869208 started on Wed Jul 18 10:42:30 2018.

    747630 2388041 10:42:44 525.845 DaemonServerApplication/DaemonApplication: WARNING! Child process 2388042 died with signal 9, SIGKILL.

    Warning: Handler process 2388042 died with signal 9, SIGKILL.

    Info: Connection from AAA.BBB.CCC.DDD port 57741 process 2388042 closed on Wed Jul 18 10:42:44 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 52421 accepted on Wed Jul 18 10:43:18 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 52421 process 869396 started on Wed Jul 18 10:43:18 2018.

    747630 869207 10:43:26 307.408 DaemonServerApplication/DaemonApplication: WARNING! Child process 869208 died with signal 9, SIGKILL.

    Warning: Handler process 869208 died with signal 9, SIGKILL.

    Info: Connection from AAA.BBB.CCC.DDD port 52396 process 869208 closed on Wed Jul 18 10:43:26 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 58827 accepted on Wed Jul 18 10:43:57 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 58827 process 869916 started on Wed Jul 18 10:43:57 2018.

    747630 869395 10:44:04 235.704 DaemonServerApplication/DaemonApplication: WARNING! Child process 869396 died with signal 9, SIGKILL.

    Warning: Handler process 869396 died with signal 9, SIGKILL.

    Info: Connection from AAA.BBB.CCC.DDD port 52421 process 869396 closed on Wed Jul 18 10:44:04 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 58851 accepted on Wed Jul 18 10:44:36 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 58851 process 870367 started on Wed Jul 18 10:44:36 2018.

    747630 869915 10:44:42 727.645 DaemonServerApplication/DaemonApplication: WARNING! Child process 869916 died with signal 9, SIGKILL.

    Warning: Handler process 869916 died with signal 9, SIGKILL.

    Info: Connection from AAA.BBB.CCC.DDD port 58827 process 869916 closed on Wed Jul 18 10:44:42 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 61465 accepted on Wed Jul 18 10:45:14 2018.

    Info: Connection from AAA.BBB.CCC.DDD port 61465 process 870696 started on Wed Jul 18 10:45:14 2018.

    747630 870366 10:45:20 998.629 DaemonServerApplication/DaemonApplication: WARNING! Child process 870367 died with signal 9, SIGKILL.

    Warning: Handler process 870367 died with signal 9, SIGKILL.

    Info: Connection from AAA.BBB.CCC.DDD port 58851 process 870367 closed on Wed Jul 18 10:45:20 2018.

    Please, help me to understand what is going on and to fix it. Thank you for your help!

    #19053
    AvatarHaven
    Contributor

    Hello 7777777,
    It looks like nxserver processes that should handle the connection were killed:
    Warning: Handler process 869396 died with signal 9, SIGKILL.
    Warning: Handler process 869916 died with signal 9, SIGKILL.
    Warning: Handler process 870367 died with signal 9, SIGKILL.

    We need nxserver.log to check the possible reason if possible with debug enabled.
    Please follow the instructions here:
    https://www.nomachine.com/AR07K00677

    Please also check system logs for possible issues with those processes.

    • This reply was modified 2 years, 10 months ago by AvatarHaven.
    #19055
    Avatar7777777
    Participant

    Below is the corresponding section of the nxserver.log. The problem happened 2 times yesterday. Each time the solution was to kill the nxnode.bin process and then I was able to login again by creating a new session. It might be somehow caused by mutiprocessing plotting with python matplotlib. Both times the freeze happened, I was doing that.

    2018-07-12 10:29:27 858.909 2388042 NXSERVER User ‘myuser’ logged in from ‘AAA.BBB.CCC.DDD’ using authentication method password.

    2018-07-13 10:17:24 329.396 747572 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec /usr/NX/scripts/restricted/nxupdate.sh abbipub background check’ with pid ‘2825183/2825183’ finished with exit code 110 after 90,629 seconds.

    2018-07-15 10:17:40 015.121 747572 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec /usr/NX/scripts/restricted/nxupdate.sh abbipub background check’ with pid ‘3725972/3725972’ finished with exit code 110 after 91,225 seconds.

    2018-07-17 10:17:57 018.808 747572 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec /usr/NX/scripts/restricted/nxupdate.sh abbipub background check’ with pid ‘419205/419205’ finished with exit code 110 after 90,609 seconds.

    2018-07-18 10:42:38 880.964 869208 NXSERVER User ‘myuser’ logged in from ‘AAA.BBB.CCC.DDD’ using authentication method password.

    2018-07-18 10:42:44 247.206 869208 NXSERVER WARNING! Session 24C3934C79F03054F1ED0198336CAA48 reached timeout during termination.

    2018-07-18 10:42:44 248.430 869208 NXSERVER WARNING! Killing session 24C3934C79F03054F1ED0198336CAA48 nxnode with PID 2388065.

    2018-07-18 10:42:44 372.414 2388042 NXSERVER WARNING! Connection to the 24C3934C79F03054F1ED0198336CAA48 nxnode closed unexpectedly.

    2018-07-18 10:42:44 435.893 869208 NXSERVER WARNING! Session 24C3934C79F03054F1ED0198336CAA48 server with PID 2388042 has been killed.

    2018-07-18 10:43:19 995.016 869396 NXSERVER User ‘myuser’ logged in from ‘AAA.BBB.CCC.DDD’ using authentication method password.

    2018-07-18 10:43:25 239.512 869396 NXSERVER WARNING! Session 960784948E1967BABD7EDFAFD82CA749 reached timeout during termination.

    2018-07-18 10:43:25 240.716 869396 NXSERVER WARNING! Killing session 960784948E1967BABD7EDFAFD82CA749 nxnode with PID 869268.

    2018-07-18 10:43:25 521.480 869208 NXSERVER WARNING! Connection to the 960784948E1967BABD7EDFAFD82CA749 nxnode closed unexpectedly.

    2018-07-18 10:43:25 536.241 869208 NXSERVER ERROR! Status ‘Failed’ for ‘started’ session with Id: ‘960784948E1967BABD7EDFAFD82CA749’

    2018-07-18 10:43:25 650.791 869208 NXSERVER User ‘myuser’ from ‘AAA.BBB.CCC.DDD’ logged out.

    2018-07-18 10:43:25 651.642 869208 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec –node –user myuser –priority realtime –mode 0 –pid 15 -H 5’ with pid ‘869261/869261’ finished with exit code 9 after 39,905 seconds.

    2018-07-18 10:43:25 651.756 869208 NXSERVER WARNING! NXExec process 869261 finished abnormally with exit code 9.

    2018-07-18 10:43:25 804.629 869396 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec nxkill.sh 869261 9 ‘ with pid ‘869557/869557’ finished with exit code 1 after 0,259 seconds.

    2018-07-18 10:43:25 831.781 869396 NXSERVER WARNING! Session 960784948E1967BABD7EDFAFD82CA749 server with PID 869208 has been killed.

    2018-07-18 10:43:26 459.336 869658 NXNODE WARNING! Cannot write to FD#1.

    2018-07-18 10:43:26 459.449 869658 NXNODE WARNING! Error is: 32, ‘Broken pipe’.

    2018-07-18 10:43:26 459.548 869658 NXNODE ERROR! Failed to send message: NX> 1000 NoMachine Node – Version 6.2.4: write to FD#1 failed: Broken pipe

    2018-07-18 10:43:58 845.532 869916 NXSERVER User ‘myuser’ logged in from ‘AAA.BBB.CCC.DDD’ using authentication method password.

    2018-07-18 10:44:04 037.228 869916 NXSERVER WARNING! Session 799256F9BEFC2821F39529D9F719F9E6 reached timeout during termination.

    2018-07-18 10:44:04 038.675 869916 NXSERVER WARNING! Killing session 799256F9BEFC2821F39529D9F719F9E6 nxnode with PID 869740.

    2018-07-18 10:44:04 090.591 869396 NXSERVER WARNING! Connection to the 799256F9BEFC2821F39529D9F719F9E6 nxnode closed unexpectedly.

    2018-07-18 10:44:04 121.412 869396 NXSERVER ERROR! Status ‘Failed’ for ‘started’ session with Id: ‘799256F9BEFC2821F39529D9F719F9E6’

    2018-07-18 10:44:04 146.607 869396 NXSERVER User ‘myuser’ from ‘AAA.BBB.CCC.DDD’ logged out.

    2018-07-18 10:44:04 147.447 869396 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec –node –user myuser –priority realtime –mode 0 –pid 16 -H 5’ with pid ‘869688/869688’ finished with exit code 9 after 38,062 seconds.

    2018-07-18 10:44:04 147.583 869396 NXSERVER WARNING! NXExec process 869688 finished abnormally with exit code 9.

    2018-07-18 10:44:04 175.569 869916 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec nxkill.sh 869688 9 ‘ with pid ‘869949/869949’ finished with exit code 1 after 0,090 seconds.

    2018-07-18 10:44:04 189.393 869916 NXSERVER WARNING! Session 799256F9BEFC2821F39529D9F719F9E6 server with PID 869396 has been killed.

    2018-07-18 10:44:05 349.451 869973 NXNODE WARNING! Cannot write to FD#1.

    2018-07-18 10:44:05 349.586 869973 NXNODE WARNING! Error is: 32, ‘Broken pipe’.

    2018-07-18 10:44:05 349.675 869973 NXNODE ERROR! Failed to send message: NX> 1000 NoMachine Node – Version 6.2.4: write to FD#1 failed: Broken pipe

    2018-07-18 10:44:37 314.630 870367 NXSERVER User ‘myuser’ logged in from ‘AAA.BBB.CCC.DDD’ using authentication method password.

    2018-07-18 10:44:42 540.904 870367 NXSERVER WARNING! Session 3959479F1CB5AC18ED88C738520D864F reached timeout during termination.

    2018-07-18 10:44:42 542.619 870367 NXSERVER WARNING! Killing session 3959479F1CB5AC18ED88C738520D864F nxnode with PID 870004.

    2018-07-18 10:44:42 590.532 869916 NXSERVER WARNING! Connection to the 3959479F1CB5AC18ED88C738520D864F nxnode closed unexpectedly.

    2018-07-18 10:44:42 615.176 869916 NXSERVER ERROR! Status ‘Failed’ for ‘started’ session with Id: ‘3959479F1CB5AC18ED88C738520D864F’

    2018-07-18 10:44:42 664.803 870367 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec nxkill.sh 869984 9 ‘ with pid ‘870399/870399’ finished with exit code 1 after 0,073 seconds.

    2018-07-18 10:44:42 679.722 869916 NXSERVER User ‘myuser’ from ‘AAA.BBB.CCC.DDD’ logged out.

    2018-07-18 10:44:42 680.838 869916 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec –node –user myuser –priority realtime –mode 0 –pid 16 -H 5’ with pid ‘869984/869984’ finished with exit code 9 after 38,356 seconds.

    2018-07-18 10:44:42 680.994 869916 NXSERVER WARNING! NXExec process 869984 finished abnormally with exit code 9.

    2018-07-18 10:44:42 690.268 870367 NXSERVER WARNING! Session 3959479F1CB5AC18ED88C738520D864F server with PID 869916 has been killed.

    2018-07-18 10:44:43 393.819 870424 NXNODE WARNING! Cannot write to FD#1.

    2018-07-18 10:44:43 393.930 870424 NXNODE WARNING! Error is: 32, ‘Broken pipe’.

    2018-07-18 10:44:43 394.030 870424 NXNODE ERROR! Failed to send message: NX> 1000 NoMachine Node – Version 6.2.4: write to FD#1 failed: Broken pipe

    2018-07-18 10:45:15 667.082 870696 NXSERVER User ‘myuser’ logged in from ‘AAA.BBB.CCC.DDD’ using authentication method password.

    2018-07-18 10:45:20 843.809 870696 NXSERVER WARNING! Session 1B89660A25493C78E6EB2EBA51DF2871 reached timeout during termination.

    2018-07-18 10:45:20 845.016 870696 NXSERVER WARNING! Killing session 1B89660A25493C78E6EB2EBA51DF2871 nxnode with PID 870466.

    2018-07-18 10:45:20 888.091 870367 NXSERVER WARNING! Connection to the 1B89660A25493C78E6EB2EBA51DF2871 nxnode closed unexpectedly.

    2018-07-18 10:45:20 908.204 870367 NXSERVER ERROR! Status ‘Failed’ for ‘started’ session with Id: ‘1B89660A25493C78E6EB2EBA51DF2871’

    2018-07-18 10:45:20 925.091 870367 NXSERVER User ‘myuser’ from ‘AAA.BBB.CCC.DDD’ logged out.

    2018-07-18 10:45:20 925.895 870367 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec –node –user myuser –priority realtime –mode 0 –pid 16 -H 5’ with pid ‘870425/870425’ finished with exit code 9 after 38,117 seconds.

    2018-07-18 10:45:20 926.014 870367 NXSERVER WARNING! NXExec process 870425 finished abnormally with exit code 9.

    2018-07-18 10:45:20 964.980 870696 NXSERVER WARNING! Process ‘/usr/NX/bin/nxexec nxkill.sh 870425 9 ‘ with pid ‘870728/870728’ finished with exit code 1 after 0,076 seconds.

    2018-07-18 10:45:20 979.002 870696 NXSERVER WARNING! Session 1B89660A25493C78E6EB2EBA51DF2871 server with PID 870367 has been killed.

    2018-07-18 10:45:21 687.710 870760 NXNODE WARNING! Cannot write to FD#1.

    2018-07-18 10:45:21 687.822 870760 NXNODE WARNING! Error is: 32, ‘Broken pipe’.

    2018-07-18 10:45:21 687.908 870760 NXNODE ERROR! Failed to send message: NX> 1000 NoMachine Node – Version 6.2.4: write to FD#1 failed: Broken pipe

    #19058
    AvatarHaven
    Contributor

    Hi 7777777,
    It looks we have some problems with establishing the connection to nxagent.
    Do you accessing physical desktop on this machine or is it a headless Linux?
    We would need ‘node’ directory to analyze nxnode working directories.
    You could attach it here or send to forum[at]nomachine[dot]com.
    Please reference your forum topic when sending the logs.

    #19059
    Avatar7777777
    Participant

    Hello haven,

    I connect to a virtual machine Linux, but if it is more reliable, I could connect to a physical server instead.

    I will send the node directory in a few seconds to the e-mail address.

    Thank you for your help!

    #19068
    AvatarHaven
    Contributor

    Hi 7777777,
    Logs in nxnodes working directories do not contain any particularity error that could point the problem.
    We need debug logs. Please enable debug level log in server.cfg and node.cfg files on the server.
    Please also set SessionLogClean on 0. Then restart NoMachine and after issue occurs upload full var/log
    directory to forum[at]nomachine[dot]com.

    What Linux version are you using in the virtual machine?

    #19075
    Avatar7777777
    Participant

    Hi haven,

    It might take me some time to collect debug logs since I use NoMachine at work and restarting the server would mean killing running applications, which I would do if there is no other way to solve the frozen NoMachine problem. But if/when it happens again, I will be happy to help.

    I use “Red Hat Enterprise Linux Server release 7.4 (Maipo)” in the virtual machine (this is what ‘cat /etc/redhat-release’ gives). It is an about 1 year old distribution: https://access.redhat.com/articles/3078 But there were security updates installed on top of the base distribution (including those to mitigate Spectre/Meltdown vulnerabilities and others). Kernel version is 3.10.0-693.17.1.el7.x86_64 (‘uname -r’).

    As I wrote above, the problem might be related to mutiprocessing plotting with python matplotlib. Both times the freezes happened, I was doing that. You could try the following two pieces (paste/copy) of Python code to reproduce it:

    # input data
    import pandas as pd, matplotlib.pyplot as plt
    df = pd.DataFrame(data={'i':['A','A','B','B'],
    'x':[1.,2.,3.,4.],
    'y':[1.,2.,3.,4.]})
    df.set_index('i', inplace=True)
    df.sort_index(inplace=True)

    # function which creates a figure from the data
    def Draw(df, i):
    fig
    = plt.figure(i)
    ax
    = fig.gca()
    df
    = df.loc[i,:]
    ax
    .scatter(df[‘x’], df[‘y’])
    return fig

    def DrawWrapper(x): return Draw(*x)

    # creating figures in parallel
    from multiprocessing import Pool
    poolSize
    = 2
    with Pool(poolSize) as p:
    args
    = [(df,‘A’), (df,‘B’)]
    figs
    = p.map(DrawWrapper, args)

    # attempt to visualize the results
    fig
    = plt.figure(‘A’)
    plt
    .show()
    # FIXME: get “RuntimeError: main thread is not in main loop”

    and this

    # input data
    import pandas as pd, matplotlib.pyplot as plt
    df = pd.DataFrame(data={'i':['A','A','B','B'],
    'x':[1.,2.,3.,4.],
    'y':[1.,2.,3.,4.]})
    df.set_index('i', inplace=True)
    df.sort_index(inplace=True)

    # function which creates a figure from the data
    def Draw(df, i):
    fig
    = plt.figure(i)
    ax
    = fig.gca()
    df
    = df.loc[i,:]
    ax
    .scatter(df[‘x’], df[‘y’])
    plt
    .show()

    # creating figures in parallel
    from multiprocessing import Process
    args
    = [(df,‘A’), (df,‘B’)]
    for a in args:
    p
    = Process(target=Draw, args=a)
    p
    .start()

    # FIXME: run this piece after you run the 1st code pieces above
    # result is the same (might be even worse since I do not

    # get any result which I could attempt to show):
    # …
    # RuntimeError: main thread is not in main loop
    # RuntimeError: main thread is not in main loop

    Especially, try to cancel the running commands by repeatedly pressing Ctrl+C. I believe this is exactly what I did when the freezes happened (the code was more complex, but the relevant parts were the same). Or if you have some sandbox machine on which I could do it, I can help to attempt to reproduce the problem.

    #19095
    AvatarHaven
    Contributor

    Hi 7777777,
    We cannot reproduce the problem with our laboratory.

    Please provide as much detail as possible about the environment you are using to virtualize the machine and we will prepare ‘sandbox environment’ were you could reproduce the issue.

    In particular we are inserting in the virtualization infrastructure (KVM, VMware, VirtualBox, etc) and the machine parameters (like CPUs, RAM and any option that could be important).

    Please also provide exact NoMachine package version that you are using.

    Also I am still not sure if you are accessing physical desktop on virtual machine or you are creating new virtual session.

Viewing 8 posts - 1 through 8 (of 8 total)

Closed because the user did not provide further feedback. Please notify us if you confirm that it is resolved or open a new topic if you have the same problem.