February 15, 2019 at 19:04 #21454
We are experiencing some troubles with the NX Terminal Server (NoMachine Enterprise Terminal Server Subscription – Version 6.3.6) when more than 1,500 users do login “at the same time” (this means between a lapse of 30m).
Our users are getting connected from different parts of the country between 9:30 and 10:00 hs. When the users amount reachs about 1,500 connections the Terminal Server goes slow and next users can’t login (we have a total of 2,300 users). Everything fails and current logged in users have slow connections, so we must restart everything and go back to our contingency plan (which is Free NX).
The first approach of NX support was a “probably I/O” disk problem, that was quickly discarded because we moved the disk to a ramdisk and the problem continued.
No logs are available because being a productive environment the administrators went for full restart as soon as possible because delays in operations would end in heads being cut.
I’m trying to debug by stressing the terminal server
by using guest sessions.
In order to replicate this productive problem I’ve searched for a way to script multiple sessions and see what happens with the Terminal Server, the first article I’ve read was this one:
After reading it I enabled guest sessions, generated 100 templates for each guest user and ran a script to launch the NX sessions. Sadly I found that guest sessions were limited by some amount and stopped connecting.
By creating system users
Then I’ve enabled the USERDB and PASSWORDDB in server.cfg to create local users but in this case I’m unable to authenticate with the message:
Error: Cannot authenticate to the requested node
How I need help?
I would like to know if there’s a way to extend the limits of guest sessions to 1,500 or more.
Or how could I enable the system users to be authenticated correctly.
Best regards.February 20, 2019 at 10:04 #21510
This topic can be marked as solved.
The guest accounts do not work with multinode environments: https://www.nomachine.com/TR02Q09138
Since I can’t use productive users I could not use this approach.
H.February 22, 2019 at 22:02 #21569drichardParticipant
I would be interested in hearing more about this issue. These are virtual Linux sessions? How many nodes are you running? Are you using the Terminal Server as a node as well or just a broker for the nodes? Your terminal server being the broker receives all network traffic from the nodes and then routes it out from one interface. I have 310 users logged in right now, and it’s running around 100Mb with a peak of 240Mb. Scaling that up by a factor of 4, you might be hitting a bottleneck here if you have a 1Gb connector or infrastructure.
Use iftop to watch the traffic from and to the nodes and then the total. If your terminal server is a node, consider turning that off and making it just be the broker. There might be some ways to use heartbeat feature to increase capacities too.
Also something to consider is that when people log into a server first thing in the morning, they do so because they need to use the computer immediately. So you might have a high number of people opening browsers initially, and then slowly through the day their usage drops.February 25, 2019 at 14:26 #21578
Yes, those are virtual Linux sessions.
8 Nodes 1 terminal server.
The TS acts as broker to the nodes.
This are the stats for 165 users right now:
TX: cum: 574MB peak: 28.7Mb rates: 25.1Mb 24.1Mb 22.4Mb
RX: 572MB 28.4Mb 24.4Mb 24.0Mb 22.3Mb
TOTAL: 1.12GB 57.1Mb 49.6Mb 48.1Mb 44.7Mb
I never used this tool, checking it out!
And about this:
So you might have a high number of people opening browsers initially, and then slowly through the day their usage drops.
Our infra is quite complex. The NX clients will load with a generated template by script and load the X server via SSH, but yes, because its a bank the operations starts at certain time in the morning.
We had to rollback to our 300 servers of NX until we found out what is causing the issue.
I’ll try to read about configuring multiple network cards to see if that helps.
H.February 25, 2019 at 19:18 #21586
Btw confirmed with networking the current network has 10gb capability, that the moment of the incident this worked well.
!m technicians suggest a problem with the access to the database, they are working on a fix:
the initial results of benchmark tests in our labs allowed to identify two key points that can be improved. Database warnings have been also reproduced.
This however doesn’t happen if the same number of users is reached progressively (not all of them try to login at the same time).
Without logs from your environment we cannot confirm if the issues found in our labs are exactly the same, but in any case they need to be addressed.
1) Procedure to check users on the system slows down when the number of concurrent accesses increases.
As a workaround, add the following key to the end of /usr/NX/etc/server.cfg on the Enterprise Terminal Server host:
There’s no need to restart the server.
This ticket was about creating stress test by ourselves, I guess this can be closed as we are following the incident through the support team of NX.
This topic was marked as solved, you can't post.