Vista computer periodically refuses connections from non Vista machines

professornayr

New Member
I am the administrator for a small network of ten computers. Two are running Server 2003, five are running Vista, and the remaining three are XP professional. For some unknown reason, one of the Vista computers will periodically refuse a connection from the XP and Server 2003 machines. It appears that if we restart the Vista machine, it will allow the connection again, but then, at some point in the future, it will start denying the connection again. The message I receive is as follows:


\\Officepc is not accessible. You might not have permission to use this network resource. Contact the administrator of this server to find out if you have access permissions.

The specified server cannot perform the requested operation.


As I said before, if I restart this machine, I will have no problem accessing it again. In the meantime, all of my Vista machines can access this computer without any problems. I have logged in as myself (Domain Admin) from both types of machines to make sure it was not a user issue. Any thoughts?
 

My Computer

Interesting issue.

Do you know how to use a packet sniffer to collect a network trace? It's not hard - install the sniffer app, start a capture, repro the problem, stop and save the capture.

Wireshark (www.wireshark.org) is a great sniffer and free too.

If you can create a small trace of a failure-to-access-from-Win2k3, zip it up, and upload here, it might be possible to give you more info about what's going wrong.

Also, you might want to check the global "paged pool" and "non-paged pool" numbers on the problem Vista box. It wouldn't surprise me if a leaky driver was causing pool memory to be depleted, and hence leading to a gradual breakdown in services, though that's just a wild stab in the dark.
 

My Computer

Thanks for the advice. I will start with the packet sniffer and see if I can locate anything suspicious.

By the way, I had our secretary (the person using the Vista machine in question) restart her computer, and, of course, the problem went away. However, I suspect it may be back within a few days. I'll post again when I have been able to reproduce the problem.
 

My Computer

The problem resurfaced, so I have used Wireshark to run some tests on the connection, both from a successful computer (running Vista), and an unsuccesful attempt (running Server 2003).

I have attached both trace files to this post as txt files. The correct file extention is pcap.

The response to the 2003 Server is "Error: Out of memory", while the Vista computer connects without any problems. Does a connection between two Vista machines use a different port or protocol than between XP/Server 2003 and Vista? It appears the 2003 server tries to connect on the NBSS protocol (port 139), while Vista uses SMB2 (port 445). Is this correct?

If the problem is that port 139 on the problem computer is somehow getting locked on another connection, how do I go about fixing this.

Thanks again for any help anyone can provide!
 

Attachments

  • FailedConnection.txt
    1.9 KB · Views: 102
  • SuccessfulConnnection.txt
    11.6 KB · Views: 80

My Computer

Unless computer networks are your bread'n'butter, you've done extraordinarily well here, both in the collection of data and the way you analysed it. Have some rep :)

A bit of background...

Vista-to-Vista or Vista-to-Win2K8, or any combination thereof, will indeed use a new protocol called SMB2, as opposed to the old version which (now) gets labeled SMB1. For obvious backwards compatibility reasons, Vista and Win2K8 also operate quite happily via SMB1, although it's far slower over high-latency links. All prior versions including Win2K, XP, and Win2K3 understand only SMB1.

However, port 445 is used by both SMB2 and SMB1. It's called "direct-hosted SMB" to distinguish it from SMB messages wrapped up in a small NetBIOS session header, which the Server service listens to via TCP port 139. If you do a NETSTAT -NA on one of the Win2K3 servers, you should see it LISTENING on both 139 and 445.

By default, a Win2K3 or XP box acting as the client (initiating an SMB session) will launch a two-pronged connection attempt to both 445 and 139, since it doesn't know whether the target is NT4 (which only supported TCP139) or Win2K+ (445 also available). If it receives positive connection responses via both, it'll just dump the 139 session and continue on 445. What's slightly odd in your case is the lack of a concurrent 445 attempt from Win2K3 to Vista, but that might just be the way the bindings are configured (on Win2K3) and it's not the root cause of the failure to connect.

As you pointed out, the "out of memory" error in response to the "Negotiate" command via 139 is the lethal factor that kills the Win2K3->Vista connection attempt, and obviously it hasn't (yet) started to affect the slightly different handler on port 445.

I think your Vista box has a memory leak. The fact that it always takes about a week to manifest itself suggests a nice medium-rate leak, probably in paged or non-paged pool down in kernel-mode. Once a certain threshold is reached, stuff starts failing and it's likely that if you left the Vista box in that state for another week - without a reboot - other services would start showing symptoms of breakdown as well.

Wireshark has served its purpose and it can't reveal the reason for the "out of memory" condition, but perfmon can. If you periodically keep an eye on the "Memory\Pool paged bytes" and "Memory\Pool non-paged bytes" during the week, I've got a hunch you'll see at least one of those creeping upwards. If the Vista box is 32-bit, it's really only got several hundred MB of each of those pools, irrespective of the amount of RAM. Once the NPP utilisation reaches say 150MB or higher, the box will start doing straaange stuff.

I don't want to drone on pointlessly, just in case my hunch is wrong, so I'd be interested to hear what happens to those pool counters before suggesting how to find the "leaker" (it's a driver, but which one?).
 

My Computer

Thank you so much for all of the information! I am going to be watching the information you suggested via perfmon for the next few days, and I will be sure to post again when I see some changes. Thanks!
 

My Computer

Well, I may have a verdict concerning the pool paged/nonpaged bytes. The problem of connecting to the this specific Vista machine resurfaced today. I have been following both the paged and non paged bytes on that computer just about every day over the last week or so. The non-paged bytes has never gone above 42MB, and the paged pool has stayed fairly consistently between 110-133MB. In fact, the problem machine has somewhat similar hardware to my laptop (Intel Core 2 Duo (2.2Ghz) with 2 GB RAM), and the numbers between the two machines have stayed very consistent with one another. I understand that the specs may not have an effect on the pool paged/nonpaged bytes based on the previous post, but I suspect they were similar between these two machines because of the similar hardware. Would that be correct?

In addition, I also kept an eye on a third computer to use it for comparison. This computer has lower hardware specs, but the non-paged numbers were about the same as the other two machines. However, the paged pool was quite a bit lower, which I suspect is due to the fact it only has 1 GB RAM.

Is there something else that could be eating the memory for this?
 

My Computer

I think I speak for both of us when I say - bummer :)

A pool leak would have been the best outcome from the perspective of a relatively quick fix, but there's no use wishing. For what it's worth, the answer to your question is that given similar hardware (and therefore similar drivers) it would be expected for two machines to share roughly comparable pool utilisation figures.

OK, so the error message that percolates up to the network layer is "out of memory", and the symptom is periodic in the sense that it occurs a certain time after a reboot, but yet there's no obvious evidence of a pool memory leak. Hmmmm.... this won't be simple...

Can you run that Vista box without anti-virus for a while? Outright uninstall it. There's a chance that the AV filter driver is the culprit.

If you haven't already done so, it would be worthwhile to update all relevant drivers: NIC driver, non-default firewalls (if any), backup agents... anything on that Vista machine which may include a kernel-mode driver which would participate in file access.

The next time it happens check:

- "Memory\Committed Bytes". Is it anywhere near the combined total of the RAM size + pagefile size(s)? For example, with 2GB of RAM and say 3GB of pagefile, the "commit limit" is 5GB. Is that committed bytes counter creeping anywhere even near 5GB?

- "Process\svchost<instance>\private bytes", where the svchost instance is the one whose "ID process" counter (PID) matches the LanmanServer container when you type TASKLIST /SVC. Is the "private bytes" counter going beyond 1GB or perhaps even beyond 1.5GB?

For example, on this machine the relevant PID would (currently) be 420:

W:\>TASKLIST /SVC
Image Name PID Services
========================= ======== ============================================
...
svchost.exe 372 AudioSrv, Dhcp, Eventlog, lmhosts,
p2pimsvc, PNRPsvc, wscsvc

svchost.exe 420 AeLookupSvc, Appinfo, AppMgmt, BITS,
Browser, gpsvc, IKEEXT, iphlpsvc,
LanmanServer, MMCSS, ProfSvc, RasMan,
Schedule, seclogon, SENS, ShellHWDetection,
Themes, Winmgmt, wuauserv
So in perfmon, I'd be checking the "process\private bytes" of the svchost instance whose current "ID process" is 420 - it contains LanmanServer.

- What happens if you just restart the Server service on the Vista machine instead of rebooting (NET STOP SERVER, NET START SERVER). Does that clear up the issue without a reboot?

Anything can be fixed. It just depends on how much time you're willing to invest.
 

My Computer

First of all, thank you so much for all of the help! In addition to hopefully discovering the solution, I have learned quite a bit from all of this information.

I wrote out a rather long response a few minutes ago under the impression that we were still experiencing the problem this morning. Then I realized that an automatic backup process from one of the W2K3 servers to the problem computer ran successfully this morning at 4:00 am, which simply means that the computer must have been restarted last night. (I hate to admit that there is the slight possibility that I was the one who restarted it, but I just do not remember for sure; it's been a long week :confused:) I guess I will have to wait until the problem resurfaces to run some of the tests you suggested.

I do have one question. I will admit that I got a little lost on the "private bytes" section. However, that was only because after determining the PID for the svchost process (1172), I could not locate the corresponding service in perfmon. Under "process\private bytes" on both my laptop and the trouble machine, the svchost processes are simply numbered from 1 on up (although there is one svchost without a number). So under perfmon, I only had choices of "svchost", "svchost#1", "svchost#2", and so on. How do I determine which svchost instance in perfmon corresponds to the correct PID under the tasklist?

Again, thanks for the help!
 

My Computer

Yeah, I know what you mean about long weeks ;)

What I wrote for the "private bytes" bit is clear as mud. Sorry. You're still on the right track though.

In perfmon, each one of those svchost#1, svchost#2, svchost#3... instances will have its own "ID Process" counter. Once you use TASKLIST /SVC to work out the PID of the svchost instance in which you're interested - let's say 1172 - you can tie that up to the corresponding instance in perfmon by looking at every svchost "ID Process" until you find the one that's 1172. Only one of them will have that ID, and that's the instance whose "private bytes" you want to monitor because it contains the LanmanServer control "applet" (for lack of a better name) for the SMB Server service.

What we're trying to work out is where the "out of memory" message might be coming from. There are more sophisticated ways to do that, but they're impractical over a web forum.
 

My Computer

I have not had a chance to do a whole lot more research on the possibilities we have recently discussed, but I did discover something this morning that may be related to the issue.

The problem Vista computer has a second hard drive that is shared and used as a backup location for several parts of our system. This morning, the problem popped up again while manually running one of these backups from one of the 2K3 Servers. The reason I think this might be helpful to know is that there tends to be a large amount of data being pushed to this machine on a regular basis. We have in the neighborhood of 40GB that could potentially be moved to this hard drive on a weekly basis via the network connection. Most of the backup takes place during the night, but some take place during the day, creating a fairly consistent significant flow of data to this machine from various sources. This may not be a large amount of data relatively speaking, but it seems somewhat significant to me.

It appears that the problem I experienced this morning is not exactly the same as the previous problem, only because about 10 minutes later, the Vista machine started accepting connections from the 2K3 Server again without requiring a restart. (Huh? :confused:) However, the problem appeared to be identical for the time period that it appeared (i.e. it would accept connections from other Vista machines, but not 2K3).

I have no idea if this is helpful, but I think it might be good to mention that there is a large amount of data traveling on this connection.
 

My Computer

Okay, it looks like the problems could be related. I just ran another manual backup, and the problem started happening again. I did restart the Server service on the machine, and all seems to be well again. I may have to find an easy way to restart the Server service, but it certainly is better than restarting the computer.
 

My Computer

Because of this recurring issue and the fact that I will be out of the office for a few days, I decided to transfer the destination location for these backups to another machine running Vista Business (64-bit). I had no sooner done this than the problem started happening with that machine as well.

This tells me that this problem must be bigger than a specific driver on a specific machine. It appears to me that it may have something to do with either 1) running the backup utility on a 2K3 Server with the destination box being a Vista machine or 2) the size of the files being transferred.

Any thoughts on these new developments?
 

My Computer

I think it's more likely to be a driver that's common to both of the Vista machines. 40GB is not tiny, but it's not a gargantuan amount of data either. I've copied amounts an order of magnitute greater than that to the Vista box on which I currently type this, and it never complained.

There is theoretically no way for the Win2K3 machine to induce an "out of memory" response from the Server service on Vista by merely copying files in its direction. SMB1 is fairly simple - copy about 60KB, acknowledge, copy another 60KB, acknowledge...

One thing that I do find odd is that your Win2k3 server is not making that initial connection request on TCP445 and is confining itself to only using TCP139. Why only connections via TCP139 should be affected on the "server" Vista box - that's the key question.

Since you now appear to have a way to trigger a repro (run that backup job), I'd suggest you consider wiping one of the Vista "servers" and rebuilding it from scratch as a completely clean OS. Yes, it's a lot of work, but you've already invested more and this is a relatively complex problem which may not be practical to debug and fix over a web forum.

Once you've got a completely clean "server", rebuilt from scratch, try to repro by copying the 40GB to it. Presumably that will work, otherwise you really ought to talk to MS directly! Then, start adding things like the antivirus and other apps one-by-one, watching to see whether the file copy breaks again.

I hate rebuilds but I'm trying to advise you on the most practical course of action. We kinda painted ourselves into a "this is hard and very low-level" corner.
 

My Computer

Okay, I did not intend to disappear from this thread for three weeks. April was an extremely busy month, and I am just now getting back into the office on a normal schedule.

Because I did not have time to "fix" the problem earlier, I simply changed the backup location to another computer. It worked fine for awhile, but it looks like it has started having the same problem within the last week or so.

Concerning the idea of wiping one of the Vista machines clean, that is actually what I had done just prior to switching over the backup to this new computer. We had just purchased the computer several weeks ago, and it had Vista Home Premium as the OS, so I reformatted the hard drive and installed Vista Business. In addition to that, I have not had anti-virus running on the computer was originally having the problem, and it has never been installed on this new computer.

In fact, there are very few programs (or even hardware) that overlap between the two computers. With that being said, there is one program that I think would have the greatest chance of being the culprit. The program is called SyncBack, and we use it to automatically run some specific backup functions on our network. I am going to test a couple of things related to the program and see if it could possibly be related.

By the way, I cannot remember if I mentioned this earlier, but restarting the "Server" service does fix the problem immediately.

________________________________

I just discovered something interesting that might be related. I needed to use the new computer that is currently the target for the forementioned backup operations. It seemed a little slow when I logged on, so I opened the task manager and found that 5.7 GB of the 6 GB of RAM were being used, even though I was the only user on the computer and there were no applications running.

However, I was also aware that one of the Windows Server Backup processes was currently running and backing up its information on this computer. From what I can tell, the primary process that is eating up the RAM is one of the svchost.exe processes. As soon as the backup process completed, the memory in used dropped from 5.7 GB to 858 MB, which is where I expected it to be when I logged on. However, the svchost.exe process did not drop, or at least as much as it had gained.

So it appears to me (although I rarely know what is actually happening) that when the backup processes are running, they essentially use all available RAM on the target machine. My guess is that somehow, even though the RAM is made available again once the backup process has completed, it is somehow creating the "Out of Memory" message.
 
Last edited:

My Computer

First, I apologize for the length of the last post. Then again, this one will probably be as long.

I think I am finally on to what service might be causing the problem. H2SO4 mentioned to watch the svchost process associated with the LanmanServer (PID 320). This may still be related, but here is the other information I have discovered.

The svchost process that is eating up the RAM is svchost.exe PID 232, which in my case encompasses the following services:

· AudioEndpointBuilder
· CscService
· EMDMgmt
· Hidserv
· Netman
· PcaSvc
· SysMain
· TabletInputService
· TrkWks
· UmRdpService
· UxSms
· WdiSystemHost
· WPDBusEnum
· Wudfsvc

In task manager, the svchost.exe process associated with these processes regularly gets significantly larger as the Backup process is run from the server. When the backup process finishes, this process retains most of the Memory (Private Working Set) that it had while the process was still working.

For example, I just restarted the computer and this process was using about 170 MB. After running one backup process, it jumped up to about 190 MB. By the time I have run the same backup process two or three times, it might be as high as 220 MB. It may drop a little, but it seems to grow much more than it shrinks back to after the backup has completed. I'm not sure if the attached picture will help, but at least it will give you an idea of what I am seeing.

It also appears that this might be corresponding with what is happening in Perfmon. When the backup process is running, this service sees the most activity. And all of the different meters associated with this service seem to grow. Is there something in one of the processes above that would be affected by either the backup process writing to this computer or any type of network activity going to this computer?
 

Attachments

  • Perfmon - TskMng.jpg
    Perfmon - TskMng.jpg
    55 KB · Views: 25

My Computer

First, I apologize for the length of the last post. Then again, this one will probably be as long.

I think I am finally on to what service might be causing the problem. H2SO4 mentioned to watch the svchost process associated with the LanmanServer (PID 320). This may still be related, but here is the other information I have discovered.

The svchost process that is eating up the RAM is svchost.exe PID 232, which in my case encompasses the following services:

· AudioEndpointBuilder
· CscService
· EMDMgmt
· Hidserv
· Netman
· PcaSvc
· SysMain
· TabletInputService
· TrkWks
· UmRdpService
· UxSms
· WdiSystemHost
· WPDBusEnum
· Wudfsvc

In task manager, the svchost.exe process associated with these processes regularly gets significantly larger as the Backup process is run from the server. When the backup process finishes, this process retains most of the Memory (Private Working Set) that it had while the process was still working.

For example, I just restarted the computer and this process was using about 170 MB. After running one backup process, it jumped up to about 190 MB. By the time I have run the same backup process two or three times, it might be as high as 220 MB. It may drop a little, but it seems to grow much more than it shrinks back to after the backup has completed. I'm not sure if the attached picture will help, but at least it will give you an idea of what I am seeing.

It also appears that this might be corresponding with what is happening in Perfmon. When the backup process is running, this service sees the most activity. And all of the different meters associated with this service seem to grow. Is there something in one of the processes above that would be affected by either the backup process writing to this computer or any type of network activity going to this computer?

Hello again :)

If it wasn't for your "out of memory" symptom, I'd be tempted to conclude that the increase in working set size is normal - the memory should be released when necessary. However, under the circumstances it would probably be useful to understand that footprint increase. Suggestions:

1) Try splitting off some of those services into their own SVCHOST instances with this syntax from an elevated CMD prompt:

SC CONFIG <service> TYPE= OWN

Note the lack of space before the 'equals' symbol and the space just after it. Once the machine is rebooted, the service will no longer inhabit the same SVCHOST container (it'll have its own), so it should be easier to deduce which service is responsible for the leak-like behaviour. Once you're done with the investigation, it's best to re-integrate all of those services back into their shared SVCHOST:

SC CONFIG <service> TYPE= SHARE

2) Even once you know which specific service seems responsible, it won't necessarily be obvious why the memory usage is going ga-ga. That level of insight generally requires debugging. You might want to look through this write-up, just in case you do wish to go in that direction:

Umdhtools.exe: How to use Umdh.exe to find memory leaks

Less importantly...

3) The Process ID (PID) of each process is volatile and entirely specific to that instance of the process. If you reboot or otherwise restart any given process, its PID will change. Hence, the PID confers no useful information about which process you're referring to, unless it happens to still be running at the time. For example, Notepad.exe on your machine might be 1224 (PIDs are all divisible by 4), and then it'll be 908 the next time you start Notepad, and then 2448... it only depends on the next-available PID designator. PIDs 0 and 4 are the only "special" ones - those are the idle and system processes respectively, and since they never restart their PIDs never change.
 

My Computer

Sounds good. I had noticed that the PID changed when I restarted the computer. It's good to know why.

I will try splitting the services and see if I can figure out which one is primarily affected.

Is it normal for RAM to be used up like that during a normal "copy and paste" or hard drive write action across a network? I know it is a little different for a backup process, but essentially it is just writing to a hard drive across a network. Going back to an earlier post, you mentioned...

There is theoretically no way for the Win2K3 machine to induce an "out of memory" response from the Server service on Vista by merely copying files in its direction. SMB1 is fairly simple - copy about 60KB, acknowledge, copy another 60KB, acknowledge...

It sounds to me like it should not start using system RAM during a file copy. But then again, I am learning a LOT during this process.
 

My Computer

Is it normal for RAM to be used up like that during a normal "copy and paste" or hard drive write action across a network? I know it is a little different for a backup process, but essentially it is just writing to a hard drive across a network. Going back to an earlier post, you mentioned...

There is theoretically no way for the Win2K3 machine to induce an "out of memory" response from the Server service on Vista by merely copying files in its direction. SMB1 is fairly simple - copy about 60KB, acknowledge, copy another 60KB, acknowledge...

It sounds to me like it should not start using system RAM during a file copy. But then again, I am learning a LOT during this process.

I don't know, but I'm fairly confident it's not normal. I haven't looked closely into that behaviour under Vista. On previous versions of Windows, it wouldn't be normal, because the way that the file copy is buffered at the "server" end relies on pool memory that certainly wouldn't show up as ever-increasing footprint in a SVCHOST instance. The way your machines are behaving seems to place an upper limit on the size of the file copy based on the amount of installed RAM. That just can't be right. Presumably, there's a leaking DLL somewhere in there, and it's not default, otherwise everyone would find their file copies limited by their amount of RAM.

It will be interesting to find out which particular service is responsible.
 

My Computer

I split the services into individual instances and discovered one other interesting piece of information in the process. As I mentioned before, the RAM spikes when the Backup program is running on the server. It does not appear, however, to happen during a normal file copy across the network. Furthermore, it appears that the spike does not happen during the actual backup, but rather, during the verification process. So, when the Server 2K3 machine is backing up to the Vista Business computer, things stay relatively normal; when 2K3 verifies that the file has been backed up properly, Vista Business maxes out the RAM to cooperate.

As for the service, at first, I thought it was SysMain, which is the Superfetch service. It appeared to spike slightly as the backup process first started. It also generally uses more RAM than any other service when there are no applications running. However, as the RAM started to max out, all of the services, including Superfetch, dropped in RAM to the point that the service using the most RAM was MMC, which was less than 10MB in the task manager. Superfect bottomed out at about 6MB before jumping back to just under 100MB before the backup verification process on the server finished. As soon as the verification process finished, Superfetch jumped up to 165MB, which, by the way, is about 20MB higher than where it was at before the backup process. This accounts for the reason that I thought this process was going up in memory during the process because I had not taken the time to watch it during the entire backup process.

So now I am thoroughly confused. :confused: All of the services appear to drop all of their memory in favor of a service that is not showing up in Task Manager. My primary question is this: how can a service that is not listed in Task Manager be using up all of the memory on a computer? I have made doubly sure that all processes from all uses are shown, so I know I am not missing one that should be appearing in Task Manager.

Please let me know if I am losing my mind! :D
 
Last edited:

My Computer

Back
Top