10 threads in a single program or 1 thread program ran 10 times (C++)?

10 threads in a single program or 1 thread program ran 10 times (C++)? - c++

I am wondering whether there is any difference in performance in running a single program (exe) with 10 different threads or running the program with a single thread 10 times in parallel (starting it from a .bat file) assuming the work done is the same and only the number of threads spawned by the program change?
I am developing a client/server communication program and want to test it for throughput. I'm currently learning about parallel programming and threading as wasn't sure how Windows would handle the above scenario. Will the scheduler schedule work the same way for both scenarios? Will there be a performance difference?
The machine the program is running on has 4 threads.

Threads are slightly lighter weight than processes as there are many things a process gets it's own copy of. Especially when you compare the time it takes to start a new thread, vs starting a new process (from scratch, fork where available also avoids a lot of costs). Although in either case you can generally get even better performance using a worker pool where possible rather than starting and stopping fresh processes/threads.
The other major difference is that threads by default all share the same memory while processes get their own and need to communicate through more explicit means (which may include blocks of shared memory). This might make it easier for a threaded solution to avoid copying data, but this is also one of the dangers of multithreaded programming when care is not taken in how they use the shared memory/objects.
Also there may be API's that are more orientated around a single process. For example on Windows there is IO Completion Ports which basically works on the idea of having many in-progress IO operations for different files, sockets, etc. with multiple threads (but generally far less than the number of files/sockets) handling the results as they become available through a GetQueuedCompletionStatus loop.

Related

What is the point of being able to spawn dozens of processes efficiently if only very few of them can be executed in parallel?

Erlang is very efficient in spawning new processes, but what is the point, if the CPU can only execute only e.g. 4 of them in parallel?
Therefore the rest should wait for the Erlang-"context switch".
Do you get more things done faster if you have for example 10k processes, than you would by using Java/C#/C++?

There are many reasons:
Conceptually, processes are easy to reason about. Asynchronous callbacks and promises in languages like JavaScript are harder to reason about because the code in the callbacks can change the values of variables used by other code in the thread.
Processes provide isolation for the code running inside them. A process can only affect other processes by placing messages in their mailboxes. A process cannot meddle with the state of other processes.
Processes are granular. This means:
If you have 400 processes on a 4 core machine the scheduler will make sure to distribute them across the threads in such a way as to fully utilize the 4 cores. One core is always going to be handling OS stuff, so the scheduler would likely end up giving the thread running on that core less work than the other 3 threads. But it adapts, so in any situation the scheduler will do it's best to make sure processes wait as little as possible and threads always have a queue of processes waiting for CPU time.
Moving to better hardware with more cores doesn't require changes to the code or architecture of the application. Moving your Erlang application from a 4 core machine to a 64 core machine will mean your application will run about ~16 times faster without any changes, assuming your application is structured in such a way that it can take advantage of the extra cores (usually this means making sure tasks that could be done in parallel are executed in separate processes).
Processes are very lightweight, so there is very little overhead. In most applications the benefits provided by processes and the scheduler far outweigh the small overhead from running thousands of processes. Commodity hardware can easily handle hundreds of thousands of processes.
So in closing, whether or not processes execute in parallel isn't that important. The other benefits they provide are enough to justify their usage.

Parallel Thread Execution to achieve performance

I am little bit confused in multithreading. Actually we create multiple threads for breaking the main process to subprocess for achieving responsiveness and for removing waiting time.
But Here I got a situation where I have to execute the same task using multiple threads parallel.
And My processor can execute 4 threads parallel and so Will it improve the performance if I create more that 4 threads(10 or more). When I put this question to my colleague he is telling that nothing will happen we are already executing many threads in many other applications like browser threads, kernel threads, etc so he is telling to create multiple threads for the same task.
But if I create more than 4 threads that will execute parallel will not create more context switch and decrease the performance.
Or even though we create multiple thread for executing parallely the will execute one after the other so the performance will be the same.
So what to do in the above situations and are these correct?
edit
1 thread worked. time to process 120 seconds.
2 threads worked. time to process is about 60 seconds.
3 threads created. time to process is about 60 seconds.(not change to the time of 2 threads.)
Is it because, my hardware can only create 2 threads(for being dual)?
software thread=piece of code
Hardware thread=core(processor) for running software thread.
So my CPU support only 2 concurrent threads so if I purchase a AMD CPU which is having 8 cores or 12 cores can I achieve higher performance?

Multi-Tasking is pretty complex and performance gains usually depend a lot on the problem itself:
Only a part of the application can be worked in parallel (there is always a first part that splits up the work into multiple tasks). So the first question is: How much of the work can be done in parallel and how much of it needs to be synchronized (in some cases, you can stop here because so little can be done in parallel that the whole work isn't worth it).
Multiple tasks may depend on each other (one task may need the result of another task). These tasks cannot be executed in parallel.
Multiple tasks may work on the same data/resources (read/write situation). Here we need to synchronize access to this data/resources. If all tasks needs write access to the same object during the WHOLE process, then we cannot work in parallel.
Basically this means that without the exact definition of the problem (dependencies between tasks, dependencies on data, amount of parallel tasks, ...) it's very hard to tell how much performance you'll gain by using multiple threads (and if it's really worth it).

http://en.wikipedia.org/wiki/Amdahl%27s_law
Amdahl's states in a nutshell that the performance boost you receive from parallel execution is limited by your code that must run sequentially.
Without knowing your problem space here are some general things you should look at:
Refactor to eliminate mutex/locks. By definition they force code to run sequentially.
Reduce context switch overhead by pinning threads to physical cores. This becomes more complicated when threads must wait for work (ie blocking on IO) but in general you want to keep your core as busy as possible running your program not switching out threads.
Unless you absolutely need to use threads and sync primitives try use a task scheduler or parallel algorithms library to parallelize your work. Examples would be Intel TBB, Thrust or Apple's libDispatch.

Benefits of a multi thread program in a unicore system [duplicate]

This question already has answers here:
How can multithreading speed up an application (when threads can't run concurrently)?
(9 answers)
Closed 9 years ago.
My professor causally mentioned that we should program multi-thread programs even if we are using a unicore processor however because of the lack of time , he did not elaborate on it .
I would like to know what are the benefits of a multi-thread program in a unicore processor ??

It won't be as significant as a multi-core system but it can still provide some benefits.
Mainly all the benefits that you are going to get will be regarding to the context switch that will happen after a input miss to the already executing thread. Executing thread may be waiting for anything such as a hardware resource or a branch mis-prediction or even data transfer after a cache miss.
At this point the waiting thread can be executed to benefit from this "waiting time". But of course context switch will take some time. Also managing threads inside the code rather than sequential computation can create some extra complexity to your program. And as it has been said, some applications needs to be multi-threaded so there is no escape from the context switch in some cases.

Some applications need to be multi-threaded. Multi-threading isn't just about improving performance by using more cores, it's also about performing multiple tasks at once.
Take Skype for example - The GUI needs to be able to accept the text you're entering, display it on the screen, listen for new messages coming from the user you're talking to, and display them. This wouldn't be a trivial task in a single threaded application.
Even if there's only one core available, the OS thread scheduler will give you the illusion of parallelism.

Usually it is about not blocking. Running many threads on a single core still gives the illusion of concurrency. So you can have, say, a thread doing IO while another one does user interactions. The user interaction thread is not blocked while the other does IO, so the user is free to carry on interacting.

Benefits could be different.
One of the widely used examples is the application with GUI, which supposed to perform some kind of computations. If you will have a single thread - the user will have to wait the result before dealing something else with the application, but if you start it in the separate thread - user interface could be still available for user during the computation process. So, multi-thread program could emulate multi-task environment even on a unicore system. That's one of the points.

As others have already mentioned, not blocking is one application. Another one is separation of logic for unrelated tasks that are to be executed simultaneously. Using threads for that leaves handling of scheduling these tasks to the OS.
However, note that it may also be possible to implement similar behavior using asynchronous operations in a single thread. "Future" and boost::asio provide ways of doing non-blocking stuff without necessarily resorting to multiple threads.

I think it depends a bit on how exactly you design your threads and which logic is actually in the thread. Some benefits you can even get on a single core:
A thread can wrap a blocking/long-during call you can't circumvent otherwise. For some operations there are polling mechanisms, but not for all.
A thread can wrap an almost standalone part of your application that has virtually no interaction with other code. For example background polling for updates, monitoring some resource (e.g. free storage), checking internet connectivity. If you keep them in a separate thread you can keep the code relatively simple in its own 'runtime' without caring too much about the impact on the main program, the sole communication with the main logic is usually a single 'event'.
In some environments you might get more processing time. This mainly depends on how your OS scheduling system works, but if this allocates time per thread, the more threads you have the more your app will be scheduled.
Some benefits long-term:
Where it's not hard to do you benefit if your hardware evolves. You never know what's going to happen, today your app runs on a single-core embedded device, tomorrow that embedded device gets a quad core. Programming threaded from the beginning improves your future scalability.
One example is an environment where you can deterministically assign work to a thread, e.g. based on some hash all related operations end up in the same thread. The advantage for single cores is 'small' but it's not hard to do as you need little synchronization primitives so the overhead stays small.
That said, I think there are situations where it's very ill advise:
As soon as your required synchronization mechanism with other threads becomes complex (e.g. multiple locks, lots of critical sections, ...). It might still be then that multi-threading gives you a benefit when effectively moving to multiple CPUs, but the overhead is huge both for your single core and your programming time.

For instance think about operations that block because of slow peripheral devices (harddisk access etc.). While these are waiting, even the single core can do other things asyncronously.

In a lot of applications the bottleneck is not CPU processing power. So when the program flow is waiting for completion of IO requests (user input, network/disk IO), critical resources to be available, or any sort of asynchroneously triggered events, the CPU can be scheduled to do other work instead of just blocking.
In this case you don't necessarily need multiple threads that can actually run in parallel. Cooperative multi-tasking concepts like asynchroneous IO, coroutines, or fibers come into mind.
If however the application's bottleneck is CPU processing power (constantly 100% CPU usage), then it makes sense to increase the number of CPUs available to the application. At that point it is easier to scale the application up to use more CPUs if it was designed to run in parallel upfront.

As far as I can see, one answer was not yet given:
You will have to write multithreaded applications in the future!
The average number of cores will double every 18 months in the future. People have learned single-threaded programming for 50 years now, and now they are confronted with devices that have multiple cores. The programming style in a multi-threaded environment differs significantly from single-threaded programming. This refers to low-level aspects like avoiding race conditions and proper synchronization, as well as the high-level aspects like the general algorithm design.
So in addition to the points already mentioned, it's also about writing future-proof software, scalability and the development of the skills that are required to achieve these goals.

Multithreading vs multiprocessing

I am new to this kind of programming and need your point of view.
I have to build an application but I can't get it to compute fast enough. I have already tried Intel TBB, and it is easy to use, but I have never used other libraries.
In multiprocessor programming, I am reading about OpenMP and Boost for the multithreading, but I don't know their pros and cons.
In C++, when is multi threaded programming advantageous compared to multiprocessor programming and vice versa?Which is best suited to heavy computations or launching many tasks...? What are their pros and cons when we build an application designed with them? And finally, which library is best to work with?

Multithreading means exactly that, running multiple threads. This can be done on a uni-processor system, or on a multi-processor system.
On a single-processor system, when running multiple threads, the actual observation of the computer doing multiple things at the same time (i.e., multi-tasking) is an illusion, because what's really happening under the hood is that there is a software scheduler performing time-slicing on the single CPU. So only a single task is happening at any given time, but the scheduler is switching between tasks fast enough so that you never notice that there are multiple processes, threads, etc., contending for the same CPU resource.
On a multi-processor system, the need for time-slicing is reduced. The time-slicing effect is still there, because a modern OS could have hundred's of threads contending for two or more processors, and there is typically never a 1-to-1 relationship in the number of threads to the number of processing cores available. So at some point, a thread will have to stop and another thread starts on a CPU that the two threads are sharing. This is again handled by the OS's scheduler. That being said, with a multiprocessors system, you can have two things happening at the same time, unlike with the uni-processor system.
In the end, the two paradigms are really somewhat orthogonal in the sense that you will need multithreading whenever you want to have two or more tasks running asynchronously, but because of time-slicing, you do not necessarily need a multi-processor system to accomplish that. If you are trying to run multiple threads, and are doing a task that is highly parallel (i.e., trying to solve an integral), then yes, the more cores you can throw at a problem, the better. You won't necessarily need a 1-to-1 relationship between threads and processing cores, but at the same time, you don't want to spin off so many threads that you end up with tons of idle threads because they must wait to be scheduled on one of the available CPU cores. On the other hand, if your parallel tasks requires some sequential component, i.e., a thread will be waiting for the result from another thread before it can continue, then you may be able to run more threads with some type of barrier or synchronization method so that the threads that need to be idle are not spinning away using CPU time, and only the threads that need to run are contending for CPU resources.

There are a few important points that I believe should be added to the excellent answer by #Jason.
First, multithreading is not always an illusion even on a single processor - there are operations that do not involve the processor. These are mainly I/O - disk, network, terminal etc. The basic form for such operation is blocking or synchronous, i.e. your program waits until the operation is completed and then proceeds. While waiting, the CPU is switched to another process/thread.
if you have anything you can do during that time (e.g. background computation while waiting for user input, serving another request etc.) you have basically two options:
use asynchronous I/O: you call a non-blocking I/O providing it with a callback function, telling it "call this function when you are done". The call returns immediately and the I/O operation continues in the background. You go on with the other stuff.
use multithreading: you have a dedicated thread for each kind of task. While one waits for the blocking I/O call, the other goes on.
Both approaches are difficult programming paradigms, each has its pros and cons.
with async I/O the logic of the program's logic is less obvious and is difficult to follow and debug. However you avoid thread-safety issues.
with threads, the challange is to write thread-safe programs. Thread safety faults are nasty bugs that are quite difficult to reproduce. Over-use of locking can actually lead to degrading instead of improving the performance.
(coming to the multi-processing)
Multithreading made popular on Windows because manipulating processes is quite heavy on Windows (creating a process, context-switching etc.) as opposed to threads which are much more lightweight (at least this was the case when I worked on Win2K).
On Linux/Unix, processes are much more lightweight. Also (AFAIK) threads on Linux are implemented actually as a kind of processes internally, so there is no gain in context-switching of threads vs. processes. However, you need to use some form of IPC (inter-process communications), as shared memory, pipes, message queue etc.
On a more lite note, look at the SQLite FAQ, which declares "Threads are evil"! :)

To answer the first question:
The best approach is to just use multithreading techniques in your code until you get to the point where even that doesn't give you enough benefit. Assume the OS will handle delegation to multiple processors if they're available.
If you actually are working on a problem where multithreading isn't enough, even with multiple processors (or if you're running on an OS that isn't using its multiple processors), then you can worry about discovering how to get more power. Which might mean spawning processes across a network to other machines.
I haven't used TBB, but I have used IPP and found it to be efficient and well-designed. Boost is portable.

Just wanted to mention that the Flow-Based Programming ( http://www.jpaulmorrison.com/fbp ) paradigm is a naturally multiprogramming/multiprocessing approach to application development. It provides a consistent application view from high level to low level. The Java and C# implementations take advantage of all the processors on your machine, but the older C++ implementation only uses one processor. However, it could fairly easily be extended to use BOOST (or pthreads, I assume) by the use of locking on connections. I had started converting it to use fibers, but I'm not sure if there's any point in continuing on this route. :-) Feedback would be appreciated. BTW The Java and C# implementations can even intercommunicate using sockets.

What is process and thread?

Yes, I have read many materials related to operating system. And I am still reading. But it seems all of them are describing the process and thread in a "abstract" way, which makes a lot of high level elabration on their behavior and logic orgnization. I am wondering what are they physically? In my opinion, they are just some in-memory "data structures" which are maintained and used by the kernel codes to facilitate the execution of program. For example, operating system use some process data structure (PCB) to describe the aspects of the process assigned for a certain program, such as its priority, its address space and so on. Is this all right?

First thing you need to know to understand the difference between a process and a thread, is a fact, that processes do not run, threads do.
So, what is a thread? Closest I can get explaining it is an execution state, as in: a combination of CPU registers, stack, the lot. You can see a proof of that, by breaking in a debugger at any given moment. What do you see? A call stack, a set of registers. That's pretty much it. That's the thread.
Now, then, what is a process. Well, it's a like an abstract "container" entity for running threads. As far as OS is concerned in a first approximation, it's an entity OS allocates some VM to, assigns some system resources to (like file handles, network sockets), &c.
How do they work together? The OS creates a "process" by reserving some resources to it, and starting a "main" thread. That thread then can spawn more threads. Those are the threads in one process. They more or less can share those resources one way or another (say, locking might be needed for them not to spoil the fun for others &c). From there on, OS is normally responsible for maintaining those threads "inside" that VM (detecting and preventing attempts to access memory which doesn't "belong" to that process), providing some type of scheduling those threads, so that they can run "one-after-another-and-not-just-one-all-the-time".

Normally when you run an executable like notepad.exe, this creates a single process. These process could spawn other processes, but in most cases there is a single process for each executable that you run. Within the process, there can be many threads. Usually at first there is one thread, which usually starts at the programs "entry point" which is the main function usually. Instructions are executed one by one in order, like a person who only has one hand, a thread can only do one thing at a time before it moves on to the next.
That first thread can create additional threads. Each additional thread has it's own entry point, which is usually defined with a function. The process is like a container for all the threads that have been spawned within it.
That is a pretty simplistic explanation. I could go into more detail but probably would overlap with what you will find in your textbooks.
EDIT: You'll notice there are lot's of "usually"'s in my explanation, as there are occasionally rare programs that do things drastically different.

One of the reasons why it is pretty much impossible to describe threads and processes in a non-abstract way is that they are abstractions.
Their concrete implementations differ tremendously.
Compare for example an Erlang Process and a Windows Process: an Erlang Process is very lightweight, often less than 400 Bytes. You can start 10 million processes on a not very recent laptop without any problems. They start up very quickly, they die very quickly and you are expected to be able to use them for very short tasks. Every Erlang Process has its own Garbage Collector associated with it. Erlang Processes can never share memory, ever.
Windows Processes are very heavy, sometimes hundreds of MiBytes. You can start maybe a couple of thousand of them on a beefy server, if you are lucky. They start up and die pretty slowly. Windows Processes are the units of Applications such as IDEs or Text Editors or Word Processors, so they are usually expected to live quite a long time (at least several minutes). They have their own Address Space, but no Garbage Collector. Windows Processes can share memory, although by default they don't.
Threads are a similar matter: an NPTL Linux Thread on x86 can be as small as 4 KiByte and with some tricks you can start 800000+ on a 32 Bit x86 machine. The machine will certainly be useable with thousands, maybe tens of thousands of threads. A .NET CLR Thread has a minimum size of about 1 MiByte, which means that just 4000 of those will eat up your entire address space on a 32 Bit machine. So, while 4000 NPTL Linux Threads is generally not a problem, you can't even start 4000 .NET CLR Threads because you will run out of memory before that.
OS Processes and OS Threads are also implemented very differently between different Operating Systems. The main two approaches are: the kernel knows only about processes. Threads are implemented by a Userspace Library, without any knowledge of the kernel at all. In this case, there are again two approaches: 1:1 (every Thread maps to one Kernel Process) or m:n (m Threads map to n Processes, where usually m > n and often n == #CPUs). This was the early approach taken on many Operating Systems after Threads were invented. However, it is usually deemed inefficient and has been replaced on almost all systems by the second approach: Threads are implemented (at least partially) in the kernel, so that the kernel now knows about two distinct entities, Threads and Processes.
One Operating System that goes a third route, is Linux. In Linux, Threads are neither implemented in Userspace nor in the Kernel. Instead, the Kernel provides an abstraction of both a Thread and a Process (and indeed a couple of more things), called a Task. A Task is a Kernel Scheduled Entity, that carries with it a set of flags that determine which resources it shares with its siblings and which ones are private.
Depending on how you set those flags, you get either a Thread (share pretty much everything) or a Process (share all system resources like the system clock, the filesystem namespace, the networking namespace, the user ID namespace, the process ID namespace, but do not share the Address Space). But you can also get some other pretty interesting things, too. You can trivially get BSD-style jails (basically the same flags as a Process, but don't share the filesystem or the networking namespace). Or you can get what other OSs call a Virtualization Container or Zone (like a jail, but don't share the UID and PID namespaces and system clock). Since a couple of years ago via a technology called KVM (Kernel Virtual Machine) you can even get a full-blown Virtual Machine (share nothing, not even the processor's Page Tables). [The cool thing about this is that you get to reuse the highly-tuned mature Task Scheduler in the kernel for all of these things. One of the things the Xen Virtual Machine has often criticized for, was the poor performance of its scheduler. The KVM developers have a much superior scheduler than Xen, and the best thing is they didn't even have to write a single line of code for it!]
So, on Linux, the performance of Threads and Processes is much closer than on Windows and many other systems, because on Linux, they are actually the same thing. Which means that the usage patterns are very different: on Windows, you typically decide between using a Thread and a Process based on their weight: can I afford a Process or should I use a Thread, even though I actually don't want to share state? On Linux (and usually Unix in general), you decide based on their semantics: do I actually want to share state or not?
One reason why Processes tend to be lighter on Unix than on Windows, is different usage: on Unix, Processes are the basic unit of both concurrency and functionality. If you want to use concurrency, you use multiple Processes. If your application can be broken down into multiple independent pieces, you use multiple Processes. Every Process does exactly one thing and only that one thing. Even a simple one-line shell script often involves dozens or hundreds of Processes. Applications usually consist of many, often short-lived Processes.
On Windows, Threads are the basic units of concurrency and COM components or .NET objects are the basic units of functionality. Applications usually consist of a single long-running Process.
Again, they are used for very different purposes and have very different design goals. It's not that one or the other is better or worse, it's just that they are so different that the common characteristics can only be described very abstractly.
Pretty much the only few things you can say about Threads and Processes are that:
Threads belong to Processes
Threads are lighter than Processes
Threads share most state with each other
Processes share significantly less state than Threads (in particular, they generally share no memory, unless specifically requested)

I would say that :
A process has a memory space, opened files,..., and one or more threads.
A thread is an instruction stream that can be scheduled by the system on a processor.

Have a look at the detailed answer I gave previously here on SO. It gives an insight into a toy kernel structure responsible for maintaining processes and the threads...
Hope this helps,
Best regards,
Tom.

We have discussed this very issue a number of times here. Perhaps you will find some helpful information here:
What is the difference between a process and a thread
Process vs Thread
Thread and Process

A process is a container for a set of resources used while executing a program.
A process includes the following:
Private virtual address space
A program.
A list of handles.
An access token.
A unique process ID.
At least one thread.
A pointer to the parent process, whether or not the process still exists or not.
That being said, a process can contain multiple threads.
Processes themselves can be grouped into jobs, which are containers for processes and are executed as single units.
A thread is what windows uses to schedule execution of instructions on the CPU. Every process has at least one.
I have a couple of pages on my wiki you could take a look at:
Process
Thread

Threads are memory structures in the scheduler of the operating system, as you say. Threads point to the start of some instructions in memory and process these when the scheduler decides they should be. While the thread is executing, the hardware timer will run. Once it hits the desired time, an interrupt will be invoked. After this, the hardware will then stop execution of the current program, and will invoke the registered interrupt handler function, which will be part of the scheduler, to inform that the current thread has finished execution.

Physically:
Process is a structure that maintains the owning credentials, the thread list, and an open handle list
A Thread is a structure containing a context (i.e. a saved register set + a location to execute), a set of PTEs describing what pages are mapped into the process's Virtual Address space, and an owner.
This is of course an extremely simplified explanation, but it gets the important bits. The fundamental unit of execution on both Linux and Windows is the Thread - the kernel scheduler doesn't care about processes (much). This is why on Linux, a thread is just a process who happens to share PTEs with another process.

A process is a area in memory managed by the OS to run an application. Thread is a small area in memory within a process to run a dedicated task.

Processes and Threads are abstractions - there is nothing physical about them, or any other part of an
operating system for that matter. That is why we call it software.
If you view a computer in physical terms you end up with a jumble of
electronics that emulate what a Turing Machine does.
Trying to do anything useful with a raw Truing Machine would turn your brain to Jell-O in
five minutes flat. To avoid
that unpleasant experience, computer folks developed a set of abstractions to compartmentalize
various aspects of computing. This lets you focus on the level of abstraction that
interests you without having to worry about all the other stuff supporting it.
Some things have been cast into circuitry (eg. adders and the like) which makes them physical but the
vast majority of what we work with is based on a set abstractions. As a general rule, the abstractions
we use have some form of mathematical underpinning to them. This is why stacks,
queues and "state" play such an important role in computing - there is a well founded
set of mathematics around these abstractions that let us build upon and reason about
their manipulation.
The key is realizing that software is always based on a
composite of abstract models of "things". Those "things" don't always relate to
anything physical, more likely they relate some other abstraction. This is why
you cannot find a satisfactory "physical" basis for Processes and Threads
anywhere in your text books.
Several other people have posted links to and explanations about what threads and
processes are, none of them point to anything "physical" though. As you guessed, they
are really just a set of data structures and rules that live within the larger
context of an operating system (which in turn is just more data structures and rules...)
Software is like an onion, layers on layers on layers, once you peal all the layers
(abstractions) away, nothing much is left! But the onion is still very real.

It's kind of hard to give a short answer which does this question justice.
And at the risk of getting this horribly wrong and simplying things, you can say threads & processes are an operating-system/platform concept; and under-the-hood, you can define a single-threaded process by,
Low-level CPU instructions (aka, the program).
State of execution--meaning instruction pointer (really, a special register), register values, and stack
The heap (aka, general purpose memory).
In modern operating systems, each process has its own memory space. Aside shared memory (only some OS support this) the operating system forbids one process from writing in the memory space of another. In Windows, you'll see a general protection fault if a process tries.
So you can say a multi-threaded process is the whole package. And each thread is basically nothing more than state of execution.
So when a thread is pre-empted for another (say, on a uni-processor system), all the operating system has to do in principle is save the state of execution of the thread (not sure if it has to do anything special for the stack) and load in another.
Pre-empting an entire process, on the other hand, is more expensive as you can imagine.
Edit: The ideas apply in abstracted platforms like Java as well.

They are not physical pieces of string, if that's what you're asking. ;)
As I understand it, pretty much everything inside the operating system is just data. Modern operating systems depend on a few hardware requirements: virtual memory address translation, interrupts, and memory protection (There's a lot of fuzzy hardware/software magic that happens during boot, but I'm not very familiar with that process). Once those physical requirements are in place, everything else is up to the operating system designer. It's all just chunks of data.

The reason they only are mentioned in an abstract way is that they are concepts, while they will be implemented as data structures there is no universal rule how they have to be implemented.
This is at least true for the threads/processes on their own, they wont do much good without a scheduler and an interrupt timer.
The scheduler is the algorithm by which the operating system chooses the next thread to run for a limited amount of time and the interrupt timer is a piece of hardware which periodically interrupts the execution of the current thread and hands control back to the scheduler.
Forgot something: the above is not true if you only have cooperative threading, cooperative threads have to actively yield control to the next thread, which can get ugly with one thread polling for results of an other thread, which waits for the first to yield.
These are even more lightweight than other threads as they don't require support of the underlying operating system to work.

I had seen many of the answers but most of them are not clear enough for an OS beginner.
In any modern day operating system, one process has a virtual CPU, virtual Memory, Virtual I/O.
Virtual CPU : if you have multiple cores the process might be assigned one or more of the cores for processing by the scheduler.
Virtual I/O : I/O might be shared between various processes. Like for an example keyboard that can be shared by multiple processes. So when you type in a notepad you see the text changing while a key logger running as daemon is storing all the keystrokes. So the process is sharing an I/O resource.
Virtual Memory : http://en.wikipedia.org/wiki/Virtual_memory you can go through the link.
So when a process is taken out of the state of execution by the scheduler it's state containing the values stored in the registers, its stack and heap and much more are saved into a data structure.
So now when we compare a process with a thread, threads started by a process shares the Virtual I/O and Virtual Memory assigned to the process which started it but not the Virtual CPU.
So there might be multiple thread being started by a process all sharing the same virtual Memory and Virtual I/O bu but having different Virtual CPUs.
So you understand the need for locking the resource of a process be it statically allocated (stack) or dynamically allocated(heap) as the virtual memory space is shared between threads of a process.
Also each thread having its own Virtual CPU can run in parallel in different cores and significantly reduce the completion time of a process(reduction will be observable only if you have managed the memory wisely and there are multiple cores).

A thread is controlled by a process, a process is controlled by the operating system

Process doesn't share memory between each other - since it works in so called "protected flat model", on other hand threads shares the same memory.

With the Windows, at least once you get past Win 3.1, the operating system (OS) contains multiple process each with its own memory space and can't interact with other processes without the OS.
Each process has one or more threads that share the same memory space and do not need the OS to interact with other threads.

Process is a container of threads.

Well, I haven't seen an answer to "What are they physically", yet. So I give it a try.
Processes and Thread are nothing phyical. They are a feature of the operating system. Usally any physical component of a computer does not know about them. The CPU does only process a sequential stream of opcodes. These opcodes might belong to a thread. Then the OS uses traps and interrupts regain control, decide which code to excecute and switch to another thread.

Process is one complete entity e.g. and exe file or one jvm. There can be a child process of a parent process where the exe file run again in a separate space. Thread is a separate path of execution in the same process where the process is controlling which thread to execute, halt etc.

Trying to answer this question relating to Java world.
A process is an execution of a program but a thread is a single execution sequence within the process. A process can contain multiple threads. A thread is sometimes called a lightweight process.
For example:
Example 1:
A JVM runs in a single process and threads in a JVM share the heap belonging to that process. That is why several threads may access the same object. Threads share the heap and have their own stack space. This is how one thread’s invocation of a method and its local variables are kept thread safe from other threads. But the heap is not thread-safe and must be synchronized for thread safety.
Example 2:
A program might not be able to draw pictures by reading keystrokes. The program must give its full attention to the keyboard input and lacking the ability to handle more than one event at a time will lead to trouble. The ideal solution to this problem is the seamless execution of two or more sections of a program at the same time. Threads allows us to do this. Here Drawing picture is a process and reading keystroke is sub process (thread).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js