The well-known GNU C/C++ Compiler (GCC), an optimizing 32-bit compiler at the heart of the GNU project, supports the x86 architecture quite well, and includes the ability to insert assembly code in C programs, in such a way that register allocation can be either specified or left to GCC. GCC works on most available platforms, notably Linux, *BSD, VSTa, OS/2, *DOS, Win*, etc.
GCC home page is http://gcc.gnu.org.
There are two Win32 GCC ports: cygwin and mingw
There is also an OS/2 port of GCC called EMX; it works under DOS too, and includes lots of unix-emulation library routines. Look around the following site: ftp://ftp.leo.org/pub/comp/os/os2/leo/gnu/emx+gcc/.
The documentation of GCC includes documentation files in TeXinfo format. You can compile them with TeX and print then result, or convert them to .info, and browse them with emacs, or convert them to .html, or nearly whatever you like; convert (with the right tools) to whatever you like, or just read as is. The .info files are generally found on any good installation for GCC.
The right section to look for is C Extensions::Extended Asm::
Section Invoking GCC::Submodel Options::i386 Options:: might help too. Particularly, it gives the i386 specific constraint names for registers: abcdSDB correspond to %eax, %ebx, %ecx, %edx, %esi, %edi and %ebp respectively (no letter for %esp).
The DJGPP Games resource (not only for game hackers) had page specifically about assembly, but it's down. Its data have nonetheless been recovered on the DJGPP site, that contains a mine of other useful information: http://www.delorie.com/djgpp/doc/brennan/.
GCC depends on GAS for assembling and follows its syntax (see below); do mind that inline asm needs percent characters to be quoted, they will be passed to GAS. See the section about GAS below.
Find lots of useful examples in the linux/include/asm-i386/ subdirectory of the sources for the Linux kernel.
Because assembly routines from the kernel headers (and most likely your own headers, if you try making your assembly programming as clean as it is in the linux kernel) are embedded in extern inline functions, GCC must be invoked with the -O flag (or -O2, -O3, etc), for these routines to be available. If not, your code may compile, but not link properly, since it will be looking for non-inlined extern functions in the libraries against which your program is being linked! Another way is to link against libraries that include fallback versions of the routines.
Inline assembly can be disabled with -fno-asm, which will have the compiler die when using extended inline asm syntax, or else generate calls to an external function named asm() that the linker can't resolve. To counter such flag, -fasm restores treatment of the asm keyword.
More generally, good compile flags for GCC on the x86 platform are
gcc -O2 -fomit-frame-pointer -W -Wall
-O2 is the good optimization level in most cases. Optimizing besides it takes more time, and yields code that is much larger, but only a bit faster; such over-optimization might be useful for tight loops only (if any), which you may be doing in assembly anyway. In cases when you need really strong compiler optimization for a few files, do consider using up to -O6.
-fomit-frame-pointer allows generated code to skip the stupid frame pointer maintenance, which makes code smaller and faster, and frees a register for further optimizations. It precludes the easy use of debugging tools (gdb), but when you use these, you just don't care about size and speed anymore anyway.
-W -Wall enables all useful warnings and helps you to catch obvious stupid errors.
You can add some CPU-specific -m486 or such flag so that GCC will produce code that is more adapted to your precise CPU. Note that modern GCC has -mpentium and such flags (and PGCC has even more), whereas GCC 2.7.x and older versions do not. A good choice of CPU-specific flags should be in the Linux kernel. Check the TeXinfo documentation of your current GCC installation for more.
-m386 will help optimize for size, hence also for speed on computers whose memory is tight and/or loaded, since big programs cause swap, which more than counters any "optimization" intended by the larger code. In such settings, it might be useful to stop using C, and use instead a language that favors code factorization, such as a functional language and/or FORTH, and use a bytecode- or wordcode- based implementation.
Note that you can vary code generation flags from file to file, so performance-critical files will use maximum optimization, whereas other files will be optimized for size.
To optimize even more, option -mregparm=2 and/or corresponding function attribute might help, but might pose lots of problems when linking to foreign code, including libc. There are ways to correctly declare foreign functions so the right call sequences be generated, or you might want to recompile the foreign libraries to use the same register-based calling convention...
Note that you can add make these flags the default by editing file /usr/lib/gcc-lib/i486-linux/126.96.36.199/specs or wherever that is on your system (better not add -W -Wall there, though). The exact location of the GCC specs files on system can be found by gcc -v.
GCC allows (and requires) you to specify register constraints in your inline assembly code, so the optimizer always know about it; thus, inline assembly code is really made of patterns, not forcibly exact code.
Thus, you can put your assembly into CPP macros, and inline C functions, so anyone can use it in as any C function/macro. Inline functions resemble macros very much, but are sometimes cleaner to use. Beware that in all those cases, code will be duplicated, so only local labels (of 1: style) should be defined in that asm code. However, a macro would allow the name for a non local defined label to be passed as a parameter (or else, you should use additional meta-programming methods). Also, note that propagating inline asm code will spread potential bugs in them; so watch out doubly for register constraints in such inline asm code.
Lastly, the C language itself may be considered as a good abstraction to assembly programming, which relieves you from most of the trouble of assembling.