Intel Software

Syndicate content
Updated: 9 hours 28 min ago

P2P Hopping Protocol

September 5, 2010 - 2:58pm


In the last week, I started working on the last big missing infrastructure piece of the mesh networking peer-to-peer network. A way to send messages between may two nodes in the peer-to-peer network. Each node communicates with at most 20 neighboring nodes, this is useful to form a mesh, but it only gets interesting is you can route messages by hopping from node to node. Now, each node's identifier is a long hash of the nodes public certificate.

A key question is how do your route over a peer-to-peer network a message from one node to another with only the hash of the source and target nodes. None of the nodes have a general routing table or a full view of the network topology. Still, we can route the message from hop-to-hop by, at each hop, reducing the "distance" between the message and its destination. The distance is measured not is physical or network distance, but in the difference between a node identifier and the target of the message. In any case, it works quite well.

One great thing about this system is that you can message nodes that are behind a firewall or proxy, and it will still be routed correctly. There are plenty of interesting application, notably you can do lots of outside-in scenarios with you access your home computers from a cellphone anywhere in the world.

This week, I will keep working on this system and add replay-detection and message signing and encryption. Should be fun.

By the way, only one week to go before I take up and head to IDF. Lots of excitement to come.

Ylian
meshcentral.homeip.net

Parallelism Education and the Role of Abstraction

September 3, 2010 - 3:56pm


On September 13th I will be participating in a panel at IDF on how the shift to parallelism will, or should, affect computer science education. My opinion is that this is a huge challenge, but one that can be met. However, it will require rethinking certain aspects of the CS curriculum, from how (and what) algorithms and data structures are introduced, to what languages are used. It is also worthwhile contemplating how and when the physical design of computers is covered, and what the role of abstraction should be---and which abstractions are appropriate.


Parallelism makes it even more difficult to write correct programs, let alone programs that perform well. However, it is vital that parallel programming be included as part of the CS curriculum broadly. This has to be done in a way that does not neglect core skills, and that ultimately enables students to create reliable, maintainable, and efficient applications.


Ultimately, the goal of CS education should be to instill in students accurate conceptual models of how programs will behave on physical computing machines, along with strong abstraction and design skills that will allow them to efficiently develop usable and efficient software for these machines. The real challenge is going to be one of balance: in a pragmatic sense, how can parallelism be covered in the limited time available for an undergraduate education? How can the necessary new concepts be covered, without sacrificing other crucial aspects of that education? What should the balance be between practical skills useful immediately and underlying concepts useful in the long run? What emphasis should be placed on performance and efficiency?


I personally feel that efficiency is important: if you don’t care about efficiency or performance, then you don’t have to care about parallelism. However, most applications of parallelism, from virtualization to high performance computing, are about achieving as much as possible with a given amount of hardware and power. Of course, this desire for execution efficiency needs to be balanced with productivity: the need to minimize development time. However, software developers (and their managers…) need to realize that unnecessarily inefficient programs are, in fact, environmental hazards, and result in extra costs, and possibly missed business opportunities. Is there a way to achieve both performance and productivity? I believe so---though the intelligent and informed use of appropriate abstractions.


In order to achieve performance, students need to have a clear understanding of the underlying hardware mechanisms in the computer architecture, and in particular what assumptions the hardware is making about programs to achieve performance. In order to achieve reliability and to construct reliable parallel programs efficiently, software developers also need to have a knowledge of good practices for constructing efficient parallel programs, as well as a practical knowledge of the tools.


These goals are complementary, not contradictory, if it is understood that the properties that the hardware requires from programs to achieve performance are relatively simple: data locality and latent parallelism, at multiple levels. Best practices, in turn, can be encapsulated in appropriate design patterns. Design patterns can be taught that have the properties of parallelism and locality, and case studies can show how they work in practice. Software developers with a good conceptual model of the hardware (and the compilers and other systems that map programs onto that hardware) can then use these abstractions intelligently to architect efficient programs.


To be an efficient software developer, one needs to use abstractions. It is not possible to code everything at the lowest level, all the time. Programming languages were invented for a reason. However, abstractions are often taught as a way to “hide information”. This is not quite right. My opinion is that abstractions should be taught as a way to automate and delegate the management of details. However, a good software engineer should, in theory, be able to understand the details, and use that knowledge to guide the selection of appropriate abstractions. “Information hiding” should not extend to a professional’s education.


As for tools and languages: these are still evolving, but what strikes me most is not the variety of parallel programming models available, but how often they are based on a common set of core design principles.


So, in my opinion, a curriculum based around appropriate and well-structured design patterns at multiple levels of abstraction, motivated by real applications and case studies of real system, can achieve the desired goal: software developers who can efficiently create sophisticated, scalable, and reliable parallel applications, and whose skills can evolve along with the technology.

Brace for Test Impact

September 3, 2010 - 12:52pm


What is Test Impact Analysis?

Test Impact Analysis is a new feature in Visual Studio 2010 (Ultimate and Premium editions) that allows Developers to identify and focus on tests that are affected by specific code changes.

When to use Test Impact Analysis?

Test Impact Analysis eases the maintenance of a large test suite when making many code changes. By detecting code changes and reporting impacted tests, the Developer can focus on maintaining the relevant impacted tests. Time spent re-running irrelevant tests, and the possibility of neglecting relevant tests are both greatly reduced.

How to use Test Impact Analysis?


This simple solution contains a library project and a test project. As you can see from the screenshot above, the Test Results window shows 1 passing test, Add_PostiveInts_ReturnsSum.


The Test Impact View provides access to all the information and functionality that has been introduced with Test Impact Analysis. This view can be loaded by navigating to Test > Windows > Test Impact View. The screenshot above shows the Test Impact View in its initial disabled state. Clicking the “Enable the test impact diagnostics data adapter in the active test settings” link will configure the test settings to capture test impact analysis data.


The “Data and Diagnostics” page not only shows that Test Impact has been enabled, but that many more features are available. Once enabled, a rebuild is required to seed data into the Test Impact View.


Right-clicking on the Add method and selecting “Show Calling Tests” from the context menu updates the Test Impact View to report all the tests that exercise the Add method. In this simple scenario Test Impact Analysis has successfully discovered the relationship between Add and Add_PositiveInts_ReturnsSum.


Changing Add and rebuilding the library has triggered Test Impact Analysis to detect a code change. The Test Impact View is now reporting that Add has changed and is also reporting that Add_PositiveInts_ReturnsSum should be refreshed. Refreshing the impacted tests can be done by simply pressing CTRL+R, Y which is a new shortcut that corresponds to the Run All Impacted Tests command. This command can also be accessed by navigating to Test > Run > All Impacted Tests.

References

How to: Identify the Test Impact of Code Changes During Development
Identifying Code Change Impact on Tests

Intel(r) vPro developers: What do YOU want to emulate?

September 3, 2010 - 12:44pm


Many developers know that we used to provide an emulator for Intel(r) Active Management Technology inside our Software Developer Toolkit.  Unfortunately, we stopped including it and this was not good news to a lot of our developers.

So why would anyone need an emulator for Intel(R) AMT?  Well there are quite a few reasons.  I think (and correct me if I'm wrong)  companies that are located in other geographies and who wish to build Management Consoles may have the greatest need for a way to test their software without actually having hardware.  From my understanding it is difficult for them to get vPro systems, even after they have been launched.  I have also heard that there are large tariffs on these systems as well so buying them is often prohibitive.  How wonderful would it be for these companies to be able to test their software even though they are not able to access systems?

I'm sure an emulator would benefit many other Software Vendors as well, even if they are here in the US.  We would like to hear from our Community.  We want to know a couple things:

  • Do you have a need for an Intel AMT emulator?  If so, how critical is this need?
  • What do you need an emulator to do?  In-band vs Out of Band; what features are most critical for you to test?
  • Do you need an emulator for Provisioning your Intel AMT System?

I hope to get tons of comments on this blog.  PLEASE tell us what you think, want, need and must have in an emulator!

SIMD Parallism using Array Notation

September 3, 2010 - 9:56am


Are you a C or C++ programmer who has ever envied APL or Fortran 90's array expressions?   Read on.  If you don't know what array expressions are, then you really should read on, to find out what you should have envied.  In any case, the envy is over, because  Intel Parallel Composer 2011 brings array expressions to C and C++.

Background

A while back I wrote about the Three Layer Cake pattern for parallel programming.  The pattern is a way of organizing programs to fully exploit modern  multi-core chips.   Two of the layers are:

  • fork-join: harnesses multiple hardware threads.
  • SIMD: harnesses SIMD instructions.

The compiler in Intel Parallel Composer 2011 extends C++ to directly support these two layers.  The extensions are called Intel(R) Cilk Plus.  They are:

  • Cilk notation for specifying fork-join parallelism.
  • Array notation for specifying SIMD parallelism.

This blog introduces the array notation, with a Seismic Duck kernel as the example.  I'll introduce Cilk notation in another blog.  The two notations are independent.  Indeed, the array notation is valuable with other threading packages too, such as Threading Building Blocks, or just for writing faster serial code.

Quick Introduction to Array Notation

The array notation extension is reminiscent of APL and Fortran-90 style array expressions.   The expression:

a[index:count]

denotes an array section starting at index with count elements.  Scalar operations can be used on conformable array sections in an intuitive manner.   Operations between scalars and array sections work too; scalar extende in the obvious way (like in APL or Fortran 90).  Examples:  

z[i:n] = x[i:n];      // Copies x[i..i+n-1] to x[i..i+n-1]. z[i:n] = 2*x[i+1:n];  // Sets z[i..i+n-1] to twice the corresponding elements in x[i+1..i+n]. u[i:m][j:n] += 1;     // Increments elements of two-dimensional mxn array section with upper left corner [i][j].

Section notation also permits expressions of the form array[index:count:stride], reductions, and shorthands that I will not into here.   I'm presenting just enough to pique your interest.  To learn more about it, follow this link to the compiler documentation.   

Example

I've described in other blogs how seismic wave propagation in Seismic Duck depends on updating a "tile", a small subarray that fits in cache.  Here is the scalar code that dominates execution time.  It updates a tile with uniform A and B coefficients:

float a = 2*A[iFirst][jFirst]; float b = B[iFirst][jFirst]; for( int i=iFirst; i<iLast; ++i ) { for( int j=jFirst; j<jLast; ++j ) { Vx[i][j] += a*(U[i][j+1]-U[i][j]); Vy[i][j] += a*(U[i+1][j]-U[i][j]); U[i][j] += b*((Vx[i][j]-Vx[i][j-1])+(Vy[i][j]-Vy[i-1][j])); } }

To improve the speed on compilers that did not automatically generate SIMD code from the scalar loops, I wrote the key loops with SSE intrinsics, so that calculations are done four wide instead of one at a time.  The resulting code looks like this:

#define CAST(x) (*(__m128*)&(x)) /* for aligned load or store */ #define LOAD(x) _mm_loadu_ps(&(x)) /* for unaligned load */ #define ADD _mm_add_ps #define MUL _mm_mul_ps #define SUB _mm_sub_ps ... __m128 a = CAST(A[iFirst][jFirst]); a = ADD(a,a); __m128 b = CAST(B[iFirst][jFirst]); for( int i=iFirst; i<iLast; ++i ) { for( int j=jFirst; j<jLast; j+=4 ) { __m128 u = CAST(U[i][j]); CAST(Vx[i][j]) = ADD(CAST(Vx[i][j]),MUL(a,SUB(LOAD(U[i][j+1]),u))); CAST(Vy[i][j]) = ADD(CAST(Vy[i][j]),MUL(a,SUB(CAST(U[i+1][j]),u))); CAST(U[i][j]) = ADD(u,MUL(b,ADD(SUB(CAST(Vx[i][j]),LOAD(Vx[i][j-1])),SUB(CAST(Vy[i][j]),CAST(Vy[i-1][j]))))); } }

The downside of the change is obvious - it's hard to read. And this was a simple case because logic elsewhere guarantees that jLast-jFirst is a multiple of 4.  Otherwise, dealing with the extra iterations would have further obfuscated the code.

For this particular example, explicit SSE intrinsics are not actually necessary with a compiler that automatically vectorizes (convert to SIMD instructions).  Indeed, recent compilers that I tried seem to be able to do so.  (Though one older compiler from 2008 did not.)   But I was careful to cater to the optimizer.  I declared the arrays Vx, Vy, and U as static file-scope arrays in the source code, not pointers.  That's not trendy OO programming, but it lets the compiler easily prove absence of aliasing, and thus absence of loop carried dependences that could thwart vectorization.   It's not always practical to cater this way to the optimizer.  Furthermore, array notation has its own elegance.  So I'll use the kernel as a running example anyway. 

The array notation in Intel(R) Cilk Plus lets me state my intent ("SIMD parallelism!") to the compiler more bluntly.  Below is an array notation version of the example:

int i = iFirst;    int j = jFirst;    size_t m = iLast-iFirst;    size_t n = jLast-jFirst;    float a = 2*A[i][j];    float b = B[i][j];    Vx[i:m][j:n] += a*(U[i:m][j+1:n]-U[i:m][j:n]);    Vy[i:m][j:n] += a*(U[i+1:m][j:n]-U[i:m][j:n]);    U[i:m][j:n] += b*((Vx[i:m][j:n]-Vx[i:m][j-1:n])+(Vy[i:m][j:n]-Vy[i-1:m][j:n]));

Compare the last three lines with the forall loops from which the code was derived in another blog:

forall i, j { Vx[i][j] += (A[i][j+1]+A[i][j])*(U[i][j+1]-U[i][j]); Vy[i][j] += (A[i+1][j]+A[i][j])*(U[i+1][j]-U[i][j]); } forall i, j { U[i][j] += B[i][j]*((Vx[i][j]-Vx[i][j-1]) + (Vy[i][j]-Vy[i-1][j])); }

The array notation has let me clearly convey the parallel nature of the updates.  I had to add the setup of i, j, m, n.  But that's an accident of history.  Elsewhere the code computes { iFirst, iLast, jFirst, jLast} from the equivalent of {i, j, m, n} because the former simplified writing the C++ for loops.  If I adapt the rest of the code to use array notation, then { iFirst, iLast, jFirst, jLast} will disappear and {i,j,m,n} will be setup in their place.

Of course I'm still depending on a clever optimizer to eliminate the temporary subarrays.  For example, the subexpression:

a*(U[i:m][j+1:n]-U[i:m][j:n]);

conceptually generates two temporary array sections, for the results of - and *.   In practice, the Intel compiler is good at eliminating those temporaries.   (It's had years of practice doing so for Fortran 90.)  But even if some temporary sections remained, the parallelism is still clear.  The compiler does not have to deduce parallelism from dependence analysis of serial for loops. 

Concise notation is nice, but how about the performance?  When compiled by the Intel compiler and run on the Core-2 Quad system in my office, the array notation variant performed faster than my hand-coded SSE.   [Your mileage may vary.  See optimization disclaimer here.]  I dug through the object code to figure out why.  It turns out that updating Vx, Vy, and U in separate loops does better than with a single loop.   I found out that the hand-coded SSE does as well if changed to use separate loops to update the three arrays.  Anyway, I'm happy that the array notation matches the best that I can do by hand for this example.

Summary

The array notation is a concise way to express SIMD parallelism.  I'm hoping it catches on with other compilers.  In another blog I'll introduce the application of Cilk fork-join parallelism to Seismic Duck.

My favorite PC accessory of 2010 is my Lap Desk!

September 2, 2010 - 7:13pm


So this may sound a little odd but my favorite PC accessory purchase this year has to be my Lap Desk. (See below Pic). To make a very long story short I used to use a rigid Art Board for years. However; after many years it started getting pretty worn out. Several months’ back I noticed several Lap Desks at my local Barnes & Noble & figured I’d give one a try. (Note: Lap Desks are sold in many venues – typically in stores where books are sold)

I currently love it! I only wish I’d bought one earlier for my laptop. However; this goes far beyond the added convenience of adding the Lap Desk to the mix, but rather what it allows me to do more comfortably. Which all boils down to me being able to use my laptop more feasibly in other areas of the house. For starters: there’s the living room when I’m sitting on the couch or one of the chairs. Also; it enables me to work, or play games, from the comfort of my bed. In the latter scenario I tend to be a night owl and the Lap Desk definitely provides more options when I don’t feel like reading a book or drawing.

So there’s my very random choice for my favorite PC accessory of 2010 thus far.

Intel Threading Challenge 2010: One Down, Three to Go

August 31, 2010 - 5:15pm


The first problem in Phase 2 of the 2010 Intel Threading Challenge contest has closed.  I'm very pleased that we had over 20 entries across the two problem classes from around the world.  Right now the judging staff is preparing to begin compiling and running the enter applications against the chosen data sets.

The biggest change for this contest is the use of the Intel(R) Manycore Testing Lab. The shared platforms allow contestants to have access to hardware and software that is the same as that used by the judges when evaluating the entered codes. Contestants are now able to tune their entries to the exact machine specs rather than having to code "in the dark" and hope that their solution will execute well when the judges test it. For the judges, the hope is that entries will compile and run without any modifications. (In the past, working through differences in systems, libraries used, and compiler versions between the diverse set of development platforms used and the platforms used for scoring took many hours.)

As with previous problems, the clever participants were able to think beyond the scope envisioned by the judges as the problem descriptions were written up. This led to some long discussions clarifications of the original problem intentions within the ISN forums devoted to the problems. We expect it all worked out well enough for all participants. The judges have taken these discussions to heart and, hopefully, have been able to anticipate better the loopholes that might be found in the current and upcoming problems.

Best of luck to those that submitted an entry for this first problem of Phase 2.  If you missed the deadline for the first problem, it is not too late to get started. The second problem has been posted and will end at noon (PDT) on 20 SEP 2010. Each problem awards prizes to the top three point total entries and the grand prize will be given to the contestant with the highest combined point total from their top three scoring entries. Go to the Threading Challenge 2010 page for details on the problems and how to enter.

A Sea Change in Computer Science Education

August 30, 2010 - 6:39pm


After decades of sturm und drang over whether or not to include parallelism in the undergraduate computer science curriculum, we can announce definitively that battle is over. Parallelism is here, and it already abides.
Fortunately, we are not left staring into the abyss. Academia, Industry and Developers are cooperating to help define what the new landscape (or seascape) should look like. While the details are still coming into focus, certain aspects now dominate the discussion:

  • We should get beyond thinking about teaching “parallel programming” -- it’s all just programming.
  • Parallelism must be introduced early into the curriculum - no later than second year, and it must inform all relevant courses.
  • New focus must be paid to architecture - but not the same architecture we’ve been teaching for years.
  • Design patterns will take on increasing importance.
  • Parallel models are no longer in their infancy - some are mature and can be widely adopted.
  • Hiring managers are looking for general knowledge of parallelism more than specific tool sets.
  • As educators, we must prepare our students to make the decisions that industry demands - the tools, models, and patterns will lead our way forward.

Agree? Disagree? Good.
We’re going to be having a conversation about this at the IDF education panel “Navigating in a Sea of Cores” on Monday, September 13th. We have representatives from industry and from academia on the panel, and expect to have a lively discussion there continuing through lunch afterwards.

http://idfcommunity.intel.com/planner/SessionCatalog.aspx?track=(ACA).

Free passes are available to educators -Enter the code ACAWEB1 when you register.

For those of you who can’t attend, or for those who want to dry-lab the discussion, we’re also going to have a series of blog posts by some of the contributors. We’ll also be adding links to those further discussions here, so you can just check back to keep up to date!

nulstein v2 plog - back to sequential !

August 30, 2010 - 5:03am


(note: this is slide 5 of the nulstein plog)

In an Agatha Christie novel, this slide would really go at the end of the talk, everything would finally get revealed only in the end, leaving the reader to play detective, picking up clues as the story unfolds... But that's not what I want, this slide goes here, like in a good old Columbo episode: you get to see what happened straight away, and we'll go over everything again to understand the details and make everything right again.

Having found the functional decomposition too complex for my taste, I went back a few steps, trying to split the work up along a different edge. At the time, I was looking at Google's map-reduce and also at the Larrabee rendererer, and one thing really struck me then: what Abrash describes can be seen as a system that maps triangles to tiles and then reduces these to bitmaps... And to generalize this: it is okay to break a process in sequential steps, as long as each step exhibits good parallelism.

Isn't that great: sequential is okay, after all!

This lead me to break the update in two parts: a read-only part and a write-only part. When all that gets done is reading, there is no need for locks, everybody can do it at once. When all that gets done is changing own state, again there is no contention possible and no arbitration is required. Of course, there are a few more nuts and bolts required to make this work, but here you have it: split that update in two phases, and everything can get updated in parallel much more easily.

Breaking the draw phase down is quite easy too, collect information about what everything wants to draw, sort that and do the actual render in multiple deferred render-contexts (as provided by DirectX 11). Here, it can almost be seen as straight map-reduce: a set of objects is processed to generate set of intermediate key/value pairs, which gets reduced into a set of command lists, ready to render.

The unexpected part for me, when carrying out development, was that I would still have one thread just talking to DX and very busy doing that. That's the part I was trying to kill and, as it turned out, it did survive and stop being a problem.

There are five slides before we really dive into the details of how this works, which should turn into a couple of weeks worth of blogging, so I'd like to invite everyone who is interested to start playing in their mind with the idea of two-stage update (all read and, later, all write). Look for things that don't look like they should work, look for ways to address those problems, try and see what in your game would stop working if it had to be written like that, and try to think if there would be any way to fix it. That's the great thing about making this talk a plog: you get to really think about what I present while I'm presenting it. The unfortunate thing is that our comments system seems to be down at the moment and we can't interact properly. So, if anybody wants to feedback, feel free, you can twitter @jmuffat or email jerome dot muffat dash meridol at intel dot com.

Next time, demo! : to be continued
Spoiler (slides+source code): here

Seismic Duck goes Open Source

August 28, 2010 - 10:42pm


Now you can read the source code for my Seismic Duck game on Source Forge.  I open-sourced the code for several reasons:

  • My blogs on parallelizing it with SSE and TBB omit details of interest.  The blogs chiefly concern the seismic wave propagation code in Source/Wavefield.cpp .
  • Games about reflection seismology are not runaway best sellers.
  • It's limited to Windows.  I'd like to find volunteers to port it to other platforms.   Mac OS is of particular interest, since it is common in educational settings.   The OS-specific parts are about 600 lines of C++.

The code is my own hobby, not Intel's.   As such, comments are  sparse.  I'll expand them as questions arise.  If you are interested in porting it to new platforms, please contact me and I'll help as best I can.

Shopping athletes get new embedded help

August 28, 2010 - 10:02pm


I'm not really big on "shopping" as a sport. Some people love to wander the up and down the aisles of Costco on a "treasure hunt" to see new and unexpected items for sale or wander from store to store in a shopping mall.  I'm more one of those "hunters" who targets what I need, goes in for the kill, and relaxes in the cave after the hunt.

But I can see how something like the smart kiosks that Paul Otellini showed off in his Comdex keynote would be cool for shopping athletes. (Spendletes?) In case you missed it the first time around, here it is again:

Here's the link:
http://www.youtube.com/watch?v=1ConBYAzAfk

Offhand, if you were given the job to build something like this for a customer, what kind of technology would you need to pull it off?

  • 3D graphics. Plenty of animations which would be nicely rendered in OpenGL or it's cousins.
  • Touch input with a really big touch screen area.
  • Recognition algorithms which can take a video feed and determine the height and gender of the shopper.
  • Video playback. You probably want efficient video decode.
  • Networking and a quick database connector to look up loyalty memberships
  • Blue tooth connection to send directions or coupons to your phone
  • How about enough processing horsepower to handle multiple of these kiosks in the store

Although all of these appear to be important, the most critical technology of them all is not even listed here.

The most important technology is that it doesn't crash.

When I see something like this demo, I imagine kids coming into the store and playing around with it endlessly. I see teen hackers trying to break into the network somehow. I see a store staffed with employees who don't have the training to fix it if it goes haywire.

That's why this kind of thing is considered an embedded application. If you deploy something like this in your store, the last thing you want to see is a BIOS setup screen or the Blue Screen of Death. You design it to be tough as a tank and to run forever.

Frankly, this is one reason why people start with systems like an embedded Linux system, which has a great reputation for running for months on end with no intervention even with advanced technologies running on top.

Of course, it's just a short step from this kiosk to the kinds of displays we saw in "Minority Report"

http://www.youtube.com/watch?v=nQbVD5hlddk

No telling when people will be willing to give up their privacy for the sake of a better tank top at The Gap.

Testing Intel’s Parallel Studio

August 27, 2010 - 6:44pm


I just published a post at my other blog about Intel's Parallel Studio for OpenMP and MPI applications. My Conclusion is that this tool is useful for OpenMP (and TBB) but not for MPI.
You are welcome to read it from here (http://telzur.blogspot.com/2010/08/testing-intels-parallel-studio.html).
Starting from the forthcoming semester I am going to incorporate Intel's Parallel Studio in my Parallel Processing course for demos and for hands on practice during lab sessions.

Have a Cisco ACE XML Gateway? Intel(R) SOA Expressway to the Rescue

August 27, 2010 - 3:34pm


It looks like Cisco has issued both an end-of-sale and end-of-life announcement for their Cisco ACE XML Gateway.

In response, the SOA Expressway team has teed-up a special offer for Cisco customers looking to move to replacement XML Gateway.

This is an interesting development to be sure, and it probably signals that Cisco is seeing less demand for XML traffic than anticipated. Voice and video probably have taken over a larger share.

All this being said, I remember talking to a CTO at a major networking company (not Cisco) about 5 years ago about the proportion of XML traffic as a fraction of total Internet traffic, and while I forgot his name, I can't forget his comment which was something along the lines of: "90% percent of Internet traffic is spam and porn."

I wonder how much the proportion has changed in the last 5 years? Hopefully it is an upward trend :)

Adding web page tooltips that are iPad compatible

August 27, 2010 - 3:27pm


In recent weeks, I have been working on MeshCentral, a central web site for managing all our computers. I got lots of features added in and more to come but I wanted to make the web site more user friendly by adding pop-up tool tips as certain places. I also wanted it to not be annoying and be compatible with touch screen devices such as the Apple iPad and the many others that are likely to hit the market.

I just started getting familiar with JQuery and looking for JQuery compatible tool tips is easy, there are many of them. In fact, JQuery as basic support already built-in. Sadly, all of these will detect that the mouse hovers over an area and display the help. First, I wanted to have the user click to get help, avoiding having tip windows pop in and out as the mouse moves. I also wanted the tip window to close using a close button. This way, the site would work great for tablet owners.

After some research I found a great example in "atooltip", it did exactly what I wanted but did have some quarks I had to fix. For example: When clicking many tips in a row, the help text would pile up on the window. I also has a problem with my social configuration page. The user on that page can opt to hide portions of the page, and hitting a tool top and closing the area on the page should close the tool tip.

Well, took about two hours but tool tip help support is now added to MeshCentral. I only use it in two pages (account & social settings), but more to come.

Ylian
meshcentral.homeip.net

nulstein v2 plog - divide and surrender

August 27, 2010 - 11:05am


(note: this is slide 4 of the nulstein plog)

I like calling the time when I started writing games "the good old days", it was in the nineties, DOOM's era, I had quit doing IT development work for hire to join a crazy team, doing creative stuff, pushing machines and people to the limits of what they could achieve. Everything felt heroic, there were no ready-made bricks, you'd start from almost scratch every time. As Hervé Lange used to put it: "imagine you're making movies and building the camera is part of the process". It felt good and it feels like long time ago now (it is) and, to be fair, the only thing that's now false in this statement is that you don't have to build the camera any more (but still, you can...)

Back on topic. The sequence of operations in a video game is very straight forward: frame, frame, frame, frame... We keep rendering frames, in a loop, evolving based on the players' inputs and the state of the previous frame (ie "render-farm" style is not the way to go). The work for a frame splits in two main parts: updating game state (advancing time) and drawing it. And inside these parts, we find modules that are loosely connected : collision/physics, AI, audio, scripted mechanisms, visibility/culling, the actual rendering and more... When everything was sequential, it was possible to use the order in which these were processed to make our lives simpler, but order wasn't so critical. So, when CPUs with multiple cores started to appear, the natural thing to try was to split per module and spread work like that. GPUs had been around for a while too, and the idea of a pipeline that would go from one core to the next to then move on to the GPU felt like it made sense.

This approach sorts of work with two cores: one core runs the update side of things, the second deals with drawing things and off everything goes to the GPU. Nice and easy breakdown, data always flowing in the same direction, looks cool on the paper. Until you start tuning... In the really old days when the term "3D accelerator" hadn't even be invented, everything happened on the same chip, it was like juggling with one ball (I can do that). Add a GPU: depending on what you do you might be CPU bound or GPU bound and the optimal use of the machine is to get both to spend about the same amount of time working, you can achieve this by varying the amount of graphics detail, it's as easy as juggling with two balls (I can do that too!). Now, in a pipelined model like I was describing, you want the core that updates the game, the one that draws the frame and the GPU rendering polygons to all spend about the same amount of time working, and I'll call that juggling with three balls (which I can't do, but hope to achieve one day). But then there are four cores, six cores, hyper-threading, and it's clear that pipelining won't work.

The next logical step is to keep the principle of functional decomposition and get rid of the pipeline idea only. Also, because some modules can leverage parallelism internally, like physics that can process islands separately, it becomes possible to spread work a little more easily. Data flow is more complicated but the real difficulty is synchronisation between the various modules. Together with the need to balance work between all cores, this converges to require the engine to split work in ever smaller jobs. The ideal chunk of work being one that executes in a reasonably small amount of time and never needs to wait for any other...

This is how one goes from moving AI to a separate thread, to splitting it in sub-groups, and again until reaching the level of the individual and moving on to split behaviours in separate "aspects". And then only do you finally have jobs that never wait (and are small). Cores can be kept busy and dependencies are implicitly managed by jobs firing as their prerequisites happen. The main difficulty that arises now is complexity, you have to think in terms of objects at a level of granularity below the individual unit (from the player's point of view). The resulting jobs are also likely to end up being so small they become too small and the overhead of managing them starts to weight on performances. Like I said last time: this is hard!

I think the best description available online of a system that works is the one about Dice's Frostbite engine. Slide 26, the CPU job graph looks scary at first and probably isn't that bad but it does illustrate really well both the need for breaking things down in small blocks and for keeping track of dependencies. Slide 29 shows that even in one of the most elaborate engines today, it is unavoidable to have cores going idle waiting for other cores to finish.

But do we have to go down to that level of complexity? In this project, I'm showing an alternative approach that attacks the problem from a different angle. But, this post is already long, you'll have to wait for next time...

Next time we'll see how work is subdivided in nulstein
Spoiler (slides+source code): here

Smart/iPhones – a convergence hub device? I believe so.

August 26, 2010 - 10:18pm


A few days ago I received my iPhone 4 and I absolutely love it! The setup was pretty straightforward and easy to navigate. So far the only thing that took any length of time was the necessary step of downloading iTunes for the device which is primarily used for syncing contacts, email, and the like from my PC. After toying around with this device for a few days and gathering my initial impressions I started pondering the past, present, and likely the future for Smart/iPhones.

Historically speaking my early cell phones progressed from an early MicroTac (1996) to a MPX200 Smartphone (2003). After that I found I pretty much only used the Smartphone for the calendar functionally and sort of gave up on them for awhile and went back to a normal cell phone (Razr). Over the past few years I then moved towards a Pearl blackberry device which I had more success with in leveraging other functionality such as email, texting, etc.

So what fascinates me the most about all of this and what does it have to do with my position statement about Smart/iPhones being a convergence hub device? There are actually several things. Going back over the history of my cell phones I can’t help but notice that the displays in the 90’s have transitioned from 2D LED’s to the gorgeous touch capable 2D/3D LCD’s that we see today. Furthermore; when you look under the hood at the capabilities of something like an iPhone 4 its literally about twice as powerful as the first desktop PC I purchased which was a Micron P266 with an 8mb video card! Throw in a 5-12 MPix camera and portable Music/Media players (e.g. Walkman, iPod), and I believe there’s been some significant convergence going on in the handheld space.

Given that the capabilities of Smart/iPhones are currently accelerating at a Moore’s law pace or faster; where do they go next? My prediction is that they likely won’t replace PCs (Laptops/Desktops) but will end up being a central “Hub-like” device that enhances one’s existing PCs and other Smart Devices. If past trends are any indication of the future then it’s not out of the realm of possibility that in the next five to ten years Smart/iPhones become more important as they become glorified memory sticks (Hard Drives) tethered to the cloud, keeping your data synchronized. I expect that they’ll end up with processors in the 2-3+ GHz range, with Drive capacities in the range of 100-250GBs or more, and sporting video graphics capable of handling DirectX11/OpenGL 4 games & applications. When we add the additional ability of these devices being able to push or pull wireless HDMI, WiGig, etc types of signals to the mix I foresee some interesting game changing scenarios emerging. At least I hope we’re heading in this direction! In the meantime I’m going to go download some games for my new phone!

Life, the game (you and Scott Pilgrim vs. the world)

August 26, 2010 - 5:26pm

As a game designer and developer I know the power of points. Even if they have no connection to the real world, it feels great to score points, and even better to score big points, and level up.

In case you haven't noticed, the rest of the world is catching on. For years we've been earning points (miles) in frequent flier programs, which have of course expanded into all kinds of frequent buyer programs (your 11th haircut is free). As Jesse Schell's DICE talk

read more

My Lab's Computer Power State Now Public

August 26, 2010 - 5:25pm


Today I added a new feature in MeshCentral that allows an administrator to make some of the state of their computers public for anyone to see. Obviously, this is not enabled by default, but when enabled MeshCentral gives you a link to a publicly accessible web page you can share with anyone. Now your friends can see the power state of your computers at any give time., great if you want to share this on social networks. By the way, if you opt to make the information public and also setup Twitter, a link will be added to each Tweet. So, if a computer just turned off, you will see a message like "Contact lost with ComputerX" followed by a link to the public information about ComputerX.

At Intel, I have a lab with a bunch of machines and I am making their power states public for everyone to see. The link is: http://meshcentral.homeip.net/m.aspx?39. I also am making the mesh "graph" viewable by the public, so you will see a link to the graph showing the connections between the nodes in my lab. None of my lab's computers move much, so there is not much chance of seeing anything other than a fully connected graph, we may get more interesting graphs as I add more computers.

By the way, I added OpenID support for Google and Yahoo yesterday, seems to work pretty well.

Ylian
meshcentral.homeip.net

Unity as a multi-platform development environment.

August 26, 2010 - 3:49pm
window.fbAsyncInit = function() { FB.init({appId: 'your app id', status: true, cookie: true, xfbml: true}); }; (function() { var e = document.createElement('script'); e.async = true; e.src = document.location.protocol + '//connect.facebook.net/en_US/all.js'; document.getElementById('fb-root').appendChild(e); }()); //
To all the folks here who are looking at game creation on AppUp (or elsewhere) we wanted to share our recent experience with the Unity Game Engine.

We'll show a little video here but in short: UNITY ROCKS!

read more

Unity as a multi-platform development environment.

August 26, 2010 - 3:49pm
window.fbAsyncInit = function() { FB.init({appId: 'your app id', status: true, cookie: true, xfbml: true}); }; (function() { var e = document.createElement('script'); e.async = true; e.src = document.location.protocol + '//connect.facebook.net/en_US/all.js'; document.getElementById('fb-root').appendChild(e); }()); //
To all the folks here who are looking at game creation on AppUp (or elsewhere) we wanted to share our recent experience with the Unity Game Engine.

We'll show a little video here but in short: UNITY ROCKS!

read more

Team BPN does not cheat!

BulletProof Nerds is officially recognized by the Central Outpost as a genuine gaming organization.