This blog post summarizes the work I have been doing the past 8 months or so on an automated design system for (civillian) UAVs. Ive been meaning to do this post for a long time but wanted the tools to be in a beta quality state before writing about them. In good software engineering tradition there have been a few hiccups and detours along the way but Im happy to say beta stage has been reached.
I will go over the vision behind the system, its architecture, and the various tools I have used to put it together. For the impatient, this is the idea: we need a system that allows us to rapidly go through this process:
Agile UAV design if you want to call it that. Tools I will talk about include Python, Numpy, RabbitMQ, Redis, Flask, Django, Bootstrap, Knockout.js, Vagrant and how they link up with application specific codes such as Ansys Fluent, SolidWorks, Vanguard and Anylogic.
This post is an extension of the talk I gave at PyConUK 2011 a couple of months ago and is also what I will cover in my upcoming talk at Imperial College and at the upcoming AIAA conference in Honolulu in April 2012.
Background & vision
The system I going to talk about came about as part of the EPSRC funded DECODE project here at the University of Southampton aerospace department. Many things were promised in the project proposal but the fundamental vision is phrased like this.
DECODE addresses the design and integration of complex systems such where the problems of cost overruns, delays and performance shortfalls are particularly acute. The current value of an aerospace prime contract is now typically measured in £ Billions, yet surprisingly, many companies still use relatively traditional, ad-hoc and organic processes and design tools within their programmes.
The research to be undertaken under DECODE will investigate design decision making tools that provide holistic optimisation at a system level to maximise satisfaction of all the stakeholders associated with the system. Uniquely DECODE will provide active design exploration of systems level trade-offs between performance, unit cost and system life-cycle costs based on a “design mission control capability” .
To do this an innovative design decision support environment that focuses on Value metrics is being built. This will allow a Value-Driven “mission control capability” for design decision making to be researched. It will integrate early concept design exploration with full resolution, geometry backed design computations and detailed manufacturing and operations models.
So lets look what this means in a more practical sense. The traditional decision making process in aerospace design projects is that a group of engineers representing the different subsystems/domains sit together in a room and fire off powerpoint presentations and excel spreadsheets until consensus is reached.
The vision we are aiming for here is that you still put the same people in the same room, but you give them a flexible environment that they can query in real time and will allow them/give them access to:
- Design rationale: quickly pull up the design rationale maps and drill down to the evidence behind a certain decision. For example: the rationale behind the decision to use a particular airfoil section may link to mission specification documents and CFD analysis.
- Live geometry: its much easier to debate the merits of a particular design decision (e.g., the need for structural reinforcement around the landing gear mounting) if you can easily pull up the relevant CAD geometry and manipulate in real time
- Probe the design & its performance in real time: engineers are able to vary the parameters of the design & get direct feedback on how they impact performance. They can opt for a rough answer now or an accurate later (e.g., over lunch, or over night). For example, What if we decrease the wing span by 10%?
- Find out the value of a particular design choice: Essentially you want to answer the question “Is it worth it?”. We know a 10% increase of wing span will add to the weight/cost, allow us to land slower, etc. But is it worth it?
Its this last point that is the main novelty of the DECODE project, though its also the most crucial to get right. Answering the “is it worth it?” question depends on the mission and thus depends on a trustworthy operational simulation and realistic value model.
Together this is quite an ambitious vision and requires a sizeable amount of architecting and software engineering to get right. There are about 7 people on the team, each responsible for different domain specific components that are plugged into the system. The team responsible for actually building the system and linking everything together is…err…me
With about 18 months left on the project I doubt we will get all the way there. But we should definately have proved that you can get something working. For the rest of the post I plan to focus more on the software system side of things. I could talk at length about each of the 4 points above but that would lead us too far.
Unmanned Aerial Vehicle’s (UAVs)
The vision outlined above is very generic, it could apply to designing space shuttles or Volkswagen Polo’s. The focus of the project has been to use UAVs as the main use case. Mainly because there is a lot of very cool and relevant technology around UAVs (like Quadcopter ping pong) but also since they are complex enough to be taken seriously as an application but simple enough to designed and built within a year with a university research group budget. Not to mention they are a great way to teach students about complex systems design. We are a university after all.
So far our group has focused on sub 25kg fixed wing UAVs. As soon as you go above 25kg the certification and authorization issues become hairy. Other groups at the university & NOC are working with Quadcopters and Autonomous Underwater Vehicles. Some of the aircraft we have designed and flown are:
The autopilot we use was developed at the NOC by one of our team members who has recently setup a separate company.
Given UAVs as the main use case the task of the Decode system is to help decide which UAV should be built and discrimintate between alternatives. There are a bewildering number of decisions that have to be made when building a system like a UAV, ranging from obvious configuration choices (do we need an undercarriage?) to more obscure one like do we need square or round section tail booms. Which ones are the important ones, which ones don’t matter, which ones are critical, etc. The color of the paint you use may not seem like something important, but depending on the mission and flight conditions, it may very well be. This is a topic one of my colleagues is delving into and one which Im involved in from the sidelines. The following figure illustrates the different geometry related decision variables on one of our airframes.
At heart this is really a multidisciplinary design optimization (MDO) problem that revolves around solving conflicting requirements. The following cartoon illustrating the tensions between requirements is a classic:
Similar tensions exist when building software, the memory-speed trade being the obvious one. Luckily, in the engineering world, there are powerful mathematical techniques that can help you solve this trade-off and find an optimum. A luxury that the software world, as far as I am aware, does not have.
Dropping bombs vs saving lives
As soon as you say UAV or drone people immediately think of Predator type aircraft and have visions of souless robots dropping bombs on civilians & all things Orwellian. This is (rightfully) a controversial topic and one I’m not going to get into. I would not feel comfortable working on military applications so I am very happy that all our work focuses on the civilian applications of UAVs. As with any bit of science or technology UAVs can, and are, used for military & law enforcement applications but people often don’t realize how many civilian applications there are. And this market is only growing.
The focus of DECODE is the use of UAV for search and rescue missions. Using a helicopter to search for individuals lost at sea or in the mountains is extremely expensive, costing thousands of pounds per hour. A fleet of UAVs should drastically be able to reduce the price and search time. Not replacing helicopters and lifeboats, but complementing them.
As part of this we developed a detailed operational simulation model of the coast around Southampton (though it can be applied to any area) using the Anylogic agent based simulator. This can be used to simulate missions and get an idea of what kind of performance is needed from a UAV. For example, it allows us to answer the question whether we are better off with 10 small, cheap UAVs, or 3 expensive, high performance UAVs.
But lets not get ahead of ourselves just yet. The question we should ask ourselves at this stage is, what does a good search and rescue UAV look like?
System design drivers
Designing an efficient and performant UAV is a multidisciplinary undertaking, requiring codes, solvers, & expertise from varying domains. The following figure illustrates the building blocks that are needed.
Coming up with a good UAV design requires exercising these blocks in concert for every design decision and assessing the impact of changes. Some blocks may be very simple web service calls which complete in less than a second. While other blocks require many hours of computing time on high performance clusters. A system is needed to orchestrate all this and can be driven by the designer in an intuitive fashion.
We can come up with a a number of requirements for such a software system:
- the system should be pluggable and support a wide range of components/solvers on any platform
- the system must be capable of running on high performance clusters, grids, and clouds
- multiple users should be able to drive the system simultaneously, asking questions such as
- How many more lives can we save if the wing span is 10% longer?
- What is the optimal propeller diameter?
- Should we use a canard?
- How does the parasitic drag vary with cruise speed?
- there must be a clear provenance record to track the flow of data through the system
Which brings us to a number of desirable design principles:
- de-coupled and asynchronous
- fault tolerant
- horizontally scalable
- cross platform and cross language
- support multiple workflows
- user friendly
Which, in turn, can help us think about a suitable architecture for the system….
I will not go into all the details but fundamentally the system follows a typical 3-tiered architecture.
At the bottom there is a data layer which holds all the data related to a particular UAV design & all the data generated as part of analyzing the design (e.g., CFD results, CAD geometry, …). This information is best stored in a proper data repository such as Fedora Commons or similar.
Above that is the orchestration layer which is tasked with implementing the business logic. In this case coordinating between the different solvers & components in order to a answer the various questions a designer has about the design.
Finally, there is the presentation layer which presents an intuitive user interface to the available workflows and the underlying data repositories. There may be multiple client interfaces, at least one of them being a web based UI.
As most work so far has concentrated on the orchestration layer I shall primarily talk about it. I chose to implement this layer as a series of lightweight & independent components which interact with each other in a fully asynchronous and message based manner. This automatically gives us flexibility over how we implement each component, performance (no blocking calls) and horizontal scalability (multiple components can feed from the same message queue).
I chose python as the main implementation language as it seemed like a nice high level language to do such things. Also, I was only starting off with it and wanted to learn more. As messaging protocol I selected AMQP, using RabbitMQ as the implmentation and broker. While not perfect, I liked the concepts of queues, exchanges, acknowledgements, it was easy to setup, it seems to be quite widely used and it had good support among all major programming languages. As message format I selected protocol buffers. Again, good support among the major programming languages, easy to use, good documentation, and the type safety + utility methods it gives you over basic JSON come in handy at times.
A very important decision I made here was to avoid using a central database & data model that every component relies on & that I would have to keep in sync. I wanted to have the individual components as lightweight and stateless as I could possibly have them. (Note: Looking back at this it strikes me that what I essentially did was implement the Actor pattern).
However, the only problem was that, yes, in an ideal world everything is stateless, but in reality you need to keep track of certain things. For example, each component executes a part of the overall workflow. But if one component dies you dont want to have to start all over again, you would like to continue where you left off, or at least let one of your peers continue if you cant. Especially with long, expensive, CFD solves you want to avoid doing things twice. Secondly, from a user/debugging perspective you would like to keep track of which components are available and what each is doing.
So some kind of persistent state is needed, and for that I chose redis, an open source and high performance data structure server. I use it both as a registry to store information on the available components (the EXPIRE command can be used to implement a heartbeat) and as a way to store simple state information. Lightweight is the key here. I still want to avoid any kind of central data model, redis only stores simple JSON dictionaries and the system will not break if state information is suddenly lost (though a particular job may fail to complete).
The following figure illustrates all the components in the system and the communication between them:
There are two types of components: relay components and analysis components. Relay components (like the balancer and controller in the image) are responsible for the orchestration of analysis components, they implement the actual system logic. Analysis components are less intelligent in that they essentially wrap a particular solver (e.g., Fluent for CFD or Vanguard for costing). The balancing code is a bit of a special case since it acts like an analysis code but also plays an important role in the overall orchestration. Together with the Balancer component it is responsible for ensuring the aircraft is balanced, i.e., it is a feasible aircraft where lift = weight, the center of gravity is in the right place, etc. (Note: this is where the Excel -> Python compilation code I blogged about previously comes in).
Each component is thus a lightweight python process and has two queues at its disposal. A direct input queue which is exclusive to that component, and an input queue which is shared between all components of the same type. Each message received on either queue will be processed and returned to the component specified in the “reply-to” field of the message, so workflows are not hardwired. A component also starts a small embedded HTTP server using flask, which gives you access to the local result directory and which I will probably use to add some simple monitoring and administration.
There can be an arbitrary number of instances of each component type running (horizontal scalability). Also, thanks to the semantics of AMQP acknowledgements, we can ensure we only acknowledge a message once we have successfully processed it and performed the next action in the overall workflow. This leads to natural failover between different components. To demonstrate components can really come and go as they please I wrote some proof of principle code linking the component lifetime with the local screensaver.
As with every solution, a decoupled, message based system also brings with it a number of weaknesses:
- Sometimes you may want synchronous, sequential behavior, emulating this in an asynchronous system is ugly
- Ideally you want to “fire-and-forget” but often you do want somebody to reply. So you need to detect the fact that nobody replies which requires extra monitoring.
- Once you’ve sent out your messages you cannot simply recall them. Cancelling an operation becomes tricky.
- As soon as you have some shared state you have to worry about race conditions and deadlock. Something you cannot really avoid without using a distributed transaction manager like zookeeper.
- The RabbitMQ broker and, to a lesser degree, redis are still central points of failure. Both have high availability features with failover and persistence but its not as clean and robust as using something like 0MQ for example.
- Loosely coupled systems are difficult to test since there are many moving parts. You also have to beware of ending up in a message queue spaghetti.
So far none of these are a real problem, it will depend on how the project evolves to see if that changes. In any case I have taken care to abstract the interactions with Redis and RabbitMQ in order to make any transition or change not too painful.
Almost all the interaction with the system is through the Controller component. All it requires is a protobuf Query message, which is a simple message representing a system query. Among other things it contains: inputs (constraints) to set, aircraft configuration to use, required fidelity level (quick answer now, or accurate answer later) and some other options. You can then have different client libraries that interface with the controller, as long as they can produce a protobuf Query message and send it to an AMQP broker.
So far I have implemented three clients: a Matlab client (using the java API),
an Excel client (using the C# API),
The Excel client is deprecated at the moment, though, as there seems to be little or no need for it. It is quite cool to use though. Simply drag down a formula with your mouse and kick of 100′s of number crunching jobs without even noticing. The python interface is great for testing and scripting, while the web interface is easy to use for non techy people and for proving the concept. The Matlab interface is great if you want to get all nitty gritty with parameter sweeps, response surface modeling and and optimization. A topic that I devoted 4 years of my life to previously.
Where the code hits the runway
So you now have this design system which allows you to explore the impact of various design decisions in a quick and intuitive manner. By iterating through the system you can gradually pin down the UAV you want to build. Then of course you actually have to build it.
Here we also try to stay agile by leveraging rapid manufacturing technologies such as 3D printing as much as possible. Additive manufacturing has the great advantage that you can prototype things really quickly and you can have a lot of complexity in the design, essentially for free. This is great for a designer since it gives him more freedom in designing complex CAD parts without having to worry too much about if it can be made.
The extreme case here is to build the whole airframe with rapid manufacturing. This is what we did with our SULSA aircraft, with which we gained quite a lot of publicity as we were the first in the world to do it and fly successfully.
But its important to not get carried away. Additive manufacturing is great but you have to be aware of the limits of the technique and the materials used. Traditional materials such as aluminium, carbon, etc. still have their place.
From the point of view of the system, the ultimate goal here is to have a button on the web page like this:
And thats it, no more interaction needed, go straight from design to flying in the space of a couple of days.
Unfortunately, though that vision is not 100% possible (yet). The main problem is that this requires a fully parametric CAD model. This is something we are actively doing (maybe I will give a sneak peek in a future blog post) but unfortunately not everything can be parametrized. Well you can try but either the tool breaks or you break from all the complexity. The thing is that designing a CAD model that can be manufactured requires a very large amount of detail, for example: servo mounts and screw holes. And this detailing will always require manual intervention.
However, that being said, you still are able to get a much shorter turnaround time than you would do traditionally, not too mention the savings in cost.
The obvious question now is, I want to try this out, where’s the github repo? Well, the current answer is that you’d have to be on the campus network to be able to use the system, and even then, you would have to ask me nicely As I said at the beginning of this post, the system has just reached beta state. Everything seems to work and we are starting to get useful and sensible results. My main concern now is doing a proper deployment. Currently I have most components running on my own machine with some of the more expensive ones automatically being farmed out over our 8000+ core Linux cluster (using ssh & Torque).
With cloud computing being all cool these days I now want to setup an automated VM generation workflow using Vagrant and Puppet. The goal being that I simply run a script and it will generate a nice clean VM, containing all the components & which I can then easily deploy to one or more of our own servers or a public cloud. Using something like supervisor or, more likely Salt, I can then remotely administer these instances, starting and stopping components when needed.
Such a deployment process will come in useful when more people, including students, will start using the system and break it
The plan is also to gradually open the system to the public, i.e., you, but that is still a couple of months away probably. As for the system code itself, I hope that it may some day be open sourced and made publicly available. But that will depend on how things evolve.
The main open issue, especially when we come to public use, is related to security. What I really need is an end-2-end single sign on system, across all components and all clients. This is something I really need to think about. Suggestions welcome
Other open issues:
- Think about handling result files. A CFD run easily generates a few GB of data, serving this through flask for multiple users does not seem like a good idea
- Do I want to make the controller more intelligent (e.g., to handle sweeps or optimization’s) or do I keep that intelligence in the client. Both have pro’s and cons.
- Add support for sessions so you can have multiple clients active at the same time for the same user.
- Think about how to deal with having different versions of components running at the same time. How and when to (dis)allow this.
- Evolve the web client into a more powerful design exploration tool. Initially we were going to use a third party tool for this but the company went bust.
- Add support for automatically integrating flight test data recorded by the autopilot as a special type of “analysis code”
- Possibly get involved in the parametric CAD side of things.
The longer term vision is to explore different MDO paradigms, and extend this to more types of missions, and configurations, possibly including rotary aircraft such as quadrocopters. However, these kind of things will require a brand new concept design tool and that is still a long way away. In February we will know if we get EU funding to be involved in the 2 Seas project. If so, that will drive my activities as well.
Phew… while I haven’t been able to cover everything, this has become quite a lengthy post but should give a good overview of what has been going on. Please follow me on Twitter if you would like to keep up to date.
I would be very interested in any suggestions, comments, constructive criticisms, etc. So dont hold back