Code Obfuscation for AI Self-Driving Cars

936

By Lance Eliot, the AI Trends Insider

Earlier in my career, I was hired to reverse engineer a million lines of code for a system that the original developer had long since disappeared. He had left behind no documentation. The firm had at least gotten him to provide a copy of the source code. Nobody at the firm knew anything about how the code itself worked. The firm was dependent upon the compiled code executing right and they simply hoped and prayed that they would not need to make any changes to the system.

Not a very good spot to be in.

I was told that the project was a hush-hush one and that I should not tell anyone else what I was doing. They would only let me see the source code while physically at their office, and otherwise I wasn’t to make a copy of it or take it off the premises. They even gave me a private room to work in, rather than sitting in a cubicle or other area where fellow staffers were. I became my own miniature skunk works, of sorts.

There was a mixture of excitement and trepidation for me about this project. I had done other reverse engineering efforts before and knew how tough it could be to figure out someone else’s code. Any morsels of “documentation” were always welcomed, even if the former developer(s) had only written things onto napkins or the back of recycled sheets of paper. Also, I usually had someone that kind of knew something about the structure of the code or at least had heard rumors by water cooler chats with the tech team. In this case, the only thing I had available were the end-users that used the system. I was able to converse with them and find out what the system was supposed to do, how they interacted with it, the outputs it produced, etc.

For a million lines of code, and with supposedly just one developer, he presumably was churning out a lot of lines of code for being just one person. I was told that he was a “coding genius” and that he was always able to “magically” make the system do whatever they needed. He was a great resource, they said. He was willing to make changes on the fly. He would come in during weekends to make changes. They felt like they had been given the “hacker from heaven” (with the word hacker in this case meaning a proficient programmer, and not the nowadays more common use as a criminal or cyber hacker).

I gently pointed out that if he was such a great developer, dare I say software engineer, how come he hadn’t documented his work? How come no one else was ever able to lay eyes on his work? How come he was the only one that knew what it did? I pointed out that they had painted themselves into a corner. If this heavenly hacker got hit by a bus (and floated upstairs, if you know what I mean), what then?

Well, they sheepishly admitted that I must be some kind of mind reader because he had one day just gotten up and left the company. There were stories that his girlfriend had gotten kidnapped in some foreign country and that he had arranged for mercenaries to rescue her, and that he personally was going there to be part of the rescue team. My mouth gaped open at this story. Sure, I suppose it could be true. I kind of doubted it. Seemed bogus.

The whole thing smelled like the classic case of someone that was protective of their work, and also maybe wanted a bit of job security. It’s pretty common that some developers will purposely aim to not document their code and make it as obscure as they can, in hopes of staving off losing their job. The idea is that if you are the only one that knows the secret sauce, the firm won’t dare get rid of you. You will have them trapped. Many companies have gotten themselves into that same predicament. And, though it seems like an obvious ploy to you and me, these firms often are clueless about what is taking place and fall into the trap without any awareness. When the person suddenly departs, the firm wakes up “shockingly” to what they’ve allowed to happen.

Some developers that get themselves into this posture will also at times try to push their luck. They demand that the firm pay them more money. They demand that the firm let them have some special perks. They keep upping the ante figuring that they’ll see how far they can push their leverage. This will at times trigger a firm to realize that things aren’t so kosher. At that point, they often aren’t sure of what to do. I’ve been hired as a “code mercenary” to parachute into such situations and try to help bail out the firm. As you might guess, the original developer, if still around, becomes nearly impossible to deal with and will refuse to lift a finger to help share or explain the secret sauce.

When I’ve discussed these situations with the programmer that had led things in that direction, they usually justified it. They would tell me that the firm at first paid them less than what a McDonald’s hamburger slinger would get. They got no respect for having finely honed programming skills. If the firm was stupid enough to then allow things to get into a posture whereby the programmer now had the upper hand, it seems like fair play. The company was willing to “cheat” him, so why shouldn’t he do likewise back to the company. The world’s a tough place and we each need to make our own choices, is what I was usually told.

Besides, it often played out over months and sometimes years, and the firm could have at any time opted to do something to prevent the continuing and deepening dependency. One such programmer told me that he had “saved” the company a lot of money. The doing of documentation would have required more hours and more billable time. The act of showing the code to others and teaching them about how it worked, once again more billable time. Furthermore, just like the case that I began to describe herein, he had worked evenings and weekends, being at the beck and call of the firm. They had gotten a great deal and had no right to complain.

Anyway, I’ll put to the side for the moment the ethics involved in all of this.

For those of you interested in the ethical aspects of programmers, please see my article: https://aitrends.com/selfdrivingcars/algorithmic-transparency-self-driving-cars-call-action/

When I took a look at the code of the “man that went to save his girlfriend in a strange land,” here’s what I found:   Ludwig Van Beethoven, Wolfgang Amadeus Mozart, Johann Sebastian Bach, Richard Wagner, Joseph Haydn, Johannes Brahms, Franz Schubert, Peter Ilyich Tchaikovsky, etc.

Huh?

Allow me to elaborate. The entire source code consisted of variables with names of famous musical composers, and likewise all of the structure and objects and subroutines were named after such composers or were based on titles of their songs. Instead of seeing something like LoopCounter = LoopCounter + 1, it would say Mozart = Mozart + 1. Imagine a financial banking application that instead of referring to Account Name, Account Balance, Account Type, it instead said Bach, Wagner, and Brahms, respectively.

So, when trying to figure out the code, you’d need to tease out of the code that whenever you see the use of “Bach” it really means the Account Name field. When you see the use of Wagner it really means the Account Balance. And so on.

I was kind of curious about this seeming fascination with musical composers. When I asked if the developer was known for perhaps having a passion for classical music, I was told that maybe so, but not that anyone noticed.

I’d guess that it wasn’t so much his personal tastes in composers, and instead it was more likely his interest in code obfuscation.

You might not be aware that some programmers will purposely write their code in a manner to obfuscate it. They will do exactly what this developer had done. Instead of using naming that would be logically befitting the circumstance, they would make-up other names. The idea was that this would make it much harder for anyone else to figure out the code. This ties back to my earlier point about the potential desire to become the only person that can do the maintenance and upkeep on the code. By making things as obfuscated as you can, it causes anyone else to be either be baffled or have to climb up a steep learning curve to divine your secret sauce code.

If the person’s hand was forced by the company insisting that they share the code with Joe or Samantha, the programmer could say, sure, I’ll do so, and then hand them something that seems like utter mush. Here you go, have fun, the developer would say. If Joe and Samantha had not seen this kind of trickery before, they would likely roll their eyes and report back to management that it was going to be a long time to ferret out how the thing works.

I had the CEO of a software company that when this very thing happened, and when it was me that told him the programmer had made the code obfuscated, the CEO nearly blew his top. We’ll sue him for every dime we ever paid him, the CEO exclaimed. We’ll hang him out to dry and tell any future prospective employer that he’s poison and don’t ever hire him. And so on. Of course, trying to go after the programmer for this is going to be somewhat problematic. Did the code work? Yes. Did it do what the firm wanted? Yes. Did the firm ever say anything about the code having to be more transparently written? No.

Motivations for Code Obfuscation Vary

I realize that some of you have dealt with code that appears to be the product of obfuscation, and yet you might say that it wasn’t done intentionally. Yes, I agree that sometimes the code obfuscation can occur by happenstance. A programmer that doesn’t consider the ramifications of their coding practices might indeed write such code. They maybe didn’t intend to write something obfuscated, it just turned out that way. Suppose this programmer loved the classics and the composers, and when he started the coding he opted to use their names. That was well and good for say the first thousand lines of code.

He then kept building upon the initial base of code. Might as well continue the theme of using composer names. After a while, the whole darned thing is shaped in that way. It can happen, bit by bit. At each point in time, you think it doesn’t make sense to redo what you’ve already done, and so you just keep going. It might be like constructing a building that you first laid down some wood beams for, and even if maybe you should be using steel instead because that building is actually ultimately going to be a skyscraper, you started with wood, you kept adding into it with wood, and so wood it is.

For those of you that have pride as a software engineer, these stories often make you ill to your stomach. It’s those seat-of-the-pants programmers that give software development and software developers a bad name. Code obfuscation for a true software engineer is the antithesis of what they try to achieve. It’s like seeing a bridge with rivets and struts made of paper and you know the whole thing was done in a jury rigged manner. That’s not how you believe good and proper software is written.

I think we can anyway say this, code obfuscation can happen for a number of reasons, including possibly:

  •         Unintentionally and without awareness of it as a concern
  •         Unintentionally and by step at a time falling into it
  •         Intentionally and with some loathsome intent to obfuscate
  •         Intentionally but with an innocent or good meaning intent

So far, the intent to obfuscate has been suggested as something being done for job security or other personal reasons that have seemed somewhat untoward. There’s another reason to want to obfuscate the code, namely for code security or privacy, and rightfully so.

Suppose you are worried that someone else might find the code. This someone is not supposed to have it. You want the code to remain relatively private and you are hopeful of securing it so that no one else can rip it off or otherwise see what’s in it. This could be rightfully the case, since you’ve written the code and the Intellectual Property (IP) rights belong to you of it. Companies often invest millions of dollars into developing proprietary code and they obviously would like to prevent others from readily taking it or stealing it.

You might opt to encrypt the file that contains the source code. Thus, if someone gets the file, they need to find a means to decrypt it to see the contents. You can use some really strong form of encryption and hopefully the person wanting to inappropriately decrypt the file will have a hard time doing so and might be unable to do so or give up trying.

Using encryption is a pretty much an on-or-off kind of thing. In the encrypted state, no sense can be made of the contents, presumably. Suppose though that you realize that one way or another, someone has a chance of actually getting to the source code and being able to read what it says. Either they decrypt the file, or they happen to come along when it is otherwise in a decrypted state and grab up a copy of it, maybe they wander over to the programmer’s desktop and put in a USB stick and quickly get a copy while it is in plaintext format.

So, another layer of protection would be to obfuscate the code. You render the code less understandable. This can be done by altering the semantics of the code. The example of the musical composer names showcases how you might do this obfuscation. The musical composer names are written in English and readily read. But, from a logical perspective, in the context of this code, it wouldn’t have any meaning to someone else. The programmer(s) working on the code might have agreed that they all accept the idea that Bach means Account Name and Wagner means Account Balance.

Anyone else that somehow gets their hands on the code will be perplexed. What does Bach mean here? What does Wagner refer to? It puts those interlopers at a disadvantage. Rather than just picking up the code and immediately comprehending it, now they need to carefully study it and try to “reverse engineer” what it seems to be doing and how it is working.

This might require a laborious line-by-line inspection. It might take lots of time to figure out. Maybe it is so well obfuscated that there’s no reasonable way to figure it out at all.

The code obfuscation can also act like a watermark. Suppose that someone else grabs your code, and they opt to reuse it in their own system. They go around telling everyone that it is their own code, written from scratch, and no one else’s. Meanwhile, you come along and are able to take a look at their code. Imagine that you look at their code and observe that the code has musical composer names for all of the key objects in the code. Coincidence? Maybe, maybe not. It could be a means to try and argue that the code was ripped off from your code.

There are ways to programmatically make code obfuscated. Thus, you don’t necessarily need to do so by hand. You can use a tool to do the code obfuscation. Likewise, there are tools to help you crack a code obfuscation. Thus, you don’t necessarily need to do so entirely by hand.

In the case of the musical composer names, I might simply substitute the word “Bach” with the words “Account Name” and so on, which might make the code more comprehensible. The reality is that it isn’t quite that easy, and there are lots of clever ways to make the code obfuscated that it is very hard to render it fully un-obfuscated. There is still often a lot of by-hand effort required.

In this sense, the use of code obfuscation can be by purposeful design. You are trying to achieve the so-called “security by obscurity” kind of trickery. If you can make something obscure, it tends to make it harder to figure out and break into. At my house, I might put a key outside in my backyard so that I can get in whenever I want, but of course a burglar can now do the same. I might put the key under the doormat, but that’s pretty minimal obscurity. If I instead put the key inside a fake rock and I put it amongst a whole dirt area of rocks, the obfuscation is a lot stronger.

One thing about the source code obfuscation that needs to be kept in mind is that you don’t want to alter the code such that it computationally does something different than what it otherwise was going to do. That’s not usually considered in the realm of obfuscation. In other words, you can change the appearance of the code, you can possibly change around the code so that it doesn’t seem as recognizable, but if you’ve now made it that the code can no longer calculate the person’s banking balance, or if you’ve changed it such that the banking balance now gets calculated in a different way, you aren’t doing just code obfuscation.

In quick recap, here’s some aspects about code obfuscation:

  •         You are changing up the semantics and the look, but not the computational effect
  •         Code obfuscation can be done by-hand and/or by the use of tools
  •         Trying to reverse engineer the obfuscation can be done by-hand and/or by the use of tools
  •         There is weak obfuscation that doesn’t do an extensive code obfuscation
  •         There is strong obfuscation that makes the code obfuscation deep and arcane to unwind
  •         Code obfuscation can serve an additional purpose of trying to act like a watermark

What does this have to do with AI self-driving cars?

At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. And, like many of the auto makers and tech firms, we consider the source code to be proprietary and worthy of protecting.

One means for the auto makers and tech firms to try and achieve some “security via obscurity” is to go ahead and apply code obfuscation to their precious and highly costly source code.

This will help too for circumstances where someone somehow gets a copy of the source code. It could be an insider that opts to leak it to another firm or sell it to a competitor. Or, it could be that an breach took place into the systems holding the source code and a determined attacker managed to grab it. At some later point in time, if the matter gets exposed and there is a legal dispute, it’s possible that the code obfuscation aspects could come to play as a type of watermark of the original code.

For my article about the stealing of secrets and AI self-driving cars, see: https://aitrends.com/selfdrivingcars/stealing-secrets-about-ai-self-driving-cars/

For my article about the egocentric designs of AI self-driving cars, see:  https://aitrends.com/selfdrivingcars/egocentric-design-and-ai-self-driving-cars/

If you are considering using code obfuscation for this kind of purpose, you’ll obviously want to make sure that the rest of the team involved in the code development is on-board with the notion too. Some developers will like the idea, some will not. Some firms will say that when you check-out the code from a versioning system, they will have it automatically undo the code obfuscation, and only when it is resting in the code management system will it be in the code obfuscation form. Anyway, there are lots of issues to be considered before jumping into this.

For my article about AI developers and groupthink, see: https://aitrends.com/selfdrivingcars/groupthink-dilemmas-for-developing-ai-self-driving-cars/

For the dangers of making an AI system into a Frankenstein, see my article: https://aitrends.com/selfdrivingcars/frankenstein-and-ai-self-driving-cars/

Let’s also remember that there are other ways that one can end-up with code obfuscation. For some of the auto makers and tech firms, and with some of the open source code that has been posted for AI self-driving cars, I’ve right away noticed a certain amount of code obfuscation that has crept into the code when I’ve gotten an opportunity to inspect it.

As mentioned earlier, it could be that the natural inclination of the programmers or AI developers involves writing code that has code obfuscation in it. This can be especially true for some of the AI developers that were working in university research labs and now they have taken a job at an auto maker or tech firm that is creating AI software for self-driving cars. In the academic environment, often any kind of code you want to sling is fine, no need to “pretty it up” since it usually is done as a one-off to do an experiment or provide some kind of proof about an algorithm.

Self-Driving Car Software Needs to be Well-Built

The software intended to run a self-driving car ought to be better made than that – lives are at stake.

In some cases, the AI developers are under such immense pressures to churn out code for a self-driving car, due to the auto maker or tech firm having unimaginable or unattainable deadlines, they inadvertently write code no matter whether it seems clear cut or not. As often has been said, there is no style in a knife fight. There can also be AI developers that aren’t given guidance to write clearer code, or not given the time to do so, or not rewarded for doing so, and thus all of those reasons can come to play in code obfuscation too.

See my article about AI developer burnout: https://aitrends.com/selfdrivingcars/developer-burnout-and-ai-self-driving-cars/

See my article about API’s and AI self-driving cars: https://aitrends.com/selfdrivingcars/apis-and-ai-self-driving-cars/

Per my framework about AI self-driving cars, these are the major tasks involved in the AI driving the car:

  •         Sensor data collection and interpretation
  •         Sensor fusion
  •         Virtual world model updating
  •         AI action plan formulation
  •         Car controls command issuance

See my framework at: https://aitrends.com/selfdrivingcars/framework-ai-self-driving-driverless-cars-big-picture/

There is a lot of code involved in each of those tasks. This is a real-time system that must be able to act and react quickly. The code needs to be tightly done so that it can run in optimal time. Meanwhile, the code needs to be understandable since the humans that wrote the code will need to find bugs in it, when they appear (which they will), and the humans need to update the code (such as when new sensors are added), and so on.

Some of the elements are based on “non-code” such as a machine learning model. Let’s agree to carve that out of the code obfuscation topic for the moment, though there are certainly ways to craft a machine learning model that can be more transparent or less transparent. In any case, taking out those pre-canned portions, I assure you that there’s a lot of code still leftover.

See my article about machine learning models and AI self-driving cars: https://aitrends.com/selfdrivingcars/machine-learning-benchmarks-and-ai-self-driving-cars/

The auto makers and tech firms are in a mixed bag right now with some of them developing AI software for self-driving cars that is well written, robust, and ready for being maintained and updated. Others are rushing to write the code, or are unaware of the ramifications of writing obfuscated code, and might not realize the err of their ways until further along in the life cycle of advancing their self-driving cars. There are even some AI developers that are like the music man that wrote his code with musical composers in mind, for which it could be an unintentional act or an intentional act. In any case, it might be “good” for them right now, but likely later on will most likely turn out to be “bad” for them and others too.

Here’s then the final rules for today’s discussion on code obfuscation for AI self-driving cars:

  •         If it is happening and you don’t realize it, please wake-up and decide what to overtly be doing
  •         If you are using it as a rightful technique for security by obscurity, please make sure you do so aptly
  •         If you are using it for nefarious purposes, just be aware that what goes around comes around
  •         If you aren’t using it, decide explicitly whether to consider it or not, making a calculated decision about the value and ROI of using code obfuscation

For those of you reading this article, please be aware that in thirty seconds this text will self-obfuscate into English language obfuscation and the article will no longer appear to be about code obfuscation and instead will be about underwater basket weaving. The secrets of code obfuscation herein will no longer be visible. Voila!

Copyright 2018 Dr. Lance Eliot

This content is originally posted on AI Trends.