Trusted Programming
Table of content
  1. Trusted Programming – Our Rust Mission at Huawei
    1. Innovations by Rust
    2. Initial adoption of Rust at Huawei
    3. Contributions to Rust community from Huawei
    4. Adapting end-to-end Rust tooling for Huawei
      1. tokei
      2. cargo-geiger
    5. Research on Rust through Deep Code Learning
    6. Conclusion
  2. Updates

中文

Trusted Programming – Our Rust Mission at Huawei §

Yijun Yu

Chief Expert on Trusted Programming
Trustworthy Open-Source Software Engineering Lab &
Ireland Research Centre
Huawei Technology, Inc.

Amanieu d’Antras

Principal Rust Expert
Trustworthy Open-Source Software Engineering Lab &
Ireland Research Centre
Huawei Technology, Inc.

Nghi D. Q. Bui

Research Scientist
Trustworthy Open-Source Software Engineering Lab &
Ireland Research Centre
Huawei Technology, Inc.

Innovations by Rust §

Since 2015, Rust has consistently been voted as the most loved programming language in the StackOverflow survey.

There has also been an increasing number of publications on Rust at the recent top programming languages and software engineering conferences.

If that’s not enough, a recent Nature 2020 article, `Why Scientists are Turning to Rust’, says that there is increasing momentum on the adoption of Rust amongst scientists.

Initial adoption of Rust at Huawei §

At Huawei, we aim to engineer trustworthy software systems in the world’s largest telecom industry.

For example, we are working to migrate parts of our codebase towards Rust, which is safer and as performant as C/C++. To assist our developers in this process, we are leveraging the open-source C2Rust transpiler to generate Rust code directly from C. We have created automated tools to refactor and clean up this generated Rust code through source-to-source transformations.

We are also developing a rich set of internal libraries in Rust built around an actor-based concurrency paradigm. This simplifies asynchronous programming by leveraging Rust language features such as async, await, etc.

All these factors have led to increased adoption of Rust withing Huawei and smooth migration from C/C++ programs, which are dominant in the telecom industry. As the leading company in this industry and a founding member of the Rust Foundation, Huawei is committed to the the success of Rust and will continue contributing back to the Rust community.

Contributions to Rust community from Huawei §

We also contribute significant features back to the Rust community. For example, our recent contributions to the Rust compiler enable the compilation of Rust programs for big-endian and ILP32 variants of AArch64. These changes enable Huawei and other hardware companies to run Rust code on networking hardware which commonly uses these architecture variants. This contribution is achieved with the help of our Rust expert Amanieu d’Antras, who has pushed through these pull requests to the LLVM compiler, the libc crate, and the Rust compiler itself. These changes introduce new end-to-end cross-compilation targets for the Rust compiler, making it easier to build Rust products for bespoke hardware using a single command:

cargo build --target aarch64_be-unknown-linux-gnu
cargo build --target aarch64-unknown-linux-gnu_ilp32
cargo build --target aarch64_be-unknown-linux-gnu_ilp32

With respect to community engagement, Huawei has been leading the effort in China, strategically sponsored the first Rust China Conf during December 26-27 in Shenzhen. We have started to lead the community by carrying out several activities, including creating Rust tutorials and Rust coding conventions in Chinese for a vast number of developers who are interested in Rust.

Adapting end-to-end Rust tooling for Huawei §

There are many end-to-end tools out there in the Rust community and we have started to benefit from the interactions with developers of these tools.

Here are just a few examples.

tokei §

Because trustworthy programming typically involves migrating programming languages, we have adopted tokei as our code complexity metrics tool, which can recognize as many as 200 languages. For For example, the following statistics show how many lines of code various programming languages have been developed in Google’s Fucshia project:

It is relatively easy to plot the proportion of C, C++, Rust code in the evolution of Fucshia, as follows:

To accommodate the needs to processing multiple programming languages in our projects, we have made a pull request to tokei to support batch processing of recognized languages.

cargo-geiger §

To improve safety, we would like to know how much code has been checked by the Rust compiler. Fortunately, cargo-geiger does almost this by counting the statistics of unsafe items such as fn, expr, struct, impl, trait, and their occurrences in various dependent crates:

However, the statistics do not reflect the ratio of safe items, hence not showing how much has been achieved overall for Rust projects. Therefore, we made a pull request to cargo-geiger to report the checked safe ratios of Rust projects. After it was accepted, this tool has been used regularly by our product teams on daily basis. A report will look like the following, which has made it easier to tell which crates have not been fully checked by the Rust compiler:

Research on Rust through Deep Code Learning §

As codebases from the Rust open-source community evolve and grow, new developers need to learn the best practices, including but not limited to the language itself. Statistical machine learning methods from a large amount of source code, also known as Big Code, have been considered by software engineering research communities: similar to the machine-learning problems for image processing and natural language processing where a vast number of features requires deep neural networks (DNN) to extract, big code may also be used to train a DNN to reflect on statistical patterns of programs, which is called `Deep Code Learning’.

In this respect, Huawei is pushing the limits by improving the state-of-the-art of `cross-language’ deep code learning, through a technical collaboration with The Open University, UK and Singapore Management University.

For example, initial deep code learning methods are trained and evaluated using the benchmarks of 52,000 C/C++ programs of 104 algorithm classes collected from the programming courses of Peking University. Traditionally, tree-based convolution neural networks (TBCNN) could achieve 94\% accuracy in algorithm classification for this dataset (AAAI’16). A recent progress of the SOTA using abstract syntax trees at the statement level (ICSE’19) achieved 98\% accuracy. Our recent progress pushes the SOTA even higher to achieve 98.4\% accuracy (AAAI’21) by an innovation on Tree-based Capsule Networks.

Earlier, we have used cross-language datasets to show that the learned model of one language applies to another programming language. For example, using the Rosetta Code datasets from Github, we show it possible to obtain 86\% accuracy for algorithm classification (Java to C) (SANER’19), and cross-language API mapping problems (Java to C#) (ESEC/FSE’19). These statistical language models have found multiple applications to software engineering, in terms of code classification, code search, code recommendation, code summary, method name prediction, and code clone detection (ICSE’21). Such models also have the capability to transfer the knowledge across many tasks, thus it will reduce the effort to retrain the models for each of the tasks separately.

To analyze Rust projects, we have made another pull request to the Rust parser project tree-sitter and XML serialization crate quick-xml, which allow us to feed the abstract syntax trees of Rust programs to train a deep code learning model. The preliminary results are quite promising, the detection algorithms in Rust can reach an accuracy as high as 85.5\%. This number is still climbing as we continue working on improving toolchains.

A prototype of such an IDE is shown as an extension to the Visual Studio Code,where programmers are assisted with the recommendation of a suitable algorithm and an explanation of the choice.

Conclusion §

In summary, the Huawei Trustworthy Open-Source Software Engineering Lab is working hard to provide programmers an end-to-end IDE toolchain that intelligently assists in maximizing safety and performance.

A journey towards the vision of Trusted Programming has just begun and we hope to work collaboratively with the Rust community, and the upcoming Rust Foundation, to lead a smooth revolution to the Telecom software industry.

Updates §