Resource Modification On Multicore Server With Kernel Bypass

Dimas Febriyan Priambodo(1*), Ahmad Ashari(2)

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author


Technology develops very fast marked by many innovations both from hardware and software. Multicore servers with a growing number of cores require efficient software. Kernel and Hardware used to handle various operational needs have some limitations. This limitation is due to the high level of complexity especially in handling as a server such as single socket discriptor, single IRQ and lack of pooling so that it requires some modifications. The Kernel Bypass is one of the methods to overcome the deficiencies of the kernel. Modifications on this server are a combination increase throughput and decrease server latency. Modifications at the driver level with hashing rx signal and multiple receives modification with multiple ip receivers, multiple thread receivers and multiple port listener used to increase throughput. Modifications using pooling principles at either the kernel level or the program level are used to decrease the latency. This combination of modifications makes the server more reliable with an average throughput increase of 250.44% and a decrease in latency 65.83%.


hash rx; multiple ip; multiple port; multiple thread; pooling; kernel bypass

Full Text:



[1] Abel, F., Hagleitner, C., & Verplanken, F. (2012). Rx stack accelerator for 10 GbE integrated NIC. Proceedings - 2012 IEEE 20th Annual Symposium on High-Performance Interconnects, HOTI 2012, 17–24.

[2] Angelo, G. D., Marchetti-spaccamela, A., & Cnr, I. (2016). Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities. 2016 28th Euromicro Conference on Real-Time Systems, 237–247.

[3] Bo, Z. (2016). Analysis of the Resource Affinity in NUMA Architecture for High Performance Network. 2016 5th International Conference on Measurement, Instrumentation and Automation, 547–550.

[4] Diener, M., Cruz, E. H. M., Alves, M. A. Z., Navaux, P. O. A., Busse, A., & Heiss, H. U. (2016). Kernel-Based Thread and Data Mapping for Improved Memory Affinity. IEEE Transactions on Parallel and Distributed Systems, 27(9), 2653–2666.

[5] Fusco, F., & Deri, L. (2010). High Speed Network Traffic Analysis with Commodity Multi-core Systems. proceedings of the 10th ACM SIGCOMM conference on Internet measurement, 218–224.

[6] Galagan, V., Yurchenko, O., Preobrazhensky, E., Zhuravkov, P., & Dombrougov, M. (2013). Multi-gigabit intel-based software routers. Proceedings - RoEduNet IEEE International Conference.

[7] Gu, Q., Wen, L., Dai, F., Gong, H., Yang, Y., Xu, X., & Feng, Z. (2014). StackPool: A high-performance scalable network architecture on multi-core servers. Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013, 17–28.

[8] Hanford, N., Ahuja, V., Farrens, M., Ghosal, D., Balman, M., Pouyoul, E., & Tierney, B. (2014). Analysis of the Effect of Core Affinity on High-Throughput Flows. 4th International workshop on Network-Aware Data Management, 9–15.

[9] Hanford, N., Ahuja, V., Farrens, M., Ghosal, D., Balman, M., Pouyoul, E., & Tierney, B. (2015). Improving network performance on multicore systems : Impact of core affinities on high throughput flows. Future Generation Computer Systems.

[10] Hanford, N., Farrens, M. K., Pouyoul, E., & Tierney, B. (2014). Characterizing the Impact of End-System Affinities On the End-to-End Performance of High-Speed Flows. ACM/IEEE symposium on Architechtures for Networking and comunications System, 259–260.

[11] He, P., Wang, J., Deng, H., & Zhang, W. (2010). Balanced locality-aware packet schedule algrorithm on multi-core network processor. Proceedings of the 2010 2nd International Conference on Future Computer and Communication, ICFCC 2010, 3, 248–252.

[12] Huang, C., Yu, X., & Luo, H. (2010). Research on high-speed network data stream capture based on multi-queue NIC and multi-core processor. ICIME 2010 - 2010 2nd IEEE International Conference on Information Management and Engineering, 2, 248–251.

[13] Jie, L., Shuhui, C., & Jinshu, S. (2016). Implementation of TCP large receive offload on multi-core NPU platform. 2016 International Conference on Information and Communication Technology Convergence, ICTC 2016, 258–263.

[14] Jin, H. W., Yun, Y. J., & Jang, H. C. (2008). TCP/IP performance near I/O bus bandwidth on multi-core systems: 10-Gigabit ethernet vs. multi-port gigabit ethernet. Proceedings of the International Conference on Parallel Processing Workshops, 87–94.

[15] Li, Y., & Qiao, X. (2011). A Parallel Packet Processing Method on Multi-core Systems. 2011 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 78–81.

[16] Majo, Z., & Gross, T. R. (2013). ( Mis ) Understanding the NUMA Memory System Performance of Multithreaded Workloads. IEEE International Symposium on Workload Characterization (IISWC), 1–8.

[17] Nelms, T., & Ahamad, M. (2010). Packet Scheduling for Deep Packet Inspection on Multi-Core Architectures. Architectures for Networking and Communications Systems (ANCS), 2010 ACM/IEEE Symposium on, 1–11.

[18] Orosz, P. (2012). Improving Packet Processing Efficiency on Multi-core Architectures with Single Input Queue. Carpathian Journal of Electric and Computer Engineering, 5, 44–48.

[19] Paul, M. V. V., Bhattacharjee, R., & Rajesh, R. (2014). Traffic capture beyond 10 Gbps: Linear scaling with multiple network interface cards on commodity servers. Proceedings - 2014 International Conference on Data Science and Engineering, ICDSE 2014, 194–199.

[20] Rivera, D., Ach, E., & Bustos-jim, J. (2014). Analysis of Linux UDP Sockets Concurrent Performance. 2014 33rd International Conference of the Chilean Computer Science Society, 65–69.

[21] Shambharkar, S. A. (2015). A Study on Setting Processor or CPU Affinity in Multi-Core Architecture for Parallel Computing. International Journal of Science and Research, 4(5), 2013–2016.

[22] Sibai, F. N. (2010). Simulation and performance analysis of multi-core thread scheduling and migration algorithms. CISIS 2010 - The 4th International Conference on Complex, Intelligent and Software Intensive Systems, 895–900.

[23] Tang, L., Mars, J., Zhang, X., Hagmann, R., Hundt, R., & Tune, E. (2013). Optimizing Google’s warehouse scale computers: The NUMA experience. Proceedings - International Symposium on High-Performance Computer Architecture, 188–197.

[24] Tsai, W. Y., Huang, N. F., & Hung, H. W. (2012). A port-configuration assisted NIC IRQ affinitization scheme for multi-core packet forwarding applications. GLOBECOM - IEEE Global Telecommunications Conference, 2547–2552.

[25] Tsujita, Y., Hori, A., & Ishikawa, Y. (2014). Affinity-Aware Optimization of Multithreaded Two-Phase I / O for High Throughput Collective I / O. international Conference on High Performance Computing & Simulation, 210–217.

[26] Velkoski, G., Ristov, S., & Gusev, M. (2013). Affinity-aware HPC Applications in Multichip and Multicore Multiprocessor. Information Technology Interfaces (ITI), Proceedings of the ITI 2013, 95–100.

[27] Zou, H., Sun, X., Ma, S., & Duan, X. (2012). A Source-aware Interrupt Scheduling for Modern Parallel I / O Systems. 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 156–166.


Article Metrics

Abstract views : 1811 | views : 1353


  • There are currently no refbacks.

Copyright (c) 2020 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2