Wechat ID: NeuroMem
关注模型压缩、低比特量化、移动端推理加速优化、部署
A curated list of awesome A.I. & Embedded/Mobile-devices resources, tools and more.
Looking for contributors. Submit a pull request if you have something to add :)
Please check the contribution guidelines for info on formatting and writing pull requests.
- Device Benchmark
- Papers
- App-Experience
- Demo-Codes
- Frameworks
- Course/Guide/Tutorial
- Hardware
- Company
- News
- 高通骁龙处理器排行榜,强大性能一览无余 | Qualcomm
- 手机CPU性能天梯图 CPU performance of mobile comparison | mydriver
- Qualcomm Adreno GPU Performance as below:
- [1512.03385] Deep Residual Learning for Image Recognition
- [1610.02357] Xception: Deep Learning with Depthwise Separable Convolutions
- [1611.05431] ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- [1703.09039] Efficient Processing of Deep Neural Networks: A Tutorial and Survey
- [1707.01209] Model compression as constrained optimization, with application to neural nets. Part I: general framework
- [1707.04319] Model compression as constrained optimization, with application to neural nets. Part II: quantization
- [1707.09926] A Framework for Super-Resolution of Scalable Video via Sparse Reconstruction of Residual Frames
- [1608.01409] Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- [SenSys ’16] Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables
- [IoT-App ’15] An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices
- [1704.06904] Residual Attention Network for Image Classification [code]
- BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks
- [CVPR2017] Squeeze-and-Excitation networks (ILSVRC 2017 winner) at CVPR2017
- [1707.06342] ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- [1707.01083] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- [1704.04861] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- [1707.06990] Memory-Efficient Implementation of DenseNets
- [1706.03912] SEP-Nets: Small and Effective Pattern Networks
- [CVPR2017] Local Binary Convolutional Neural Networks [code]
- [1707.04693] Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- [1602.02830] Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- [1603.05279] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- [1606.06160] DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- [CVPR'17] Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- [ICLR'17] Pruning Filters for Efficient ConvNets
- [ICLR'17] Pruning Convolutional Neural Networks for Resource Efficient Inference
- [ICLR'17] Soft Weight-Sharing for Neural Network Compression
- [ICLR'16] Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- [NIPS'16] Dynamic Network Surgery for Efficient DNNs
- [NIPS'15] Learning both Weights and Connections for Efficient Neural Networks
- [ICML'17] The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- [1412.6115] Compressing Deep Convolutional Networks using Vector Quantization
- [CVPR '16] Quantized Convolutional Neural Networks for Mobile Devices
- [ICASSP'16] Fixed-Point Performance Analysis of Recurrent Neural Networks
- [arXiv'16] Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- [ICLR'17] Loss-aware Binarization of Deep Networks
- [ICLR'17] Towards the Limit of Network Quantization
- [CVPR'17] Deep Learning with Low Precision by Half-wave Gaussian Quantization
- [1706.02393] ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- [CVPR'15] Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- [1511.06067] Convolutional neural networks with low-rank regularization
- [NIPS'14] Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- [ICLR'16] Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- [1503.02531] Distilling the Knowledge in a Neural Network
- Face Model Compression by Distilling Knowledge from Neurons
- [1707.09102] Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- [1703.09746] Coordinating Filters for Faster Deep Neural Networks
- [1606.05316] Learning Infinite-Layer Networks: Without the Kernel Trick
- [ICML2017] Analytical Guarantees on Numerical Precision of Deep Neural Networks
- [1707.09068] Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
- [1708.00999] Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning
- [ASPLOS’17] Neurosurgeon: Collaborative intelligence between the cloud and mobile edge
- [1705.04630] Forecasting using incomplete models
-
[1606.05316] Learning Infinite-Layer Networks: Without the Kernel Trick
-
[1608.02893] Syntactically Informed Text Compression with Recurrent Neural Networks
-
[1608.05148] Full Resolution Image Compression with Recurrent Neural Networks
-
[1707.09422] Hyperprofile-based Computation Offloading for Mobile Edge Networks
-
[1707.09855] Convolution with Logarithmic Filter Groups for Efficient Shallow CNN
-
[1707.09597] ScanNet: A Fast and Dense Scanning Framework for Metastatic Breast Cancer Detection from Whole-Slide Images
-
[1604.08772] Towards Conceptual Compression
- [1605.04614] DeepLearningKit - an GPU Optimized Deep Learning Framework for Apple's iOS, OS X and tvOS developed in Metal and Swift
- [MobiSys '17] DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications
- [MobiSys '17] DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware
- [EMDL '17] MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
- [WearSys '16] DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices
- [IPSN '16] DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
- [ISCA '16] EIE: Efficient Inference Engine on Compressed Deep Neural Network
- [MobiSys '16] MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
- [MobiCASE '16] DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit
- [MM '16] CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
- yonghenglh6/DepthwiseConvolution: A personal mobile convolution implementation on caffe by liuhao.(only GPU)
- liuzhuang13/DenseNet: Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award)
- kevinzakka/DenseNet: PyTorch Implementation of "Densely Connected Convolutional Networks"
- hollance/MobileNet-CoreML: The MobileNet neural network using Apple's new CoreML framework
- AngusG/tensorflow-xnor-bnn: BinaryNets in TensorFlow with XNOR GEMM op
- jonathanmarek1/binarynet-tensorflow
- farmingyard/caffe-mobilenet: A caffe implementation of mobilenet's depthwise convolution layer
- kedartatwawadi/NN_compression
- chuanqi305/MobileNet-SSD: Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.
- msracver/FCIS: Fully Convolutional Instance-aware Semantic Segmentation
- bearpaw/PyraNet: Code for "Learning Feature Pyramids for Human Pose Estimation" (ICCV 2017)
- aquaviter/iot-demo-mxnet-greengrass
- bearpaw/PyraNet: Code for "Learning Feature Pyramids for Human Pose Estimation" (ICCV 2017)
- CongWeilin/mtcnn-caffe: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks
- foreverYoungGitHub/MTCNN: Repository for "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks", implemented with Caffe, C++ interface.
- OAID/mtcnn: C++ project to implement MTCNN, a perfect face detect algorithm, on different DL frameworks. The most popular frameworks: caffe/mxnet/tensorflow, are all suppported now
- Seanlinx/mtcnn: this repository is the implementation of MTCNN in MXnet
- LaoDar/cnn_head_pose_estimator: a simple and fast mxnet version CNN based head pose estimator
- ProjectDent/ARKit-CoreLocation: Combines the high accuracy of AR with the scale of GPS data
- bjarnel/arkit-tictactoe: Tic-Tac-Toe implemented using ARKit+Scenekit
- arirawr/ARKit-FloorIsLava: Basic ARKit example that detects planes and makes them lava.
- exyte/ARTetris: Augmented Reality Tetris made with ARKit and SceneKit
- bjarnel/arkit-portal: Simple portal demo implemented with ARKit+SceneKit, the trick is to change the rendering order and render invisible "masks" to hide what's inside.
- bjarnel/scenekit-tictactoe
- harvardnlp/nmt-android: Neural Machine Translation on Android
- TensorFlow Android Camera Demo
- KleinYuan/Caffe2-iOS: Caffe2 on iOS Real-time Demo. Test with Your Own Model and Photos.
- MXNet Android Classification App - Image classification on Android with MXNet.
- bwasti/AICamera: Demonstration of using Caffe2 inside an Android application.
- mtmd/Mobile_ConvNet: RenderScript based implementation of Convolutional Neural Networks for Android phones
- MXNet iOS Classification App - Image classification on iOS with MXNet.
- Compile MXnet on Xcode (in Chinese) - a step-by-step tutorial of compiling MXnet on Xcode for iOS app
- KleinYuan/Caffe2-iOS: Caffe2 on iOS Real-time Demo. Test with Your Own Model and Photos.
- KimDarren/FaceCropper: Crop faces, inside of your image, with iOS 11 Vision api.
- hollance/TensorFlow-iOS-Example: Source code for my blog post "Getting started with TensorFlow on iOS"
- kingreza/SeeFood: Inspired by HBO's Silicon Valley: SeeFood is an iOS app that uses CoreML to detect various dishes
- hollance/TensorFlow-iOS-Example: Source code for my blog post "Getting started with TensorFlow on iOS"
- Naituw/CoreMLDemo: Demo for CoreML & Vision Framework
- SaschaWillems/Vulkan: Examples and demos for the new Vulkan API
- ARM-software/vulkan-sdk: ARM Vulkan SDK
- alexhultman/libvc: Vulkan Compute for C++ (experimentation project)
- Deep Learning in a Single File for Smart Devices — mxnet
- ARM-software/ComputeLibrary: The ARM Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies Intro
- Apple CoreML
- Microsoft Embedded Learning Library
- mil-tokyo/webdnn: Fastest DNN Execution Framework on Web Browser
- jiaxiang-wu/quantized-cnn: An efficient framework for convolutional neural networks
- Tencent/ncnn: ncnn is a high-performance neural network inference framework optimized for the mobile platform
- Darknet with NNPACK: NNPACK was used to optimize Darknet without using a GPU. It is useful for embedded devices using ARM CPUs
- naibaf7/libdnn: Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
- blei-lab/edward: A library for probabilistic modeling, inference, and criticism. Deep generative models, variational inference. Runs on TensorFlow
- dmlc/nnvm-fusion: Kernel Fusion and Runtime Compilation Based on NNVM
- hollance/BNNS-vs-MPSCNN: Compares the speed of Apple's two deep learning frameworks: BNNS and Metal Performance Shaders
- DeepMark/deepmark: THE Deep Learning Benchmarks
Model convertor. More convertors please refer deep-learning-model-convertor
- MTG/essentia: C++ library for audio and music analysis, description and synthesis, including Python bindings
- Pili-完美直播体验(Pili Streaming Cloud)
- pili-engineering/PLDroidMediaStreaming: PLDroidMediaStreaming 是 Pili 直播 SDK 的 Android 推流端,支持 RTMP 推流,h.264 和 AAC 编码,硬编、软编支持。具有丰富的数据和状态回调,方便用户根据自己的业务定制化开发。具有直播场景下的重要功能,如:美颜、背景音乐、水印等功能。PLDroidMediaStreaming 是现在目前重点维护的版本,自带采集模块也支持用户自己做采集端。
- pili-engineering/PLDroidShortVideo: PLDroidShortVideo 是七牛推出的一款适用于 Android 平台的短视频 SDK,提供了包括美颜、滤镜、水印、断点录制、分段回删、视频编辑、混音特效、本地/云端存储在内的多种功能,支持高度定制以及二次开发。
- pili-engineering/PLDroidPlayer: PLDroidPlayer 是 Pili 直播 SDK 的安卓播放器。支持所有直播常用的格式,如:RTMP、HLS、FLV。拥有优秀的功能和特性,如:首屏秒开、追帧优化、丰富的数据和状态回调、硬解软解支持。而且可以根据自己的业务进行高度定制化开发。
- pili-engineering/PLMediaStreamingKit: PLMediaStreamingKit 是 Pili 直播 SDK 的 iOS 推流端,支持 RTMP 推流,h.264 和 AAC 编码,硬编、软编支持。具有丰富的数据和状态回调,方便用户根据自己的业务定制化开发。具有直播场景下的重要功能,如:美颜、背景音乐、水印等功能。
- pili-engineering/PLShortVideoKit: PLShortVideoKit 是七牛推出的一款适用于 iOS 平台的短视频 SDK,提供了包括美颜、滤镜、水印、断点录制、分段回删、视频编辑、混音特效、本地/云端存储在内的多种功能,支持高度定制以及二次开发。
- pili-engineering/PLPlayerKit: PLPlayerKit 是 Pili 直播 SDK 的 iOS 播放器。支持所有直播常用的格式,如:RTMP、HLS、FLV。拥有优秀的功能和特性,如:首屏秒开、追帧优化、丰富的数据和状态回调、硬解软解支持。而且可以根据自己的业务进行高度定制化开发。
- pili-engineering/PLPlayerKit: PLPlayerKit 是 Pili 直播 SDK 的 iOS 播放器。支持所有直播常用的格式,如:RTMP、HLS、FLV。拥有优秀的功能和特性,如:首屏秒开、追帧优化、丰富的数据和状态回调、硬解软解支持。而且可以根据自己的业务进行高度定制化开发。
- facebook/fb-caffe-exts: Some handy utility libraries and tools for the Caffe deep learning framework.
- Samsung/iotjs: Platform for Internet of Things with JavaScript code
- hollance/Forge: A neural network toolkit for Metal
- christopher5106/FastAnnotationTool: A tool using OpenCV to annotate images for image classification, optical character reading, etc.
- raphui/rnk: rnk is a RTOS targeting ARM architecture.
This part contains related course, guides and tutorials.
- Deep learning systems: UW course schedule(focused on systems design, not learning)
- Squeezing Deep Learning Into Mobile Phones
- Deep Learning – Tutorial and Recent Trends
- Efficient Convolutional Neural Network Inference on Mobile GPUs
- ARM® Mali™ GPU OpenCL Developer Guide html pdf
- Optimal Compute on ARM MaliTM GPUs
- GPU Compute for Mobile Devices
- Compute for Mobile Devices Performance focused
- Hands On OpenCL
- Adreno OpenCL Programming Guide
- Better OpenCL Performance on Qualcomm Adreno GPU
- Tutorial on Hardware Architectures for Deep Neural Networks | MIT MICRO-50
- 基于mtcnn和facenet的实时人脸检测与识别系统开发 | 知乎专栏
- Creating insanely fast image classifiers with MobileNet in TensorFlow | HACKERNOON
- How to squeeze the most from your training data | KDNUGGETS
- Ubuntu16.04腾讯NCNN框架入门到应用 | CSDN
- Building Cross-Platform CUDA Applications with CMake | NVIDIA
- Caffe2 Bay Area Meetup (5/31/2017) | YouTube
- Bifrost GPU architecture and ARM Mali-G71 GPU
- Midgard GPU Architecture
- ARM Mali-T880 GPU
- Mobile GPU market share
- Lift: A novel approach to achieving performance portability on parallel accelerators. | Where High-Level Programming Meets Performance Portability
- mlmodelzoo.com – deep learning models on mobile
2017-08-07
- OpenCV 3.3版本发布
- 鱼和熊掌兼得,DNN加入 OpenCV 全家桶 | 知乎专栏
- Qualcomm Snapdragon Neural Processing Engine (NPE) | Qualcomm Developer Network
- AI让芯片业洗牌: 苹果、微软和谷歌挤入赛道,英特尔、英伟达、高通、AMD几家欢乐几家愁 | 新智元
- 解密图森:英伟达为何投资这家无人车公司;估值18亿背后有位长者 | 量子位
- 被英伟达相中,给Tier1供货,天瞳威视仅靠AI就搞定ADAS | 车东西
- ARM的最新NB-IoT报告 | 5G
- ARM发飙!几个月后手机处理器将因它们而变天! | 智趣狗
- 人工智能和云计算让芯片业洗牌,英特尔成了最大输家 | 量子位
- The Rise of AI Is Forcing Google and Microsoft to Become Chipmakers | WIRED
- 如何评价腾讯刚出的ncnn库? | 知乎
- 沈向洋宣布微软开发 AI 芯片HPU,剑指英伟达等芯片巨头软肋 | 新智元
- 超越GPU,FPGA、ASIC和更智能的手机 | 新智元
- "TensorFire - runs neural networks in the browser using WebGL" [Demo: style-transfer]
- Getting Started with Neural Compute Stick and Rasbperry Pi 3 | YouTube
2017-07-24