- 投稿日:2019-10-01T23:28:48+09:00
強化学習で目指すF-ZERO王者(前編)
できたもの(途中経過)
F-ZEROのコースをsegmentationする深層学習モデルをJetson Nano上で動かしています.480x288,30FPSで処理できています.
Realtime "F-ZERO" course segmentation model is running on Jetson Nano. It is part of "gaming AI making challenge" as my personal work.#jetson #nvidia pic.twitter.com/AYCqE75JbG
— nobu_e753 (@nobu_e753) September 27, 2019キャプチャしたものはこちら(キャプチャの過程で負荷がかかり,レートが落ちています)
Realtime "F-ZERO" course segmentation model is running on Jetson Nano (captured movie)#jetson #nvidia pic.twitter.com/gnn79ys9Oa
— nobu_e753 (@nobu_e753) October 1, 2019はじめに
強化学習と聞くと真っ先にイメージするのがゲームの自動プレイです.ちょっと試してみたいな..と思い論文や既存の実装を見るも,どうも馴染みのないゲームが題材にされていることが多いようでした.ROM入手にかかわることが大きいのでしょう.
そこで,どうせ勉強を兼ねて試すなら"自分が遊んだ・好きなゲームで試してみたいよね"ということで,そういったシステムをどう構築するか?から取り組んでみました.
この記事はその前編で,システムの構築までを取り扱います.
1. ゲームの選定
以前から格闘ゲームのハックはしてきたので今回もと思ったのですが,キャラクターが2人いたり動きに自由度があり,難易度高そうな感ぷんぷんでした.初トライで高難度は勘弁...ということで,
- レース or シューティング(一方方向にしか動かないゲームが解析しやすいと聞いたので)
- 背景がシンプルで,数値等の読み取りがしやすい
- 自分が遊んだことがある(モチベーション維持のため最重要!)
という条件をもとに考えた結果,F-ZERO(初代,1990年,スーパーファミコン)にしました.ロケットスタートとか,ジャンプ台でのショートカットとか,それなりに遊んでいた記憶があります.
なお,強化学習 x F-ZEROについてはいくつも先行事例があり,タイトルに"F-ZERO"とはいった論文(Racing F-Zero with Imitation Learning, 2017)や,まんまなチュートリアル動画"FZERO AI - PPO2 and A2C - OpenAI and Stable-baslines"は見ているだけでも面白いです.
2. 全体構成
"自分の好きなゲームで強化学習をしたい"というのが今回のコンセプトなので,ゲーム機実機を組み込んだ構成としました.またどうせやるなら,ということで計算機にはJetson Nanoを充ててみました.学習はともかく推論ができれば,"Jetson Nano限定AIゲーム大会"なんてのも夢ではないかもしれません(是非やりたいです).なおソニックは大会があったみたいですね.
使用した機材
- Jetson Nano (ML computer)
- Retro freak (Game console)
- ezcap 261 (Video capture)
- GameControllerizer (Game controll)
ML computerについては,当初Edge TPUを使う予定でしたが,量子化手順がめんどくさすぎるのでパスしました.Video captureについてはezcap 261を選択.この機種はUVC(USB Video Class)&Linuxに対応していることが明記されており,かつ安価であることから選びました.
キャプチャユニットをJetson Nanoに接続しcv2.VideoCapture()をたたくだけで,だらだらとゲーム画面がやってきます.import cv2 cap = cv2.VideoCapture(-1) # 0はカメラのデバイス番号 cap.set(cv2.CAP_PROP_FPS, 30) # カメラFPSを30FPSに設定 cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280) # カメラ画像の横幅を1280に設定 cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720) # カメラ画像の縦幅を720に設定 ret, frame = cap.read() # 画像を取得ゲーム機実機を使った場合「学習が一生終わらない」なんてことになるかもしれませんが,そうなったらその時に考えることにします.
3. 情報の抽出
とっかかりはゲーム画面からの情報の抽出です.ゲーム強化学習で一般的なフレームワークGym retroと異なり,システムをゼロから組んだ場合はこれが第一の壁になります.必要な情報としては
- Power
- Speed
- Time (option)
- Course (option)
まず初めに各情報の位置をメモしておき,それぞれ取り掛かることにしました.
![]()
3.1 Power
最初は「赤色部分の長さ」で判別しようとしましたが,Powerゲージが25%を切ると点滅し赤⇔グレーを繰り返すのでうまくいきませんでした.そののち,ゲージ部分を観察し右側に黒線があることに気づいたので,X方向で輝度最低のPixel位置を求めるような判定ルーチンを組みました.
3.2 Speed
これも最初は適当にテンプレートマッチング(SSD,SAD)すればいけるんじゃね?と思ってやったもののダメでした.数字が微妙に透過しているせいで背景の影響を受け精度が散々でした.
仕方ないので小型のCNNで判別することにしました.28x24サイズで10クラスなのでMNISTとほぼ同等です.
モデル構築部DIGIT_SIZE = [24, 28, 1] def __build_model(lr=1e-3): # functional i = k.layers.Input(DIGIT_SIZE, name="input_0") x = k.layers.Conv2D(8, (3,3), activation="relu", padding="valid")(i) x = k.layers.MaxPool2D()(x) x = k.layers.Flatten()(x) x = k.layers.Dense(10, activation="softmax", name="output")(x) model = k.models.Model(inputs=i, outputs=x) model.summary() model.compile(loss='binary_crossentropy', optimizer=k.optimizers.Adam(lr=lr, decay=1e-5), metrics=['accuracy']) return model学習データはゲーム画面からクロップして仕訳けました.計685枚.
)
学習部は
tf.kerasのお作法そのままです.1点だけ,収集した事例数が結構偏っていたので,この差を緩和するためにlabels_weightで事例重みを渡してやっています.問題が簡単なせいか,すぐにAccuracy = 1.0に.学習部# Found 685 images belonging to 10 classes. # Counter({0: 154, 3: 99, 1: 76, 2: 67, 6: 55, 5: 53, 7: 51, 4: 48, 8: 45, 9: 37}) model = __build_model() model.fit_generator( batch_generator, epochs=10, class_weight=labels_weight, verbose=1)両者の結果をOverlayで表示したものが以下です.Jetson Nano上でもリアルタイムで楽々動きます.
Speed & power value extraction from F-ZERO screen. The image processing program is running on JetsonNano. For speed value extraction, very small CNN model is used.#jetson #nvidia pic.twitter.com/wVJtNCy3th
— nobu_e753 (@nobu_e753) October 1, 20193.3 Time (option)
ひとしきり考えた末,スキップすることにしました.
[より短い
Time] := [高いSpeed] かつ [Powerが減っていない]と読み替えていいのではと.サボりですw
3.4 Course (option)
シンプルな強化学習であればコース情報は強化学習中に自動で獲得してゆくものでしょう.ただ,あらかじめコースを抽出できるモデルを組んでおけば,学習時間を短縮できるのでは?という思いからトライしてみました.またJetson Nano上でSegmentationタスクを動かしたとき,どのくらいのパフォーマンスを出せそうなのか?ということにも興味がありました.
モデル検討
とにかく早いモデルにしておこう,ということで,Segmentationデータセットとして有名なCityscapeのベンチマークで最速のモデルを探してきました.
- Fast-SCNN: Fast Semantic Segmentation Network, 2019
- https://arxiv.org/abs/1902.04502
1024 x 512 処理時に285 FPSとのことなので期待が持てます(実際はさらにchannel数を半分に切りつめ,入力画像サイズ480 x 288で利用しました).ありがたいことにFast-SCNNのTensorflowでの解説付き実装も公開されていました.
学習データ
こればっかりはどうしよもないので気合で対応,1面(Mute city),2面(Big blue)からそれぞれ75 frameずつ,計150 frameに対して,アノテーション(コース,磁力帯)を作成しました.作成には,ブラウザ上で動くフリーの教師作成ツールAnnpFabを使いました.作成時間は1時間強でした.
学習&推論
磁力帯を無視し,ひとまずコースだけをsegmentationする学習フローを構築.学習は5000ステップ(適当),RTX 2080i上で18分かかりました.静止画に対する適用結果を確認すると,えいやで作ったモデルのわりに推論結果もまあまあです.
唯一,自車がコース端に激突しエフェクトが走るとsegmentation結果が甘くなりますが,これは学習データ追加でカバーできるでしょう.速度がでない!
モデルが出来あがりウキウキでJetson Nanoにマッピング,動画を処理してみるも速度が出ません!カクカクです!! 調べてみると
オーバーレイなどの事前事後処理が遅い
でした.segmentation部分を視覚化するためにマスクを作ったりオーバーレイをしたりという事後処理を
OpenCV+numpyで書いていたのですが,この処理だけで50-60msかかっていました.逆にCNNの部分は30msec以下でした.Edge deviceの場合,CNNについてはGPUというアクセラレータがついているのに対して,CPUは貧弱なのでこういう結果を生んだと考えられます.本件に限らずEdge deviceを利用する場合は注意が必要そうです.
高速化(並列化,TF-TRT)
このままでは学習に影響が出そうですし,なにより悔しいので高速化してみました.
並列化
Pythonの
multiprocessing.Processを使って,処理を
- 前処理(キャプチャ,縮小)
- Segmentation
- 後処理(オーバーレイ,表示)
の3つに分割,パイプラインで実行できるようにしました.
Jetson上でGPUの使用率を確認するツールjtopで状況を確認してみると,使用率が如実に改善されていました.TF-TRT(TensorFlow with TensorRT optimization)
割と使えそうなのがこちらです.Jetson Nano上でCNNモデルを動作させる場合,
- Tensorflowモデルのまま,Tensorflow runtime上で動かす
- TensorRTで最適化したモデルを,TensorRT runtime上で動かす
があります.もちろん2のほうが高効率ですがモデル内にTensorRTサポート外オペレーションを含んでいると最適化ができません.例えば今回のモデルには
ResizeBilinearオペレーションが含まれており,これがTensorRT(5.X)非サポートでした.しかしドキュメントをよく読むと,これ以外に第3の選択肢があり「TensorRTで対応しているオペレーションだけ最適化を行いこれはTensorRT runtime上で動かす.その他はTensorflowモデルのままTensorflow runtime上で動かす」ハイブリッドな方法が存在します(公式ページ)
導入方法も簡単で,Tensorflowの推論グラフ(.pb)を読み込んだ後に,数行付け加えるだけです.導入後に推論部のみの速度を計測したところ 24ms → 15ms と大きな改善がみられました.
TF-TRT導入# TF-TRT導入に必要 from tensorflow.python.compiler.tensorrt import trt_convert as trt gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.65) ... graph = tf.Graph() with graph.as_default(): model_file = open(pb, 'rb') graph_def = tf.GraphDef() graph_def.ParseFromString(model_file.read()) ############################################# # TF-TRTむけの追加部分 converter = trt.TrtGraphConverter( input_graph_def=graph_def, # precision_mode="FP16", # 演算モードも選べる nodes_blacklist=[OUTPUT_NODE]) graph_def = converter.convert() ############################################# tf.import_graph_def(graph_def, name="")最適化前後のsegmentation速度です.なんとか30 fps達成です.これで冒頭のようなリアルタイムsegmentation画面を表示することができました.
まとめ
"自分の好きなゲームで強化学習をしたい"というコンセプトをもとに,これを可能にするシステムを構築してみました.あわせて,計算機部分にEdge device(Jetson Nano)を試用してみたことで,その特性を知ることができました.
特にJetson Nano上でリアルタイム処理を実現しようとする場合,
- GPUと比較してCPUが貧弱なので,CNN以外の処理負荷にも注意
- TensorRTが適用できない場合にも,TF-TRTがある
の2点は大変よい知見となりました.
足回りができたので,いよいよ本題の強化学習です.自動プレイでMute cityを1周出来るのがいつになるかはわかりませんが,本業の合間にのんびりやっていきます.
挑戦者待ってます!
- 投稿日:2019-10-01T22:46:07+09:00
[Tensorflow] Tensorflow 2
Tensorflow 2.0が発表されて、変化点を見る事も楽しいですね。
Kerasを基本に使えるようになって、便利になりますたね。Release 2.0.0
Major Features and Improvements
TensorFlow 2.0 focuses on simplicity and ease of use, featuring updates like:
- Easy model building with Keras and eager execution.
- Robust model deployment in production on any platform.
- Powerful experimentation for research.
- API simplification by reducing duplication and removing deprecated endpoints.
For details on best practices with 2.0, see the Effective 2.0 guide
For information on upgrading your existing TensorFlow 1.x models, please refer to our Upgrade and Migration guides. We have also released a collection of tutorials and getting started guides.
Highlights
- TF 2.0 delivers Keras as the central high level API used to build and train models. Keras provides several model-building APIs such as Sequential, Functional, and Subclassing along with eager execution, for immediate iteration and intuitive debugging, and
tf.data, for building scalable input pipelines. Checkout guide for additional details.- Distribution Strategy: TF 2.0 users will be able to use the
tf.distribute.StrategyAPI to distribute training with minimal code changes, yielding great out-of-the-box performance. It supports distributed training with Keras model.fit, as well as with custom training loops. Multi-GPU support is available, along with experimental support for multi worker and Cloud TPUs. Check out the guide for more details.- Functions, not Sessions. The traditional declarative programming model of building a graph and executing it via a
tf.Sessionis discouraged, and replaced with by writing regular Python functions. Using thetf.functiondecorator, such functions can be turned into graphs which can be executed remotely, serialized, and optimized for performance.- Unification of
tf.train.Optimizersandtf.keras.Optimizers. Usetf.keras.Optimizersfor TF2.0.compute_gradientsis removed as public API, useGradientTapeto compute gradients.- AutoGraph translates Python control flow into TensorFlow expressions, allowing users to write regular Python inside
tf.function-decorated functions. AutoGraph is also applied in functions used with tf.data, tf.distribute and tf.keras APIs.- Unification of exchange formats to SavedModel. All TensorFlow ecosystem projects (TensorFlow Lite, TensorFlow JS, TensorFlow Serving, TensorFlow Hub) accept SavedModels. Model state should be saved to and restored from SavedModels.
- API Changes: Many API symbols have been renamed or removed, and argument names have changed. Many of these changes are motivated by consistency and clarity. The 1.x API remains available in the compat.v1 module. A list of all symbol changes can be found here.
- API clean-up, included removing
tf.app,tf.flags, andtf.loggingin favor of absl-py.- No more global variables with helper methods like
tf.global_variables_initializerandtf.get_global_step.- Add toggles
tf.enable_control_flow_v2()andtf.disable_control_flow_v2()for enabling/disabling v2 control flow.- Enable v2 control flow as part of
tf.enable_v2_behavior()andTF2_BEHAVIOR=1.- Fixes autocomplete for most TensorFlow API references by switching to use relative imports in API
__init__.pyfiles.- Auto Mixed-Precision graph optimizer simplifies converting models to
float16for acceleration on Volta and Turing Tensor Cores. This feature can be enabled by wrapping an optimizer class withtf.train.experimental.enable_mixed_precision_graph_rewrite().- Add environment variable
TF_CUDNN_DETERMINISTIC. Setting toTRUEor "1" forces the selection of deterministic cuDNN convolution and max-pooling algorithms. When this is enabled, the algorithm selection procedure itself is also deterministic.Breaking Changes
- Many backwards incompatible API changes have been made to clean up the APIs and make them more consistent.
Toolchains:
- TensorFlow 1.15 is built using devtoolset7 (GCC7) on Ubuntu 16. This may lead to ABI incompatibilities with extensions built against earlier versions of TensorFlow.
- Tensorflow code now produces 2 different pip packages: tensorflow_core containing all the code (in the future it will contain only the private implementation) and tensorflow which is a virtual pip package doing forwarding to tensorflow_core (and in the future will contain only the public API of tensorflow). We don't expect this to be breaking, unless you were importing directly from the implementation. Removed the
freeze_graphcommand line tool;SavedModelshould be used in place of frozen graphs.
tf.contrib:
tf.contribhas been deprecated, and functionality has been either migrated to the core TensorFlow API, to an ecosystem project such as tensorflow/addons or tensorflow/io, or removed entirely.- Remove
tf.contrib.timeseriesdependency on TF distributions.- Replace contrib references with
tf.estimator.experimental.*for apis inearly_stopping.py.
tf.estimator:
- Premade estimators in the tf.estimator.DNN/Linear/DNNLinearCombined family have been updated to use
tf.keras.optimizersinstead of thetf.compat.v1.train.Optimizers. If you do not pass in anoptimizer=arg or if you use a string, the premade estimator will use the Keras optimizer. This is checkpoint breaking, as the optimizers have separate variables. A checkpoint converter tool for converting optimizers is included with the release, but if you want to avoid any change, switch to the v1 version of the estimator:tf.compat.v1.estimator.DNN/Linear/DNNLinearCombined*.- Default aggregation for canned Estimators is now
SUM_OVER_BATCH_SIZE. To maintain previous default behavior, please passSUMas the loss aggregation method.- Canned Estimators don’t support
input_layer_partitionerarg in the API. If you have this arg, you will have to switch totf.compat.v1 canned Estimators.Estimator.export_savedmodelhas been renamed toexport_saved_model.- When saving to SavedModel, Estimators will strip default op attributes. This is almost always the correct behavior, as it is more forwards compatible, but if you require that default attributes to be saved with the model, please use
tf.compat.v1.Estimator.- Feature Columns have been upgraded to be more Eager-friendly and to work with Keras. As a result,
tf.feature_column.input_layerhas been deprecated in favor oftf.keras.layers.DenseFeatures. v1 feature columns have direct analogues in v2 except forshared_embedding_columns, which are not cross-compatible with v1 and v2. Usetf.feature_column.shared_embeddingsinstead.
tf.keras:
OMP_NUM_THREADSis no longer used by the default Keras config. To configure the number of threads, usetf.config.threadingAPIs.tf.keras.model.save_modelandmodel.savenow defaults to saving a TensorFlow SavedModel. HDF5 files are still supported.- Deprecated
tf.keras.experimental.export_saved_modelandtf.keras.experimental.function. Please usetf.keras.models.save_model(..., save_format='tf')andtf.keras.models.load_modelinstead.- Layers now default to float32, and automatically cast their inputs to the layer's dtype. If you had a model that used float64, it will probably silently use float32 in TensorFlow 2, and a warning will be issued that starts with
Layer <layer-name>is casting an input tensor from dtype float64 to the layer's dtype of float32. To fix, either set the default dtype to float64 withtf.keras.backend.set_floatx('float64'), or passdtype='float64'to each of the Layer constructors. Seetf.keras.layers.Layerfor more information.
tf.lite:
- Removed
lite.OpHint,lite.experimental, andlite.constantfrom 2.0 API.Tensors are no longer hashable, but instead compare element-wise with
==and!=. Usetf.compat.v1.disable_tensor_equality()to return to the previous behavior.Performing equality operations on Tensors or Variables with incompatible shapes an exception is no longer thrown. Instead
__eq__returns False and__ne__returns True.Removed
tf.string_splitfrom v2 API.Deprecated the use of
constraint=and.constraintwith ResourceVariable.Add
UnifiedGRUas the new GRU implementation for tf2.0. Change the default recurrent activation function for GRU fromhard_sigmoidtosigmoid, andreset_afterto True in 2.0. Historically recurrent activation ishard_sigmoidsince it is fast than 'sigmoid'. With new unified backend between CPU and GPU mode, since the CuDNN kernel is using sigmoid, we change the default for CPU mode to sigmoid as well. With that, the default GRU will be compatible with both CPU and GPU kernel. This will enable user with GPU to use CuDNN kernel by default and get a 10x performance boost in training. Note that this is checkpoint breaking change. If user want to use their 1.x pre-trained checkpoint, please construct the layer with GRU(recurrent_activation='hard_sigmoid', reset_after=False) to fallback to 1.x behavior.
CUDNN_INSTALL_PATH,TENSORRT_INSTALL_PATH,NCCL_INSTALL_PATH,NCCL_HDR_PATHare deprecated. UseTF_CUDA_PATHSinstead which supports a comma-separated list of base paths that are searched to find CUDA libraries and headers.Refer to our public project status tracker and issues tagged with
2.0on GitHub for insight into recent issues and development progress.If you experience any snags when using TF 2.0, please let us know at the TF 2.0 Testing User Group. We have a support mailing list as well as weekly testing meetings, and would love to hear your migration feedback and questions.
Bug Fixes and Other Changes
tf.contrib:
- Expose
tf.contrib.proto.*ops intf.io(they will exist in TF2)
tf.data:
- Add support for TensorArrays to
tf.data Dataset.- Integrate Ragged Tensors with
tf.data.- All core and experimental tf.data transformations that input user-defined functions can span multiple devices now.
- Extending the TF 2.0 support for
shuffle(..., reshuffle_each_iteration=True)andcache()to work across different Python iterators for the same dataset.- Removing the
experimental_numa_awareoption fromtf.data.Options.- Add
num_parallel_readsand passing in a Dataset containing filenames intoTextLineDatasetandFixedLengthRecordDataset.- Add support for defaulting the value of
cycle_lengthargument oftf.data.Dataset.interleaveto the number of schedulable CPU cores.- Promoting
tf.data.experimental.enumerate_datasetto core astf.data.Dataset.enumerate.- Promoting
tf.data.experimental.unbatchto core astf.data.Dataset.unbatch.- Adds option for introducing slack in the pipeline to reduce CPU contention, via
tf.data.Options().experimental_slack = True- Added experimental support for parallel batching to
batch()andpadded_batch(). This functionality can be enabled throughtf.data.Options().- Support cancellation of long-running
reduce.- Now we use
datasetnode name as prefix instead of the op name, to identify the component correctly in metrics, for pipelines with repeated components.- Improve the performance of datasets using
from_tensors().- Promoting
unbatchfrom experimental to core API.- Adding support for datasets as inputs to
from_tensorsandfrom_tensor_slicesand batching and unbatching of nested datasets.
tf.distribute:
- Enable
tf.distribute.experimental.MultiWorkerMirroredStrategyworking in eager mode.- Callbacks are supported in
MultiWorkerMirroredStrategy.- Disable
run_eagerlyand distribution strategy if there are symbolic tensors added to the model usingadd_metricoradd_loss.- Loss and gradients should now more reliably be correctly scaled w.r.t. the global batch size when using a
tf.distribute.Strategy.- Set default loss reduction as
AUTOfor improving reliability of loss scaling with distribution strategy and custom training loops.AUTOindicates that the reduction option will be determined by the usage context. For almost all cases this defaults toSUM_OVER_BATCH_SIZE. When used in distribution strategy scope, outside of built-in training loops such astf.kerascompileandfit, we expect reduction value to be 'None' or 'SUM'. Using other values will raise an error.- Support for multi-host
ncclAllReducein Distribution Strategy.
tf.estimator:
- Replace
tf.contrib.estimator.add_metricswithtf.estimator.add_metrics- Use
tf.compat.v1.estimator.inputsinstead oftf.estimator.inputs- Replace contrib references with
tf.estimator.experimental.*for apis in early_s in Estimator- Canned Estimators will now use keras optimizers by default. An error will be raised if tf.train.Optimizers are used, and you will have to switch to tf.keras.optimizers or tf.compat.v1 canned Estimators.
- A checkpoint converter for canned Estimators has been provided to transition canned Estimators that are warm started from
tf.train.Optimizerstotf.keras.optimizers.- Losses are scaled in canned estimator v2 and not in the optimizers anymore. If you are using Estimator + distribution strategy + optimikzer v1 then the behavior does not change. This implies that if you are using custom estimator with optimizer v2, you have to scale losses. We have new utilities to help scale losses
tf.nn.compute_average_loss,tf.nn.scale_regularization_loss.
tf.keras:
- Premade models (including Linear and WideDeep) have been introduced for the purpose of replacing Premade estimators.
- Model saving changes
model.saveandtf.saved_model.savemay now save to the TensorFlow SavedModel format. The model can be restored usingtf.keras.models.load_model. HDF5 files are still supported, and may be used by specifyingsave_format="h5"when saving.- Raw TensorFlow functions can now be used in conjunction with the Keras Functional API during model creation. This obviates the need for users to create Lambda layers in most cases when using the Functional API. Like Lambda layers, TensorFlow functions that result in Variable creation or assign ops are not supported.
- Add support for passing list of lists to the
metricsargument in Kerascompile.- Add
tf.keras.layers.AbstractRNNCellas the preferred implementation for RNN cells in TF v2. User can use it to implement RNN cells with custom behavior.- Keras training and validation curves are shown on the same plot when using the TensorBoard callback.
- Switched Keras
fit/evaluate/predictexecution to use only a single unified path by default unless eager execution has been explicitly disabled, regardless of input type. This unified path places an eager-friendly training step inside of atf.function. With this- All input types are converted to
Dataset.- The path assumes there is always a distribution strategy. when distribution strategy is not specified the path uses a no-op distribution strategy.
- The training step is wrapped in
tf.functionunlessrun_eagerly=Trueis set in compile. The single path execution code does not yet support all use cases. We fallback to the existing v1 execution paths if your model contains the following:
sample_weight_modein compileweighted_metricsin compile- v1 optimizer
- target tensors in compile If you are experiencing any issues because of this change, please inform us (file an issue) about your use case and you can unblock yourself by setting
experimental_run_tf_function=Falsein compile meanwhile. We have seen couple of use cases where the model usage pattern is not as expected and would not work with this change.- output tensors of one layer is used in the constructor of another.
- symbolic tensors outside the scope of the model are used in custom loss functions. The flag can be disabled for these cases and ideally the usage pattern will need to be fixed.
- Mark Keras
set_sessionascompat.v1only.tf.keras.estimator.model_to_estimatornow supports exporting totf.train.Checkpoint format, which allows the saved checkpoints to be compatible withmodel.load_weights.keras.backend.resize_images(and consequently,keras.layers.Upsampling2D) behavior has changed, a bug in the resizing implementation was fixed.- Add an
implementation=3mode fortf.keras.layers.LocallyConnected2Dandtf.keras.layers.LocallyConnected1Dlayers usingtf.SparseTensorto store weights, allowing a dramatic speedup for large sparse models.- Raise error if
batch_sizeargument is used when input is dataset/generator/keras sequence.- Update TF 2.0
keras.backend.name_scopeto use TF 2.0name_scope.- Add v2 module aliases for losses, metrics, initializers and optimizers:
tf.losses = tf.keras.losses&tf.metrics = tf.keras.metrics&tf.initializers = tf.keras.initializers&tf.optimizers = tf.keras.optimizers.- Updates binary cross entropy logic in Keras when input is probabilities. Instead of converting probabilities to logits, we are using the cross entropy formula for probabilities.
- Added public APIs for
cumsumandcumprodkeras backend functions.- Add support for temporal sample weight mode in subclassed models.
- Raise
ValueErrorif an integer is passed to the training APIs.- Added fault-tolerance support for training Keras model via
model.fit()withMultiWorkerMirroredStrategy, tutorial available.- Custom Callback tutorial is now available.
- To train with
tf.distribute, Keras API is recommended over estimator.steps_per_epochandstepsarguments are supported with numpy arrays.- New error message when unexpected keys are used in sample_weight/class_weight dictionaries
- Losses are scaled in Keras compile/fit and not in the optimizers anymore. If you are using custom training loop, we have new utilities to help scale losses
tf.nn.compute_average_loss,tf.nn.scale_regularization_loss.Layerapply and add_variable APIs are deprecated.- Added support for channels first data format in cross entropy losses with logits and support for tensors with unknown ranks.
- Error messages will be raised if
add_update,add_metric,add_loss, activity regularizers are used inside of a control flow branch.- New loss reduction types:
AUTO: Indicates that the reduction option will be determined by the usage context. For almost all cases this defaults toSUM_OVER_BATCH_SIZE. When used withtf.distribute.Strategy, outside of built-in training loops such astf.kerascompileandfit, we expect reduction value to beSUMorNONE. UsingAUTOin that case will raise an error.NONE: Weighted losses with one dimension reduced (axis=-1, or axis specified by loss function). When this reduction type used with built-in Keras training loops likefit/evaluate, the unreduced vector loss is passed to the optimizer but the reported loss will be a scalar value.SUM: Scalar sum of weighted losses. 4.SUM_OVER_BATCH_SIZE: ScalarSUMdivided by number of elements in losses. This reduction type is not supported when used withtf.distribute.Strategyoutside of built-in training loops liketf.kerascompile/fit.- Wraps losses passed to the
compileAPI (strings and v1 losses) which are not instances of v2Lossclass inLossWrapperclass. => All losses will now useSUM_OVER_BATCH_SIZEreduction as default.model.add_loss(symbolic_tensor)should work in ambient eager.- Update metric name to always reflect what the user has given in compile. Affects following cases
- When name is given as 'accuracy'/'crossentropy'
- When an aliased function name is used eg. 'mse'
- Removing the
weightedprefix from weighted metric names.- Allow non-Tensors through v2 losses.
- Add v2 sparse categorical crossentropy metric.
- Add v2 APIs for
AUCCurveandAUCSummationMethodenums.add_updatecan now be passed a zero-arg callable in order to support turning off the update when settingtrainable=Falseon a Layer of a Model compiled withrun_eagerly=True.- Standardize the LayerNormalization API by replacing the args
norm_axisandparams_axiswithaxis.- Fixed critical bugs that help with DenseFeatures usability in TF2
tf.lite:
- Added evaluation script for
COCOminival- Add delegate support for
QUANTIZE.- Add
GATHERsupport to NN API delegate.- Added support for TFLiteConverter Python API in 2.0. Contains functions from_saved_model, from_keras_file, and from_concrete_functions.
- Add
EXPAND_DIMSsupport to NN API delegate TEST.- Add
narrow_rangeattribute to QuantizeAndDequantizeV2 and V3.- Added support for
tflite_convertcommand line tool in 2.0.- Post-training quantization tool supports quantizing weights shared by multiple operations. The models made with versions of this tool will use INT8 types for weights and will only be executable interpreters from this version onwards.
- Post-training quantization tool supports fp16 weights and GPU delegate acceleration for fp16.
- Add delegate support for
QUANTIZED_16BIT_LSTM.- Extracts
NNAPIDelegateKernelfrom nnapi_delegate.ccTensorRT
- Add TensorFlow 2.0-compatible
TrtGraphConverterV2API for TensorRT conversion. TensorRT initialization arguments are now passed wrapped in a named-tuple,TrtConversionParams, rather than as separate arguments as inTrtGraphConverter.- Changed API to optimize TensorRT enginges during graph optimization. This is now done by calling
converter.build()where previouslyis_dynamic_op=Falsewould be set.converter.convert()no longer returns atf.function. Now the funtion must be accessed from the saved model.- The
converter.calibrate()method has been removed. To trigger calibration, acalibration_input_fnshould be provided toconverter.convert().Other:
- Fix accidental quadratic graph construction cost in graph-mode
tf.gradients().- ResourceVariable's gather op supports batch dimensions.
- ResourceVariable support for
gather_nd.ResourceVariableandVariableno longer acceptsconstraintin the constructor, nor expose it as a @property.- Added gradient for
SparseToDenseop.- Expose a flag that allows the number of threads to vary across Python benchmarks.
image.resizein 2.0 now supports gradients for the new resize kernels.image.resizenow considers proper pixel centers and has new kernels (incl. anti-aliasing).- Renamed
tf.imagefunctions to remove duplicate "image" where it is redundant.- Variadic reduce is supported on CPU Variadic reduce is supported on CPU
- Remove unused
StringViewVariantWrapper.- Delete unused
Fingerprint64Mapop registration- Add broadcasting support to
tf.matmul.- Add C++ Gradient for
BatchMatMulV2.- Add
tf.math.cumulative_logsumexpoperation.- Add ellipsis (...) support for
tf.einsum().- Add expand_composites argument to all
nest.*methods.- Added
strings.byte_split.- Add a new "result_type" parameter to
tf.strings.split.- Add name argument to
tf.string_splitandtf.strings_split.- Extend
tf.strings.splitto support inputs with any rank.- Added
tf.random.binomial.- Added
keyandskipmethods torandom.experimental.Generator.- Extend
tf.functionwith basic support for CompositeTensors arguments (such asSparseTensorandRaggedTensor).parallel_for.pfor: add converters for Softmax, LogSoftmax, IsNaN, All, Any, and MatrixSetDiag.parallel_for: add converters for LowerTriangularSolve and Cholesky.parallel_for: add converters forLogMatrixDeterminantandMatrixBandPart.parallel_for: Add converter forMatrixDiag.parallel_for: Add converters forOneHot,LowerBound,UpperBound.parallel_for: add converter forBroadcastTo.- Add
pforconverter forSqueeze.- Add
RaggedTensor.placeholder().- Add ragged tensor support to
tf.squeeze.- Update RaggedTensors to support int32 row_splits.
- Allow
LinearOperator.solveto take aLinearOperator.- Allow all dtypes for
LinearOperatorCirculant.- Introduce MaxParallelism method
- Add
LinearOperatorHouseholder.- Adds Philox support to new stateful RNG's XLA path.
- Added
TensorSpecsupport for CompositeTensors.- Added
tf.linalg.tridiagonal_solveop.- Added partial_pivoting input parameter to
tf.linalg.tridiagonal_solve.- Added gradient to
tf.linalg.tridiagonal_solve.- Added
tf.linalg.tridiagonal_mul op.- Added GPU implementation of
tf.linalg.tridiagonal_matmul.- Added
LinearOperatorToeplitz.- Upgraded LIBXSMM to version 1.11.
- Uniform processing of quantized embeddings by Gather and EmbeddingLookup Ops.
- Correct a misstatement in the documentation of the sparse softmax cross entropy logit parameter.
- Add
tf.ragged.boolean_mask.tf.switch_caseadded, which selects a branch_fn based on a branch_index.- The C++ kernel of gather op supports batch dimensions.
- Fixed default value and documentation for
trainablearg of tf.Variable.EagerTensornow supports numpy buffer interface for tensors.- This change bumps the version number of the
FullyConnectedOp to 5.- Added new op:
tf.strings.unsorted_segment_join.- Added HW acceleration support for
topK_v2.- CloudBigtable version updated to v0.10.0 BEGIN_PUBLIC CloudBigtable version updated to v0.10.0.
- Expose
Headas public API.- Added
tf.sparse.from_denseutility function.- Improved ragged tensor support in
TensorFlowTestCase.- Added a function
nested_value_rowidsfor ragged tensors.- Added
tf.ragged.stack.- Makes the a-normal form transformation in Pyct configurable as to which nodes are converted to variables and which are not.
ResizeInputTensornow works for all delegates.tf.condemits a StatelessIf op if the branch functions are stateless and do not touch any resources.- Add support of local soft device placement for eager op.
- Pass partial_pivoting to the
_TridiagonalSolveGrad.- Add HW acceleration support for
LogSoftMax.- Add guard to avoid acceleration of L2 Normalization with input rank != 4
- Fix memory allocation problem when calling
AddNewInputConstantTensor.- Delegate application failure leaves interpreter in valid state
tf.while_loopemits a StatelessWhile op if the cond and body functions are stateless and do not touch any resources.tf.cond,tf.whileand if and while in AutoGraph now accept a nonscalar predicate if has a single element. This does not affect non-V2 control flow.- Fix potential security vulnerability where decoding variant tensors from proto could result in heap out of bounds memory access.
- Only create a GCS directory object if the object does not already exist.
- Introduce
dynamicconstructor argument in Layer and Model, which should be set toTruewhen using imperative control flow in thecallmethod.- Begin adding Go wrapper for C Eager API.
- XLA HLO graphs can be inspected with interactive_graphviz tool now.
- Add dataset ops to the graph (or create kernels in Eager execution) during the python Dataset object creation instead doing it during Iterator creation time.
- Add
batch_dimsargument totf.gather.- The behavior of
tf.gatheris now correct whenaxis=Noneandbatch_dims<0.- Update docstring for gather to properly describe the non-empty
batch_dimscase.- Removing of dtype in the constructor of initializers and partition_info in call.
- Add
tf.math.nextafterop.- Turn on MKL-DNN contraction kernels by default. MKL-DNN dynamically dispatches the best kernel implementation based on CPU vector architecture. To disable them, build with
--define=tensorflow_mkldnn_contraction_kernel=0.tf.linspace(start, stop, num)now always uses "stop" as last value (for num > 1)- Added top-k to precision and recall to keras metrics.
- Add a ragged size op and register it to the op dispatcher
- Transitive dependencies on :
pooling_opswere removed. Some users may need to add explicit dependencies on :pooling_opsif they reference the operators from that library.- Add
CompositeTensorbase class.- Malformed gif images could result in an access out of bounds in the color palette of the frame. This has been fixed now
- Add templates and interfaces for creating lookup tables
Tensor::UnsafeCopyFromInternaldeprecated in favorTensor::BitcastFrom.- In
map_vectorizationoptimization, reduce the degree of parallelism in the vectorized map node.- Add variant wrapper for
absl::string_view.- Add OpKernels for some stateless maps.
- DType is no longer convertible to an int. Use
dtype.as_datatype_enuminstead ofint(dtype)to get the same result.- Support both binary and -1/1 label input in v2 hinge and squared hinge losses.
- Added
LinearOperator.adjointandLinearOperator.H(alias).- Expose CriticalSection in core as
tf.CriticalSection.- Enhanced graphviz output.
- Add opkernel templates for common table operations.
- Fix callbacks do not log values in eager mode when a deferred build model is used.
SignatureDefutil functions have been deprecated.- Update
Fingerprint64Mapto use aliases- Add legacy string flat hash map op kernels.
- Add support for
add_metricin the graph function mode.- Updating cosine similarity loss - removed the negate sign from cosine similarity.
- Changed default for gradient accumulation for TPU embeddings to true.
- Adds summary trace API for collecting graph and profile information.
- The
precision_modeargument toTrtGraphConverteris now case insensitive.






