- 投稿日:2019-10-01T23:28:48+09:00
強化学習で目指すF-ZERO王者(前編)
できたもの(途中経過)
F-ZEROのコースをsegmentationする深層学習モデルをJetson Nano上で動かしています.480x288,30FPSで処理できています.
Realtime "F-ZERO" course segmentation model is running on Jetson Nano. It is part of "gaming AI making challenge" as my personal work.#jetson #nvidia pic.twitter.com/AYCqE75JbG
— nobu_e753 (@nobu_e753) September 27, 2019キャプチャしたものはこちら(キャプチャの過程で負荷がかかり,レートが落ちています)
Realtime "F-ZERO" course segmentation model is running on Jetson Nano (captured movie)#jetson #nvidia pic.twitter.com/gnn79ys9Oa
— nobu_e753 (@nobu_e753) October 1, 2019はじめに
強化学習と聞くと真っ先にイメージするのがゲームの自動プレイです.ちょっと試してみたいな..と思い論文や既存の実装を見るも,どうも馴染みのないゲームが題材にされていることが多いようでした.ROM入手にかかわることが大きいのでしょう.
そこで,どうせ勉強を兼ねて試すなら"自分が遊んだ・好きなゲームで試してみたいよね"ということで,そういったシステムをどう構築するか?から取り組んでみました.
この記事はその前編で,システムの構築までを取り扱います.
1. ゲームの選定
以前から格闘ゲームのハックはしてきたので今回もと思ったのですが,キャラクターが2人いたり動きに自由度があり,難易度高そうな感ぷんぷんでした.初トライで高難度は勘弁...ということで,
- レース or シューティング(一方方向にしか動かないゲームが解析しやすいと聞いたので)
- 背景がシンプルで,数値等の読み取りがしやすい
- 自分が遊んだことがある(モチベーション維持のため最重要!)
という条件をもとに考えた結果,F-ZERO(初代,1990年,スーパーファミコン)にしました.ロケットスタートとか,ジャンプ台でのショートカットとか,それなりに遊んでいた記憶があります.
なお,強化学習 x F-ZEROについてはいくつも先行事例があり,タイトルに"F-ZERO"とはいった論文(Racing F-Zero with Imitation Learning, 2017)や,まんまなチュートリアル動画"FZERO AI - PPO2 and A2C - OpenAI and Stable-baslines"は見ているだけでも面白いです.
2. 全体構成
"自分の好きなゲームで強化学習をしたい"というのが今回のコンセプトなので,ゲーム機実機を組み込んだ構成としました.またどうせやるなら,ということで計算機にはJetson Nanoを充ててみました.学習はともかく推論ができれば,"Jetson Nano限定AIゲーム大会"なんてのも夢ではないかもしれません(是非やりたいです).なおソニックは大会があったみたいですね.
使用した機材
- Jetson Nano (ML computer)
- Retro freak (Game console)
- ezcap 261 (Video capture)
- GameControllerizer (Game controll)
ML computerについては,当初Edge TPUを使う予定でしたが,量子化手順がめんどくさすぎるのでパスしました.Video captureについてはezcap 261を選択.この機種はUVC(USB Video Class)&Linuxに対応していることが明記されており,かつ安価であることから選びました.
キャプチャユニットをJetson Nanoに接続しcv2.VideoCapture()
をたたくだけで,だらだらとゲーム画面がやってきます.import cv2 cap = cv2.VideoCapture(-1) # 0はカメラのデバイス番号 cap.set(cv2.CAP_PROP_FPS, 30) # カメラFPSを30FPSに設定 cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280) # カメラ画像の横幅を1280に設定 cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720) # カメラ画像の縦幅を720に設定 ret, frame = cap.read() # 画像を取得ゲーム機実機を使った場合「学習が一生終わらない」なんてことになるかもしれませんが,そうなったらその時に考えることにします.
3. 情報の抽出
とっかかりはゲーム画面からの情報の抽出です.ゲーム強化学習で一般的なフレームワークGym retroと異なり,システムをゼロから組んだ場合はこれが第一の壁になります.必要な情報としては
- Power
- Speed
- Time (option)
- Course (option)
まず初めに各情報の位置をメモしておき,それぞれ取り掛かることにしました.
3.1 Power
最初は「赤色部分の長さ」で判別しようとしましたが,Powerゲージが25%を切ると点滅し赤⇔グレーを繰り返すのでうまくいきませんでした.そののち,ゲージ部分を観察し右側に黒線があることに気づいたので,X方向で輝度最低のPixel位置を求めるような判定ルーチンを組みました.
3.2 Speed
これも最初は適当にテンプレートマッチング(SSD,SAD)すればいけるんじゃね?と思ってやったもののダメでした.数字が微妙に透過しているせいで背景の影響を受け精度が散々でした.
仕方ないので小型のCNNで判別することにしました.28x24サイズで10クラスなのでMNISTとほぼ同等です.
モデル構築部DIGIT_SIZE = [24, 28, 1] def __build_model(lr=1e-3): # functional i = k.layers.Input(DIGIT_SIZE, name="input_0") x = k.layers.Conv2D(8, (3,3), activation="relu", padding="valid")(i) x = k.layers.MaxPool2D()(x) x = k.layers.Flatten()(x) x = k.layers.Dense(10, activation="softmax", name="output")(x) model = k.models.Model(inputs=i, outputs=x) model.summary() model.compile(loss='binary_crossentropy', optimizer=k.optimizers.Adam(lr=lr, decay=1e-5), metrics=['accuracy']) return model学習データはゲーム画面からクロップして仕訳けました.計685枚.
)学習部は
tf.keras
のお作法そのままです.1点だけ,収集した事例数が結構偏っていたので,この差を緩和するためにlabels_weight
で事例重みを渡してやっています.問題が簡単なせいか,すぐにAccuracy = 1.0
に.学習部# Found 685 images belonging to 10 classes. # Counter({0: 154, 3: 99, 1: 76, 2: 67, 6: 55, 5: 53, 7: 51, 4: 48, 8: 45, 9: 37}) model = __build_model() model.fit_generator( batch_generator, epochs=10, class_weight=labels_weight, verbose=1)両者の結果をOverlayで表示したものが以下です.Jetson Nano上でもリアルタイムで楽々動きます.
Speed & power value extraction from F-ZERO screen. The image processing program is running on JetsonNano. For speed value extraction, very small CNN model is used.#jetson #nvidia pic.twitter.com/wVJtNCy3th
— nobu_e753 (@nobu_e753) October 1, 20193.3 Time (option)
ひとしきり考えた末,スキップすることにしました.
[より短い
Time
] := [高いSpeed
] かつ [Power
が減っていない]と読み替えていいのではと.サボりですw
3.4 Course (option)
シンプルな強化学習であればコース情報は強化学習中に自動で獲得してゆくものでしょう.ただ,あらかじめコースを抽出できるモデルを組んでおけば,学習時間を短縮できるのでは?という思いからトライしてみました.またJetson Nano上でSegmentationタスクを動かしたとき,どのくらいのパフォーマンスを出せそうなのか?ということにも興味がありました.
モデル検討
とにかく早いモデルにしておこう,ということで,Segmentationデータセットとして有名なCityscapeのベンチマークで最速のモデルを探してきました.
- Fast-SCNN: Fast Semantic Segmentation Network, 2019
- https://arxiv.org/abs/1902.04502
1024 x 512 処理時に285 FPSとのことなので期待が持てます(実際はさらにchannel数を半分に切りつめ,入力画像サイズ480 x 288で利用しました).ありがたいことにFast-SCNNのTensorflowでの解説付き実装も公開されていました.
学習データ
こればっかりはどうしよもないので気合で対応,1面(Mute city),2面(Big blue)からそれぞれ75 frameずつ,計150 frameに対して,アノテーション(コース,磁力帯)を作成しました.作成には,ブラウザ上で動くフリーの教師作成ツールAnnpFabを使いました.作成時間は1時間強でした.
学習&推論
磁力帯を無視し,ひとまずコースだけをsegmentationする学習フローを構築.学習は5000ステップ(適当),RTX 2080i上で18分かかりました.静止画に対する適用結果を確認すると,えいやで作ったモデルのわりに推論結果もまあまあです.
唯一,自車がコース端に激突しエフェクトが走るとsegmentation結果が甘くなりますが,これは学習データ追加でカバーできるでしょう.速度がでない!
モデルが出来あがりウキウキでJetson Nanoにマッピング,動画を処理してみるも速度が出ません!カクカクです!! 調べてみると
オーバーレイなどの事前事後処理が遅い
でした.segmentation部分を視覚化するためにマスクを作ったりオーバーレイをしたりという事後処理を
OpenCV
+numpy
で書いていたのですが,この処理だけで50-60msかかっていました.逆にCNNの部分は30msec以下でした.Edge deviceの場合,CNNについてはGPUというアクセラレータがついているのに対して,CPUは貧弱なのでこういう結果を生んだと考えられます.本件に限らずEdge deviceを利用する場合は注意が必要そうです.
高速化(並列化,TF-TRT)
このままでは学習に影響が出そうですし,なにより悔しいので高速化してみました.
並列化
Pythonの
multiprocessing.Process
を使って,処理を
- 前処理(キャプチャ,縮小)
- Segmentation
- 後処理(オーバーレイ,表示)
の3つに分割,パイプラインで実行できるようにしました.
Jetson上でGPUの使用率を確認するツールjtopで状況を確認してみると,使用率が如実に改善されていました.TF-TRT(TensorFlow with TensorRT optimization)
割と使えそうなのがこちらです.Jetson Nano上でCNNモデルを動作させる場合,
- Tensorflowモデルのまま,Tensorflow runtime上で動かす
- TensorRTで最適化したモデルを,TensorRT runtime上で動かす
があります.もちろん2のほうが高効率ですがモデル内にTensorRTサポート外オペレーションを含んでいると最適化ができません.例えば今回のモデルには
ResizeBilinear
オペレーションが含まれており,これがTensorRT(5.X)非サポートでした.しかしドキュメントをよく読むと,これ以外に第3の選択肢があり「TensorRTで対応しているオペレーションだけ最適化を行いこれはTensorRT runtime上で動かす.その他はTensorflowモデルのままTensorflow runtime上で動かす」ハイブリッドな方法が存在します(公式ページ)
導入方法も簡単で,Tensorflowの推論グラフ(.pb)を読み込んだ後に,数行付け加えるだけです.導入後に推論部のみの速度を計測したところ 24ms → 15ms と大きな改善がみられました.
TF-TRT導入# TF-TRT導入に必要 from tensorflow.python.compiler.tensorrt import trt_convert as trt gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.65) ... graph = tf.Graph() with graph.as_default(): model_file = open(pb, 'rb') graph_def = tf.GraphDef() graph_def.ParseFromString(model_file.read()) ############################################# # TF-TRTむけの追加部分 converter = trt.TrtGraphConverter( input_graph_def=graph_def, # precision_mode="FP16", # 演算モードも選べる nodes_blacklist=[OUTPUT_NODE]) graph_def = converter.convert() ############################################# tf.import_graph_def(graph_def, name="")最適化前後のsegmentation速度です.なんとか30 fps達成です.これで冒頭のようなリアルタイムsegmentation画面を表示することができました.
まとめ
"自分の好きなゲームで強化学習をしたい"というコンセプトをもとに,これを可能にするシステムを構築してみました.あわせて,計算機部分にEdge device(Jetson Nano)を試用してみたことで,その特性を知ることができました.
特にJetson Nano上でリアルタイム処理を実現しようとする場合,
- GPUと比較してCPUが貧弱なので,CNN以外の処理負荷にも注意
- TensorRTが適用できない場合にも,TF-TRTがある
の2点は大変よい知見となりました.
足回りができたので,いよいよ本題の強化学習です.自動プレイでMute cityを1周出来るのがいつになるかはわかりませんが,本業の合間にのんびりやっていきます.
挑戦者待ってます!
- 投稿日:2019-10-01T22:46:07+09:00
[Tensorflow] Tensorflow 2
Tensorflow 2.0が発表されて、変化点を見る事も楽しいですね。
Kerasを基本に使えるようになって、便利になりますたね。Release 2.0.0
Major Features and Improvements
TensorFlow 2.0 focuses on simplicity and ease of use, featuring updates like:
- Easy model building with Keras and eager execution.
- Robust model deployment in production on any platform.
- Powerful experimentation for research.
- API simplification by reducing duplication and removing deprecated endpoints.
For details on best practices with 2.0, see the Effective 2.0 guide
For information on upgrading your existing TensorFlow 1.x models, please refer to our Upgrade and Migration guides. We have also released a collection of tutorials and getting started guides.
Highlights
- TF 2.0 delivers Keras as the central high level API used to build and train models. Keras provides several model-building APIs such as Sequential, Functional, and Subclassing along with eager execution, for immediate iteration and intuitive debugging, and
tf.data
, for building scalable input pipelines. Checkout guide for additional details.- Distribution Strategy: TF 2.0 users will be able to use the
tf.distribute.Strategy
API to distribute training with minimal code changes, yielding great out-of-the-box performance. It supports distributed training with Keras model.fit, as well as with custom training loops. Multi-GPU support is available, along with experimental support for multi worker and Cloud TPUs. Check out the guide for more details.- Functions, not Sessions. The traditional declarative programming model of building a graph and executing it via a
tf.Session
is discouraged, and replaced with by writing regular Python functions. Using thetf.function
decorator, such functions can be turned into graphs which can be executed remotely, serialized, and optimized for performance.- Unification of
tf.train.Optimizers
andtf.keras.Optimizers
. Usetf.keras.Optimizers
for TF2.0.compute_gradients
is removed as public API, useGradientTape
to compute gradients.- AutoGraph translates Python control flow into TensorFlow expressions, allowing users to write regular Python inside
tf.function
-decorated functions. AutoGraph is also applied in functions used with tf.data, tf.distribute and tf.keras APIs.- Unification of exchange formats to SavedModel. All TensorFlow ecosystem projects (TensorFlow Lite, TensorFlow JS, TensorFlow Serving, TensorFlow Hub) accept SavedModels. Model state should be saved to and restored from SavedModels.
- API Changes: Many API symbols have been renamed or removed, and argument names have changed. Many of these changes are motivated by consistency and clarity. The 1.x API remains available in the compat.v1 module. A list of all symbol changes can be found here.
- API clean-up, included removing
tf.app
,tf.flags
, andtf.logging
in favor of absl-py.- No more global variables with helper methods like
tf.global_variables_initializer
andtf.get_global_step
.- Add toggles
tf.enable_control_flow_v2()
andtf.disable_control_flow_v2()
for enabling/disabling v2 control flow.- Enable v2 control flow as part of
tf.enable_v2_behavior()
andTF2_BEHAVIOR=1
.- Fixes autocomplete for most TensorFlow API references by switching to use relative imports in API
__init__.py
files.- Auto Mixed-Precision graph optimizer simplifies converting models to
float16
for acceleration on Volta and Turing Tensor Cores. This feature can be enabled by wrapping an optimizer class withtf.train.experimental.enable_mixed_precision_graph_rewrite()
.- Add environment variable
TF_CUDNN_DETERMINISTIC
. Setting toTRUE
or "1" forces the selection of deterministic cuDNN convolution and max-pooling algorithms. When this is enabled, the algorithm selection procedure itself is also deterministic.Breaking Changes
- Many backwards incompatible API changes have been made to clean up the APIs and make them more consistent.
Toolchains:
- TensorFlow 1.15 is built using devtoolset7 (GCC7) on Ubuntu 16. This may lead to ABI incompatibilities with extensions built against earlier versions of TensorFlow.
- Tensorflow code now produces 2 different pip packages: tensorflow_core containing all the code (in the future it will contain only the private implementation) and tensorflow which is a virtual pip package doing forwarding to tensorflow_core (and in the future will contain only the public API of tensorflow). We don't expect this to be breaking, unless you were importing directly from the implementation. Removed the
freeze_graph
command line tool;SavedModel
should be used in place of frozen graphs.
tf.contrib
:
tf.contrib
has been deprecated, and functionality has been either migrated to the core TensorFlow API, to an ecosystem project such as tensorflow/addons or tensorflow/io, or removed entirely.- Remove
tf.contrib.timeseries
dependency on TF distributions.- Replace contrib references with
tf.estimator.experimental.*
for apis inearly_stopping.py
.
tf.estimator
:
- Premade estimators in the tf.estimator.DNN/Linear/DNNLinearCombined family have been updated to use
tf.keras.optimizers
instead of thetf.compat.v1.train.Optimizer
s. If you do not pass in anoptimizer=
arg or if you use a string, the premade estimator will use the Keras optimizer. This is checkpoint breaking, as the optimizers have separate variables. A checkpoint converter tool for converting optimizers is included with the release, but if you want to avoid any change, switch to the v1 version of the estimator:tf.compat.v1.estimator.DNN/Linear/DNNLinearCombined*
.- Default aggregation for canned Estimators is now
SUM_OVER_BATCH_SIZE
. To maintain previous default behavior, please passSUM
as the loss aggregation method.- Canned Estimators don’t support
input_layer_partitioner
arg in the API. If you have this arg, you will have to switch totf.compat.v1 canned Estimators
.Estimator.export_savedmodel
has been renamed toexport_saved_model
.- When saving to SavedModel, Estimators will strip default op attributes. This is almost always the correct behavior, as it is more forwards compatible, but if you require that default attributes to be saved with the model, please use
tf.compat.v1.Estimator
.- Feature Columns have been upgraded to be more Eager-friendly and to work with Keras. As a result,
tf.feature_column.input_layer
has been deprecated in favor oftf.keras.layers.DenseFeatures
. v1 feature columns have direct analogues in v2 except forshared_embedding_columns
, which are not cross-compatible with v1 and v2. Usetf.feature_column.shared_embeddings
instead.
tf.keras
:
OMP_NUM_THREADS
is no longer used by the default Keras config. To configure the number of threads, usetf.config.threading
APIs.tf.keras.model.save_model
andmodel.save
now defaults to saving a TensorFlow SavedModel. HDF5 files are still supported.- Deprecated
tf.keras.experimental.export_saved_model
andtf.keras.experimental.function
. Please usetf.keras.models.save_model(..., save_format='tf')
andtf.keras.models.load_model
instead.- Layers now default to float32, and automatically cast their inputs to the layer's dtype. If you had a model that used float64, it will probably silently use float32 in TensorFlow 2, and a warning will be issued that starts with
Layer <layer-name>
is casting an input tensor from dtype float64 to the layer's dtype of float32. To fix, either set the default dtype to float64 withtf.keras.backend.set_floatx('float64')
, or passdtype='float64'
to each of the Layer constructors. Seetf.keras.layers.Layer
for more information.
tf.lite
:
- Removed
lite.OpHint
,lite.experimental
, andlite.constant
from 2.0 API.Tensors are no longer hashable, but instead compare element-wise with
==
and!=
. Usetf.compat.v1.disable_tensor_equality()
to return to the previous behavior.Performing equality operations on Tensors or Variables with incompatible shapes an exception is no longer thrown. Instead
__eq__
returns False and__ne__
returns True.Removed
tf.string_split
from v2 API.Deprecated the use of
constraint=
and.constraint
with ResourceVariable.Add
UnifiedGRU
as the new GRU implementation for tf2.0. Change the default recurrent activation function for GRU fromhard_sigmoid
tosigmoid
, andreset_after
to True in 2.0. Historically recurrent activation ishard_sigmoid
since it is fast than 'sigmoid'. With new unified backend between CPU and GPU mode, since the CuDNN kernel is using sigmoid, we change the default for CPU mode to sigmoid as well. With that, the default GRU will be compatible with both CPU and GPU kernel. This will enable user with GPU to use CuDNN kernel by default and get a 10x performance boost in training. Note that this is checkpoint breaking change. If user want to use their 1.x pre-trained checkpoint, please construct the layer with GRU(recurrent_activation='hard_sigmoid', reset_after=False) to fallback to 1.x behavior.
CUDNN_INSTALL_PATH
,TENSORRT_INSTALL_PATH
,NCCL_INSTALL_PATH
,NCCL_HDR_PATH
are deprecated. UseTF_CUDA_PATHS
instead which supports a comma-separated list of base paths that are searched to find CUDA libraries and headers.Refer to our public project status tracker and issues tagged with
2.0
on GitHub for insight into recent issues and development progress.If you experience any snags when using TF 2.0, please let us know at the TF 2.0 Testing User Group. We have a support mailing list as well as weekly testing meetings, and would love to hear your migration feedback and questions.
Bug Fixes and Other Changes
tf.contrib
:
- Expose
tf.contrib.proto.*
ops intf.io
(they will exist in TF2)
tf.data
:
- Add support for TensorArrays to
tf.data Dataset
.- Integrate Ragged Tensors with
tf.data
.- All core and experimental tf.data transformations that input user-defined functions can span multiple devices now.
- Extending the TF 2.0 support for
shuffle(..., reshuffle_each_iteration=True)
andcache()
to work across different Python iterators for the same dataset.- Removing the
experimental_numa_aware
option fromtf.data.Options
.- Add
num_parallel_reads
and passing in a Dataset containing filenames intoTextLineDataset
andFixedLengthRecordDataset
.- Add support for defaulting the value of
cycle_length
argument oftf.data.Dataset.interleave
to the number of schedulable CPU cores.- Promoting
tf.data.experimental.enumerate_dataset
to core astf.data.Dataset.enumerate
.- Promoting
tf.data.experimental.unbatch
to core astf.data.Dataset.unbatch
.- Adds option for introducing slack in the pipeline to reduce CPU contention, via
tf.data.Options().experimental_slack = True
- Added experimental support for parallel batching to
batch()
andpadded_batch()
. This functionality can be enabled throughtf.data.Options()
.- Support cancellation of long-running
reduce
.- Now we use
dataset
node name as prefix instead of the op name, to identify the component correctly in metrics, for pipelines with repeated components.- Improve the performance of datasets using
from_tensors()
.- Promoting
unbatch
from experimental to core API.- Adding support for datasets as inputs to
from_tensors
andfrom_tensor_slices
and batching and unbatching of nested datasets.
tf.distribute
:
- Enable
tf.distribute.experimental.MultiWorkerMirroredStrategy
working in eager mode.- Callbacks are supported in
MultiWorkerMirroredStrategy
.- Disable
run_eagerly
and distribution strategy if there are symbolic tensors added to the model usingadd_metric
oradd_loss
.- Loss and gradients should now more reliably be correctly scaled w.r.t. the global batch size when using a
tf.distribute.Strategy
.- Set default loss reduction as
AUTO
for improving reliability of loss scaling with distribution strategy and custom training loops.AUTO
indicates that the reduction option will be determined by the usage context. For almost all cases this defaults toSUM_OVER_BATCH_SIZE
. When used in distribution strategy scope, outside of built-in training loops such astf.keras
compile
andfit
, we expect reduction value to be 'None' or 'SUM'. Using other values will raise an error.- Support for multi-host
ncclAllReduce
in Distribution Strategy.
tf.estimator
:
- Replace
tf.contrib.estimator.add_metrics
withtf.estimator.add_metrics
- Use
tf.compat.v1.estimator.inputs
instead oftf.estimator.inputs
- Replace contrib references with
tf.estimator.experimental.*
for apis in early_s in Estimator- Canned Estimators will now use keras optimizers by default. An error will be raised if tf.train.Optimizers are used, and you will have to switch to tf.keras.optimizers or tf.compat.v1 canned Estimators.
- A checkpoint converter for canned Estimators has been provided to transition canned Estimators that are warm started from
tf.train.Optimizers
totf.keras.optimizers
.- Losses are scaled in canned estimator v2 and not in the optimizers anymore. If you are using Estimator + distribution strategy + optimikzer v1 then the behavior does not change. This implies that if you are using custom estimator with optimizer v2, you have to scale losses. We have new utilities to help scale losses
tf.nn.compute_average_loss
,tf.nn.scale_regularization_loss
.
tf.keras
:
- Premade models (including Linear and WideDeep) have been introduced for the purpose of replacing Premade estimators.
- Model saving changes
model.save
andtf.saved_model.save
may now save to the TensorFlow SavedModel format. The model can be restored usingtf.keras.models.load_model
. HDF5 files are still supported, and may be used by specifyingsave_format="h5"
when saving.- Raw TensorFlow functions can now be used in conjunction with the Keras Functional API during model creation. This obviates the need for users to create Lambda layers in most cases when using the Functional API. Like Lambda layers, TensorFlow functions that result in Variable creation or assign ops are not supported.
- Add support for passing list of lists to the
metrics
argument in Kerascompile
.- Add
tf.keras.layers.AbstractRNNCell
as the preferred implementation for RNN cells in TF v2. User can use it to implement RNN cells with custom behavior.- Keras training and validation curves are shown on the same plot when using the TensorBoard callback.
- Switched Keras
fit/evaluate/predict
execution to use only a single unified path by default unless eager execution has been explicitly disabled, regardless of input type. This unified path places an eager-friendly training step inside of atf.function
. With this- All input types are converted to
Dataset
.- The path assumes there is always a distribution strategy. when distribution strategy is not specified the path uses a no-op distribution strategy.
- The training step is wrapped in
tf.function
unlessrun_eagerly=True
is set in compile. The single path execution code does not yet support all use cases. We fallback to the existing v1 execution paths if your model contains the following:
sample_weight_mode
in compileweighted_metrics
in compile- v1 optimizer
- target tensors in compile If you are experiencing any issues because of this change, please inform us (file an issue) about your use case and you can unblock yourself by setting
experimental_run_tf_function=False
in compile meanwhile. We have seen couple of use cases where the model usage pattern is not as expected and would not work with this change.- output tensors of one layer is used in the constructor of another.
- symbolic tensors outside the scope of the model are used in custom loss functions. The flag can be disabled for these cases and ideally the usage pattern will need to be fixed.
- Mark Keras
set_session
ascompat.v1
only.tf.keras.estimator.model_to_estimator
now supports exporting totf.train.Checkpoint format
, which allows the saved checkpoints to be compatible withmodel.load_weights
.keras.backend.resize_images
(and consequently,keras.layers.Upsampling2D
) behavior has changed, a bug in the resizing implementation was fixed.- Add an
implementation=3
mode fortf.keras.layers.LocallyConnected2D
andtf.keras.layers.LocallyConnected1D
layers usingtf.SparseTensor
to store weights, allowing a dramatic speedup for large sparse models.- Raise error if
batch_size
argument is used when input is dataset/generator/keras sequence.- Update TF 2.0
keras.backend.name_scope
to use TF 2.0name_scope
.- Add v2 module aliases for losses, metrics, initializers and optimizers:
tf.losses = tf.keras.losses
&tf.metrics = tf.keras.metrics
&tf.initializers = tf.keras.initializers
&tf.optimizers = tf.keras.optimizers
.- Updates binary cross entropy logic in Keras when input is probabilities. Instead of converting probabilities to logits, we are using the cross entropy formula for probabilities.
- Added public APIs for
cumsum
andcumprod
keras backend functions.- Add support for temporal sample weight mode in subclassed models.
- Raise
ValueError
if an integer is passed to the training APIs.- Added fault-tolerance support for training Keras model via
model.fit()
withMultiWorkerMirroredStrategy
, tutorial available.- Custom Callback tutorial is now available.
- To train with
tf.distribute
, Keras API is recommended over estimator.steps_per_epoch
andsteps
arguments are supported with numpy arrays.- New error message when unexpected keys are used in sample_weight/class_weight dictionaries
- Losses are scaled in Keras compile/fit and not in the optimizers anymore. If you are using custom training loop, we have new utilities to help scale losses
tf.nn.compute_average_loss
,tf.nn.scale_regularization_loss
.Layer
apply and add_variable APIs are deprecated.- Added support for channels first data format in cross entropy losses with logits and support for tensors with unknown ranks.
- Error messages will be raised if
add_update
,add_metric
,add_loss
, activity regularizers are used inside of a control flow branch.- New loss reduction types:
AUTO
: Indicates that the reduction option will be determined by the usage context. For almost all cases this defaults toSUM_OVER_BATCH_SIZE
. When used withtf.distribute.Strategy
, outside of built-in training loops such astf.keras
compile
andfit
, we expect reduction value to beSUM
orNONE
. UsingAUTO
in that case will raise an error.NONE
: Weighted losses with one dimension reduced (axis=-1, or axis specified by loss function). When this reduction type used with built-in Keras training loops likefit
/evaluate
, the unreduced vector loss is passed to the optimizer but the reported loss will be a scalar value.SUM
: Scalar sum of weighted losses. 4.SUM_OVER_BATCH_SIZE
: ScalarSUM
divided by number of elements in losses. This reduction type is not supported when used withtf.distribute.Strategy
outside of built-in training loops liketf.keras
compile
/fit
.- Wraps losses passed to the
compile
API (strings and v1 losses) which are not instances of v2Loss
class inLossWrapper
class. => All losses will now useSUM_OVER_BATCH_SIZE
reduction as default.model.add_loss(symbolic_tensor)
should work in ambient eager.- Update metric name to always reflect what the user has given in compile. Affects following cases
- When name is given as 'accuracy'/'crossentropy'
- When an aliased function name is used eg. 'mse'
- Removing the
weighted
prefix from weighted metric names.- Allow non-Tensors through v2 losses.
- Add v2 sparse categorical crossentropy metric.
- Add v2 APIs for
AUCCurve
andAUCSummationMethod
enums.add_update
can now be passed a zero-arg callable in order to support turning off the update when settingtrainable=False
on a Layer of a Model compiled withrun_eagerly=True
.- Standardize the LayerNormalization API by replacing the args
norm_axis
andparams_axis
withaxis
.- Fixed critical bugs that help with DenseFeatures usability in TF2
tf.lite
:
- Added evaluation script for
COCO
minival- Add delegate support for
QUANTIZE
.- Add
GATHER
support to NN API delegate.- Added support for TFLiteConverter Python API in 2.0. Contains functions from_saved_model, from_keras_file, and from_concrete_functions.
- Add
EXPAND_DIMS
support to NN API delegate TEST.- Add
narrow_range
attribute to QuantizeAndDequantizeV2 and V3.- Added support for
tflite_convert
command line tool in 2.0.- Post-training quantization tool supports quantizing weights shared by multiple operations. The models made with versions of this tool will use INT8 types for weights and will only be executable interpreters from this version onwards.
- Post-training quantization tool supports fp16 weights and GPU delegate acceleration for fp16.
- Add delegate support for
QUANTIZED_16BIT_LSTM
.- Extracts
NNAPIDelegateKernel
from nnapi_delegate.ccTensorRT
- Add TensorFlow 2.0-compatible
TrtGraphConverterV2
API for TensorRT conversion. TensorRT initialization arguments are now passed wrapped in a named-tuple,TrtConversionParams
, rather than as separate arguments as inTrtGraphConverter
.- Changed API to optimize TensorRT enginges during graph optimization. This is now done by calling
converter.build()
where previouslyis_dynamic_op=False
would be set.converter.convert()
no longer returns atf.function
. Now the funtion must be accessed from the saved model.- The
converter.calibrate()
method has been removed. To trigger calibration, acalibration_input_fn
should be provided toconverter.convert()
.Other:
- Fix accidental quadratic graph construction cost in graph-mode
tf.gradients()
.- ResourceVariable's gather op supports batch dimensions.
- ResourceVariable support for
gather_nd
.ResourceVariable
andVariable
no longer acceptsconstraint
in the constructor, nor expose it as a @property.- Added gradient for
SparseToDense
op.- Expose a flag that allows the number of threads to vary across Python benchmarks.
image.resize
in 2.0 now supports gradients for the new resize kernels.image.resize
now considers proper pixel centers and has new kernels (incl. anti-aliasing).- Renamed
tf.image
functions to remove duplicate "image" where it is redundant.- Variadic reduce is supported on CPU Variadic reduce is supported on CPU
- Remove unused
StringViewVariantWrapper
.- Delete unused
Fingerprint64Map
op registration- Add broadcasting support to
tf.matmul
.- Add C++ Gradient for
BatchMatMulV2
.- Add
tf.math.cumulative_logsumexp
operation.- Add ellipsis (...) support for
tf.einsum()
.- Add expand_composites argument to all
nest.*
methods.- Added
strings.byte_split
.- Add a new "result_type" parameter to
tf.strings.split
.- Add name argument to
tf.string_split
andtf.strings_split
.- Extend
tf.strings.split
to support inputs with any rank.- Added
tf.random.binomial
.- Added
key
andskip
methods torandom.experimental.Generator
.- Extend
tf.function
with basic support for CompositeTensors arguments (such asSparseTensor
andRaggedTensor
).parallel_for.pfor
: add converters for Softmax, LogSoftmax, IsNaN, All, Any, and MatrixSetDiag.parallel_for
: add converters for LowerTriangularSolve and Cholesky.parallel_for
: add converters forLogMatrixDeterminant
andMatrixBandPart
.parallel_for
: Add converter forMatrixDiag
.parallel_for
: Add converters forOneHot
,LowerBound
,UpperBound
.parallel_for
: add converter forBroadcastTo
.- Add
pfor
converter forSqueeze
.- Add
RaggedTensor.placeholder()
.- Add ragged tensor support to
tf.squeeze
.- Update RaggedTensors to support int32 row_splits.
- Allow
LinearOperator.solve
to take aLinearOperator
.- Allow all dtypes for
LinearOperatorCirculant
.- Introduce MaxParallelism method
- Add
LinearOperatorHouseholder
.- Adds Philox support to new stateful RNG's XLA path.
- Added
TensorSpec
support for CompositeTensors.- Added
tf.linalg.tridiagonal_solve
op.- Added partial_pivoting input parameter to
tf.linalg.tridiagonal_solve
.- Added gradient to
tf.linalg.tridiagonal_solve
.- Added
tf.linalg.tridiagonal_mul op
.- Added GPU implementation of
tf.linalg.tridiagonal_matmul
.- Added
LinearOperatorToeplitz
.- Upgraded LIBXSMM to version 1.11.
- Uniform processing of quantized embeddings by Gather and EmbeddingLookup Ops.
- Correct a misstatement in the documentation of the sparse softmax cross entropy logit parameter.
- Add
tf.ragged.boolean_mask
.tf.switch_case
added, which selects a branch_fn based on a branch_index.- The C++ kernel of gather op supports batch dimensions.
- Fixed default value and documentation for
trainable
arg of tf.Variable.EagerTensor
now supports numpy buffer interface for tensors.- This change bumps the version number of the
FullyConnected
Op to 5.- Added new op:
tf.strings.unsorted_segment_join
.- Added HW acceleration support for
topK_v2
.- CloudBigtable version updated to v0.10.0 BEGIN_PUBLIC CloudBigtable version updated to v0.10.0.
- Expose
Head
as public API.- Added
tf.sparse.from_dense
utility function.- Improved ragged tensor support in
TensorFlowTestCase
.- Added a function
nested_value_rowids
for ragged tensors.- Added
tf.ragged.stack
.- Makes the a-normal form transformation in Pyct configurable as to which nodes are converted to variables and which are not.
ResizeInputTensor
now works for all delegates.tf.cond
emits a StatelessIf op if the branch functions are stateless and do not touch any resources.- Add support of local soft device placement for eager op.
- Pass partial_pivoting to the
_TridiagonalSolveGrad
.- Add HW acceleration support for
LogSoftMax
.- Add guard to avoid acceleration of L2 Normalization with input rank != 4
- Fix memory allocation problem when calling
AddNewInputConstantTensor
.- Delegate application failure leaves interpreter in valid state
tf.while_loop
emits a StatelessWhile op if the cond and body functions are stateless and do not touch any resources.tf.cond
,tf.while
and if and while in AutoGraph now accept a nonscalar predicate if has a single element. This does not affect non-V2 control flow.- Fix potential security vulnerability where decoding variant tensors from proto could result in heap out of bounds memory access.
- Only create a GCS directory object if the object does not already exist.
- Introduce
dynamic
constructor argument in Layer and Model, which should be set toTrue
when using imperative control flow in thecall
method.- Begin adding Go wrapper for C Eager API.
- XLA HLO graphs can be inspected with interactive_graphviz tool now.
- Add dataset ops to the graph (or create kernels in Eager execution) during the python Dataset object creation instead doing it during Iterator creation time.
- Add
batch_dims
argument totf.gather
.- The behavior of
tf.gather
is now correct whenaxis=None
andbatch_dims<0
.- Update docstring for gather to properly describe the non-empty
batch_dims
case.- Removing of dtype in the constructor of initializers and partition_info in call.
- Add
tf.math.nextafter
op.- Turn on MKL-DNN contraction kernels by default. MKL-DNN dynamically dispatches the best kernel implementation based on CPU vector architecture. To disable them, build with
--define=tensorflow_mkldnn_contraction_kernel=0
.tf.linspace(start, stop, num)
now always uses "stop" as last value (for num > 1)- Added top-k to precision and recall to keras metrics.
- Add a ragged size op and register it to the op dispatcher
- Transitive dependencies on :
pooling_ops
were removed. Some users may need to add explicit dependencies on :pooling_ops
if they reference the operators from that library.- Add
CompositeTensor
base class.- Malformed gif images could result in an access out of bounds in the color palette of the frame. This has been fixed now
- Add templates and interfaces for creating lookup tables
Tensor::UnsafeCopyFromInternal
deprecated in favorTensor::BitcastFrom
.- In
map_vectorization
optimization, reduce the degree of parallelism in the vectorized map node.- Add variant wrapper for
absl::string_view
.- Add OpKernels for some stateless maps.
- DType is no longer convertible to an int. Use
dtype.as_datatype_enum
instead ofint(dtype)
to get the same result.- Support both binary and -1/1 label input in v2 hinge and squared hinge losses.
- Added
LinearOperator.adjoint
andLinearOperator.H
(alias).- Expose CriticalSection in core as
tf.CriticalSection
.- Enhanced graphviz output.
- Add opkernel templates for common table operations.
- Fix callbacks do not log values in eager mode when a deferred build model is used.
SignatureDef
util functions have been deprecated.- Update
Fingerprint64Map
to use aliases- Add legacy string flat hash map op kernels.
- Add support for
add_metric
in the graph function mode.- Updating cosine similarity loss - removed the negate sign from cosine similarity.
- Changed default for gradient accumulation for TPU embeddings to true.
- Adds summary trace API for collecting graph and profile information.
- The
precision_mode
argument toTrtGraphConverter
is now case insensitive.