20191001のTensorFlowに関する記事は2件です。

投稿日：2019-10-01T23:28:48+09:00

強化学習で目指すF-ZERO王者（前編）

できたもの（途中経過）

F-ZEROのコースをsegmentationする深層学習モデルをJetson Nano上で動かしています．480x288，30FPSで処理できています．

Realtime "F-ZERO" course segmentation model is running on Jetson Nano. It is part of "gaming AI making challenge" as my personal work.#jetson #nvidia pic.twitter.com/AYCqE75JbG
— nobu_e753 (@nobu_e753) September 27, 2019

キャプチャしたものはこちら（キャプチャの過程で負荷がかかり，レートが落ちています）

Realtime "F-ZERO" course segmentation model is running on Jetson Nano (captured movie)#jetson #nvidia pic.twitter.com/gnn79ys9Oa
— nobu_e753 (@nobu_e753) October 1, 2019

はじめに

強化学習と聞くと真っ先にイメージするのがゲームの自動プレイです．ちょっと試してみたいな．．と思い論文や既存の実装を見るも，どうも馴染みのないゲームが題材にされていることが多いようでした．ROM入手にかかわることが大きいのでしょう．

そこで，どうせ勉強を兼ねて試すなら"自分が遊んだ・好きなゲームで試してみたいよね"ということで，そういったシステムをどう構築するか？から取り組んでみました．

この記事はその前編で，システムの構築までを取り扱います．

1. ゲームの選定

以前から格闘ゲームのハックはしてきたので今回もと思ったのですが，キャラクターが２人いたり動きに自由度があり，難易度高そうな感ぷんぷんでした．初トライで高難度は勘弁...ということで，

レース or シューティング（一方方向にしか動かないゲームが解析しやすいと聞いたので）

背景がシンプルで，数値等の読み取りがしやすい

自分が遊んだことがある（モチベーション維持のため最重要！）

という条件をもとに考えた結果，F-ZERO（初代，1990年，スーパーファミコン）にしました．ロケットスタートとか，ジャンプ台でのショートカットとか，それなりに遊んでいた記憶があります．

なお，強化学習 x F-ZEROについてはいくつも先行事例があり，タイトルに"F-ZERO"とはいった論文（Racing F-Zero with Imitation Learning, 2017）や，まんまなチュートリアル動画"FZERO AI - PPO2 and A2C - OpenAI and Stable-baslines"は見ているだけでも面白いです．

2. 全体構成

"自分の好きなゲームで強化学習をしたい"というのが今回のコンセプトなので，ゲーム機実機を組み込んだ構成としました．またどうせやるなら，ということで計算機にはJetson Nanoを充ててみました．学習はともかく推論ができれば，"Jetson Nano限定AIゲーム大会"なんてのも夢ではないかもしれません（是非やりたいです）．なおソニックは大会があったみたいですね．

使用した機材

Jetson Nano (ML computer)

Retro freak (Game console)

ezcap 261 (Video capture)

GameControllerizer (Game controll)

ML computerについては，当初Edge TPUを使う予定でしたが，量子化手順がめんどくさすぎるのでパスしました．Video captureについてはezcap 261を選択．この機種はUVC(USB Video Class)＆Linuxに対応していることが明記されており，かつ安価であることから選びました．
キャプチャユニットをJetson Nanoに接続しcv2.VideoCapture()をたたくだけで，だらだらとゲーム画面がやってきます．
import cv2

cap = cv2.VideoCapture(-1) # 0はカメラのデバイス番号
cap.set(cv2.CAP_PROP_FPS, 30)           # カメラFPSを30FPSに設定
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280) # カメラ画像の横幅を1280に設定
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720) # カメラ画像の縦幅を720に設定

ret, frame = cap.read() # 画像を取得
ゲーム機実機を使った場合「学習が一生終わらない」なんてことになるかもしれませんが，そうなったらその時に考えることにします．

3. 情報の抽出

とっかかりはゲーム画面からの情報の抽出です．ゲーム強化学習で一般的なフレームワークGym retroと異なり，システムをゼロから組んだ場合はこれが第一の壁になります．必要な情報としては

Power

Speed

Time (option)

Course (option)

位と考えました．

まず初めに各情報の位置をメモしておき，それぞれ取り掛かることにしました．

3.1 Power

最初は「赤色部分の長さ」で判別しようとしましたが，Powerゲージが25%を切ると点滅し赤⇔グレーを繰り返すのでうまくいきませんでした．そののち，ゲージ部分を観察し右側に黒線があることに気づいたので，X方向で輝度最低のPixel位置を求めるような判定ルーチンを組みました．

3.2 Speed

これも最初は適当にテンプレートマッチング（SSD,SAD）すればいけるんじゃね？と思ってやったもののダメでした．数字が微妙に透過しているせいで背景の影響を受け精度が散々でした．

仕方ないので小型のCNNで判別することにしました．28x24サイズで10クラスなのでMNISTとほぼ同等です．
モデル構築部
DIGIT_SIZE = [24, 28, 1]

def __build_model(lr=1e-3):
    # functional
    i = k.layers.Input(DIGIT_SIZE, name="input_0")
    x = k.layers.Conv2D(8, (3,3), activation="relu", padding="valid")(i)
    x = k.layers.MaxPool2D()(x)
    x = k.layers.Flatten()(x)
    x = k.layers.Dense(10, activation="softmax", name="output")(x)
    model = k.models.Model(inputs=i, outputs=x)
    model.summary()
    model.compile(loss='binary_crossentropy',
                  optimizer=k.optimizers.Adam(lr=lr, decay=1e-5),
                  metrics=['accuracy'])
    return model
学習データはゲーム画面からクロップして仕訳けました．計685枚．
)

学習部はtf.kerasのお作法そのままです．1点だけ，収集した事例数が結構偏っていたので，この差を緩和するためにlabels_weightで事例重みを渡してやっています．問題が簡単なせいか，すぐにAccuracy = 1.0に．
学習部
# Found 685 images belonging to 10 classes.
# Counter({0: 154, 3: 99, 1: 76, 2: 67, 6: 55, 5: 53, 7: 51, 4: 48, 8: 45, 9: 37})
model = __build_model()
model.fit_generator(
    batch_generator,
    epochs=10,
    class_weight=labels_weight,
    verbose=1)
両者の結果をOverlayで表示したものが以下です．Jetson Nano上でもリアルタイムで楽々動きます．

Speed & power value extraction from F-ZERO screen. The image processing program is running on JetsonNano. For speed value extraction, very small CNN model is used.#jetson #nvidia pic.twitter.com/wVJtNCy3th
— nobu_e753 (@nobu_e753) October 1, 2019

3.3 Time (option)

ひとしきり考えた末，スキップすることにしました．

[より短いTime] := [高いSpeed] かつ [Powerが減っていない]

と読み替えていいのではと．サボりですw

3.4 Course (option)

シンプルな強化学習であればコース情報は強化学習中に自動で獲得してゆくものでしょう．ただ，あらかじめコースを抽出できるモデルを組んでおけば，学習時間を短縮できるのでは？という思いからトライしてみました．またJetson Nano上でSegmentationタスクを動かしたとき，どのくらいのパフォーマンスを出せそうなのか？ということにも興味がありました．

モデル検討

とにかく早いモデルにしておこう，ということで，Segmentationデータセットとして有名なCityscapeのベンチマークで最速のモデルを探してきました．

Fast-SCNN: Fast Semantic Segmentation Network, 2019

https://arxiv.org/abs/1902.04502

1024 x 512 処理時に285 FPSとのことなので期待が持てます（実際はさらにchannel数を半分に切りつめ，入力画像サイズ480 x 288で利用しました）．ありがたいことにFast-SCNNのTensorflowでの解説付き実装も公開されていました．

学習データ

こればっかりはどうしよもないので気合で対応，１面（Mute city），２面（Big blue）からそれぞれ75 frameずつ，計150 frameに対して，アノテーション（コース，磁力帯）を作成しました．作成には，ブラウザ上で動くフリーの教師作成ツールAnnpFabを使いました．作成時間は1時間強でした．

学習＆推論

磁力帯を無視し，ひとまずコースだけをsegmentationする学習フローを構築．学習は5000ステップ（適当），RTX 2080i上で18分かかりました．静止画に対する適用結果を確認すると，えいやで作ったモデルのわりに推論結果もまあまあです．
唯一，自車がコース端に激突しエフェクトが走るとsegmentation結果が甘くなりますが，これは学習データ追加でカバーできるでしょう．

速度がでない！

モデルが出来あがりウキウキでJetson Nanoにマッピング，動画を処理してみるも速度が出ません！カクカクです！！調べてみると

オーバーレイなどの事前事後処理が遅い

でした．segmentation部分を視覚化するためにマスクを作ったりオーバーレイをしたりという事後処理をOpenCV+numpyで書いていたのですが，この処理だけで50-60msかかっていました．逆にCNNの部分は30msec以下でした．

Edge deviceの場合，CNNについてはGPUというアクセラレータがついているのに対して，CPUは貧弱なのでこういう結果を生んだと考えられます．本件に限らずEdge deviceを利用する場合は注意が必要そうです．

高速化（並列化，TF-TRT）

このままでは学習に影響が出そうですし，なにより悔しいので高速化してみました．

並列化

Pythonのmultiprocessing.Processを使って，処理を

前処理（キャプチャ，縮小）

Segmentation

後処理（オーバーレイ，表示）

の3つに分割，パイプラインで実行できるようにしました．
Jetson上でGPUの使用率を確認するツールjtopで状況を確認してみると，使用率が如実に改善されていました．

TF-TRT（TensorFlow with TensorRT optimization）

割と使えそうなのがこちらです．Jetson Nano上でCNNモデルを動作させる場合，

Tensorflowモデルのまま，Tensorflow runtime上で動かす

TensorRTで最適化したモデルを，TensorRT runtime上で動かす

があります．もちろん2のほうが高効率ですがモデル内にTensorRTサポート外オペレーションを含んでいると最適化ができません．例えば今回のモデルにはResizeBilinearオペレーションが含まれており，これがTensorRT(5.X)非サポートでした．

しかしドキュメントをよく読むと，これ以外に第3の選択肢があり「TensorRTで対応しているオペレーションだけ最適化を行いこれはTensorRT runtime上で動かす．その他はTensorflowモデルのままTensorflow runtime上で動かす」ハイブリッドな方法が存在します（公式ページ）

導入方法も簡単で，Tensorflowの推論グラフ(.pb)を読み込んだ後に，数行付け加えるだけです．導入後に推論部のみの速度を計測したところ 24ms → 15ms と大きな改善がみられました．
TF-TRT導入
# TF-TRT導入に必要
from tensorflow.python.compiler.tensorrt import trt_convert as trt
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.65)

...

graph = tf.Graph()
with graph.as_default():
    model_file = open(pb, 'rb')
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(model_file.read())

    #############################################
    # TF-TRTむけの追加部分
    converter = trt.TrtGraphConverter(
        input_graph_def=graph_def,
        # precision_mode="FP16",  # 演算モードも選べる
        nodes_blacklist=[OUTPUT_NODE])
    graph_def = converter.convert()
    #############################################

    tf.import_graph_def(graph_def, name="")
最適化前後のsegmentation速度です．なんとか30 fps達成です．これで冒頭のようなリアルタイムsegmentation画面を表示することができました．

まとめ

"自分の好きなゲームで強化学習をしたい"というコンセプトをもとに，これを可能にするシステムを構築してみました．あわせて，計算機部分にEdge device（Jetson Nano）を試用してみたことで，その特性を知ることができました．

特にJetson Nano上でリアルタイム処理を実現しようとする場合，

GPUと比較してCPUが貧弱なので，CNN以外の処理負荷にも注意

TensorRTが適用できない場合にも，TF-TRTがある

の２点は大変よい知見となりました．

足回りができたので，いよいよ本題の強化学習です．自動プレイでMute cityを１周出来るのがいつになるかはわかりませんが，本業の合間にのんびりやっていきます．

挑戦者待ってます！

　 Tweet
Qiitaで続きを読む

投稿日：2019-10-01T22:46:07+09:00

[Tensorflow] Tensorflow 2

Tensorflow　2.0が発表されて、変化点を見る事も楽しいですね。
Kerasを基本に使えるようになって、便利になりますたね。

Release 2.0.0

Major Features and Improvements

TensorFlow 2.0 focuses on simplicity and ease of use, featuring updates like:

Easy model building with Keras and eager execution.

Robust model deployment in production on any platform.

Powerful experimentation for research.

API simplification by reducing duplication and removing deprecated endpoints.

For details on best practices with 2.0, see the Effective 2.0 guide

For information on upgrading your existing TensorFlow 1.x models, please refer to our Upgrade and Migration guides. We have also released a collection of tutorials and getting started guides.

Highlights

TF 2.0 delivers Keras as the central high level API used to build and train models. Keras provides several model-building APIs such as Sequential, Functional, and Subclassing along with eager execution, for immediate iteration and intuitive debugging, and tf.data, for building scalable input pipelines. Checkout guide for additional details.

Distribution Strategy: TF 2.0 users will be able to use the tf.distribute.Strategy API to distribute training with minimal code changes, yielding great out-of-the-box performance. It supports distributed training with Keras model.fit, as well as with custom training loops. Multi-GPU support is available, along with experimental support for multi worker and Cloud TPUs. Check out the guide for more details.

Functions, not Sessions. The traditional declarative programming model of building a graph and executing it via a tf.Session is discouraged, and replaced with by writing regular Python functions. Using the tf.function decorator, such functions can be turned into graphs which can be executed remotely, serialized, and optimized for performance.

Unification of tf.train.Optimizers and tf.keras.Optimizers. Use tf.keras.Optimizers for TF2.0. compute_gradients is removed as public API, use GradientTape to compute gradients.

AutoGraph translates Python control flow into TensorFlow expressions, allowing users to write regular Python inside tf.function-decorated functions. AutoGraph is also applied in functions used with tf.data, tf.distribute and tf.keras APIs.

Unification of exchange formats to SavedModel. All TensorFlow ecosystem projects (TensorFlow Lite, TensorFlow JS, TensorFlow Serving, TensorFlow Hub) accept SavedModels. Model state should be saved to and restored from SavedModels.

API Changes: Many API symbols have been renamed or removed, and argument names have changed. Many of these changes are motivated by consistency and clarity. The 1.x API remains available in the compat.v1 module. A list of all symbol changes can be found here.

API clean-up, included removing tf.app, tf.flags, and tf.logging in favor of absl-py.

No more global variables with helper methods like tf.global_variables_initializer and tf.get_global_step.

Add toggles tf.enable_control_flow_v2() and tf.disable_control_flow_v2() for enabling/disabling v2 control flow.

Enable v2 control flow as part of tf.enable_v2_behavior() and TF2_BEHAVIOR=1.

Fixes autocomplete for most TensorFlow API references by switching to use relative imports in API __init__.py files.

Auto Mixed-Precision graph optimizer simplifies converting models to float16 for acceleration on Volta and Turing Tensor Cores. This feature can be enabled by wrapping an optimizer class with tf.train.experimental.enable_mixed_precision_graph_rewrite().

Add environment variable TF_CUDNN_DETERMINISTIC. Setting to TRUE or "1" forces the selection of deterministic cuDNN convolution and max-pooling algorithms. When this is enabled, the algorithm selection procedure itself is also deterministic.

Breaking Changes

Many backwards incompatible API changes have been made to clean up the APIs and make them more consistent.

Toolchains:

TensorFlow 1.15 is built using devtoolset7 (GCC7) on Ubuntu 16. This may lead to ABI incompatibilities with extensions built against earlier versions of TensorFlow.

Tensorflow code now produces 2 different pip packages: tensorflow_core containing all the code (in the future it will contain only the private implementation) and tensorflow which is a virtual pip package doing forwarding to tensorflow_core (and in the future will contain only the public API of tensorflow). We don't expect this to be breaking, unless you were importing directly from the implementation. Removed the freeze_graph command line tool; SavedModel should be used in place of frozen graphs.

tf.contrib:

tf.contrib has been deprecated, and functionality has been either migrated to the core TensorFlow API, to an ecosystem project such as tensorflow/addons or tensorflow/io, or removed entirely.

Remove tf.contrib.timeseries dependency on TF distributions.

Replace contrib references with tf.estimator.experimental.* for apis in early_stopping.py.

tf.estimator:

Premade estimators in the tf.estimator.DNN/Linear/DNNLinearCombined family have been updated to use tf.keras.optimizers instead of the tf.compat.v1.train.Optimizers. If you do not pass in an optimizer= arg or if you use a string, the premade estimator will use the Keras optimizer. This is checkpoint breaking, as the optimizers have separate variables. A checkpoint converter tool for converting optimizers is included with the release, but if you want to avoid any change, switch to the v1 version of the estimator: tf.compat.v1.estimator.DNN/Linear/DNNLinearCombined*.

Default aggregation for canned Estimators is now SUM_OVER_BATCH_SIZE. To maintain previous default behavior, please pass SUM as the loss aggregation method.

Canned Estimators don’t support input_layer_partitioner arg in the API. If you have this arg, you will have to switch to tf.compat.v1 canned Estimators.

Estimator.export_savedmodel has been renamed to export_saved_model.

When saving to SavedModel, Estimators will strip default op attributes. This is almost always the correct behavior, as it is more forwards compatible, but if you require that default attributes to be saved with the model, please use tf.compat.v1.Estimator.

Feature Columns have been upgraded to be more Eager-friendly and to work with Keras. As a result, tf.feature_column.input_layer has been deprecated in favor of tf.keras.layers.DenseFeatures. v1 feature columns have direct analogues in v2 except for shared_embedding_columns, which are not cross-compatible with v1 and v2. Use tf.feature_column.shared_embeddings instead.

tf.keras:

OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.

tf.keras.model.save_model and model.save now defaults to saving a TensorFlow SavedModel. HDF5 files are still supported.

Deprecated tf.keras.experimental.export_saved_model and tf.keras.experimental.function. Please use tf.keras.models.save_model(..., save_format='tf') and tf.keras.models.load_model instead.

Layers now default to float32, and automatically cast their inputs to the layer's dtype. If you had a model that used float64, it will probably silently use float32 in TensorFlow 2, and a warning will be issued that starts with Layer <layer-name> is casting an input tensor from dtype float64 to the layer's dtype of float32. To fix, either set the default dtype to float64 with tf.keras.backend.set_floatx('float64'), or pass dtype='float64' to each of the Layer constructors. See tf.keras.layers.Layer for more information.

tf.lite:

Removed lite.OpHint, lite.experimental, and lite.constant from 2.0 API.

Tensors are no longer hashable, but instead compare element-wise with == and !=. Use tf.compat.v1.disable_tensor_equality() to return to the previous behavior.

Performing equality operations on Tensors or Variables with incompatible shapes an exception is no longer thrown. Instead __eq__ returns False and __ne__ returns True.

Removed tf.string_split from v2 API.

Deprecated the use of constraint= and .constraint with ResourceVariable.

Add UnifiedGRU as the new GRU implementation for tf2.0. Change the default recurrent activation function for GRU from hard_sigmoid to sigmoid, and reset_after to True in 2.0. Historically recurrent activation is hard_sigmoid since it is fast than 'sigmoid'. With new unified backend between CPU and GPU mode, since the CuDNN kernel is using sigmoid, we change the default for CPU mode to sigmoid as well. With that, the default GRU will be compatible with both CPU and GPU kernel. This will enable user with GPU to use CuDNN kernel by default and get a 10x performance boost in training. Note that this is checkpoint breaking change. If user want to use their 1.x pre-trained checkpoint, please construct the layer with GRU(recurrent_activation='hard_sigmoid', reset_after=False) to fallback to 1.x behavior.

CUDNN_INSTALL_PATH, TENSORRT_INSTALL_PATH, NCCL_INSTALL_PATH, NCCL_HDR_PATH are deprecated. Use TF_CUDA_PATHS instead which supports a comma-separated list of base paths that are searched to find CUDA libraries and headers.

Refer to our public project status tracker and issues tagged with 2.0 on GitHub for insight into recent issues and development progress.

If you experience any snags when using TF 2.0, please let us know at the TF 2.0 Testing User Group. We have a support mailing list as well as weekly testing meetings, and would love to hear your migration feedback and questions.

Bug Fixes and Other Changes

tf.contrib:

Expose tf.contrib.proto.* ops in tf.io (they will exist in TF2)

tf.data:

Add support for TensorArrays to tf.data Dataset.

Integrate Ragged Tensors with tf.data.

All core and experimental tf.data transformations that input user-defined functions can span multiple devices now.

Extending the TF 2.0 support for shuffle(..., reshuffle_each_iteration=True) and cache() to work across different Python iterators for the same dataset.

Removing the experimental_numa_aware option from tf.data.Options.

Add num_parallel_reads and passing in a Dataset containing filenames into TextLineDataset and FixedLengthRecordDataset.

Add support for defaulting the value of cycle_length argument of tf.data.Dataset.interleave to the number of schedulable CPU cores.

Promoting tf.data.experimental.enumerate_dataset to core as tf.data.Dataset.enumerate.

Promoting tf.data.experimental.unbatch to core as tf.data.Dataset.unbatch.

Adds option for introducing slack in the pipeline to reduce CPU contention, via tf.data.Options().experimental_slack = True

Added experimental support for parallel batching to batch() and padded_batch(). This functionality can be enabled through tf.data.Options().

Support cancellation of long-running reduce.

Now we use dataset node name as prefix instead of the op name, to identify the component correctly in metrics, for pipelines with repeated components.

Improve the performance of datasets using from_tensors().

Promoting unbatch from experimental to core API.

Adding support for datasets as inputs to from_tensors and from_tensor_slices and batching and unbatching of nested datasets.

tf.distribute:

Enable tf.distribute.experimental.MultiWorkerMirroredStrategy working in eager mode.

Callbacks are supported in MultiWorkerMirroredStrategy.

Disable run_eagerly and distribution strategy if there are symbolic tensors added to the model using add_metric or add_loss.

Loss and gradients should now more reliably be correctly scaled w.r.t. the global batch size when using a tf.distribute.Strategy.

Set default loss reduction as AUTO for improving reliability of loss scaling with distribution strategy and custom training loops. AUTO indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE. When used in distribution strategy scope, outside of built-in training loops such as tf.keras compile and fit, we expect reduction value to be 'None' or 'SUM'. Using other values will raise an error.

Support for multi-host ncclAllReduce in Distribution Strategy.

tf.estimator:

Replace tf.contrib.estimator.add_metrics with tf.estimator.add_metrics

Use tf.compat.v1.estimator.inputs instead of tf.estimator.inputs

Replace contrib references with tf.estimator.experimental.* for apis in early_s in Estimator

Canned Estimators will now use keras optimizers by default. An error will be raised if tf.train.Optimizers are used, and you will have to switch to tf.keras.optimizers or tf.compat.v1 canned Estimators.

A checkpoint converter for canned Estimators has been provided to transition canned Estimators that are warm started from tf.train.Optimizers to tf.keras.optimizers.

Losses are scaled in canned estimator v2 and not in the optimizers anymore. If you are using Estimator + distribution strategy + optimikzer v1 then the behavior does not change. This implies that if you are using custom estimator with optimizer v2, you have to scale losses. We have new utilities to help scale losses tf.nn.compute_average_loss, tf.nn.scale_regularization_loss.

tf.keras:

Premade models (including Linear and WideDeep) have been introduced for the purpose of replacing Premade estimators.

Model saving changes

model.save and tf.saved_model.save may now save to the TensorFlow SavedModel format. The model can be restored using tf.keras.models.load_model. HDF5 files are still supported, and may be used by specifying save_format="h5" when saving.

Raw TensorFlow functions can now be used in conjunction with the Keras Functional API during model creation. This obviates the need for users to create Lambda layers in most cases when using the Functional API. Like Lambda layers, TensorFlow functions that result in Variable creation or assign ops are not supported.

Add support for passing list of lists to the metrics argument in Keras compile.

Add tf.keras.layers.AbstractRNNCell as the preferred implementation for RNN cells in TF v2. User can use it to implement RNN cells with custom behavior.

Keras training and validation curves are shown on the same plot when using the TensorBoard callback.

Switched Keras fit/evaluate/predict execution to use only a single unified path by default unless eager execution has been explicitly disabled, regardless of input type. This unified path places an eager-friendly training step inside of a tf.function. With this

All input types are converted to Dataset.

The path assumes there is always a distribution strategy. when distribution strategy is not specified the path uses a no-op distribution strategy.

The training step is wrapped in tf.function unless run_eagerly=True is set in compile. The single path execution code does not yet support all use cases. We fallback to the existing v1 execution paths if your model contains the following:

sample_weight_mode in compile

weighted_metrics in compile

v1 optimizer

target tensors in compile If you are experiencing any issues because of this change, please inform us (file an issue) about your use case and you can unblock yourself by setting experimental_run_tf_function=False in compile meanwhile. We have seen couple of use cases where the model usage pattern is not as expected and would not work with this change.

output tensors of one layer is used in the constructor of another.

symbolic tensors outside the scope of the model are used in custom loss functions. The flag can be disabled for these cases and ideally the usage pattern will need to be fixed.

Mark Keras set_session as compat.v1 only.

tf.keras.estimator.model_to_estimator now supports exporting to tf.train.Checkpoint format, which allows the saved checkpoints to be compatible with model.load_weights.

keras.backend.resize_images (and consequently, keras.layers.Upsampling2D) behavior has changed, a bug in the resizing implementation was fixed.

Add an implementation=3 mode for tf.keras.layers.LocallyConnected2D and tf.keras.layers.LocallyConnected1D layers using tf.SparseTensor to store weights, allowing a dramatic speedup for large sparse models.

Raise error if batch_size argument is used when input is dataset/generator/keras sequence.

Update TF 2.0 keras.backend.name_scope to use TF 2.0 name_scope.

Add v2 module aliases for losses, metrics, initializers and optimizers: tf.losses = tf.keras.losses & tf.metrics = tf.keras.metrics & tf.initializers = tf.keras.initializers & tf.optimizers = tf.keras.optimizers.

Updates binary cross entropy logic in Keras when input is probabilities. Instead of converting probabilities to logits, we are using the cross entropy formula for probabilities.

Added public APIs for cumsum and cumprod keras backend functions.

Add support for temporal sample weight mode in subclassed models.

Raise ValueError if an integer is passed to the training APIs.

Added fault-tolerance support for training Keras model via model.fit() with MultiWorkerMirroredStrategy, tutorial available.

Custom Callback tutorial is now available.

To train with tf.distribute, Keras API is recommended over estimator.

steps_per_epoch and steps arguments are supported with numpy arrays.

New error message when unexpected keys are used in sample_weight/class_weight dictionaries

Losses are scaled in Keras compile/fit and not in the optimizers anymore. If you are using custom training loop, we have new utilities to help scale losses tf.nn.compute_average_loss, tf.nn.scale_regularization_loss.

Layer apply and add_variable APIs are deprecated.

Added support for channels first data format in cross entropy losses with logits and support for tensors with unknown ranks.

Error messages will be raised if add_update, add_metric, add_loss, activity regularizers are used inside of a control flow branch.

New loss reduction types:

AUTO: Indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE. When used with tf.distribute.Strategy, outside of built-in training loops such as tf.keras compile and fit, we expect reduction value to be SUM or NONE. Using AUTO in that case will raise an error.

NONE: Weighted losses with one dimension reduced (axis=-1, or axis specified by loss function). When this reduction type used with built-in Keras training loops like fit/evaluate, the unreduced vector loss is passed to the optimizer but the reported loss will be a scalar value.

SUM: Scalar sum of weighted losses. 4. SUM_OVER_BATCH_SIZE: Scalar SUM divided by number of elements in losses. This reduction type is not supported when used with tf.distribute.Strategy outside of built-in training loops like tf.keras compile/fit.

Wraps losses passed to the compile API (strings and v1 losses) which are not instances of v2 Loss class in LossWrapper class. => All losses will now use SUM_OVER_BATCH_SIZE reduction as default.

model.add_loss(symbolic_tensor) should work in ambient eager.

Update metric name to always reflect what the user has given in compile. Affects following cases

When name is given as 'accuracy'/'crossentropy'

When an aliased function name is used eg. 'mse'

Removing the weighted prefix from weighted metric names.

Allow non-Tensors through v2 losses.

Add v2 sparse categorical crossentropy metric.

Add v2 APIs for AUCCurve and AUCSummationMethod enums.

add_update can now be passed a zero-arg callable in order to support turning off the update when setting trainable=False on a Layer of a Model compiled with run_eagerly=True.

Standardize the LayerNormalization API by replacing the args norm_axis and params_axis with axis.

Fixed critical bugs that help with DenseFeatures usability in TF2

tf.lite:

Added evaluation script for COCO minival

Add delegate support for QUANTIZE.

Add GATHER support to NN API delegate.

Added support for TFLiteConverter Python API in 2.0. Contains functions from_saved_model, from_keras_file, and from_concrete_functions.

Add EXPAND_DIMS support to NN API delegate TEST.

Add narrow_range attribute to QuantizeAndDequantizeV2 and V3.

Added support for tflite_convert command line tool in 2.0.

Post-training quantization tool supports quantizing weights shared by multiple operations. The models made with versions of this tool will use INT8 types for weights and will only be executable interpreters from this version onwards.

Post-training quantization tool supports fp16 weights and GPU delegate acceleration for fp16.

Add delegate support for QUANTIZED_16BIT_LSTM.

Extracts NNAPIDelegateKernel from nnapi_delegate.cc

TensorRT

Add TensorFlow 2.0-compatible TrtGraphConverterV2 API for TensorRT conversion. TensorRT initialization arguments are now passed wrapped in a named-tuple, TrtConversionParams, rather than as separate arguments as in TrtGraphConverter.

Changed API to optimize TensorRT enginges during graph optimization. This is now done by calling converter.build() where previously is_dynamic_op=False would be set.

converter.convert() no longer returns a tf.function. Now the funtion must be accessed from the saved model.

The converter.calibrate() method has been removed. To trigger calibration, a calibration_input_fn should be provided to converter.convert().

Other:

Fix accidental quadratic graph construction cost in graph-mode tf.gradients().

ResourceVariable's gather op supports batch dimensions.

ResourceVariable support for gather_nd.

ResourceVariable and Variable no longer accepts constraint in the constructor, nor expose it as a @property.

Added gradient for SparseToDense op.

Expose a flag that allows the number of threads to vary across Python benchmarks.

image.resize in 2.0 now supports gradients for the new resize kernels.

image.resize now considers proper pixel centers and has new kernels (incl. anti-aliasing).

Renamed tf.image functions to remove duplicate "image" where it is redundant.

Variadic reduce is supported on CPU Variadic reduce is supported on CPU

Remove unused StringViewVariantWrapper.

Delete unused Fingerprint64Map op registration

Add broadcasting support to tf.matmul.

Add C++ Gradient for BatchMatMulV2.

Add tf.math.cumulative_logsumexp operation.

Add ellipsis (...) support for tf.einsum().

Add expand_composites argument to all nest.* methods.

Added strings.byte_split.

Add a new "result_type" parameter to tf.strings.split.

Add name argument to tf.string_split and tf.strings_split.

Extend tf.strings.split to support inputs with any rank.

Added tf.random.binomial.

Added key and skip methods to random.experimental.Generator.

Extend tf.function with basic support for CompositeTensors arguments (such as SparseTensor and RaggedTensor).

parallel_for.pfor: add converters for Softmax, LogSoftmax, IsNaN, All, Any, and MatrixSetDiag.

parallel_for: add converters for LowerTriangularSolve and Cholesky.

parallel_for: add converters for LogMatrixDeterminant and MatrixBandPart.

parallel_for: Add converter for MatrixDiag.

parallel_for: Add converters for OneHot, LowerBound, UpperBound.

parallel_for: add converter for BroadcastTo.

Add pfor converter for Squeeze.

Add RaggedTensor.placeholder().

Add ragged tensor support to tf.squeeze.

Update RaggedTensors to support int32 row_splits.

Allow LinearOperator.solve to take a LinearOperator.

Allow all dtypes for LinearOperatorCirculant.

Introduce MaxParallelism method

Add LinearOperatorHouseholder.

Adds Philox support to new stateful RNG's XLA path.

Added TensorSpec support for CompositeTensors.

Added tf.linalg.tridiagonal_solve op.

Added partial_pivoting input parameter to tf.linalg.tridiagonal_solve.

Added gradient to tf.linalg.tridiagonal_solve.

Added tf.linalg.tridiagonal_mul op.

Added GPU implementation of tf.linalg.tridiagonal_matmul.

Added LinearOperatorToeplitz.

Upgraded LIBXSMM to version 1.11.

Uniform processing of quantized embeddings by Gather and EmbeddingLookup Ops.

Correct a misstatement in the documentation of the sparse softmax cross entropy logit parameter.

Add tf.ragged.boolean_mask.

tf.switch_case added, which selects a branch_fn based on a branch_index.

The C++ kernel of gather op supports batch dimensions.

Fixed default value and documentation for trainable arg of tf.Variable.

EagerTensor now supports numpy buffer interface for tensors.

This change bumps the version number of the FullyConnected Op to 5.

Added new op: tf.strings.unsorted_segment_join.

Added HW acceleration support for topK_v2.

CloudBigtable version updated to v0.10.0 BEGIN_PUBLIC CloudBigtable version updated to v0.10.0.

Expose Head as public API.

Added tf.sparse.from_dense utility function.

Improved ragged tensor support in TensorFlowTestCase.

Added a function nested_value_rowids for ragged tensors.

Added tf.ragged.stack.

Makes the a-normal form transformation in Pyct configurable as to which nodes are converted to variables and which are not.

ResizeInputTensor now works for all delegates.

tf.cond emits a StatelessIf op if the branch functions are stateless and do not touch any resources.

Add support of local soft device placement for eager op.

Pass partial_pivoting to the _TridiagonalSolveGrad.

Add HW acceleration support for LogSoftMax.

Add guard to avoid acceleration of L2 Normalization with input rank != 4

Fix memory allocation problem when calling AddNewInputConstantTensor.

Delegate application failure leaves interpreter in valid state

tf.while_loop emits a StatelessWhile op if the cond and body functions are stateless and do not touch any resources.

tf.cond, tf.while and if and while in AutoGraph now accept a nonscalar predicate if has a single element. This does not affect non-V2 control flow.

Fix potential security vulnerability where decoding variant tensors from proto could result in heap out of bounds memory access.

Only create a GCS directory object if the object does not already exist.

Introduce dynamic constructor argument in Layer and Model, which should be set to True when using imperative control flow in the call method.

Begin adding Go wrapper for C Eager API.

XLA HLO graphs can be inspected with interactive_graphviz tool now.

Add dataset ops to the graph (or create kernels in Eager execution) during the python Dataset object creation instead doing it during Iterator creation time.

Add batch_dims argument to tf.gather.

The behavior of tf.gather is now correct when axis=None and batch_dims<0.

Update docstring for gather to properly describe the non-empty batch_dims case.

Removing of dtype in the constructor of initializers and partition_info in call.

Add tf.math.nextafter op.

Turn on MKL-DNN contraction kernels by default. MKL-DNN dynamically dispatches the best kernel implementation based on CPU vector architecture. To disable them, build with --define=tensorflow_mkldnn_contraction_kernel=0.

tf.linspace(start, stop, num) now always uses "stop" as last value (for num > 1)

Added top-k to precision and recall to keras metrics.

Add a ragged size op and register it to the op dispatcher

Transitive dependencies on :pooling_ops were removed. Some users may need to add explicit dependencies on :pooling_ops if they reference the operators from that library.

Add CompositeTensor base class.

Malformed gif images could result in an access out of bounds in the color palette of the frame. This has been fixed now

Add templates and interfaces for creating lookup tables

Tensor::UnsafeCopyFromInternal deprecated in favor Tensor::BitcastFrom.

In map_vectorization optimization, reduce the degree of parallelism in the vectorized map node.

Add variant wrapper for absl::string_view.

Add OpKernels for some stateless maps.

DType is no longer convertible to an int. Use dtype.as_datatype_enum instead of int(dtype) to get the same result.

Support both binary and -1/1 label input in v2 hinge and squared hinge losses.

Added LinearOperator.adjoint and LinearOperator.H (alias).

Expose CriticalSection in core as tf.CriticalSection.

Enhanced graphviz output.

Add opkernel templates for common table operations.

Fix callbacks do not log values in eager mode when a deferred build model is used.

SignatureDef util functions have been deprecated.

Update Fingerprint64Map to use aliases

Add legacy string flat hash map op kernels.

Add support for add_metric in the graph function mode.

Updating cosine similarity loss - removed the negate sign from cosine similarity.

Changed default for gradient accumulation for TPU embeddings to true.

Adds summary trace API for collecting graph and profile information.

The precision_mode argument to TrtGraphConverter is now case insensitive.

　 Tweet
Qiitaで続きを読む

20191001のTensorFlowに関する記事は2件です。

強化学習で目指すF-ZERO王者（前編）

できたもの（途中経過）

はじめに

1. ゲームの選定

2. 全体構成

使用した機材

3. 情報の抽出

3.1 Power

3.2 Speed

3.3 Time (option)

3.4 Course (option)

モデル検討

学習データ

学習＆推論

速度がでない！

高速化（並列化，TF-TRT）

並列化

TF-TRT（TensorFlow with TensorRT optimization）

まとめ

[Tensorflow] Tensorflow 2

Release 2.0.0

Major Features and Improvements

Highlights

Breaking Changes

Bug Fixes and Other Changes

日付別

タグ一覧