### Step 1: set model ##################################################################################
os.environ['RWKV_FLOAT_MODE']='bf16'# 'bf16' (stable) or 'fp16' (will overflow after training a large model for very long. can be solved in the future)
os.environ['RWKV_FLOAT_MODE']='bf16'# 'bf16' (stable) or 'fp16' (will overflow after training a large model for very long. can be solved in the future)
### This is using DeepSpeed stage2 + FP16 ##############################################################
#
# Currently it's slow to initialize a new model. Hence I suggest this procedure for multi-GPU training:
os.environ['RWKV_FLOAT_MODE']='bf16'# 'bf16' (stable) or 'fp16' (will overflow after training a large model for very long. can be solved in the future)