DataFrame/DataSet 创建

读文件接口

import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
val df=spark.read.xxx

DataFrame/DataSet 读取数据源文档

spark.read 返回 DataFrameReader

spark.readStream 返回 DataStreamReader

后续读文件操作雷同，可以参考作者的 Structured Streaming 文章

RDD 转换成 DataFrame/DataSet

方式1：已知元数据

val peopleDF = spark.sparkContext
  .textFile("examples/src/main/resources/people.txt")
  .map(_.split(","))
  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
  .toDF()/toDS

方式2：未知元数据

val schemaString = "name age"
// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
// Convert records of the RDD (people) to Rows
val rowRDD = peopleRDD
  .map(_.split(","))
  .map(attributes => Row(attributes(0), attributes(1).trim))

优质内容筛选与推荐>>
1、@Scope("prototype")创建的对象为什么会一样？
2、第三次作业
3、日志分析第四章安装filebeat
4、SSM框架随笔
5、tensorflow实现简单的softmat分类器

0 │ 收藏 │ 举报

DataFrame/DataSet 创建

朋友将在看一看看到

分享想法到看一看